Overall ADM Structure
We'll start by giving a brief introduction to the overall structure of the ADM. The ADM consists of a collection of elements, each of which are used to describe aspects of the audio. Each element is represented by an XML element, and they contain various attributes and sub-elements. The elements are connected to each other via references (except audioBlockFormat
), as shown in this diagram:
The diagram shows the divide between the content part, the format part and the BW64 file. Both the content and format parts make up the ADM metadata that is in XML, and is usually carried within a chunk (the 'axml' chunk) in the BW64 file. The BW64 File part at the bottom contains the 'chna' chunk which is a look-up table connecting the ADM metadata with the audio tracks in the file.
The content part describes the technical content of the audio, such as whether it contains dialogue or a particular language, as well as loudness metadata. The format part describes what sort of channels the audio tracks have and how they are grouped together, for example the left and right channels in a stereo pair. The elements in the content part are generally unique to the audio and the programme, whereas the elements in the format part can be reused.
Short Glossary
Audio terminology does vary over different environments and standards bodies, so to help clarify the terms the ADM uses, here's a short glossary:
- Track - A sequence of data representing the audio samples stored in a medium. The metadata includes the format of the data (e.g. PCM).
- Stream - One or more tracks that can be combined to make a complete set of one or more audio signals. A stream represents either a channel (when it is carrying one audio signal), or a pack (when it is carrying more than one audio signal). The metadata includes the audio codec used to generate the data in the tracks.
- Channel - (1) A mono sequence of audio samples that may have a particular spatial location (e.g. 'front left'), or other audio characteristics. Typical metadata for a channel includes the position of the sound, and loudspeaker it is intended for. (2) Audio that is intended for a particular loudspeaker (see audio types for more detail about channel-based audio).
- Block - A time-slice of a channel of a particular duration. Metadata in blocks allow channels to vary their properties (such a spatial location) over time.
- Pack - A group of related channels that ought to be kept to together (e.g. 'stereo').
- Programme - A complete audio programme that contains everything required to playout. Typical metadata attached to a programme includes its duration and the language it is in. A programme contains one or more contents.
- Content - Part of a programme, for example the dialogue or background music. Typical metadata attached to content includes the language of the dialogue and the type of content (e.g. dialogue or music). Content contains one or more objects.
- Object - (1) A set of tracks of a finite duration with a particular pack and channel configuration. The metadata includes start time and duration. (2) A sound located in a particular location in 3D space (see audio types for more detail about object-based audio).
- Static - Something that does not change over time. For example, a static channel will contain positional metadata fixed to one location.
- Dynamic - Something that does change over time. For examples, a dynamic channel will contain positional metadata that describes movement over time.
- HOA - Higher Order Ambisonics, a scene-based representation of audio (see audio types for more detail).
- Binaural - Sound intended to be delivered directly to a pair of ears (usually over headphones) that gives the impression of 3D immersion and externalisation (see audio types for more detail about binaural audio).
- Matrix - Channels that are derived from a combination of other channels via a mathematical matrix operation. For examples, Mid and Side channels are a simple matrix of Left and Right channels (see audio types for more detail about matrix audio).
The diagram below helps illustrate how some of these terms relate to each other in the context of an audio file:
Here, the example audio file contains four tracks (2x PCM, 2x coded), which are grouped into three streams (2x PCM, 1x coded). The two PCM streams each contain a channel ("Left" and "Right"), which are part of a "Stereo" pack. The coded stream contains a pack (a "3.0" layout) of three channels. Each of the two packs are the format of objects, one being a "Dialogue-1" object, and the other a "Music-1" object. The diagram also shows that the two objects are covering different time regions of the tracks and streams. These two objects are each part of different contents ("Dialogue" and "Music"). The "Main" programme contains these two contents.
Starting Point
Do you read the metadata first to find out what is in the audio, or do you want to check each audio track and find out what the metadata is for it? Well, the ADM allows either of these entry points to be taken. If you want to start with the metadata, then starting at audioProgramme
and working down from there is the way to go. If you want start with the audio, the you work up from the 'chna' look-up table at the bottom.
For this tutorial we'll start with the format part at the bottom where we work up from the 'chna' table; so, let's follow a worked example in the next step.