How do audio formats work?

FLAC, WAV, mp3, mp4, mp4pzd79, ... There are dozens of different audio formats, but I've seen confusion before as to what exactly they are and how they work. I figured I'd make this page to clarify some things.

Containers and codecs

A lot of people tend to conflate an audio file's container with its codec, but they're two different concepts, and separating them really helps clear confusion as to e.g. .mp4 and .aac are.

Now for the explanation:

  • A codec (well, technically an audio coding format) defines how raw audio data is stored on the disc. The codec basically says how to convert what you want to hear into a bunch of bytes, also referred to as a bitstream.
  • A container format is a format that...well, contains the above bitstream. Codecs don't encode info such as music tags; remember, all they handle is the audio data. Anything extra is stored in the container.

There are two main types of formats: lossy and lossless (some audio codecs, such as OGG, support both styles). Lossy encoding means that audio information deemed irrelevant (such as frequencies that are hard for the human ear to hear or inaudible from most speakers/headphones) is discarded to reduce the file size. Lossless encoding doesn't discard any data, though it can still compress the files to reduce their size.

Bitrate

Bitrate is simply the amount of space required to store audio data. For instance, a bitrate of 320 kbps (kbps -> kilo bytes per second) means that it takes 320 kilobytes to store one second of audio data.

Now's the interesting part: bitrate means far less than you'd think. For starters, it largely depends on the codec. Newer codecs can store more audio in less space, meaning that they achieve better qualities than e.g. MP3 with a lower bitrate. Even with MP3, however, the "holy grail" bitrate of 320 kbps means very little. The MP3 format is designed to work best with variable bitrate, or VBR, meaning that different parts of the file will have different bitrates. For instance, if a section of the song has a piano solo, it may be able to store 320 kbps-level sound quality in less than 320 kilobytes. Forcing it to take up 320 kilobytes means that you gain nothing except a larger file.

All this comes to say: never hardcode a bitrate unless you know what you're doing. Any decent audio encoder should give you the ability to specify the desired audio quality, usually on a scale of 1 through 10 (sometimes 10 is best, sometimes it's the worst; double-check with your encoder first). This allows the encoder to then utilize VBR and choose the best bitrate for each part of the file. This will yield far better results than hardcoding a bitrate.

One last thing to note is that the law of diminishing returns applies heavily to both quality settings and bitrate. It's easy to tell the difference between a 32 kbps MP3 and a 128 kbps MP3, but it's harder to tell the difference between 256 kbps and 320 kbps. In newer, better codecs such as AAC and Vorbis, the differences are even smaller. The same thing goes for quality: it'll be easy to differentiate a quality level of 1 (worst) from 3 (better), but it'll be nearly impossible to tell 5 (better) from 7 (even better).

Lossless: WAV and FLAC

You've no doubt seen the .wav extension before, which are WAV files. This is the classic lossless audio format.

However, notice I didn't say codec above. The reason is that a WAV file is just a container: it stores the raw, unadultered audio data, usually in the LPCM format. On one hand, this means you get the full quality. On the other hand, this also means that WAV files tend to use inordinate amounts of disc space.

That's where FLAC comes in. FLAC consists of both a lossless codec and a container. The codec only applies compression to the audio data; it does not modify it in any way. If you convert a WAV file to FLAC and back to WAV, the byte-for-byte content should be identical.

Lossy: MP3

MP3 is one of the most popular lossy storage formats, and it's, ironically enough, also one of the worst. Why?

Well, for starters, the MP3 format does not specify container, and, in fact, all the .mp3 files you have are just the bitstream. How are tags stored then? As the MP3 format is somewhat loosely defined (most MP3 players simply skip arbitrary data until they find what looks like an audio stream), the ID3 tag format is able exploit this to store tag metadata at the beginning of the file. (Originally it was stored at the end, but that changed in ID3v2.) ID3 itself is a very loosely defined standard, which is part of the reason why MP3 tags can be interpreted differently across different music players.

Better but confusing: AAC and MP4

You've probably seen the .mp4 format before; it's hugely popular to store videos and also modestly popular for audio. However, MP4 (technically MPEG-4 Part 14) is actually just a container format! MP4 videos usually use the H.264 video codec, and audio usually uses the AAC codec.

AAC is a bit confusing as to how it's stored. Usually, AAC audio is stored inside MP4 containers; when an MP4 file contains just audio, the extension .m4a is often used instead. However, AAC can also be stored in MP4-inspired 3GP containers (extension .3gp or .3g2). Things get even more confusing: it can also be stored in Apple's AIFF container format (which is much like the above mentioned WAV format, created by Microsoft). Although .aiff is technically the extension for AIFF containers, you'll also see .m4a used, just like for MP4 files.

In addition, .aac files are the raw AAC audio without a container. Of course, some people have MP4 files with the .aac extension, apparently just to mess with your head.

All this being said, AAC files are able to give far better quality while using less space than MP3s, and the codec is considered to be one of the best.

OGG, Vorbis, and Opus

OGG, creating by the Xiph.Org foundation is interesting because it's a free, open container. This means that the format isn't subject to excessive licensing restrictions or fees, and it's easy for anyone to adopt it for their software. OGG containers can technically store both video and audio, though it's most commonly used for audio (with the .ogg extension, or less commonly, .oga).

Of course, a container is useless without a codec. Although OGG can technically store FLAC audio, it's chief codec is Vorbis, which is also by Xiph.Org and was created with the same free software ideals. OGG is usually either close or equivalent to AAC when it comes to the quality:size ratio, although AAC tends to have the upper edge. Both are far better than MP3, of course.

However, more interestingly, Xiph.Org has recently published Vorbis's successor: Opus (file extension is .opus). Opus has been ranked above MP3, AAC, Vorbis, and others in multiple listening tests. However, the main con at the moment is that has less widespread support than the others due to its relatively younger age.

Bottom line

  • Disregard bitrate for quality settings.
  • Favor FLAC for lossless audio.
  • Favor Opus or AAC for lossy audio.
  • If you use .mp4 for anything other than the MP4 container, you will be haunted by the ghosts of the IETF members (or at least those who are dead).