Sound is essentially vibrations in air. To convert those vibrations into a digital file, a computer needs to "sample" the sound at extremely short intervals — recording the exact state of the sound wave at each moment. The more frequently it samples and the more precisely it records, the closer the reproduction is to the original, but the larger the file becomes.
The purpose of audio compression is to find a balance between "quality" and "file size." Imagine painting a picture: an uncompressed WAV is like a hyper-detailed painting with every pixel fully rendered; an MP3 is like a painting that preserves all the important details while omitting the subtle nuances most people won't notice.
Audio compression comes in two fundamentally different types:
Lossy compression permanently deletes some audio data to shrink the file. The data removed is typically detail that human ears don't easily perceive — ultra-high frequencies, sounds masked by louder ones, and so on. The advantage is that files can become extremely small.
Common lossy formats: MP3, AAC, OGG Vorbis, WMA.
Crucially, lossy compression is irreversible. Once compressed, the discarded data is gone forever. Even converting an MP3 back to WAV won't restore the quality.
Lossless compression shrinks files through smarter data arrangement without discarding anything. The decompressed audio is bit-for-bit identical to the original. The principle is similar to ZIP compression — compress it, unzip it, the file is exactly the same as before.
Common lossless formats: FLAC, ALAC (Apple Lossless), APE.
Lossless compression achieves less dramatic size reduction than lossy compression, typically shrinking files to 50–70% of the original. The benefit is zero quality loss.
Bitrate is the amount of audio data per unit of time, measured in kbps (kilobits per second). Think of it like a water pipe — the wider the pipe (higher bitrate), the more water (audio detail) can flow through.
| Bitrate | Quality Description | Best Use Case |
|---|---|---|
| 64 kbps | Noticeable distortion, muffled sound | Voice memos, low-quality previews |
| 96 kbps | Acceptable, limited detail | Voice call quality |
| 128 kbps | Usable, adequate for casual listening | Basic streaming quality |
| 192 kbps | Good — most listeners are satisfied | Everyday music listening |
| 256 kbps | Excellent, near-original quality | iTunes Store default quality |
| 320 kbps | Outstanding — the best MP3 quality | High-quality collections |
Bitrate and file size are proportional. A simple formula:
File size (MB) = Bitrate (kbps) × Duration (seconds) ÷ 8 ÷ 1024
Example: 128kbps × 240 seconds (4 minutes) ÷ 8 ÷ 1024 ≈ 3.75 MB
Sample rate is how many times per second the sound wave is "photographed," measured in Hz (Hertz). According to the Nyquist theorem, the sample rate must be at least twice the highest frequency to accurately reproduce sound.
Not necessarily. Human hearing tops out at around 20,000 Hz (and decreases with age), so a 44,100 Hz sample rate can theoretically capture every frequency the human ear can hear. Higher sample rates are mainly used in professional settings — during recording and mixing, ultra-high sample rates can prevent distortion introduced by digital processing.
If sample rate determines "how often you photograph," bit depth determines "the resolution of each photo." Higher bit depth means each sample captures volume information with finer precision.
Dynamic range is the gap between the loudest and quietest sounds. The 96 dB dynamic range of 16-bit audio means it can simultaneously record a very delicate piano pianissimo and a thundering rock concert. For everyday music listening, this is more than sufficient.
When compressing audio, there's another important choice: Constant Bitrate (CBR) or Variable Bitrate (VBR).
The entire audio file uses the same bitrate throughout. Whether a section is quiet or complex, the same amount of data is used to represent it.
The bitrate adjusts dynamically based on the complexity of the audio. Simple sections (like silence or solo voice) use a lower bitrate; complex sections (like a symphonic climax) use a higher bitrate.
| Use Case | Format | Bitrate | Sample Rate |
|---|---|---|---|
| Everyday listening | MP3 | 256–320 kbps | 44,100 Hz |
| Podcast publishing | MP3 | 96–128 kbps (mono) | 44,100 Hz |
| Ringtones | MP3 / M4R | 192 kbps | 44,100 Hz |
| Video background music | WAV / AAC | Uncompressed / 256 kbps | 48,000 Hz |
| Studio recording | WAV | Uncompressed | 48,000–96,000 Hz |
| Music archive | FLAC | Lossless | Original sample rate |
| Web audio effects | MP3 / OGG | 128 kbps | 44,100 Hz |