Introduction & Context
As a music lover, I want to experience my music in its purest form. The true purest form is live performance, but we can’t always be at concerts so someone created recorded music. Then someone realised that you can’t take a record player or CD player wherever you go so they created compressed audio. There are many different compression formats including MP3, Microsoft’s WMA, Apple’s AAC, Sony’s ATRAC, and Ogg Vorbis. They all have different names and slightly different methods, but the overall concept is the same.
My aim in this series of posts is to explain what happens when you turn a CD into an MP3 or similar compressed format. In most cases, if you put a CD in your computer, PlayStation, Xbox, etc. and “rip” that music to a disc drive or portable music player, there’s a very good chance the music’s been compressed.
Just like it sounds, compressing music is all about squishing the same length of song into a smaller amount of data. A music track of about 3 minutes 30 seconds takes up between 20-30Mb as pure uncompressed audio. That same track can be compressed at “high” quality to about 7Mb. That’s a massive reduction, but you might be wondering what you’re losing to get the file to shrink by two thirds. Over the next few posts I’ll explain the process and the pros / cons of compression in a simple, real-world way so don’t worry if you’re not technically minded – you won’t need to be.
I should add that I’m not a fan of compressing music, but I recognise the need for it if we want portable music so the overall theme of these posts is to understand what you’re sacrificing when you choose compressed music. Once you know what you’re giving up, you can make an informed decision about what you’re willing to sacrifice in order to carry those extra songs. I hope the information is helpful and interesting.
The Physics of Hearing: To understand the impact of compression you need to understand how we hear sound. The process begins with a sound source (like a musical instrument) that creates vibrations in the air. These vibrations travel through the air until they hit our ears. Inside our ears is a thin layer of skin that we know as the ear drum. When the vibrations hit the ear drum, it is pushed around and vibrates in time with the incoming sound. Behind the ear drum are some small bones and our inner ear. The bones get pushed by the ear drum and they vibrate accordingly. As the bones vibrate, they continue to pass the vibrations to our inner ear. You can think of the bones in your ears like the string between two tin can telephones – they just carry a simple vibration.
The inner ear receives the vibrations next and the vibrations “tickle” a bunch of nerves which translate the vibration to a new type of signal for our brain. Don’t worry about the final signal to the brain though, just think about the vibrations until they hit the inner ear. These vibrations are chaotic. They aren’t clear and defined with separate little vibrations for the drums and another set of vibrations for the guitar and another set for the singer, etc. No, the vibrations all pile up and create a big mess of vibration.
A single, perfect note looks like this:
A graph of a perfect note
This type of vibration is impossible to create with a musical instrument (other than a synthesizer) or voice. Here’s the type of vibration created by instruments and voices:
A graph of musical vibrations
Notice the mostly chaotic nature of the vibrations? There are definitely patterns there, but it’s a big mess of different vibrations. What this graph shows us is how our ear drum would move when receiving this music. The higher or lower each line is, the more our ear drum moves. Lines towards the top push our ear drum in. Lines towards the bottom pull our ear drum out. These movements are all tiny (if the music’s not too loud), but enough to send these crazy vibrations through to our ear nerves. The miracle of hearing is that our brain translates this crazy bunch of vibrations into beautiful melodies and harmonies.
Masking: The second key concept to understand is the concept of masking. Masking is the effect of a louder sound making it difficult to hear a quieter sound played at the exact same time. Think about having dinner in a busy restaurant. You might find it difficult to hear what your friends are saying because of the noise in the restaurant – that’s masking. The combined noise of everyone else’s conversations are masking the voice of your friend across the table.
When some clever bunnies wanted to create a way to store music on computers and iPods (or similar devices) they needed to take some data out of our music. The only data in our music is sound, so they had to find a way to take some sounds out of the music. Sounds tricky, yes? That’s where masking comes into play.
Studies showed that people don’t notice when certain individual sounds are removed from the overall musical landscape. In basic terms, if two sounds occur simultaneously, the quieter one can be removed and we don’t really notice. That’s a slight over-simplification, but it sums up the concept. There are very complex mathematical algorithms and formulas that help determine what sounds will and won’t be missed. I don’t even pretend to fully understand those algorithms so I won’t try to explain it. It also doesn’t really matter how the maths works because the key information to understand is that compression involves removing small pieces of the music that you won’t miss (in theory).
End of Part 1
That’s the end of the first section. Hopefully now you understand how we hear and how masking works. In Part 2 I’ll explain how that knowledge applies to compress sound and how it affects what we hear after the compression is done.