Answer Geek: Error Correction Rule CDs


-- Q U E S T I O N: CDs can hold about 650 MB of data, so each individual bit must be a microscopic pit on a CD’s surface. Since dust particles are larger than these individual pits, how does a drive’s laser see the pits buried underneath dust? I’ve played CDs that were covered in dust (and fingerprints, etc.) and yet they still work: how is that possible? — Dave I.

A N S W E R: I don’t know about your house, Dave, but in my house, CDs take a lot of abuse. The problem: a toddler who is old enough to know how to operate the CD player and bossy enough to insist on doing it himself, but too young to understand that you should treat CDs with respect and care.

So the typical CD in my household has probably spent some time scraping along the floor, or served as a Frisbee, or spent time in CD purgatory between the cushions of the couch. It’s definitely not treatment recommended by the manufacturer, and the result is that you can hardly pick a CD up off the floor of my living room that isn’t a welter of scratches, pits, and gouges, with food smears and greasy fingerprints galore. The amazing thing is they all still seem to play just fine.

How is this possible? The answer is one of the minor miracles of the digital age: error correction.

Error Correction: Can You Digit

Without it, we’d still be listening to music on vinyl, or 8-tracks. For that matter, CD-ROMs and virtually any kind of digital storage would be impossible, because in a world where every scratch, scrape, and smudge meant lost data, the odds of actually finding a CD you could play, or a file you could open would be very small indeed.

The key to error correction is something called Cross Interleaved Reed Solomon Code — CIRC for short. (Irving S. Reed and Gustave Solomon were engineers who came up with the idea in 1960 while working at MIT.) CIRC operates on a couple of basic principles. First, extra data is added to the information recorded on the disk. This extra information is called “parity bits.” Next, the data is placed on a disk not in normal linear order, like you might expect, but spread around through a process called interleaving. Let’s start with parity bits.

As you probably know, an audio CD can hold up to 74 minutes of music. What you may not know is that the real capacity is closer to 100 minutes: 25 percent of the information recorded on a CD is actually there to help the CD player read the disk properly. Control bits make up a small chunk of that 25 percent and basically are responsible for keeping track of time and marking a song’s beginning and end point. Most of the rest are parity bits used for error correction.

Understanding Bit by Bit

In simplest terms, a parity bit is either a one or a zero that is assigned to a short row of bits. Suppose you have a row of eight bits that looks like this: 10010010. Add the individual ones and zeroes together and you get three. Three is an odd number, so it gets the parity bit of one. If, on the other hand, the ones and zeroes added up to an even number, the parity bit would be a zero.

Now, data on an audio CD are organized into blocks of 24 bytes. For every 24 bytes there are eight parity bytes, for a total of 32 bytes. That group is called a block. When your CD player reads the data stream, it reads each little row of numbers in a block and adds them up. If, for some reason (say, a smear of jelly), one of the bits is missing in a row and the result doesn’t match up with the value of the parity bit, the CD player know that there is an error and it automatically corrects it be making it the same as the parity bit.

Got it? Good. Now on to interleaving. The 24 bytes of a single data block don’t sit in one nice little contiguous group right next to each other in the groove of a CD. If they did, that jelly smear would wreak havoc with your music. Instead, they are distributed among neighboring blocks of bytes — the bytes that makes up a single block are scattered about in a string of 109 physical data blocks on the disk. During playback, all of that information goes into a buffer and gets sorted out, reassembled. If necessary, errors are corrected and if all goes well, the result will be a faithful reproduction of the intended sound.

Outer Limits

Error correction can only go so far, however. Typically, between four and six errors can be corrected for every block. If that jelly smear has created more errors than that, you’re not completely hosed, at least not yet. Because interleaving scatters the data from a single block around the disk, there is a pretty good chance that an uncorrected byte still has a good byte on either side. If that’s the case, then your CD player will take an average for those two values and make an educated guess about what the missing value should be in between. This is called “interpolation.” If the number of missing bytes gets to be too large, the system will no longer be able to interpolate quickly or accurately enough and it will suppress the error by muting the sound for a fraction of a second, which is hopefully too short a period of time to be detected. At a certain point, of course, you’ll start to hear the difference. Like when the CD starts to repeat.

How robust is error correction? Or to put it another way, how much abuse can one CD take at the hands of a 2-year-old before it stops playing cleanly? CIRC can correct nearly 4,000 consecutive erroneous bits, which is equal to about 2.5 millimeters of the track on a CD. And it can hide more than 13,000 consecutive errors, or nearly 9 millimeters. You can test this by actually placing a piece of black tape on the CD. Or you can just turn it over to a 2-year-old for a while. Trust me, it takes a fair amount of scratching, scraping, Frisbee-playing, and ground-in food and grease before the quality of the sound starts to degrade noticeably.

Todd Campbell is a writer and Internet consultant living in Seattle. The Answer Geek appears weekly, usually on Thursdays.