Hierarchy of caches

May 13, 2013

Computers are a hierarchy of caches that are fast in the average case. I’m fond saying this, because it pretty much explains all of the decisions made in the last 60 years of computer science.

If you’re a computer, there are quite a few choices for how to store things you’re working on beyond hard drives and memory that most often get talked about. The fastest are registers, the hundered-or-so little number pockets embedded right in the same circuts that do the adding subtracting, multiplying and dividing. Unfortunately, they hold a mere thimblefull of data.

Enter the L1 cache, which holds a coffee cup worth of data. This is still on the chip, and pretty close to the registers, but a bit slower. L1 has some even bigger, even slower cousins called the L2/L3 cache, who live on the chip (often taking up half of it) and hold a large trash can worth of data.

Now we’re getting somewhere! But we’ve also run out of chip. Sillicon is expensive, and those registers and cache take up a lot of space. Time to bring out the main memory which holds a few railway tankers of data by squeezing it into a smaller area.

But what about our boatloads of photos, and videos we stash on the hard disk? Keeping up the analogy, this amount of data is comparable to the daily oil output of the Arctic National Wildlive Refuge (ANWR). For more data than that, we’ve got to go to the cloud, which is really just a bunch of disks over a network that can store as much data as US diesel consumption in one year.

All this extra space comes a cost. Cache, memory and disks are increasingly further away from where the calculation is taking place, so it takes increasingly long trips to get there.

A trip to the registers? That’s like going to the coffee shop downstairs. I wouldn’t think twice, and neither would an electron. L1 cache is a bit further, like heading to that tasty take out place down the street. But to get to the L2/L3 cache, I’d want to hop on the subway for a few stops (New Yorkers are impatient, if you hadn’t heard).

Now, getting to the main memory is like me heading from NYC to Westchester on the Metro-North, perhaps a little over an hour. This is not a trip I’d take on a whim. But now we’ve got a really big jump from memory and disk drives. The fastest drive you can get, a fancy new SSD, is like taking the bus to Boston (4 hours or so), but that’s only reading from the drive, writing is like going to California! It gets even worse with disk drives that actually have spinny discs in them. This leg of the journey is like going to China for the very fastest drives, and flying a quarter of the way to the moon for the slow ones!

It’s easy to get caught up in the jargon of megabytes and gigabytes and loose perspective on the vast numbers involved. I made a chart to help1:

It fascinates me that each rung on the ladder is one order of magnitude bigger (nerd speak for 10 times bigger) than the previous, in both directions. I don’t think this is a coincidence. Here’s are the rough numbers:

Why are size and latency so close to linear? My best theory is we’ve spend equal amounts of time and money making each part fast. Leaving a gap is either too expensive if you use the small fast thing or too slow if you use the large thing (maybe Siracusa or Anand have a better theory).

The reason this chart is so important, is our hierarchy of caches that says that you pay a lot of latency to access data that’s already in a fast cache. Much of the history of algorithms and data structures are a series of clever tricks2 to keep the data that you’re working on in the lower left side of the chart.

The term d’art for this is data locality.


  1. I made the charts for CS for Hackers at General Assembly, so they’re licensed cc-by-nc-sa

  2. Next time we’ll talk more about why computers are fast in the average case