What is a relatively small high-speed memory that stores the most recently used instructions or data from the larger but slower main memory?

Cache memory plays a key role in computers. In fact, all modern computer systems, including desktop PCs, servers in corporate data centers, and cloud-based compute resources, have small amounts of very fast static random access memory (SRAM) positioned very close to the central processing unit (CPU). This memory is known as cache memory.

Despite its small size compared to primary memory (RAM) or secondary memory  (storage resources), cache memory has a huge impact on the overall performance of the system.

What is Cache Memory?

Computer systems have hard disk drives or solid state drives (SSDs) to provide high capacity, long-term data storage, as well RAM, which is used to store data and program code that the central processing unit is using or is about to need in the very near future. RAM is much faster than hard disk drive or SSD storage. It is usually made of dynamic random access memory (DRAM) which is also more expensive per Gigabyte or data stored.

But a CPU works much faster than RAM, so sometimes it can be forced to wait while instructions or data are read from RAM before it can continue processing, which reduces the overall performance of the computer system.

To prevent this from happening, computer systems are commonly equipped with cache memory: a small amount of  dynamic random access memory (DRAM) which is very fast, but very expensive, located very close to the CPU itself.

This cache memory stores data or instructions that the CPU is likely to use in the immediate future. Because this prevents the CPU from having to wait, this is why caching is used to increase read performance.

Cache Memory and Performance

Cache memory increases a computer’s performance. The cache memory is located very close to the CPU, either on the CPU chip itself or on the motherboard in the immediate vicinity of the CPU and connected by a dedicated data bus. So instructions and data can be read from it (and written to it) much more quickly than is the case with normal RAM.

That means that the CPU is much less likely to be kept waiting – or wait times will be dramatically reduced. The result is that a very small amount of cache memory can result in a significant increase in the computer’s performance.

How Does Cache Memory Work?

Cache memory works by taking data or instructions at certain memory addresses in RAM and copying them into the cache memory, along with a record of the original address of those instructions or data.

This results in a table containing a small number of RAM memory addresses, and copies of the instructions or data that those RAM memory address contain.

Memory Cache “Hit”

When the processor requires instructions or data from a given RAM memory address, then before retrieving them from RAM it checks to see if the cache memory contains a reference to that RAM memory address. If it does, then it reads the corresponding data or instructions from the cache memory instead of from RAM. This is known as a “cache hit”. Since the cache memory is faster than RAM, and because it is located closer to the CPU, it can get and start processing the instructions and data much more quickly.

The same procedure is carried out when data or instructions need to be written back to memory. However, in this case there is an additional step because if anything is written to cache memory than ultimately it must also be written RAM.

How this is done depends on the cache’s write policy. The simplest policy is known as “write-through”: with this policy anything written to the memory cache is also written to RAM straight away.

An alternative policy is “write-back.” Using a “write-back” policy, data written to cache memory is now immediately written to RAM as well. Anything written to cache memory is marked as “dirty,” meaning that it is different to the original data or instructions that were read from RAM. When it is removed from the cache memory, then and only then is it written to RAM, replacing the original information.

Intermediate policies allow “dirty” information to be queued up  and written back to RAM in batches, which can be more efficient than multiple individual writes.

Memory Cache “Miss”

If data or instructions at a given RAM memory address are not found in cache memory, then this is known as a “cache miss.” In this case, the CPU is forced to wait while the information is retrieved from RAM.

In fact, the data or instructions are retrieved from RAM and written to cache memory, and then sent on to the CPU. The reason for this is that data or instructions that have been recently used are very likely to be required again in the near future. So anything that the CPU requests from RAM is always copied to cache memory.

(There is an exception to this. Some data is of a type which is rarely reused can be marked as non-cacheable. This prevents valuable cache memory space being occupied by data unnecessarily.)

This begs the question of what happens if the cache memory is already full. The answer is that some of the contents of the cache memory has to be “evicted” to make room for the new information that needs to be written there.

If a decision needs to be made then the memory cache will apply a “replacement policy” to decide which information is evicted.

There are a number of possible replacement policies. One of the most common ones is a least recently used (LRU) policy. This policy uses the principal that if data or instructions have not been used recently, then they are less likely to be required in the immediate future than data or instructions that have been required more recently.

The Key Value of Cache Memory

Cache memory is needed to reduce performance bottlenecks between RAM and the CPU. Its usage is analogous to the use of RAM as a disk cache. In this case, frequently used data stored on secondary storage systems (such as hard drives or SSDs) is temporarily placed in RAM, where it can be accessed by the CPU much more quickly.

Since RAM is more expensive (but faster)  than secondary storage, disk caches are smaller than hard drives or SSDs. Since SRAM is more expensive (but faster) than DRAM, memory caches are smaller than RAM.

Types of Cache Memory

  • Primary Cache  Most cache memory is physically located on the same die as the CPU itself, and the part closest to the CPU cores is sometimes called primary cache, although the term is not commonly used any more.
  • Secondary Cache  This often refers to a further piece of cache memory, which is located on a separate chip on the motherboard close to the CPU. This term is also not commonly used anymore, because most cache memory is now located on the CPU die itself.

Levels  of Cache Memory

Modern computer systems have more than one piece of cache memory, and these caches vary in size and proximity to the processor cores, and therefore also in speed. These are known as cache levels.

The smallest and fastest cache memory is known as Level 1 cache, or L1 cache, and the next is L2 cache. Most  systems now have L3 cache, and since the introduction of its Skylake chips, Intel has added L4 cache to some of its processors as well.

Level 1

L1 cache is cache memory that is built into the CPU itself. It runs at the same clock speed as the CPU. It is the most expensive type of cache memory so its size is extremely limited. But because it is very fast it is the first place that a processor will look for data or instructions that may have been buffered there from RAM.

In fact, in most modern CPUs, the L1 cache is divided into two parts: a data section (L1d) and an instruction section (L1i). These hold data and instructions, respectively.

A modern CPU may have a cache size on the order of 32 KB of L1i and L1d per core.

Level 2

L2 cache may also be located in the CPU chip, although not as close to the core as L1 cache. Or more rarely, it may be located on a separate chip close to the CPU. L2 caches are less expensive and larger than L1 caches, so L2 cache sizes tend to be larger, and may be of the order of 256 KB per core.

Level 3

Level 3 cache tends to be much larger than either L1 or L2 cache, but it also different in another important way. Whereas L1 and L2 caches are private to each core of a processor, L3 tends to be a shared cache that is common to all the cores. This allows it to play an important role in data sharing and inter-core communication. L3 cache may be of the order of 2 MB per core.

Cache Mapping

Cache memory, as has been discussed, is extremely fast – meaning that it can be read from very quickly.

But there is a potential bottleneck: before data can be read from cache memory, it has to be found. The processor knows the RAM memory address of the data or instruction that it wants to read. It has to search the memory cache to see if there is a reference to that RAM memory address in the memory cache, along with the associated data or instruction.

There are a number of ways that data or instructions from RAM can be mapped into memory cache, and these have direct implications for the speed at which they can be found. But there is a trade-off: minimizing the search time also minimizes the likelihood of a cache hit, while maximizing the chances of a cache hit maximizes the likely search time.

The following cache mapping methods are commonly used:

Direct Mapping

With direct mapped cache,  there is only one place in cache memory that a given block of data from RAM can be stored.

This means that the CPU only has to look in one place in the memory cache to see if the data or instructions that it is looking for are present, and if it is it will be found very quickly. The drawback with direct mapped cache is that it severely limits what data or instructions can be stored in the memory cache, so cache hits are rare.

Associative Mapping

Also known as fully associated mapping, this is the opposite of direct mapping. With an associative mapping scheme, any block of data or instructions from RAM can be placed in any cache memory block. That means that the CPU has to search the entire cache memory to see if it contains what it is looking for, but the chances of a cache hit are much higher.

Set-Associative Mapping

A compromise between the two types of mapping is set associative mapping, which allows a block of RAM  to be mapped to a limited number of different memory cache blocks.

A 2-way associative mapping systems allows a RAM block to be placed in one of two places in cache memory. In contrast, an 8-way associative mapping system would allow a RAM block to be placed in any one of 8 cache memory blocks.

A 2-way system takes twice as long to search as a direct mapped system, as the CPU has to look in two places instead of just one, but there is a much greater chance of a cache hit.