Caches

Tom Kelliher, CS 240

Apr. 3, 2000

Administrivia

Simple cache, hit/misses.

Caches: Virtual Memory.

Consider:

Design tradeoff of unified, split caches: higher hit rate (better management of cache lines), higher cache bandwidth.
Memory addresses come from PC or ALU.
Read hit sequencing.
Read miss sequencing.
Write-through, write-back caches: memory consistency (multiple bus masters) and memory utilization reduction. Tradeoffs.
Write hit/miss sequencing for a write-through cache.
Consider multiprocessor caches, DMA, bus mastering in write-back case. Snooping caches, MESI protocol (modified, exclusive, shared, invalid).

Three cases:

Instruction fetch miss: stall pipeline until we load the instruction from a lower position in the hierarchy.
PC-4 business in the text.
Data fetch miss: stall pipeline as for a lw hazard. Or, keep going, until the data is actually needed.
How do you associate the incoming word with a reg file position? (Load/Store units)
Data store miss: stall pipeline. Or use a write buffer (L3 cache), to perform writes ``later.'' Must stall if write buffer is full.
Loads must snoop the write buffer.

Let's design another cache. Parameters:

Direct mapped.
Block size is four words. Why would we want a larger block size? (Exploit spatial locality)
Consider the hit rate when fetching eight words consecutively with block sizes of one and four words.
To a point, increasing block size increases the hit rate. At what point does this break down?
4K blocks.
32 bit memory space.

Questions:

How many bits/block? Total size of the cache?
Size of the data portion of the cache, in bytes? (This is the number you see quoted.)

Organization of the cache:

Consider:

Why would we want a larger block size? (Exploit spatial locality)
Consider the hit rate when fetching eight words consecutively with block sizes of one and four words.
To a point, increasing block size increases the hit rate. At what point does this break down? Why? (Fewer blocks.)
Also, the miss penalty time will increase because of larger memory transfers.
How can a load instruction result in a memory write? How can a store instruction result in a memory read?

Three memory organizations:

What is interleaving and its history?

Memory access delays:

Compute the bytes/cycle for each organization for four word blocks.

Points to consider:

The real point of interleaving is that we now have several independent memories.
So we can perform several writes simultaneously, assuming we can write single words. This increases our write bandwidth, leading to fewer stalls.
But, DRAM sizes keep increasing, decreasing the number of banks for a given memory size.

Thomas P. Kelliher
Mon Apr 3 08:17:04 EDT 2000