Caches

Tom Kelliher, CS26

Nov. 14, 1996

From last time:

Lots of lines, arrows, and rectangles. Mass confusion?

What is important:

SRAM is faster than DRAM, but more expensive.
DRAM is faster than disk, but more expensive.

How much faster? How much more expensive?

What do we really want?

A main memory that runs at CPU speed.
A main memory with capacity of a disk.

How do we achieve these goals?

Cache Concepts

Placed between memory and requestor:

Requestor: CPU or memory.

Characteristics:

Memory: ``slow''; large.
Cache: ``fast;'' small, yet representative.

Hit rate.

How can hit rates be so high? --- Locality of reference.

Terminology

Miss --- Requested memory word is not in any cache line. (Hit.)
Miss penalty --- the extra time required to access a block when it's not in cache.
Block, cache line --- a contiguous group of memory words, aligned on a power-of-2 boundary. Unit of data transfer between cache and memory (interleaving).
Mapping function --- Function used to determine which cache line(s) in which a block can be placed.
Replacement algorithm --- A rule for selecting a ``victim'' block for removal if all possible mapped cache lines are in use.
Load through --- on a read-miss, desired word is sent to CPU simultaneously with block
Average access time:
Speed-up (5.6.2):

Update protocols:

Write through --- on writes, in-cache and in-memory blocks are updated simultaneously. On write-miss the block is not cached.
Write back, copy back --- on writes, only in-cache memory block updated. On write-miss the block is cached.

Update protocol issues:

Cache/memory consistency/coherence.
Snooping caches in multiprocessor or bus-mastering systems.
Reduction in memory writes, fewer redundant writes.
Complexity.
Actions on replacement.

Mapping Functions

Direct, fully-associative, set-associative.

Direct-Mapped Caches

A memory block is mapped to a single cache line:

How does the cache system know what block is in a cache line? (How do we check for a hit?)

Each cache entry contains:

A tag.
A memory block.
Status bits: valid, dirty, etc.

Consider a system:

16MB memory, byte addressable.
32-bit word size.
8KB cache.
Cache line is 8 words.

Questions:

How many bits in an address?
How many cache lines?
How is a memory address partitioned?
How many bits per tag?

Cache look-up on memory access:

Use block field to locate cache line.
Compare tags.
If hit: access word/byte from cache line.
If miss: start/continue memory operation.

Fully-Associative Caches

A memory block can be placed in any cache line.

Cache look-up on memory access:

Compare tag of memory address with all valid cache tags (in parallel)
If hit: access word/byte from appropriate cache line.
If miss: start/continue memory operation.

Associative, content-addressable memories.

Consider a system:

16MB memory, byte addressable.
32-bit word size.
8KB cache.
Cache line is 8 words.

Questions:

How is a memory address partitioned?
How many bits per tag?

Set-Associative Caches

A memory block can be placed into one of a small set of cache lines:

Cache look-up on memory access:

Locate set and compare tag of memory address with valid cache tags within set (in parallel)
If hit: access word/byte from appropriate cache line.
If miss: start/continue memory operation.

Consider a system:

16MB memory, byte addressable.
32-bit word size.
8KB cache.
Cache line is 8 words.
Four lines per set.

Questions:

How many sets?
How is a memory address partitioned?
How many bits per tag?

Replacement

Needed if an incoming block can replace one of a set of blocks.

How do we pick the victim?

Ideally: replace block which won't be used for longest time (optimal).
FIFO: replace oldest block.
LIFO: replace newest block.
LRU: replace block not used in longest time.
Random: Surprisingly effective.

Implementation:

Feasibility.
Additional control bits?

Comparison

Issues:

Complexity: placement, search, replacement.
Ability to capture locality. I.e., raise hit rate.

Direct-mapped, fully-associative are the ends of a continuum. The middle?

The effect of block-size on hit rate.

The effect of set-size on hit rate.

On-chip vs. off-chip caches: speed, size. Design constraints.

L3 caches: write buffering (consider write-through, write-back), call-backs.

Lock-up: cache delays further requests while servicing a miss. Effect upon pipelined processors.

How would one design a cache system for a real system?

Thomas P. Kelliher
Wed Nov 13 10:57:57 EST 1996

Tom Kelliher