Caches
Tom Kelliher, CS26
Nov. 14, 1996
From last time:
- Lots of lines, arrows, and rectangles. Mass confusion?
What is important:
- SRAM is faster than DRAM, but more expensive.
- DRAM is faster than disk, but more expensive.
How much faster? How much more expensive?
What do we really want?
- A main memory that runs at CPU speed.
- A main memory with capacity of a disk.
How do we achieve these goals?
Placed between memory and requestor:
Requestor: CPU or memory.
Characteristics:
- Memory: ``slow''; large.
- Cache: ``fast;'' small, yet representative.
Hit rate.
How can hit rates be so high? --- Locality of reference.
- Miss --- Requested memory word is not in any cache line. (Hit.)
- Miss penalty --- the extra time required to access a block when it's
not in cache.
- Block, cache line --- a contiguous group of memory words, aligned
on a power-of-2 boundary. Unit of data transfer between cache and memory
(interleaving).
- Mapping function --- Function used to determine which cache line(s)
in which a block can be placed.
- Replacement algorithm --- A rule for selecting a ``victim'' block for
removal if all possible mapped cache lines are in use.
- Load through --- on a read-miss, desired word is sent to CPU
simultaneously with block
- Average access time:
- Speed-up (5.6.2):
Update protocols:
- Write through --- on writes, in-cache and in-memory blocks are
updated simultaneously. On write-miss the block is not cached.
- Write back, copy back --- on writes, only in-cache memory block
updated. On write-miss the block is cached.
Update protocol issues:
- Cache/memory consistency/coherence.
- Snooping caches in multiprocessor or bus-mastering systems.
- Reduction in memory writes, fewer redundant writes.
- Complexity.
- Actions on replacement.
Direct, fully-associative, set-associative.
A memory block is mapped to a single cache line:
How does the cache system know what block is in a cache line? (How do we
check for a hit?)
Each cache entry contains:
- A tag.
- A memory block.
- Status bits: valid, dirty, etc.
Consider a system:
- 16MB memory, byte addressable.
- 32-bit word size.
- 8KB cache.
- Cache line is 8 words.
Questions:
- How many bits in an address?
- How many cache lines?
- How is a memory address partitioned?
- How many bits per tag?
Cache look-up on memory access:
- Use block field to locate cache line.
- Compare tags.
- If hit: access word/byte from cache line.
If miss: start/continue memory operation.
A memory block can be placed in any cache line.
Cache look-up on memory access:
- Compare tag of memory address with all valid cache tags (in
parallel)
- If hit: access word/byte from appropriate cache line.
If miss: start/continue memory operation.
Associative, content-addressable memories.
Consider a system:
- 16MB memory, byte addressable.
- 32-bit word size.
- 8KB cache.
- Cache line is 8 words.
Questions:
- How is a memory address partitioned?
- How many bits per tag?
A memory block can be placed into one of a small set of cache lines:
Cache look-up on memory access:
- Locate set and compare tag of memory address with valid cache tags
within set (in parallel)
- If hit: access word/byte from appropriate cache line.
If miss: start/continue memory operation.
Consider a system:
- 16MB memory, byte addressable.
- 32-bit word size.
- 8KB cache.
- Cache line is 8 words.
- Four lines per set.
Questions:
- How many sets?
- How is a memory address partitioned?
- How many bits per tag?
Needed if an incoming block can replace one of a set of blocks.
How do we pick the victim?
- Ideally: replace block which won't be used for longest time (optimal).
- FIFO: replace oldest block.
- LIFO: replace newest block.
- LRU: replace block not used in longest time.
- Random: Surprisingly effective.
Implementation:
- Feasibility.
- Additional control bits?
Issues:
- Complexity: placement, search, replacement.
- Ability to capture locality. I.e., raise hit rate.
Direct-mapped, fully-associative are the ends of a continuum. The middle?
The effect of block-size on hit rate.
The effect of set-size on hit rate.
On-chip vs. off-chip caches: speed, size. Design constraints.
L3 caches: write buffering (consider write-through, write-back), call-backs.
Lock-up: cache delays further requests while servicing a miss. Effect
upon pipelined processors.
How would one design a cache system for a real system?
Thomas P. Kelliher
Wed Nov 13 10:57:57 EST 1996
Tom Kelliher