Caches

Tom Kelliher, CS26

Nov. 14, 1996

From last time:

What is important:

  1. SRAM is faster than DRAM, but more expensive.

  2. DRAM is faster than disk, but more expensive.

How much faster? How much more expensive?

What do we really want?

  1. A main memory that runs at CPU speed.

  2. A main memory with capacity of a disk.

How do we achieve these goals?

Cache Concepts

Placed between memory and requestor:

Requestor: CPU or memory.

Characteristics:

  1. Memory: ``slow''; large.

  2. Cache: ``fast;'' small, yet representative.

Hit rate.

How can hit rates be so high? --- Locality of reference.

Terminology

  1. Miss --- Requested memory word is not in any cache line. (Hit.)

  2. Miss penalty --- the extra time required to access a block when it's not in cache.

  3. Block, cache line --- a contiguous group of memory words, aligned on a power-of-2 boundary. Unit of data transfer between cache and memory (interleaving).

  4. Mapping function --- Function used to determine which cache line(s) in which a block can be placed.

  5. Replacement algorithm --- A rule for selecting a ``victim'' block for removal if all possible mapped cache lines are in use.

  6. Load through --- on a read-miss, desired word is sent to CPU simultaneously with block

  7. Average access time:

  8. Speed-up (5.6.2):

Update protocols:

  1. Write through --- on writes, in-cache and in-memory blocks are updated simultaneously. On write-miss the block is not cached.

  2. Write back, copy back --- on writes, only in-cache memory block updated. On write-miss the block is cached.

Update protocol issues:

  1. Cache/memory consistency/coherence.

  2. Snooping caches in multiprocessor or bus-mastering systems.

  3. Reduction in memory writes, fewer redundant writes.

  4. Complexity.

  5. Actions on replacement.

Mapping Functions

Direct, fully-associative, set-associative.

Direct-Mapped Caches

A memory block is mapped to a single cache line:

How does the cache system know what block is in a cache line? (How do we check for a hit?)

Each cache entry contains:

  1. A tag.

  2. A memory block.

  3. Status bits: valid, dirty, etc.

Consider a system:

  1. 16MB memory, byte addressable.

  2. 32-bit word size.

  3. 8KB cache.

  4. Cache line is 8 words.

Questions:

  1. How many bits in an address?

  2. How many cache lines?

  3. How is a memory address partitioned?

  4. How many bits per tag?

Cache look-up on memory access:

  1. Use block field to locate cache line.

  2. Compare tags.

  3. If hit: access word/byte from cache line.

    If miss: start/continue memory operation.

Fully-Associative Caches

A memory block can be placed in any cache line.

Cache look-up on memory access:

  1. Compare tag of memory address with all valid cache tags (in parallel)

  2. If hit: access word/byte from appropriate cache line.

    If miss: start/continue memory operation.

Associative, content-addressable memories.

Consider a system:

  1. 16MB memory, byte addressable.

  2. 32-bit word size.

  3. 8KB cache.

  4. Cache line is 8 words.

Questions:

  1. How is a memory address partitioned?

  2. How many bits per tag?

Set-Associative Caches

A memory block can be placed into one of a small set of cache lines:

Cache look-up on memory access:

  1. Locate set and compare tag of memory address with valid cache tags within set (in parallel)

  2. If hit: access word/byte from appropriate cache line.

    If miss: start/continue memory operation.

Consider a system:

  1. 16MB memory, byte addressable.

  2. 32-bit word size.

  3. 8KB cache.

  4. Cache line is 8 words.

  5. Four lines per set.

Questions:

  1. How many sets?

  2. How is a memory address partitioned?

  3. How many bits per tag?

Replacement

Needed if an incoming block can replace one of a set of blocks.

How do we pick the victim?

  1. Ideally: replace block which won't be used for longest time (optimal).

  2. FIFO: replace oldest block.

  3. LIFO: replace newest block.

  4. LRU: replace block not used in longest time.

  5. Random: Surprisingly effective.

Implementation:

  1. Feasibility.

  2. Additional control bits?

Comparison

Issues:

  1. Complexity: placement, search, replacement.

  2. Ability to capture locality. I.e., raise hit rate.

Direct-mapped, fully-associative are the ends of a continuum. The middle?

The effect of block-size on hit rate.

The effect of set-size on hit rate.

On-chip vs. off-chip caches: speed, size. Design constraints.

L3 caches: write buffering (consider write-through, write-back), call-backs.

Lock-up: cache delays further requests while servicing a miss. Effect upon pipelined processors.

How would one design a cache system for a real system?



Thomas P. Kelliher
Wed Nov 13 10:57:57 EST 1996
Tom Kelliher