Cache Performance and Increasing It

Tom Kelliher, CS 240

Apr. 7, 2000

Administrivia

Announcements

Assignment

Study multilevel caches section, pg. 576, on your own.

Read 7.4.

From Last Time

Exploiting spatial locality.

Memory system design.

Outline

  1. Measuring cache performance.

  2. Using associativity to increase hit rate.

Coming Up

Virtual memory.

Measuring Cache Performance

Two Equations:

Example One

Consider a gcc run:

  1. I-cache miss rate is 2%. D-cache miss rate is 4%.

  2. CPI is 2, assuming no memory stalls.

  3. Miss penalty is 40 cycles.

  4. 36% of instructions are load or store.

What would be the speed-up of a machine with a perfect cache? (1.68.)

Example Two

Suppose we improve the pipeline so that the CPI is now 1. Repeat. (2.36.)

Example Three

Compare the speed-up of a real machine with a doubled clock rate against the original real machine. Assume the memory system has the same cycle time. (1.41.)

Increasing Cache Performance: Associativity

See example on pg. 571 for performance speedup.

CAMs: Content Addressable Memories

  1. Traditional memories are addressed by an index.

    Sequential search.

  2. Suppose you could address by content --- parallel search?

    (You really address by an index field.)

    Example: associative arrays in Perl.

  3. How would a CAM be organized? Components in addition to data storage:
    1. Storage for tag bits.

    2. Tag comparators for each CAM location.

    3. Data selection mux.

Fully Associative Caches

  1. One set; all blocks in this set.

  2. The entire cache is organized as a CAM.

  3. Format of address: tag, word offset, byte offset.

  4. Benefit: Maximum flexibility because an incoming block can be placed anywhere, allowing us to optimize the replacement policy.

    (Predicting the future.)

  5. Cost: One tag comparator per block, using up silicon resources and increasing access time.

Set-Associative Caches

A compromise.

  1. n blocks per set: .

    Blocks evenly distributed among sets.

  2. Number of sets a power of two, number of blocks/set a power of two. Why?

  3. Associative search within a set. Number of comparators is n.

  4. Format of address: tag, set index, word offset, byte offset.

  5. Some flexibility in block placement. Less cost than fully associative because of reduced number of comparators.

  6. Organization:

Design Example

Starting with a data size of 64KB, assuming 4 32-bit words per block, show the address organization and compute the total number of bits for these organizations:

Replacement Policies

Choices:

  1. Optimal: Replace the block which will be accessed last (furthest into the future).

    Problem: can't predict future.

  2. A reasonable compromise: least recently used (LRU).

    Assume future performance is determined by past performance.

    Implementation: Shift register access counters, linked lists.

  3. Other policies: stack, FIFO, MRU, LFU, MFU.



Thomas P. Kelliher
Fri Apr 7 09:11:22 EDT 2000
Tom Kelliher