Cache Performance and Increasing It
Tom Kelliher, CS 240
Apr. 7, 2000
Study multilevel caches section, pg. 576, on your own.
Read 7.4.
Exploiting spatial locality.
Memory system design.
- Measuring cache performance.
- Using associativity to increase hit rate.
Virtual memory.
Two Equations:

Consider a gcc run:
- I-cache miss rate is 2%. D-cache miss rate is 4%.
- CPI is 2, assuming no memory stalls.
- Miss penalty is 40 cycles.
- 36% of instructions are load or store.
What would be the speed-up of a machine with a perfect cache? (1.68.)
Suppose we improve the pipeline so that the CPI is now 1. Repeat. (2.36.)
Compare the speed-up of a real machine with a doubled clock rate against
the original real machine. Assume the memory system has the same cycle
time. (1.41.)
See example on pg. 571 for performance speedup.
- Traditional memories are addressed by an index.
Sequential search.
- Suppose you could address by content --- parallel search?
(You really address by an index field.)
Example: associative arrays in Perl.
- How would a CAM be organized? Components in addition to data
storage:
- Storage for tag bits.
- Tag comparators for each CAM location.
- Data selection mux.
- One set; all blocks in this set.
- The entire cache is organized as a CAM.
- Format of address: tag, word offset, byte offset.
- Benefit: Maximum flexibility because an incoming block can be placed
anywhere, allowing us to optimize the replacement policy.
(Predicting the future.)
- Cost: One tag comparator per block, using up silicon resources and
increasing access time.
A compromise.
- n blocks per set:
.
Blocks evenly distributed among sets.
- Number of sets a power of two, number of blocks/set a power of two.
Why?
- Associative search within a set. Number of comparators is n.
- Format of address: tag, set index, word offset, byte offset.
- Some flexibility in block placement. Less cost than fully
associative because of reduced number of comparators.
- Organization:

Starting with a data size of 64KB, assuming 4 32-bit words per block, show
the address organization and compute the total number of bits for these
organizations:
- Direct-mapped.
- Two-way set-associative.
- Four-way set-associative.
- Eight-way set-associative.
- Fully associative.
Choices:
- Optimal: Replace the block which will be accessed last (furthest into
the future).
Problem: can't predict future.
- A reasonable compromise: least recently used (LRU).
Assume future performance is determined by past performance.
Implementation: Shift register access counters, linked lists.
- Other policies: stack, FIFO, MRU, LFU, MFU.
Thomas P. Kelliher
Fri Apr 7 09:11:22 EDT 2000
Tom Kelliher