The Memory Hierarchy

Tom Kelliher, CS 220

Nov. 30, 2007

Administrivia

Announcements

Assignment

From Last Time

Introduction to pipelining.

Outline

Terminology.
Basics of caches.
Direct mapped caches.

Coming Up

Virtual memory.

Introduction

Memory is a hierarchy: registers, cache, main memory, disk, tape.
How does cost, speed, size vary over this hierarchy?
Key idea: Trick the processor into believing it has a large, fast memory. How can we accomplish this?
Principle of locality allows us to keep a subset of the memory space high in the hierarchy, where it fits and where access is fast. Because, a program uses just a small part of its address space at any instant.
(You know you need more RAM when you hear your disks seeking a lot.)
Two types of locality:
1. Temporal locality.
2. Spatial locality.
We concentrate upon two points in the memory hierarchy: cache/main memory and main memory/disk:
1. Cache/main memory: handled by the hardware. Not a part of the system architecture.
2. Main memory/disk: handled by the OS. The virtual memory system.
(Actually, the register file can be thought of as a compiler/programmer-controlled cache.)
Terminology: block, hit, hit rate, miss, miss penalty. (Define.)
Block sizes for caches, disks.

Basics of Caches

Think of memory being partitioned into blocks, called lines when in the cache.
Think of the cache as being partitioned into lines and lines being collected into sets, where the set size can be:
1. One line: direct mapped cache.
2. All lines: fully associative cache.
3. Between the first two, but a power of two: set-associative cache.
A given memory block is always loaded into the same set.
With a direct mapped cache, two blocks in the same set can't be in cache at the same time.
Think of a memory address being partitioned into two pieces: the tag and the set selector (low order bits)
Basic parameters of cache design: size of cache (may include data, tag, valid bit), size of line, set size.

Direct Mapped Caches

Direct mapped cache:

A cache structured such that a memory block is associated with exactly one cache block, based upon the address of the memory block. The usual algorithm is:

$\begin{displaymath} \hbox{\rm cache address} = \hbox{\rm memory address}~modulo~\hbox{\rm number of cache blocks} \end{displaymath}$

By definition, multiple memory blocks map to the same cache block:

$\begin{figure}\centering\includegraphics[]{Figures/f0705.eps}\end{figure}$

First Cache

Let's design our first cache. Parameters:

Direct mapped -- a memory block maps to exactly one cache block.
Block size is one word.
Eight blocks.
Memory size is 64 words, byte addressable (always the case).

Questions:

How many bits for the cache?
Consider the memory trace: 22, 26, 22, 26, 16, 4, 16, 18, run on a cold cache. How many hits, misses? What's the hit rate? What are we missing from the cache design (tag bits).
Why don't we need to store the entire memory address?
What about a valid bit?
Now, how many bits for the cache?

Second Cache

Let's design another cache. Parameters:

Direct mapped.
Block size is four words. Why would we want a larger block size?
4K blocks.
32 bit memory space.

Questions:

How many bits/block? Total size of the cache?
Size of the data portion of the cache, in bytes? (This is the number you see quoted.)

Organization of the cache:

$\begin{figure}\centering\includegraphics[]{Figures/f0710.eps}\end{figure}$

Thomas P. Kelliher 2007-11-28

Tom Kelliher