Increasing IPC in Multimedia Kernels

Tom Kelliher, CS 220

Oct. 22, 1997

Announcements

  1. Will continue with Monday's discussion Friday.

  2. Homework due Friday.

Assignment

Nothing new.

Definitions

  1. IPC -- Instructions per cycle. Goal: maximize.

  2. Multimedia kernels --- building blocks of multimedia programs: MPEG encoding, decoding (DCT), Fourier analysis, etc. Lots of independent computations and branching.

  3. Static program --- instructions as they are stored in memory.

  4. Dynamic program --- instructions as they are executed.

  5. Traditional instruction cache --- stores groups (line size) of instructions in static order.

  6. Trace cache --- stores groups (trace size) of instructions in dynamic order.

  7. Functional unit (FU) --- adder, subtracter, address generator, floating point multiplier, etc.

  8. Structural hazard --- not enough FUs.

  9. Data hazard --- data dependencies (we've seen these already)

  10. Register renaming --- a technique to eliminate data hazards:
    add r1, r2, r3   # 1
    add r4, r1, r5   # 2
    sub r1, r2, r3   # 3
    add r6, r1, r3   # 4
    
    2 depends upon 1. Does 3 really depend on 1 and 2?

  11. Commit --- permanently modify programmer-visible state (registers, memory).

  12. Speculative execution --- executing (but not committing) instructions after conditional branches before knowing which way the branch went.

  13. Branch prediction --- hardware which does what its name implies. Accuracy of >95%.

  14. Out of order execution --- technique used to keep FUs busy in face of stalls:
    load r1, 5(r2)
    add r3, r4, r5
    
    To maintain program semantics, instructions must commit in order.

  15. Data flow --- an execution model where instructions execute when their operands are ready. Contrast control flow.

Background

  1. Properties of multimedia.

  2. The Intel approach: MMX. Limitations: special instruction set, compiler techniques such as loop unrolling.

  3. Our approach is transparent: no modifications at the architectural level, no special compiler optimizations.

  4. The effect of a single bottleneck on a high IPC pipeline.

  5. Traditional pipeline:

  6. What are the bottlenecks?
    1. Effects of branches on fetch rate from cache, pipeline stalls.

    2. Single ALU.

    3. ``Long'' instructions.

High IPC Architectural Features

  1. High IPC pipeline:

  2. Fixes:
    1. Trace caches.

    2. Branch prediction, speculative execution.

    3. Multiple specialized FUs.

    4. Dataflow.

    5. Out of order execution.

    6. Pools of ready instructions (reservation station, reservation station matrix).

    7. Register renaming (using reservation stations as ``virtual'' registers).



Thomas P. Kelliher
Tue Oct 21 15:39:14 EDT 1997
Tom Kelliher