Control Hazards and Exceptions

Tom Kelliher, CS 240

Feb. 9, 2000

Administrivia

Announcements

Written assignment Monday.

Assignment

Read 6.8--6.12.

From Last Time

Handling control hazards.

Outline

  1. Control hazards, optimizations.

  2. Exceptions, interrupt types, imprecise interrupts.

  3. Introduction to superscalar execution.

Coming Up

Superscalar execution and other advanced pipelining techniques.

Control Hazards

Picking up where we left off...

Delayed Branch

Use of this technique is waning due to:

  1. Deeper pipelines.

  2. Superscalar execution.

The compiler can't cope.

Exception Handling

Exception types:

  1. Arithmetic exceptions: overflow, divide by zero, etc.

  2. Hardware exception: parity error.

  3. System call.

  4. Page fault, access to non-existent memory.

Responsibilities:

  1. Hardware: record the PC (actually +4), and the cause.

  2. Operating system: recover. Terminate, kernel mode, bring the page in, etc., as required.

Points:

  1. An interrupt is like a conditional branch.

  2. Branches to a fixed location: 0x40000040.

  3. Can occur anywhere in the pipeline.

  4. Subsequent instructions must be flushed.

  5. Multiple interrupts can occur during a cycle. Prioritizing?

Pipeline with exception handling added:

This won't handle memory exceptions. Why? (Consider a lw of a non-existent address.)

Imprecise interrupts:

  1. What are they? Why do we care?

  2. Example: Consider the following floating point code segment running on a CPU with two FP pipes:
          DIVF   F0, F2, F4
          ADDF   F10, F10, F8
          SUBF   F12, F12, F14
    
    This is an example of out-of-order completion on a superscalar machine.
    1. Are there any dependencies?

    2. What would the completion order be?

    3. Suppose the SUBF faults. What does the exception hardware do?

    4. Suppose the DIVF faults subsequent to the SUBF faulting. Now we're really in a mess. (The ADDF has completed and committed.)

Introduction to Superscalar Pipelining

  1. Historical Progression of IPC: < 1, = 1, > 1. The entire pipeline must be widened.

    Challenges: small register files, multiple-branch predictions, multiple line fetches from caches.

  2. Range of parallelism: coarse- to fine-grained.

  3. Superscalar techniques address ILP. Let's parallelize a sequential binary.

  4. What's the upper bound on IPC? It depends.

    Text processing: low, mostly.

    Image processing, multimedia: high.

    Median operation on an image example:

    medianImage(image dest, image src)
    {
       for each pixel, p, in src
          p in dest = medianPixel(p in src);
    }
    
    medianPixel(pixel p)
    {
       find the <= 8 neighboring pixels of p;
       compute and return the median value;
    }
    
    Challenges: exposing potential ILP to the compiler.

    Example. Parallelize the following:

    sum = 0;
    
    for (i = 0; i < last; ++i)
       sum += array[i];
    

  5. Compiler techniques: loop unrolling, invariant code migration, strength reduction, etc.

Types of Data Dependencies

  1. RAR. Not a problem at all.

  2. RAW. A ``true'' dependency.

  3. WAR. A ``false'' dependency.

  4. WAW. Another ``false'' dependency.

Consider the code segment:
      r1 = r2 + r3
      r4 = r1 + r5
      r1 = r6 + r7
      r8 = r1 + r4
ISA registers vs. physical registers. Register renaming?



Thomas P. Kelliher
Fri Feb 11 10:09:36 EST 2000
Tom Kelliher