Control Hazards and Exceptions
Tom Kelliher, CS 240
Feb. 9, 2000
Paper handouts.
Read 6.8--6.12.
Data hazards.
- Control hazards, optimizations.
- Exceptions, interrupt types, imprecise interrupts.
Advanced pipelining techniques.
How would you stall the pipeline to wait for a branch to resolve? When
would you begin stalling? For how long?
Consider this code segment executing on an un-optimized pipeline:

We ``predict'' branch-not-taken --- speculative execution. Helps 50% of
the time.
- How does this conflict with the delayed branch behavior we've
discussed?
- Can you explain why it takes so long to load the branch target into
the PC?
- We have to kill three instructions. Will any of them have modified
architectural state? Suppose the instruction at address 44 was a
sw. Would that have modified state?
Can we improve this? Decide earlier, predict (guess), delayed branch.
Make the decision during the ID stage:
- Test register file outputs for equality, using XNORs and an AND gate.
- Complicating factors? (Forwarding.)
- Conditionally flush the instruction in the IF stage.
The modified pipeline:

Branch history table:
- One bit entries: branch taken/not taken.
- Indexed by low-order bits of branch instruction's address.
- Aliasing? Not really a problem. Show for a two entry table.
Consider the following C fragment:
for (i = 0; i < lastI; ++i)
for (j = 0; j < LastJ; ++j)
sum += array[i,j];
It compiles to this structure:

With one-bit prediction, the inner beq is mis-predicted twice each
iteration. Why?
We can improve this with a two-bit saturating counter.
Use of this technique is waning due to:
- Deeper pipelines.
- Superscalar execution.
The compiler can't cope.
Exception types:
- Arithmetic exceptions: overflow, divide by zero, etc.
- Hardware exception: parity error.
- System call.
- Page fault, access to non-existent memory.
Responsibilities:
- Hardware: record the PC (actually +4), and the cause.
- Operating system: recover. Terminate, kernel mode, bring the page
in, etc., as required.
Points:
- An interrupt is like a conditional branch.
- Branches to a fixed location: 0x40000040.
- Can occur anywhere in the pipeline.
- Subsequent instructions must be flushed.
- Multiple interrupts can occur during a cycle. Prioritizing?
Pipeline with exception handling added:

This won't handle memory exceptions. Why? (Consider a lw of a
non-existent address.)
Imprecise interrupts:
- What are they? Why do we care?
- Example: Consider the following floating point code segment running
on a CPU with two FP pipes:
DIVF F0, F2, F4
ADDF F10, F10, F8
SUBF F12, F12, F14
This is an example of out-of-order completion on a
superscalar machine.
- Are there any dependencies?
- What would the completion order be?
- Suppose the SUBF faults. What does the exception hardware do?
- Suppose the DIVF faults subsequent to the SUBF
faulting. Now we're really in a mess. (The ADDF has completed
and committed.)
Thomas P. Kelliher
Tue Feb 8 15:16:47 EST 2000
Tom Kelliher