Tom Kelliher, CS 240
Apr. 24, 2002
No class Friday --- GIG.
Read 7.1--2.
Problems due 5/3, beginning of class: 7.7, 7.15--17, 7.32.
Data hazards in pipelines.
Caches.
Challenges: small register files, multiple-branch predictions, multiple line fetches from caches.
Text processing: low, mostly.
Image processing, multimedia: high.
Median operation on an image example:
medianImage(image dest, image src)
{
for each pixel, p, in src
p in dest = medianPixel(p in src);
}
medianPixel(pixel p)
{
find the <= 8 neighboring pixels of p;
compute and return the median value;
}
Challenges: exposing potential ILP to the compiler.
Example. Parallelize the following:
sum = 0; for (i = 0; i < last; ++i) sum += array[i];
r1 = r2 + r3
r4 = r1 + r5
r1 = r6 + r7
r8 = r1 + r4
ISA registers vs. physical registers. Register renaming?
Rename the previous example where the Register Alias Table (RAT) is initially:
r1 -> p12 r2 -> p6 r3 -> p9 r4 -> p15 r5 -> p1 r6 -> p10 r7 -> p8 r8 -> p14 Free List: p5, p11, p13, p4.Which dependencies were removed? Which remain?
Structural hazard stalls.
Only stall if no free list entries.
Consider two-instruction issue.
Must consider:
The pipeline:

What are the inter- and intra-instruction pair dependencies?
What are our options in increasing the functionality of that second ALU? (reg = reg op immed instrs, reg = reg op reg instrs) Additional dependencies?
Consider the code segment:
sum = 0; for (i = 0; i < last; ++i) sum += array[i];Which might compile to:
top: lw $t0, 0($s1)
addu $s2, $s2, $t0
addi $s1, $s1, -4
bne $s1, $0, top
How will the code be scheduled?

The addi could be raised, but what's it gain?
Suppose we unroll once:
top: lw $t0, 0($s1)
addu $s2, $s2, $t0
lw $t0, -4($s1)
addu $s2, $s2, $t0
addi $s1, $s1, -8
bne $s1, $0, top
Where are the stalls? How can we introduce temp variables to eliminate
some stalls?
How will the improved code schedule?

Is this an improvement?
By the way, what happens with the lw offsets?
Unroll twice more:

Is this an improvement?
When do you stop?