IA-64/Merced/EPIC/Itanium
Tom Kelliher, CS 240
Mar. 6, 2000
Midterm on Friday. ``Design a Pentium IV.''
PSU TEE.
- Driving forces.
- The demise of CISC and RISC.
- IA-64 architectural features.
- Comments.
Student-driven review, return homework, Transmeta?
Technology:
- 0.18 micron with 0.09 micron around the corner.
- Copper processes.
Today, 10's of millions of transistors; tomorrow 100's.
Architecturally:
- More transistors, more and more FUs.
- How do we control those FUs?
- CISC is already ``dead.'' Why?
- RISC is gasping in a quicksand of control complexity. Why?
(Parallel renaming and retiring.)
- Instruction format:
- 128 GP and 128 FP ISA registers. 64 bits.
- Three instructions per 128 bit bundle.
- 40 bit instruction:
- Six bit predicate bit ID.
- Three seven bit register IDs.
- So, 13 bits for opcode. Why so many?
- Remaining bits are template bits, explicitly stating the
parallelism within this and succeeding bundles.
- EPIC: Explicitly Parallel Instruction Computing.
Why not LIW or VLIW?
- Predication.
- Used to ``eliminate'' branches.
Not all can be eliminated, so still need branch prediction.
How does branch elimination work and how does it promote EPIC?
Consider:
if (x < 0) /* MPEG MAD */
diff -= x;
else
diff += x;
if (y > z) /* Compute max */
max = y;
else
max = z;
Compiler must analyze tradeoff of eliminating vs. predicting.
- Tag each instruction in a basic block with a predicate register.
Each predicate register has three values: not evaluated, true, false.
Instructions whose predicate is false are killed. Predicates set by
condition evaluation instructions.
- Speculative loads.
- Hoist the load above where it's needed, possibly above branches.
Exceptions are deferred.
- Original load replaced with a check instruction to handle the
exceptions.
- Once again, technology forces us to re-think.
- Problems addressed: increasing FUs and keeping them busy in an
efficient manner. The memory/CPU latency problem.
- Work pushed back to the compiler.
- Code will now be optimized for an implementation. Multiple binaries
may be distributed. Is this so different? (P Pro on 16-bit codes.)
Maybe binaries can rearrange themselves based upon the underlying
implementation?
- How much work do the predication paths waste. How important is this,
if you have so much horsepower you can search for multiple solutions
simultaneously, throwing out all but the real one?
- Now two markets: IA-32 and IA-64. What are the characteristics of
these markets? (Consumer multimedia (CPU bound) and scientific/server
(memory bound???).)
- Backward compatibility with IA-32 and PA RISC? How? (Consider how
Alpha did it.)
Thomas P. Kelliher
Mon Mar 6 09:30:10 EST 2000
Tom Kelliher