IA-64/Merced/EPIC/Itanium

Tom Kelliher, CS 240

Mar. 6, 2000

Administrivia

Announcements

Midterm on Friday. ``Design a Pentium IV.''

Assignment

From Last Time

PSU TEE.

Outline

  1. Driving forces.

  2. The demise of CISC and RISC.

  3. IA-64 architectural features.

  4. Comments.

Coming Up

Student-driven review, return homework, Transmeta?

IA-64

Driving Forces

Technology:

  1. 0.18 micron with 0.09 micron around the corner.

  2. Copper processes.

Today, 10's of millions of transistors; tomorrow 100's.

Architecturally:

  1. More transistors, more and more FUs.

  2. How do we control those FUs?

The Demise of CISC and RISC

  1. CISC is already ``dead.'' Why?

  2. RISC is gasping in a quicksand of control complexity. Why? (Parallel renaming and retiring.)

IA-64 Architectural Features

  1. Instruction format:
    1. 128 GP and 128 FP ISA registers. 64 bits.

    2. Three instructions per 128 bit bundle.

    3. 40 bit instruction:
      1. Six bit predicate bit ID.

      2. Three seven bit register IDs.

      3. So, 13 bits for opcode. Why so many?

    4. Remaining bits are template bits, explicitly stating the parallelism within this and succeeding bundles.

  2. EPIC: Explicitly Parallel Instruction Computing.

    Why not LIW or VLIW?

  3. Predication.
    1. Used to ``eliminate'' branches.

      Not all can be eliminated, so still need branch prediction.

      How does branch elimination work and how does it promote EPIC? Consider:

      if (x < 0)   /* MPEG MAD */
         diff -= x;
      else
         diff += x;
      
      if (y > z)   /* Compute max */
         max = y;
      else
         max = z;
      
      Compiler must analyze tradeoff of eliminating vs. predicting.

    2. Tag each instruction in a basic block with a predicate register. Each predicate register has three values: not evaluated, true, false. Instructions whose predicate is false are killed. Predicates set by condition evaluation instructions.

  4. Speculative loads.
    1. Hoist the load above where it's needed, possibly above branches. Exceptions are deferred.

    2. Original load replaced with a check instruction to handle the exceptions.

Comments

  1. Once again, technology forces us to re-think.

  2. Problems addressed: increasing FUs and keeping them busy in an efficient manner. The memory/CPU latency problem.

  3. Work pushed back to the compiler.

  4. Code will now be optimized for an implementation. Multiple binaries may be distributed. Is this so different? (P Pro on 16-bit codes.)

    Maybe binaries can rearrange themselves based upon the underlying implementation?

  5. How much work do the predication paths waste. How important is this, if you have so much horsepower you can search for multiple solutions simultaneously, throwing out all but the real one?

  6. Now two markets: IA-32 and IA-64. What are the characteristics of these markets? (Consumer multimedia (CPU bound) and scientific/server (memory bound???).)

  7. Backward compatibility with IA-32 and PA RISC? How? (Consider how Alpha did it.)



Thomas P. Kelliher
Mon Mar 6 09:30:10 EST 2000
Tom Kelliher