Floating Point Representation

Tom Kelliher, CS 220

Nov. 4, 2011

Administrivia

Milestone due.

Read 4.1-4.3 for Wednesday.

Binary addition, subtraction, multiplication.

Project day, then introduction to building a simple MIPS datapath.

Do we really need a way to represent floating point numbers?
Recall:

$\begin{displaymath}-1^{\rm sign} \times {\rm mantissa} \times 2^{\rm exponent}, \end{displaymath}$

where mantissa is normalized: $1 \leq {\rm mantissa} < 2$ .
Note: mantissa is sign-magnitude notation, while exponent is 2's complement!
This has to fit into one word. What are the tradeoffs?
Is there a way to increase the range, while keeping the same precision? (Hint: consider changing the base of the exponent.)
Finally, we would like to compare floating point numbers using integer compare hardware. How should we order the fields of the floating point representation in order to achieve this?
What problem will we run into, and how do we solve it?
Underflow and overflow.
Formats:
1. Single precision: 23 bit mantissa, eight bit exponent.
2. Double precision: 52 bit mantissa, 11 bit exponent.
Some real problems:
1. Accumulated errors in extended computations.
  Rounding techniques, in conjunction with guard, round, and sticky bits help with this.
2. Hardware that doesn't do floating point divide.
3. Getting floating point right is hard. (Ask Intel.)
  Doing it right and fast is even harder.

Fields of a single precision floating point number:

Mantissa sign.
Exponent: Bias 127, 8 bits.
How to use the bias:
1. ${\rm Value} \rightarrow {\rm Representation}$ : Add the bias.
2. ${\rm Representation} \rightarrow {\rm Value}$ : Subtract the bias.
Mantissa magnitude, 23 bits. Normalized.

Additional Feature: The hidden bit.

Values ( is the fractional part of the mantissa):

Examples:

Represent the value 3.14159E8 in IEEE FP. (In integer hex, it's 0x12B9AF98.)

FYI, here's a C program to confirm the result:

#include <stdio.h>

int main()
{
   float x = 3.14159E8;
   int *ptr = (int *) &x;

   printf("%E %X\n", x, *ptr);

   return 0;
}

How do we do this with pencil and paper?
Add decimal 9.567E3 and 5.678E2.
What is the algorithm?
FP hardware:

$\begin{figure}\centering\includegraphics[width=5in]{Figures/fltPtAdd.eps}\end{figure}$

Explain all the blocks, particularly the muxes!

Thomas P. Kelliher 2011-11-03