CS 2731 Recitation 3

(Textbook. Click for information.)

CS 2731, Spring 2004
Computer Organization II

Recitation 3: January 26, 28
Floating Point Number Formats
(`double` in the IEEE standard)
MW 02:00-02:50 pm, SB 3.01.04
Due: 2004-02-04 23:59:59

Recitation 3 must be submitted following directions at: submissions on or before

2004-02-04 23:59:59 (that's Wednesday, 4 February 2004, 11:59:59 pm) for full credit.
2004-02-08 23:59:59 (that's Sunday, 8 February 2004, 11:59:59 pm) for 75% credit.

Outline: For this recitation, you are to experiment with the bit pattern used to represent a double by computers that adhere to the IEEE standard for doubles. (See text, Section 4.8, pages 275-280.) Most hardware now follows the standard, although some floating point units implement a permitted extension.

Given either an "ordinary" floating point number or a bit pattern, it is easy to get the other representation, say, with programs like the following in C and in Java. The Java program uses another class to read a double: GetData.java. Both the C and the Java use the Unix command line below, rather than a development environment, but of course you may use JBuilder or some other development tool if you wish. The runs shown below are on Sun hardware. Beware: On certain other hardware (e.g., Intel), the two halves that make up the bits of a double come out in reverse order.

C Program	Java Program
/* double_rep.c: hex for doubles */ #include <stdio.h> #include <ctype.h> void print_bits(unsigned int r) { int i; for (i = 0; i < 32; i++) { printf("%1i", ( (r >> 31) ? 1 : 0)); r = r << 1; } } int main() { union { double d; struct { unsigned int p; unsigned int q; } b; } r; char ch; for( ; ; ) { while (isspace(ch = getchar())) ; if (ch == 'x') scanf("%x %x", &r.b.p, &r.b.q); else if (ch == 'f') scanf("%lf", &r.d); else break; printf(" Dec: %20.16e\n", r.d); printf(" Hex: %08x %08x\n", r.b.p, r.b.q); printf(" Bin: "); print_bits(r.b.p); printf("\n "); print_bits(r.b.q); printf("\n"); } }	// DoubleRep.java: bin and hex public class DoubleRep { // main function to try out Base class public static void main (String[] args) { GetData getData = new GetData(); double d; long drep; while (true) { d = getData.getNextDouble(); if (d == 0.0) break; drep = Double.doubleToLongBits(d); String s = Long.toBinaryString(drep); while (s.length() < 64) s = "0" + s; String t = Long.toHexString(drep); while (t.length() < 16) t = "0" + t; System.out.print(" Bin: "); for (int i = 0; i < 64; i++) { if (i == 31) System.out.print("\n "); System.out.print(s.charAt(i)); } System.out.println(); System.out.println(" Hex: " + t); } } // end of main }
C Run and Output	Java Run and Output
% cc -o double_rep double_rep.c % double_rep f -0.75 Dec: -7.5000000000000000e-01 Hex: bfe80000 00000000 Bin: 10111111111010000000000000000000 00000000000000000000000000000000 f 16.0 Dec: 1.6000000000000000e+01 Hex: 40300000 00000000 Bin: 01000000001100000000000000000000 00000000000000000000000000000000 f 1.5 Dec: 1.5000000000000000e+00 Hex: 3ff80000 00000000 Bin: 00111111111110000000000000000000 00000000000000000000000000000000 x 3fe80000 00000000 Dec: 7.5000000000000000e-01 Hex: 3fe80000 00000000 Bin: 00111111111010000000000000000000 00000000000000000000000000000000 q %	% javac GetData.java % javac DoubleRep.java % java DoubleRep -.75 Bin: 10111111111010000000000000000000 00000000000000000000000000000000 Hex: bfe8000000000000 16.0 Bin: 01000000001100000000000000000000 00000000000000000000000000000000 Hex: 4030000000000000 .75 Bin: 00111111111010000000000000000000 00000000000000000000000000000000 Hex: 3fe8000000000000 1.5 Bin: 00111111111110000000000000000000 00000000000000000000000000000000 Hex: 3ff8000000000000 15.0 Bin: 01000000001011100000000000000000 00000000000000000000000000000000 Hex: 402e000000000000 0 %

C Program

Java Program


/* double_rep.c: hex for doubles */
#include <stdio.h>
#include <ctype.h>
void print_bits(unsigned int r) {
   int i;
   for (i = 0; i < 32; i++) {
      printf("%1i", ( (r >> 31) ? 1 : 0));
      r = r << 1;
   }
}

int main() {
   union {
      double d;
      struct {
         unsigned int p;
         unsigned int q;
      } b;
   } r;
   char ch;
   for( ; ; ) {
      while (isspace(ch = getchar()))
         ;
      if (ch == 'x')
         scanf("%x %x", &r.b.p, &r.b.q);
      else if (ch == 'f')
         scanf("%lf", &r.d);
      else break;

      printf(" Dec: %20.16e\n", r.d);
      printf(" Hex: %08x %08x\n",
            r.b.p, r.b.q);
      printf(" Bin: ");
      print_bits(r.b.p); printf("\n      ");
      print_bits(r.b.q); printf("\n");
   }
}


// DoubleRep.java: bin and hex
public class DoubleRep {

   // main function to try out Base class
   public static void main (String[] args) {
      GetData  getData = new GetData();
      double d;
      long drep;
      while (true) {
         d = getData.getNextDouble();
         if (d == 0.0) break;
         drep = Double.doubleToLongBits(d);
         String s = Long.toBinaryString(drep);
         while (s.length() < 64) s = "0" + s;
         String t = Long.toHexString(drep);
         while (t.length() < 16) t = "0" + t;
         System.out.print(" Bin: ");
         for (int i = 0; i < 64; i++) {
            if (i == 31)
               System.out.print("\n      ");
            System.out.print(s.charAt(i));
         }
         System.out.println();
         System.out.println(" Hex: " + t);
      }
   } // end of main
}

C Run and Output

Java Run and Output

% cc -o double_rep double_rep.c
% double_rep
f -0.75
 Dec: -7.5000000000000000e-01
 Hex: bfe80000 00000000
 Bin: 10111111111010000000000000000000
      00000000000000000000000000000000
f 16.0
 Dec: 1.6000000000000000e+01
 Hex: 40300000 00000000
 Bin: 01000000001100000000000000000000
      00000000000000000000000000000000
f 1.5
 Dec: 1.5000000000000000e+00
 Hex: 3ff80000 00000000
 Bin: 00111111111110000000000000000000
      00000000000000000000000000000000
x 3fe80000 00000000
 Dec: 7.5000000000000000e-01
 Hex: 3fe80000 00000000
 Bin: 00111111111010000000000000000000
      00000000000000000000000000000000
q
%

% javac GetData.java
% javac DoubleRep.java
% java DoubleRep
-.75
 Bin: 10111111111010000000000000000000 
      00000000000000000000000000000000
 Hex: bfe8000000000000
16.0
 Bin: 01000000001100000000000000000000
      00000000000000000000000000000000
 Hex: 4030000000000000
.75
 Bin: 00111111111010000000000000000000
      00000000000000000000000000000000
 Hex: 3fe8000000000000
1.5
 Bin: 00111111111110000000000000000000
      00000000000000000000000000000000
 Hex: 3ff8000000000000
15.0
 Bin: 01000000001011100000000000000000
      00000000000000000000000000000000
 Hex: 402e000000000000
0
%

A More Elaborate C Program: Here is a more interesting program that tears an input double apart and assembles the individual pieces as in the book:

C Program
four06% cat double_rep2.c /* double_rep2.c: input a double, tear it into its component, reassemble. * See Patterson & Hennessy, Comp. Org. & Design, pages 278-279 * for notation. * Here Result = (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) / #include <stdio.h> #include <ctype.h> / this union allows one to extract bits from a double / union double_tag { double d; struct { unsigned int p; unsigned int q; } b; }; int main() { union double_tag r; / holds the input / unsigned int Sign_mask = 0xA0000000; / to extract Sign / unsigned int Exponent_mask = 0x7FF00000; / to extract Exponent / unsigned int Signific_mask = 0x000FFFFF; / to extract Signficand / unsigned int Sign; / Sign / unsigned int Exponent; / Exponent / unsigned int Signific; / Significand, an int / double Significand; / Significand / unsigned int Bias = 1023; / Bias / double Result; / result of reassembling / int BiasedExponent; / Biased Exponent = Exponent - bias / int i; for( ; ; ) { scanf("%lf", &r.d); Sign = (r.b.p & Sign_mask) >> 31; Exponent = (r.b.p & Exponent_mask) >> 20; Signific = r.b.p & Signific_mask; Significand = ( (double) Signific + r.b.q / 4294967295.0) / 1048576.0; / Note: the extra r.b.q / 4294967295.0 adds in the lower 32 bits. / printf("(-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result\n"); Result = (Sign ? -1.0 : 1.0) (1 + Significand); BiasedExponent = Exponent - Bias; if (BiasedExponent >= 0) { for (i = 0; i < BiasedExponent; i++) Result = Result * 2.0; } else { for (i = 0; i < -BiasedExponent; i++) Result = Result / 2.0; } printf("(-1)^%1i x (1 + %11.9f) x 2^(%4i - %4i) = %20.16f\n", Sign, Significand, Exponent, Bias, Result); } }
C Run and Output
% cc -o double_rep2 double_rep2.c % double_rep2 -.75 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result (-1)^1 x (1 + 0.500000000) x 2^(1022 - 1023) = -0.7500000000000000 16.0 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result (-1)^0 x (1 + 0.000000000) x 2^(1027 - 1023) = 16.0000000000000000 1.5 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result (-1)^0 x (1 + 0.500000000) x 2^(1023 - 1023) = 1.5000000000000000 15.0 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result (-1)^0 x (1 + 0.875000000) x 2^(1026 - 1023) = 15.0000000000000000 ^C (ctrl-C) %

C Program


four06% cat double_rep2.c
/* double_rep2.c: input a double, tear it into its component, reassemble.
 *   See Patterson & Hennessy, Comp. Org. & Design, pages 278-279
 *   for notation.
 *   Here Result = (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias)
 */
#include <stdio.h>
#include <ctype.h>
/* this union allows one to extract bits from a double */
union double_tag {
   double d;
   struct {
      unsigned int p;
      unsigned int q;
   } b;
};
int main() {
   union double_tag r;                      /* holds the input */
   unsigned int Sign_mask     = 0xA0000000; /* to extract Sign */
   unsigned int Exponent_mask = 0x7FF00000; /* to extract Exponent */
   unsigned int Signific_mask = 0x000FFFFF; /* to extract Signficand */
   unsigned int Sign;                       /* Sign */
   unsigned int Exponent;                   /* Exponent */
   unsigned int Signific;                   /* Significand, an int */
   double Significand;                      /* Significand */
   unsigned int Bias = 1023;                /* Bias */
   double Result;                           /* result of reassembling */
   int BiasedExponent;                      /* Biased Exponent = Exponent - bias */
   int i;
   for( ; ; ) {
      scanf("%lf", &r.d);
      Sign = (r.b.p & Sign_mask) >> 31;
      Exponent = (r.b.p & Exponent_mask) >> 20;
      Signific  = r.b.p & Signific_mask;
      Significand = ( (double) Signific + r.b.q / 4294967295.0) / 1048576.0;
      /* Note: the extra r.b.q / 4294967295.0 adds in the lower 32 bits. */
      printf("(-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result\n");
      Result = (Sign ? -1.0 : 1.0) * (1 + Significand);
      BiasedExponent = Exponent - Bias;
      if (BiasedExponent >= 0) {
          for (i = 0; i < BiasedExponent; i++)
             Result = Result * 2.0;
      }
      else {
          for (i = 0; i < -BiasedExponent; i++)
             Result = Result / 2.0;
      }
      printf("(-1)^%1i    x (1 + %11.9f) x 2^(%4i     - %4i) = %20.16f\n",
          Sign, Significand, Exponent, Bias, Result);
   }
}

C Run and Output

% cc -o double_rep2 double_rep2.c
% double_rep2
-.75
 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result
 (-1)^1    x (1 + 0.500000000) x 2^(1022     - 1023) = -0.7500000000000000
16.0
 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result
 (-1)^0    x (1 + 0.000000000) x 2^(1027     - 1023) = 16.0000000000000000
1.5
 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result
 (-1)^0    x (1 + 0.500000000) x 2^(1023     - 1023) =  1.5000000000000000
15.0
 (-1)^Sign x (1 + Significand) x 2^(Exponent - Bias) = Result
 (-1)^0    x (1 + 0.875000000) x 2^(1026     - 1023) = 15.0000000000000000
^C  (ctrl-C)
%

What you should submit: Refer to the submissions directions and to deadlines at the top of this page. The text file that you submit should first have Your Name, the Course Number, and the Recitation Number. The rest of the file should have the following in it, in the order below, and clearly labeled, including at the beginning the appropriate number 1-6.

  Contents of submission for Recitation 3:
Last Name, First Name; Course Number; Recitation Number (3).

Download and run one of the first two programs above, either of which shows the bits that represent a double. Try out the inputs shown above.

Try the following inputs, then print the resulting bits in each case and observe the pattern:

   0.125
   0.25
   0.5
   1.0
   2.0
   4.0
   8.0
16.0
32.0

Similarly, try:

   0.1875
   0.375
   0.75
   1.5
   3.0
   6.0
12.0
24.0

Next, try the following inputs:

   1.0              = 1
   1.5              = 1 - 1/2
   1.75             = 1 - 1/4
   1.875            = 1 - 1/8
   1.9375           = 1 - 1/16
   1.96875          = 1 - 1/32
   1.984375         = 1 - 1/64
   1.99999999999999     (14 9's)
   1.999999999999999    (15 9's)
   1.9999999999999999 (16 9's)

Try the input number:1.3333333333333333 (16 3's).
Using decimal numbers, you are used to the fact that

3.3333... = 3 + 3/10 + 3/100 + 3/1000 + 3/10000 + ... = 3*(1 + 1/10 + 1/100 + 1/1000 + 1/10000 + ...) = 3*(1/(1 - 1/10)) = 3*(10/9) = 10/3 = 3 1/3
which is true because of the formula for an infinite geometric series (see your algebra book). Use the same formula to show mathematically (using equations like those above) that the bit representation for 4/3 above really would converge to that number (if the expansion went on indefinitely).

Do the same thing as 5 above with the number 0.1 (decimal).
(This example has been confusing for some students. You are to work with the number 1/10 (decimal) -- if you add this number to itself 10 (decimal) times, you get 1. The number 0.1 has a simple representation in decimal, but this same number in binary requires an infinite repeating sequence of bits. You're supposed to find out the sequence, and analyse it as in the previous example.)

Revision date: 2003-12-24. (Please use ISO 8601, the International Standard.)

CS 2731, Spring 2004 Computer Organization II

Recitation 3: January 26, 28 Floating Point Number Formats (double in the IEEE standard) MW 02:00-02:50 pm, SB 3.01.04 Due: 2004-02-04 23:59:59

CS 2731, Spring 2004
Computer Organization II

Recitation 3: January 26, 28
Floating Point Number Formats
(`double` in the IEEE standard)
MW 02:00-02:50 pm, SB 3.01.04
Due: 2004-02-04 23:59:59