Unix basics

Random Number
Generators

Anyone who uses software to produce random numbers is in a "state of sin". [John von Neumann]

Note: You only need to know this first introductory section for exams.

Introduction: This section focuses on random numbers -- producing them and using them in C programs.

Random numbers are very widely used in simulations, in statistical experiments, in the Monte Carlo methods of numerical analysis, in other randomized algorithms, and in cryptography.

As later sections will explain more completely, the most common way to use software to produce random numbers is to use the following interative equation to generate a sequence of "random-looking" integers: x₀, x₁, x₂, ..., with the initialization of a "seed" x₀ = s.


x_new = (k*x_old + a) % m

Here k is the multiplier and m is the modulus.

Consider an artificially simple case using k = 5, m = 17, and s = 11, and in the second run changing to k = 9.

C program	Run: k = 5	Run: k = 9
#include <stdio.h> int seed = 11; double ran(); int main() { int i; printf(" i seed ran #\n\n"); for (i = 0; i < 18; i++) printf(" %2i %2i %5.3f\n", i, seed, ran()); return 0; } double ran() { int k = 5; int m = 17; seed = k*seed % m; return (double)seed/m; }	% cc -o ran ran.c % ran i seed ran # 0 4 0.235 1 3 0.176 2 15 0.882 3 7 0.412 4 1 0.059 5 5 0.294 6 8 0.471 7 6 0.353 8 13 0.765 9 14 0.824 10 2 0.118 11 10 0.588 12 16 0.941 13 12 0.706 14 9 0.529 15 11 0.647 16 4 0.235 17 3 0.176	% cc -o ran ran.c % ran i seed ran # 0 14 0.824 1 7 0.412 2 12 0.706 3 6 0.353 4 3 0.176 5 10 0.588 6 5 0.294 7 11 0.647 8 14 0.824 9 7 0.412 10 12 0.706 11 6 0.353 12 3 0.176 13 10 0.588 14 5 0.294 15 11 0.647 16 14 0.824 17 7 0.412

C program

Run: k = 5

Run: k = 9


#include <stdio.h>

int seed = 11;
double ran();

int main() {
   int i;
   printf("  i  seed   ran #\n\n");
   for (i = 0; i < 18; i++)
      printf(" %2i    %2i   %5.3f\n",
         i, seed, ran());
   return 0;
}

double ran() {
   int k = 5;
   int m = 17;
   seed = k*seed % m;
   return (double)seed/m;
}

% cc -o ran ran.c
% ran
  i  seed   ran #

  0     4   0.235
  1     3   0.176
  2    15   0.882
  3     7   0.412
  4     1   0.059
  5     5   0.294
  6     8   0.471
  7     6   0.353
  8    13   0.765
  9    14   0.824
 10     2   0.118
 11    10   0.588
 12    16   0.941
 13    12   0.706
 14     9   0.529
 15    11   0.647
 16     4   0.235
 17     3   0.176

% cc -o ran ran.c
% ran
  i  seed   ran #

  0    14   0.824
  1     7   0.412
  2    12   0.706
  3     6   0.353
  4     3   0.176
  5    10   0.588
  6     5   0.294
  7    11   0.647
  8    14   0.824
  9     7   0.412
 10    12   0.706
 11     6   0.353
 12     3   0.176
 13    10   0.588
 14     5   0.294
 15    11   0.647
 16    14   0.824
 17     7   0.412

In the simple example above, the values in the "seed" variable s cycle through all the integers from 1 to 16, but in a "random-looking" order, starting with the initial "seed" value. When these integers are divided by the multiplier m = 17, the result in this program is the sequence of doubles: 1.0/17.0, 2.0/17.0, 3.0/17.0, 4.0/17.0, ..., 16.0/17.0, but again in a random-looking order, to give a sequence of real numbers that is uniformly distributed over the interval of numbers from 0.0 to 1.0. The choice of the numbers m and k is critical for the performance of this generator. So, in the simple example above, k = 5 seems OK (has cycle length of 16), but k = 9 produces only a cycle length of 8 -- not as desirable. (The theory behind these cycle lengths is beyond the scope of this discussion. In the above case, values of k equal to 3, 5, 6, 7, 10, 11, 12, and 14 each produce the maximum cycle length of 16.)

Historically the most common RNG used a scaled-up version of the example above, with m = 2³¹ - 1 = 2147483647 and k = 7⁵ = 16807. The seed s could be anything in the range from 1 to m - 1. This generator has maximum cycle length, that is, equal to m - 1 = 2³¹ - 2 = 2147483646. There is a problem writing the RNG in C: the value of a signed int in C can be no bigger than m = 2³¹ - 1 = 2147483647, so the multiplication in the equation will give an overflow. Instead, one can use a double in C, which will give exact integer arithmetic up to 53-bit integers, which is enough for this equation. Here is an implementation that works in C, C++, or Java:

C RNG program Run with cc, gcc, or g++ Run with CC (on a Sun)

C RNG program	Run with `cc`, `gcc`, or `g++`	Run with `CC` (on a Sun)
#include <stdio.h> #include <math.h> double seed = 11111.0; double rand(); int main() { int i; printf(" i seed ran#\n\n"); for (i = 0; i < 18; i++) printf(" %2i %11.0f %11.8f\n", i, seed, rand()); return 0; } double rand() { double a = 16807.0, m = 2147483647.0; double q; seed = aseed; q = floor(seed/m); seed = seed - qm; return(seed/m); }	% cc -o random random.c -lm % random i seed ran# 0 186742577 0.08695879 1 1108883372 0.51636406 2 1139744538 0.53073491 3 132318926 0.06161580 4 1238614637 0.57677489 5 1837213688 0.85551929 6 1530577650 0.71273076 7 1859439784 0.86586912 8 1422418544 0.66236525 9 800510604 0.37276680 10 196672973 0.09158299 11 505324478 0.23531005 12 1838161508 0.85596065 13 280719214 0.13072007 14 26257239 0.01222698 15 1071268238 0.49884815 16 302379618 0.14080648 17 1147930924 0.53454699	% CC -o random_CC random.c % random_CC i seed ran# 0 11111 0.08695879 1 186742577 0.51636406 2 1108883372 0.53073491 3 1139744538 0.06161580 4 132318926 0.57677489 5 1238614637 0.85551929 6 1837213688 0.71273076 7 1530577650 0.86586912 8 1859439784 0.66236525 9 1422418544 0.37276680 10 800510604 0.09158299 11 196672973 0.23531005 12 505324478 0.85596065 13 1838161508 0.13072007 14 280719214 0.01222698 15 26257239 0.49884815 16 1071268238 0.14080648 17 302379618 0.53454699


#include <stdio.h>
#include <math.h>
double seed = 11111.0;
double rand();

int main() {
   int i;
   printf(" i   seed        ran#\n\n");
   for (i = 0; i < 18; i++)
      printf(" %2i %11.0f %11.8f\n",
         i, seed, rand());
   return 0;
}

double rand() {
   double a = 16807.0,
          m = 2147483647.0;
   double q;
   seed = a*seed;
   q = floor(seed/m);
   seed = seed - q*m;
   return(seed/m);
}

% cc -o random random.c -lm
% random
 i   seed        ran#

  0   186742577  0.08695879
  1  1108883372  0.51636406
  2  1139744538  0.53073491
  3   132318926  0.06161580
  4  1238614637  0.57677489
  5  1837213688  0.85551929
  6  1530577650  0.71273076
  7  1859439784  0.86586912
  8  1422418544  0.66236525
  9   800510604  0.37276680
 10   196672973  0.09158299
 11   505324478  0.23531005
 12  1838161508  0.85596065
 13   280719214  0.13072007
 14    26257239  0.01222698
 15  1071268238  0.49884815
 16   302379618  0.14080648
 17  1147930924  0.53454699

% CC -o random_CC random.c
% random_CC
 i   seed        ran#

  0       11111  0.08695879
  1   186742577  0.51636406
  2  1108883372  0.53073491
  3  1139744538  0.06161580
  4   132318926  0.57677489
  5  1238614637  0.85551929
  6  1837213688  0.71273076
  7  1530577650  0.86586912
  8  1859439784  0.66236525
  9  1422418544  0.37276680
 10   800510604  0.09158299
 11   196672973  0.23531005
 12   505324478  0.85596065
 13  1838161508  0.13072007
 14   280719214  0.01222698
 15    26257239  0.49884815
 16  1071268238  0.14080648
 17   302379618  0.53454699

For this to work in C, you may need to include a special library such as <math.h> to get the function floor. To convert this to Java, one just needs to write Math.floor in place of floor. Notice that in C the operator % does not work for doubles (although it does in Java), so one needs two statements to get the effect of %. This is still a fairly reasonable random number generator and is regarded as a "minimum standard".

In the example above, the seed variable s was initialized one time only. Each time s is initialized, it will start over from that value and go on. So if you initialize it to the same value before each use, then you will keep getting the same random number from the RNG: not very random.

A similar function is available in the C standard library: rand() to return a random integer, and srand(int ) to give a new seed value to the RNG. The function rand() returns a random int in the range from 1 to MAX_RAND - 1, which is usually the same as INT_MAX - 1, which in turn on our systems is usually 2147483646. To get a random double between 0 and 1, you can use (rand()/(double)RAND_MAX). The dice example, Dice program shows how to use this library RNG in C. (It is called clear at the end of the code.)

True Random Versus Pseudo-random Numbers: From the beginning (where "beginning" is the 1940s, the start of the computer age) there was interest in so-called "true" random numbers, that is, numbers generated by a random process in the world. Physical events such as the radioactive decay of particles are unpredictable except for their behavior averaged over time, and so could be used as a source of random numbers, but these events have been difficult to utilize and have been disappointing in practice. More promising recently are possibilities from quantum theory, but such matters are outside the scope of this discussion.

By far the most common source of random numbers is some deterministic process, such as a software algorithm. These provide "random-looking" numbers, but the numbers are not really random -- there is always an exact algorithm for specifying them. This is the reason that researchers now describe such numbers using the word "pseudo", which means "false". These are not true random numbers, but for most applications they can be just as useful. Sometimes they can be more useful, as for example when one wants to repeat a simulation with exactly the same random or pseudo-random numbers.

At first one might think that the best way to get random-looking numbers is to use a "random" algorithm -- one that does crazy operations, everything imaginable, in every possible order. Donald Knuth tried out such an algorithm as an example, and showed that its performance was no good at all. In its first run, Knuth's "random" algorithm almost immediately converged to a fixed point. Knuth was arguing that one should use science and great care in generating pseudo-random numbers.

An early idea for a source of random numbers was to use the bits or digits in the exapnsion of a transcendental number such as pi, the ratio of the circumference of a circle to its diameter. See Pi for decimal and other expansions of the number pi.


3.14159 26535 89793 23846 26433 83279 50288 41972 ... (in decimal)
3.11037 55242 10264 30215 14230 63050 56006 70163 21122 ... (in octal)

It has long been conjectured that this is a very good source of pseudo-random numbers, a conjecture that has still not been proved. In 1852 an English mathematician named William Shanks published 527 digits of pi, and then in 1873 another 180 digits for a total of 707. These numbers were studied statistically, and an interesting excess of the number 7 was observed in the last 180 digits. In 1945 von Neumann wanted to study statistical properties of the sequence of digits and used one of the early computers to calculate several thousand. Fortunately for Shanks his triumph was not spoiled during his lifetime, but his last 180 digits were in error and his last 20 years of effort were wasted. Also there was no "excess of 7s". The number pi has now been calculated to many billions of places, although the calculation of its digits or bits is too hard to provide a good source of random numbers. The later digits are harder to calculate than earlier ones, although a recent clever algorithm allows calculation of the nth binary (or hexadecimal) digit without calculating the previous ones.

Later work focused on a particularly simple approach using a congruence equation.

Linear Congruence Generators: This approach uses a linear congruence equation of the form:


x_n+1 = (k*x_n + a) mod m

where all terms are integers, k is the multiplier, a (often taken to be 0) is the increment, and m is the modulus. An initial seed is s = x₀. Each successive term is transformed into the next. The pseudo-random terms are in the range from 1 to m-1. To get (more-or-less) uniformly distributed floating point numbers between 0 and 1, just do a floating point division by m. Assuming that a = 0, the quality of the numbers produced depends heavily on k and m.

This type of generator can produce at most m-1 different numbers before it starts to repeat. To get this behavior, one can start with a prime number for m and use a generator for k so that all m-1 numbers will be produced in a repeating cycle, starting with whatever the seed s is.

The old generator provided by the C standard library uses 16-bit integers, and so has a maximum possible cycle length of 2¹⁶ = 65237 -- a ridiculously small cycle, making the generator useless except for toy applications. By far the most common generator of the past was implemented on hardware with 32-bit integers and used the fact that 2³¹-1 is a prime. The multiplier commonly used (starting in 1969) was 7⁵ = 16807 and the constant a was taken to be zero. This generator is quite efficient, and has a cycle length of 2³¹ - 2. The multiplier was chosen so that various statistical properties of the sequence would be similar to the results for a true random sequence. In the 1970s when I first started using this sequence the cycle length seemed quite long -- now it seems quite short since I frequently run experiments with hundreds of billions of trials.

Knuth in his chapter on conventional random number generators approves the values m = 2³¹ - 1 and k = 16807 above as "adequate", but he has suggestions for better values, such as:

m	k
2³¹-249	40692
2³¹-1	48271
2³¹-1	62089911
2³²	69069
2⁴⁸	31167285
2⁶⁴-1	6364136223846793005

Knuth suggests various generators, including one that combines the first two table entries above:


x_n+1 = 48271*x_n mod (2³¹ - 1),
y_n+1 = 40692*y_n mod (2³¹ - 249),
z_n  = (x_n - y_n) mod (2³¹ - 1),

where independent seeds are needed for x₀ and y₀, and the sequence of the z_n make up the output random numbers. The period is nearly the square of the component generators. Knuth also recommends:


x_n  = (271828183*x_n-1 - 314159269*x_n-2) mod (2³¹ - 1),

which has very good performance and whose period is the square of m. Of course two independent seeds x₀ and x₁ are needed to start the sequence off with x₂.

C/C++/Java Implementations of these Generators: An implementation of several of these generators can use the Java long type to avoid integer overflow. On conventional machines without 64-bit integers (for example, if programming in C or C++), even the implementation of a simple generator such as the very common one (discussed in the first section above)


x_n+1 = (16807*x_n mod (2³¹ - 1)

poses problems because the multiply step overflows a 32-bit integer. This generator was usually coded in assembly languages on IBM 360++ machines, where the ready access to all 64 bits of a product makes implementation easy. On machines with only 32-bit integers, or in a language like C, one can break the integers into pieces during the calculations.


// rand2: version using ints. Works on all hardware, by // breaking up numbers to avoid overflow. int seed2 = 11111; double rand2() { int a = 16807, m = 2147483647, q = 127773, r = 2836; int hi, lo, test; hi = seed2/q; lo = seed2 - qhi; test = alo - r*hi; if (test > 0) seed2 = test; else seed2 = test + m; return((double)seed2/(double)m); }


// rand2: version using ints. Works on all hardware, by
//   breaking up numbers to avoid overflow.
int seed2 = 11111;
double rand2() {
   int a = 16807,
       m = 2147483647,
       q = 127773,
       r = 2836;
   int hi, lo, test;
   hi = seed2/q;
   lo = seed2 - q*hi;
   test = a*lo - r*hi;
   if (test > 0) seed2 = test;
   else seed2 = test + m;
   return((double)seed2/(double)m);
}

This function, in exactly the form given above, works in Java, C, and C++.

In another approach, one can use the double type, which includes exact 52-bit integer arithmetic as a special case. If the multiplier is small enough to not overflow a 52-bit integer, then everything can be done using doubles. The code for this example was given in the first section above.

This particular generator once represented the minimum standard for a random number generator. I suggest that one now ought to use Knuth's double generator as the minimum standard, shown here in C:


// rand: version using doubles. Works on all hardware. // seed1 = 48271seed1 mod 2^31 - 1 // seed2 = 40691seed1 mod 2^31 - 249 // seed = (seed1 - seed2) mod 2^31 -1 double seed1 = 11111.0; double seed2 = 11111.0; double seed; double rand() { double a1 = 48271.0, a2 = 40692.0, m = 2147483647.0, m2 = 2147483399; double q1, q2; double q, diff; seed1 = a1seed1; seed2 = a2seed2; q1 = floor(seed1/m); q2 = floor(seed2/m2); seed1 = seed1 - q1m; seed2 = seed2 - q2m2; // now combine results if ((diff = seed1 - seed2) < 0.0) diff = diff + m; q = floor(diff/m); seed = diff - q*m; return(seed/m); }


// rand: version using doubles.  Works on all hardware.
// seed1 = 48271*seed1 mod 2^31 - 1
// seed2 = 40691*seed1 mod 2^31 - 249
// seed  = (seed1 - seed2) mod 2^31 -1
double seed1 = 11111.0;
double seed2 = 11111.0;
double seed;
double rand() {
   double a1 = 48271.0,      a2 = 40692.0,
          m = 2147483647.0,  m2 = 2147483399;
   double q1,                q2;
   double q, diff;
   seed1 = a1*seed1;         seed2 = a2*seed2;
   q1 = floor(seed1/m);      q2 = floor(seed2/m2);
   seed1 = seed1 - q1*m;     seed2 = seed2 - q2*m2;
   // now combine results
   if ((diff = seed1 - seed2) < 0.0) diff = diff + m;
   q = floor(diff/m);
   seed = diff - q*m;
   return(seed/m);
}

To convert this to Java, one just needs to write Math.floor in place of floor. (In Java, though, one could get a simpler implementation using the Java long type. Of course Java has good library RNGs.) In the past such a generator might be slow because of all the floating point operations, including 4 floating point divides, but now extremely fast floating point hardware is commonplace.

Revision date: 2005-01-16. (Please use ISO 8601, the International Standard.)

Random Number Generators

Random Number
Generators