CS 3343/3341 Analysis of Algorithms |
Random Number Generators
|
Anyone who uses software to produce random numbers is in a "state of sin". [John von Neumann] |
xnew = (k*xold + a) % m
C program | Run: k = 5 | Run: k = 9 |
---|---|---|
#include <stdio.h> int seed = 11; double ran(); int main() { int i; printf(" i seed ran #\n\n"); for (i = 0; i < 18; i++) printf(" %2i %2i %5.3f\n", i, seed, ran()); return 0; } double ran() { int k = 5; int m = 17; seed = k*seed % m; return (double)seed/m; } | % cc -o ran ran.c % ran i seed ran # 0 4 0.235 1 3 0.176 2 15 0.882 3 7 0.412 4 1 0.059 5 5 0.294 6 8 0.471 7 6 0.353 8 13 0.765 9 14 0.824 10 2 0.118 11 10 0.588 12 16 0.941 13 12 0.706 14 9 0.529 15 11 0.647 16 4 0.235 17 3 0.176 | % cc -o ran ran.c % ran i seed ran # 0 14 0.824 1 7 0.412 2 12 0.706 3 6 0.353 4 3 0.176 5 10 0.588 6 5 0.294 7 11 0.647 8 14 0.824 9 7 0.412 10 12 0.706 11 6 0.353 12 3 0.176 13 10 0.588 14 5 0.294 15 11 0.647 16 14 0.824 17 7 0.412 |
C RNG program | Run with cc, gcc, or g++ | Run with CC (on a Sun) |
---|---|---|
#include <stdio.h> #include <math.h> double seed = 11111.0; double rand(); int main() { int i; printf(" i seed ran#\n\n"); for (i = 0; i < 18; i++) printf(" %2i %11.0f %11.8f\n", i, seed, rand()); return 0; } double rand() { double a = 16807.0, m = 2147483647.0; double q; seed = a*seed; q = floor(seed/m); seed = seed - q*m; return(seed/m); } | % cc -o random random.c -lm % random i seed ran# 0 186742577 0.08695879 1 1108883372 0.51636406 2 1139744538 0.53073491 3 132318926 0.06161580 4 1238614637 0.57677489 5 1837213688 0.85551929 6 1530577650 0.71273076 7 1859439784 0.86586912 8 1422418544 0.66236525 9 800510604 0.37276680 10 196672973 0.09158299 11 505324478 0.23531005 12 1838161508 0.85596065 13 280719214 0.13072007 14 26257239 0.01222698 15 1071268238 0.49884815 16 302379618 0.14080648 17 1147930924 0.53454699 | % CC -o random_CC random.c % random_CC i seed ran# 0 11111 0.08695879 1 186742577 0.51636406 2 1108883372 0.53073491 3 1139744538 0.06161580 4 132318926 0.57677489 5 1238614637 0.85551929 6 1837213688 0.71273076 7 1530577650 0.86586912 8 1859439784 0.66236525 9 1422418544 0.37276680 10 800510604 0.09158299 11 196672973 0.23531005 12 505324478 0.85596065 13 1838161508 0.13072007 14 280719214 0.01222698 15 26257239 0.49884815 16 1071268238 0.14080648 17 302379618 0.53454699 |
3.14159 26535 89793 23846 26433 83279 50288 41972 ... (in decimal) 3.11037 55242 10264 30215 14230 63050 56006 70163 21122 ... (in octal)It has long been conjectured that this is a very good source of pseudo-random numbers, a conjecture that has still not been proved. In 1852 an English mathematician named William Shanks published 527 digits of pi, and then in 1873 another 180 digits for a total of 707. These numbers were studied statistically, and an interesting excess of the number 7 was observed in the last 180 digits. In 1945 von Neumann wanted to study statistical properties of the sequence of digits and used one of the early computers to calculate several thousand. Fortunately for Shanks his triumph was not spoiled during his lifetime, but his last 180 digits were in error and his last 20 years of effort were wasted. Also there was no "excess of 7s". The number pi has now been calculated to many billions of places, although the calculation of its digits or bits is too hard to provide a good source of random numbers. The later digits are harder to calculate than earlier ones, although a recent clever algorithm allows calculation of the nth binary (or hexadecimal) digit without calculating the previous ones. Later work focused on a particularly simple approach using a congruence equation.
xn+1 = (k*xn + a) mod mwhere all terms are integers, k is the multiplier, a (often taken to be 0) is the increment, and m is the modulus. An initial seed is s = x0. Each successive term is transformed into the next. The pseudo-random terms are in the range from 1 to m-1. To get (more-or-less) uniformly distributed floating point numbers between 0 and 1, just do a floating point division by m. Assuming that a = 0, the quality of the numbers produced depends heavily on k and m. This type of generator can produce at most m-1 different numbers before it starts to repeat. To get this behavior, one can start with a prime number for m and use a generator for k so that all m-1 numbers will be produced in a repeating cycle, starting with whatever the seed s is. The old generator provided by the C standard library uses 16-bit integers, and so has a maximum possible cycle length of 216 = 65237 -- a ridiculously small cycle, making the generator useless except for toy applications. By far the most common generator of the past was implemented on hardware with 32-bit integers and used the fact that 231-1 is a prime. The multiplier commonly used (starting in 1969) was 75 = 16807 and the constant a was taken to be zero. This generator is quite efficient, and has a cycle length of 231 - 2. The multiplier was chosen so that various statistical properties of the sequence would be similar to the results for a true random sequence. In the 1970s when I first started using this sequence the cycle length seemed quite long -- now it seems quite short since I frequently run experiments with hundreds of billions of trials. Knuth in his chapter on conventional random number generators approves the values m = 231 - 1 and k = 16807 above as "adequate", but he has suggestions for better values, such as:
m | k |
---|---|
231-249 | 40692 |
231-1 | 48271 |
231-1 | 62089911 |
232 | 69069 |
248 | 31167285 |
264-1 | 6364136223846793005 |
xn+1 = 48271*xn mod (231 - 1), yn+1 = 40692*yn mod (231 - 249), zn = (xn - yn) mod (231 - 1), |
xn = (271828183*xn-1 - 314159269*xn-2) mod (231 - 1),
xn+1 = (16807*xn mod (231 - 1)
// rand2: version using ints. Works on all hardware, by // breaking up numbers to avoid overflow. int seed2 = 11111; double rand2() { int a = 16807, m = 2147483647, q = 127773, r = 2836; int hi, lo, test; hi = seed2/q; lo = seed2 - q*hi; test = a*lo - r*hi; if (test > 0) seed2 = test; else seed2 = test + m; return((double)seed2/(double)m); } |
|