NHSE Review: Random Number Generators for Parallel Computers -- Lagged Fibonnaci Generators

NHSE Review^TM 1996 Volume Second Issue

Random Number Generators for Parallel Computers

| <- HREF="node10.html" Prev | Index | Next -> |
NHSE Review^TM: Comments · Archive · Search

Lagged Fibonnaci Generators

Lagged Fibonacci generators (LFGs) [1, 5, 2] are becoming increasingly popular, since they offer a simple method for obtaining very long periods, and they can be very fast. The standard C and Unix generator RANDOM is of this type. The sequence is defined by

which we will denote by F( ), where p and q are the lags, p > q, and is any binary arithmetic operation, such as addition or subtraction modulo M, multiplication modulo M, or the bitwise exclusive OR function (XOR). The arithmetic operations are done modulo any large integer value M, or modulo 1 if the X's are represented as floating point numbers in the interval [0,1), as can be done if the operation is addition or subtraction. Multiplication must be done on the set of odd integers. The modulus function is not required when using XOR. This method requires storing the p previous values in the sequence in an array called a lag table.

As with the LCGs, it is important that the parameters (p,q,M) be carefully chosen in order to provide good randomness properties and the largest possible period. If M is taken to be (i.e. the X's have b-bit precision), the maximal period is obtained if the lags are the exponents of a primitive polynomial [1, 28]. In that case the period is for XOR, for addition and subtraction, and for multiplication [28, 7, 1, 2, 31]. Tables of suitable lags are available in the literature [1, 28, 30, 29, 31]. An advantage of this generator is that the period can be made arbitrarily large by just increasing the lag p. This also improves the randomness properties [2, 15], since smaller lags mean higher correlations between the numbers in the sequence. The low-order bits can have particularly poor randomness properties if small lags are used.

Empirical tests have shown that the randomness properties of LFGs are best when multiplication is used, with addition (or subtraction) being next best, and XOR being by far the worst [2, 15, 16, 17]. This is intuitively reasonable in that multiplication mixes the bits in two numbers much more than addition, which is in turn much better than XOR.

LFGs using addition (or subtraction) are the most popular because they are very simple and very fast. All the computation can be done in floating point, which avoids a costly integer to floating point conversion, and large periods can be obtained on 32-bit processors without having to use slow multi-precision arithmetic. Each pseudo-random number can be generated with just a single floating point addition and a modulus operation.

Great care needs be taken in choosing the lags for this type of generator. Many implementations use (or recommend in the documentation) lags that are much too small to give adequate randomness properties, even though it has been known for many years that additive LFGs fail some standard statistical tests for very small lags (such as p = 17) [2], and that increasing the lag improves the randomness properties of the generator. More recently, it was shown that even lags on the order of hundreds can give incorrect results in a number of tests based on common applications such as Monte Carlo simulation, percolation and random walks [14, 22, 15, 16, 17]. Unfortunately this information seems not to have percolated through the mathematical and computational science community, and it is still extremely rare to see a lag of greater than 1000 recommended for an additive LFG. We recommend using (p,q) of at least (1279,1063), and preferably much larger values. The standard Unix generator RANDOM is an additive LFG with a default lag of 31, which is much too small, however it is possible to initialize it with a larger lag.

Setting a minimum acceptable lag is obviously a moving target, since computers are continually becoming faster, allowing for more stringent randomness tests that use more random numbers. The recommendations given here are based on the results of the current set of application-specific tests mentioned above, using currently available computers. These values may not be adequate for future applications that use Teraflop computers. In any simulation, the largest feasible lag should be used, and the results should always be checked using a different generator.

Small lags were necessary in the past because of limited computer memory, however on current high-performance computers (and even personal computers), the additional memory requirement of a lag table with a few thousand entries can easily be handled. However we will see later that memory constraints may be a problem in implementing this algorithm on parallel computers for some applications. The choice of the lag may affect the speed of the generator, depending on the type of computer used. For example, if vector processors are used, a larger lag may improve performance, since the vector lengths are longer. If a scalar processor with limited cache memory is used, having a very large lag may cause cache misses and reduce the performance.

Multiplicative LFGs have seen little use, which is somewhat surprising considering their excellent randomness properties and extremely long period. Although slower than additive LFGs, they are just as fast as 32-bit LCGs, and much faster than LCGs that require multiple-precision arithmetic. Multiplicative LFGs can also be used with much smaller lags than for additive LFGs. Many different tests are failed by additive LCGs with small lags (less than 100) [2, 13, 14, 15, 16, 17], however no currently published results in any test show failure of a multiplicative LFG for a lag as small as 17. However we recommend using a lag of at least 100, to ensure a large period and better randomness properties.

One of the obstacles in implementing multiplicative LFGs is handling the possible overflow of the multiplication. In most languages a portable implementation would require multiple precision arithmetic, which would be quite slow. However if the C programming language is used, and the X's are specified to be of type unsigned int, then the language specification for the multiplication of two unsigned int values in C guarantees that the result will be correct modulo if b is the word size (the number of bits in X), without having to worry about overflow or the use of multiple precision arithmetic. Many Fortran compilers allow the calling of subroutines written in C.

The randomness properties of LFGs can be improved (without sacrificing too much in speed) by using multiple lags (or "taps") [39, 28, 22, 14, 15, 17], i.e. by combining three or more previous elements of the sequence, rather than two. This type of generator has not yet been widely used or studied, however it seems likely that a 3- or 4-lag additive LFG would be a very fast and effective random number generator.

| <- HREF="node10.html" Prev | Index | Next -> |
NHSE Review^TM: Comments · Archive · Search

Paul Coddington, paulc@npac.syr.edu

NHSE ReviewTM 1996 Volume Second Issue

Random Number Generators for Parallel Computers

Lagged Fibonnaci Generators

NHSE Review^TM 1996 Volume Second Issue