NHSE ReviewTM 1996 Volume Second Issue

Random Number Generators for Parallel Computers

| <- HREF="node16.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Independent Sequences 

The previous two methods were restricted to generators for which arbitrary elements of the sequence could be quickly and easily computed. This means they are impractical for LFGs using multiplication, which is unfortunate since these are among the best sequential generators available.

There is, however, an even simpler way to parallelize a lagged Fibonacci generator, which is to run the same sequential generator on each processor, but with different initial lag tables (or seed tables) [55, 56]. In fact this technique is no different to what is done on a sequential computer, when a simulation needs to be run many times using different random numbers. In that case, the user just chooses different seeds for each run, in order to get different random number streams.

This method is similar to sequence splitting, in that each processor generates a different, contiguous section of the sequence. However in this case the starting point in the sequence is chosen at random for each processor, rather than computed in advance using a regular increment. This has the advantage of avoiding (or at least reducing) possible long-range correlations, but only if the seed tables on each processor are random and independent.

The initialization of the seed tables on each processor is a critical part of this algorithm. Any correlations within the seed tables or between different seed tables could have dire consequences. This leads to the Catch-22 situation of requiring an excellent parallel random number generator in order to provide a good enough initialization routine to implement an excellent parallel random number generator. However this is not as difficult as it seems - the initialization could be done by a combined LCG, or even by a different LFG (using different lags and perhaps a different operation).

A potential disadvantage of this method is that since the initial seeds are chosen at random, there is no guarantee that the sequences generated on different processors will not overlap. However using a large lag eliminates this problem to all practical purposes, since the period of these generators is so enormous that the probability of overlap will be completely negligible (assuming that the initial lag tables are not correlated). In fact, the likelihood of overlap is even less than might be expected, due to a useful property of lagged Fibonacci generators. Any LCG produces a single periodic sequence of numbers, with the different seeds just providing different starting points in the sequence. However, LFGs have many disjoint full-period cycles [31, 63], so two different seed tables may produce two completely different non-overlapping periodic sequences of numbers. In fact, since the number of such disjoint cycles is 2^{(p-1)(b-1)} for suitably chosen parameters [31, 63], then the probability of different processors producing the same cycle should be completely negligible (for example, values of p = 127 and b = 32 give roughly 10^1000 disjoint cycles!). Of course this assumes that the initial seed tables are really random, and do not have any correlations that might negate this argument.

This type of generator is quite popular, and has been implemented for a number of parallel computers and parallel languages. The Connection Machine Scientific Software Library (CMSSL) routine FAST_RNG [57] uses an additive LFG with the seeds being initialized by a different parallel random number generator (CMF_RANDOM, described in Section 6.4). The interface to this routine allows the user to specify the lag, so in principle the routine can be very good, although the CMSSL documentation suggests using a lag of 17, which is much too small to ensure adequate randomness properties.

The Maspar (or DECmpp) uses p_random, a parallel version of RANDOM, the standard Unix LFG random [60]. The initial implementation of this generator gave extremely poor quality pseudo-random numbers (the lower order bits were not even uniformly distributed), due to a poor initialization of the seed tables on each processor, which left them highly correlated. This was greatly improved in a later release, although the newer version still failed a Monte Carlo Ising model test [23, 24], presumably because the new method for initializing the generator is still introducing some correlations between the seed tables on each processor.

As with sequence splitting, just because the sequences on each processor do not overlap, does not necessarily mean they are uncorrelated. However in this case, if each processor does indeed generate part of a disjoint full-period cycle, there are some theoretical arguments for why any correlations between processors might be expected to be small [63].

Mascagni et al. have proposed a method for initializing the lag tables on each processor which guarantees that each processor will generate a sequence from a different full-period cycle [63]. It has been suggested that this method be established as a standard parallel random number generator [64, 65], and it has been used for the intrinsic random number generator in the Portland Group PGHPF High Performance Fortran [62]. The method used for seeding the lag tables is similar in complexity to jumping ahead in the sequence, so the initialization time increases with the size of the largest lag, and the current implementation is restricted to additive LFGs with lags ranging from 17 to 127 [63]. Improved techniques may be needed to make the initialization fast enough to be practical with the much larger lags required for acceptable randomness properties. Note however that although an initialization time of say, one minute, would be unacceptable for a general-purpose random number generator, for specific applications such as large-scale Monte Carlo simulations, which usually require many hours of running time, it would be perfectly adequate.

A deficiency of the independent sequence method is that, like sequence splitting, it does not produce the same sequence for different numbers of processors. However, as with sequence splitting, this can be achieved by assigning a separate generator (i.e. a different lag table) to every abstract, rather than physical, processor. This method is used in the CMSSL VP_RNG generator [57], and the PGHPF implementation mentioned above. A major problem with this method is that each abstract processor needs to have its own lag table, which becomes an exorbitant memory requirement if the lag is large enough to ensure adequate randomness properties. Both the CMSSL and PGHPF implementations use a lag of 17, which is too small. It might be possible to overcome this problem by using a multiplicative, rather than additive, lagged Fibonacci generator, for which a lag as small as 17 is enough to pass all current empirical tests [2, 15].

Of course, if the user is willing to forgo the luxury of generating the same sequence for any number of processors, then memory is not a problem, and the independent sequence method for large-lag additive or multiplicative LFGs is one of the best methods currently available for generating random numbers on parallel computers, as long as the initialization of the lag tables is done properly.

Copyright © 1996


| <- HREF="node16.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Paul Coddington, paulc@npac.syr.edu