Martinus H. V. Werts
Université d'Angers, CNRS
Numerical simulations for scientific and technological applications regularly require the generation of sequences of random numbers.[1] This is the case, for instance, for Monte Carlo methods.[2][3][4] Another example is the simulation of the motion of Brownian particles in fluids, where the sequence of numbers, in addition to being random, should follow a Gaussian distribution.[5]
This small C library provides all the basic functionality for such scientific random number generation. It is an integrated and curated collection of tried & tested code described in the literature. More background is provided at the end of this README document. It is monolithic: only randommw.c
and randommw.h
need to be included in the project, and it does not need any other non-standard library. The generators are amazingly fast, enabling, in our case, simulation of large numbers of Brownian particles with long trajectories.
The library includes five random number generators (RNGs): MWC8222,[6][7][8] Lehmer64,[9][10] PCG64DXSM,[11][12] Xoshiro256+,[13][14] and MELG19937-64.[15][16] These generate sequences of uniformly distributed integer random numbers and have been reported to pass the relevant statistical tests (see the cited references). There is a ziggurat algorithm, ZIGNOR, coded by J. A. Doornik,[6] for obtaining random floating-point numbers with a Gaussian distribution using these RNGs. The quality of the generated Gaussian distributions has been checked via their raw moments, following McFarland.[17]
Figure. Histogram (log scale) of 10 billion samples generated by randommw (ZIGNOR with MELG19937-64).
Here is a minimal example, which only uses two functions: RanInit()
for initialization and DRanNormalZig()
for normally-distributed random numbers ('full double precision' floating point values). By default, this uses the MWC8222 uniform RNG.
#include <stdio.h>
#include "randommw.h"
int main(void) {
unsigned int i;
uint64_t zigseed = 10;
double rval;
RanInit("", zigseed, 0);
for(i = 0; i < 20; i++) {
rval = DRanNormalZig();
printf("%10.6f\n", rval);
}
return 0;
}
Choose, initialize and use only a single RNG from randommw.c
in each C program. The library was designed to provide a single random number stream from a single RNG in a single process. Simple (but quite effective) parallelization of simulations is possible by running several instances of the same program in parallel, using the same RNG with the same seed uSeed
, but a different uJumpsize
(see RanInit()
) for each processes. For comparison purposes, it is possible to switch to a different RNG in the same program, but each switch completely re-initializes and re-starts the RNG.
Initialize the ziggurat algorithm, set the RNG and its random seed, and optionally "fast-forward" the generator. The random seed should always be supplied by the user, in order to have reproducible random number streams. If a different stream is needed, provide a different seed.
If sRan
is an empty string, the default generator will be used: MWC8222. At present, the possible choices for sRan
are "MWC8222"
, "Lehmer64"
, "PCG64DXSM"
, "Xoshiro256+"
and "MELG19937"
. The string is case-sensitive, and should correspond exactly to one of these options; else, your program will crash.
The random seed uSeed
is always an unsigned 64-bit integer, independently of the specific random number generator. A RNG-specific routine uses this seed to fully initialize the RNG.
For uJumpsize > 0
, the initialization routine will "fast-forward" the generator, starting from the initially seeded state. This mechanism, often called "(block) splitting", is of importance for reliable parallelization of computer simulations.[2] In the case of "PCG64DXSM"
, "Xoshiro256+"
and"MELG19937"
, long "jumps" of the generator are performed algorithmically. Each of the uJumpsize
jumps fast-forwards the RNG, e.g. by 2^192 (Xoshiro256+) or 2^256 (MELG19937) steps, giving access to a stream of random numbers that is guaranteed to be independent of the other streams from the same seed. "MWC8222"
and "Lehmer64"
do not provide such algorithmic jumps to independent sequences from the same seed. For these RNGs, we have resorted to using uJumpsize
to initialize these generators differently from the same uSeed
by forwarding the internal Splitmix64 generator used for initialization of these RNGs. This very probably leads to independent sequences, although there is no formal guarantee in this case. It is still much better than simply using different seeds.
Calculate and return the next random number in the normally distributed sequence using the ziggurat algorithm.
Obtain a double-precision floating point random number from a uniform distribution (0, 1) using the active RNG. Full 52-bit mantissa randomness.
Obtain an unsigned 32-bit integer random number from the active uniform RNG. In case of a 64-bit RNG, the 32-bit number is typically obtained from the most significant bits (the other half is discarded). Only 32-bit unsigned values are supplied for now: the ZIGNOR algorithm relies in part on 32-bit unsigned integers. Interfaces supplying other types of random numbers may be developed.
The randommw.c
library module and associated programs are developed exclusively using the gcc
C compiler, on 64-bit x86-64 systems, both on Windows via mingw-w64/w64devkit and on standard Linux. The code relies on standard C (C99). Certain RNGs require __uint128_t
arithmetic.
There is a Makefile in the root directory, and a separate Makefile for the test programs in ./tests
. With a good gcc
environment, it is sufficient to simply run make
from the respective directories.
The test programs in ./tests
, together with their makefile, provide clear examples how to integrate and use randommw.c
in your own programs.
We have validated the RNGs and are using them for normally distributed random numbers in numerical simulations of colloidal systems, The code is functional and is now contained in a monolithic module (randommw.c
) that can be easily included in a scientific computing project in C. The random numbers have a good Gaussian distribution (tested up to 8 raw moments, see tests/test_moments.c
). They are generated with high throughput, using MWC8222, Lehmer64, PCG64DXSM, Xoshiro256+ or MELG19937-64 as underlying uniform RNG.
Generated normally distributed random numbers can be written to a binary file using genzignor.c
. These numbers have been used successfully for Brownian simulations in DDM Toolkit ,[18] giving consistent results between the simulation and subsequent DDM analysis of the simulated image stack.
- The presently used raw moments test (
tests/test_moments.c
) by McFarland [17] is already quite robust for testing the quality of the Gaussian distribution, but inspiration for additional tests may be found here and here. - Explicitly test final quality of randomness of generated normally distributed random numbers, complementing the testing already done by Doornik.[6] See, e.g., [19] for feeding output to standard random test suites. Also simply (re-)test the underlying uniform RNG, using, e.g., Lemire's testingRNG,[9] or other test environments,[20][21][22] independently reproducing the already available test results for all included RNGs.
The numerical simulation of the Brownian motion of colloidal nanoparticles needs a source of random numbers. Moreover, these random numbers should have a Gaussian distribution. Computers generate random numbers by specific algorithms. As a result, the numbers generated are not really random (they are generated by a deterministic algorithm), but the sequences are (for all practical purposes) indistinguishable from really random numbers, in terms of statistical properties, and can be used in many numerical applications requiring random numbers. Computer-generated random numbers are sometimes referred to as 'pseudo-random numbers'.
The choice of the algorithm for the generation of random numbers is essential to the success of the numerical simulation.[1][2][3][4] There are many random number generators (RNGs) to choose from and new ones are still appearing.[1][2][7][8][10][11][12][13][14][15][23][25] However, it is not always very clear to the unsuspecting consumer of RNGs how to choose. Some developers are going so far as to give the impression that their RNG algorithm is "the best" and all other algorithms are "not good". There appears to be quite some sociology involved in as to how certain algorithms become the standard RNG in scientific environments such as Python/numpy, Fortran and Julia. The details about RNGs are somewhat hidden to the casual user of Python and other high-level languages, yet it is important to be aware and explicit about choosing RNGs, in particular when performing scientific simulations. Luckily, there are scientists specialized in testing RNGs, and specific test programs for RNGs have been developed.[9][20][21][22]
The diversity of RNGs in modern scientific computing is a strength, especially with open source code, and any serious flaw in RNGs for scientific computing will very likely be detected.[2][3][4] Also, it is elusive to look for "the best" RNG, but one should at least find one that is "good enough" for the particular purpose. A RNG can be good enough for Brownian simulation, while still failing certain very stringent statistical tests. A look in the literature and discussions with colleagues doing molecular dynamic simulations shows that currently the Mersenne Twister (MT19337) is often used. This does not mean that other generators such as PCG64 and Xoshiro256++ will not work. For instance, we have interchangeably used MT19937 and PCG64 in basic Brownian simulations,[18] with equal success.
Standard RNGs generally generate uniformly distributed random numbers. Since we need a normal distribution for Brownian simulation, the initial uniform should be converted into a random with a Gaussian distribution. There are several ways to do this,[19] with the Box-Muller transform and the ziggurat algorithm [26] being amongst the most well-known methods. Several modifications of the ziggurat algorithm now exist, improving statistical properties and/or numerical performance.[6][17]
A particularly portable, structured and relatively well-documented C code for the generation of normally distributed random numbers is provided by J. A. Doornik.[6] It has been tried and tested independently in the literature.[17] The Doornik code was found to compile and run correctly on both Linux and Windows gcc implementation, and will very likely work with other systems and C compilers. By default, it uses Marsaglias's MWC8222 (originally named MWC256 by Marsaglia,[8] and others[7]) RNG as the source of a uniform random variable, which is then converted to a random variable with a Gaussian distribution using a ziggurat algorithm. We used the Doornik ziggurat code as the starting point for our development.
In the folder original_ziggurat_code_doornik
the source code of the original ZIP archive by Doornik is conserved. The compiled executables from the ZIP file have been removed for security reasons, and the "makefile" folders for gcc have been renamed to emphasize the 32-bit vs 64-bit nature of the targeted executables. The file contents have been left intact.
The files necessary for development (zignor.c
, zignor.h
, zigrandom.c
and zigrandom.h
) have been copied from original_ziggurat_code_doornik
to the root folder. zignor
and zigrandom
were merged into randommw
and have undergone further changes. The modifications only concern the structure of the code, not the fundamental algorithms. The functionDRan_MWC8222()
generates random doubles with full 52-bit mantissa resolution (instead of 32-bit in Doornik's original) via two iterations of the MWC8222 generator (see [24]).
The MWC8222 RNG is a "lag-256 multiply-with-carry" 32-bit random number generator[25] with a period of about 2^8222. Not to be confused with other multiply-with-carry generators called MWC256, that do not have such a big state and period.
Several implementations of ziggurat algorithms for generation of normally distributed numbers from a uniform RNG can be found on the Internets. Here are two well-documented and reliable examples in the C programming language.
-
Kschischang has made a very nicely documented and well-structured (yet platform-dependent) C implementation of McFarland's 2016 [17] ziggurat algorithm, called ZMG. In initial (unoptimized?) testing, ZMG seems to be about 2 times slower than
randommw
for generation of sequences of random numbers with a Gaussian distribution. -
Voss provides a concise and well-structured ziggurat code that is part of the GNU Scientific Library (function
gsl_ran_gaussian_ziggurat()
). It is based on the original algorithm by Marsaglia & Tsang,[25] with some simplifications, and may suffer from the same randomness correlation problem found by Doornik.[6]
The random number generators included in randommw.c
were chosen on basis of published reports of statistical quality and speed, support in the scientific literature, and availability of clear and working C source code, aiming for diversity of the underlying algorithms. It is not our aim to have an exhaustive collection of RNGs, but a well-chosen set of modern 'scientific grade' RNGs that allows for running simulation code using different RNGs to check consistency.
RNG | native output | state memory | arithmetic ops. |
---|---|---|---|
(bit) | (bit) | (bit) | |
MWC8222 | 32 | 8224 | 64 |
Lehmer64 | 64 | 128 | 128 |
PCG64DXSM | 64 | 128 + 128 | 128 |
Xoshiro256+ | 64 (a) | 256 | 64 |
MELG19937 | 64 | 19968 | 64 |
- MWC8222 is a "lag-256 multiply-with-carry" generator.[25] It was the original RNG used by Doornik for ZIGNOR,[6] and was found there to have good statistical properties.
- Lehmer64 is a Lehmer generator proposed by Lemire as a very simple (and supposedly fast) algorithm generating numbers of sufficient statistic quality.[9][10] The implementation requires 128-bit
__uint128_t
integer arithmetic. - PCG64DXSM[11] is the standard RNG of Numpy,[12] and is thus extensively being used. It has good statistical quality.[9] The implementation requires 128-bit
__uint128_t
integer arithmetic. - Xoshiro256+ is a fast and efficient RNG algorithm that can be used to generate doubles (52-bit mantissa) and 32-bit integers, as needed by ZIGNOR, with good statistical quality. (a) For 64-bit number generation, the slightly more elaborate Xoshiro256++ is recommended.[13]
- MELG19937 is a modern 64-bit variant of the well-known Mersenne Twister RNG.[15][16] It has good statistical behaviour.
[1] D. Jones, "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf
[2] H. Bauke, S. Mertens, "Random Numbers for Large-Scale Distributed Monte Carlo Simulations", Phys. Rev. E 2007, 75, 066701. doi:10.1103/PhysRevE.75.066701.
[3] A.M. Ferrenberg, D. P. Landau, D. P., Y. J. Wong, "Monte Carlo Simulations: Hidden Errors from ‘“Good”’ Random Number Generators", Phys. Rev. Lett. 1992, 69, 3382–3384. doi:10.1103/PhysRevLett.69.3382.
[4] Click, T. H., Liu, A., & Kaminski, G. A., "Quality of random number generators significantly affects results of Monte Carlo simulations for organic and biological systems", Journal of computational chemistry 2011, 32, 513. https://doi.org/10.1002/jcc.21638
[5] G. Volpe, G. Volpe, "Simulation of a Brownian Particle in an Optical Trap", American Journal of Physics 2013, 81, 224–230. doi:10.1119/1.4772632.
[6] J. A. Doornik, "An Improved Ziggurat Method to Generate Normal Random Samples" (2005), https://www.doornik.com/research/ziggurat.pdf
[7] Collins, J. C. "Testing, Selection, and Implementation of Random Number Generators", Defense Technical Information Center: Fort Belvoir, VA, 2008. doi:10.21236/ADA486379.
[8] (a) G. Marsaglia, post to sci.math Usenet group, 25 Feb 2003. (b) G. Marsaglia, post to comp.lang.c Usenet group, 13 May 2003
[9] https://github.com/lemire/testingRNG
[10] P. L'Ecuyer, "Tables of linear congruential generators of different sizes and good lattice structure", Mathematics of Computation of the American Mathematical Society 1999, 68.225, 249-260.
[11] https://www.pcg-random.org/
[12] https://numpy.org/doc/stable/reference/random/bit_generators/pcg64dxsm.html
[13] (a) D. Blackman and S. Vigna. "Scrambled Linear Pseudorandom Number Generators", ACM Trans. Math. Softw. 2021, 47, 1–32. https://doi.org/10.1145/3460772. (b) https://prng.di.unimi.it/
[14] H. Bauke, "Tina’s Random Number Generator Library Version 4.24". Documentation. https://www.numbercrunch.de/trng/trng.pdf
[15] S. Harase, T. Kimoto, "Implementing 64-Bit Maximally Equidistributed F 2 -Linear Generators with Mersenne Prime Period", ACM Trans. Math. Softw. 2018, 44, 1–11. doi:10.1145/3159444.
[16] F. Le Floc’h, "Entropy of Mersenne-Twisters", arXiv doi:10.48550/arXiv.2101.11350
[17] C. D. McFarland, "A modified ziggurat algorithm for generating exponentially and normally distributed pseudorandom numbers", Journal of Statistical Computation and Simulation 2016, 7, 1281. https://dx.doi.org/10.1080/00949655.2015.1060234
[18] ddm-toolkit @ github
[19] D. B. Thomas, P. H. W. Leong, W. Luk, J. D. Villasenor, "Gaussian Random Number Generators", ACM Computing Surveys 2007, 39, 11. doi:10.1145/1287620.1287622. https://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf
[20] (a) P. L'Ecuyer and R. Simard, "TestU01: A C Library for Empirical Testing of Random Number Generators", ACM Transactions on Mathematical Software 2007, 33, 22. (b) http://simul.iro.umontreal.ca/testu01/tu01.html
[21] https://www.phy.duke.edu/~rgb/General/dieharder.php
[22] https://github.com/MartyMacGyver/PractRand
[23] F. Panneton, P. L'Ecuyer, M. Matsumoto, "Improved long-period generators based on linear recurrences modulo 2", ACM Transactions on Mathematical Software 2006, 32, 1. https://doi.org/10.1145/1132973.1132974
[24] J.A. Doornik, "Conversion of High-Period Random Numbers to Floating Point", ACM Trans. Model. Comput. Simul. 2007, 17, 3. https://dx.doi.org/10.1145/1189756.1189759
[25] G. Marsaglia, "Random Number Generators", J. Mod. App. Stat. Meth. 2003, 2, 2–13. doi:10.22237/jmasm/1051747320.
[26] G. Marsaglia and W. W. Tsang, "The Ziggurat Method for Generating Random Variables", Journal of Statistical Software 2000, 5, 1-7. https://doi.org/10.18637/jss.v005.i08