Skip to content

Generators of uniformly and normally distributed random numbers

License

Notifications You must be signed in to change notification settings

mhvwerts/randommw

Repository files navigation

randommw: Generator of random numbers with uniform or Gaussian distribution (in C)

Martinus H. V. Werts

Université d'Angers, CNRS

DOI

Description

Numerical simulations for scientific and technological applications regularly require the generation of sequences of random numbers.[1] This is the case, for instance, for Monte Carlo methods.[2][3][4] Another example is the simulation of the motion of Brownian particles in fluids, where the sequence of numbers, in addition to being random, should follow a Gaussian distribution.[5]

This small C library provides all the basic functionality for such scientific random number generation. It is an integrated and curated collection of tried & tested code described in the literature. More background is provided at the end of this README document. It is monolithic: only randommw.c and randommw.h need to be included in the project, and it does not need any other non-standard library. The generators are amazingly fast, enabling, in our case, simulation of large numbers of Brownian particles with long trajectories.

The library includes five random number generators (RNGs): MWC8222,[6][7][8] Lehmer64,[9][10] PCG64DXSM,[11][12] Xoshiro256+,[13][14] and MELG19937-64.[15][16] These generate sequences of uniformly distributed integer random numbers and have been reported to pass the relevant statistical tests (see the cited references). There is a ziggurat algorithm, ZIGNOR, coded by J. A. Doornik,[6] for obtaining random floating-point numbers with a Gaussian distribution using these RNGs. The quality of the generated Gaussian distributions has been checked via their raw moments, following McFarland.[17]

Figure. Histogram (log scale) of 10 billion samples generated by randommw (ZIGNOR with MELG19937-64).

Usage

Minimal example

Here is a minimal example, which only uses two functions: RanInit() for initialization and DRanNormalZig() for normally-distributed random numbers ('full double precision' floating point values). By default, this uses the MWC8222 uniform RNG.

#include <stdio.h>
#include "randommw.h"

int main(void) {
	unsigned int i;
	uint64_t zigseed = 10;
	double rval;
		
	RanInit("", zigseed, 0);
	
	for(i = 0; i < 20; i++)	{
		rval = DRanNormalZig();
		printf("%10.6f\n", rval);
	}
	
	return 0;
}

Important warning

Choose, initialize and use only a single RNG from randommw.c in each C program. The library was designed to provide a single random number stream from a single RNG in a single process. Simple (but quite effective) parallelization of simulations is possible by running several instances of the same program in parallel, using the same RNG with the same seed uSeed, but a different uJumpsize (see RanInit()) for each processes. For comparison purposes, it is possible to switch to a different RNG in the same program, but each switch completely re-initializes and re-starts the RNG.

void RanInit(const char *sRan, uint64_t uSeed, uint64_t uJumpsize)

Initialize the ziggurat algorithm, set the RNG and its random seed, and optionally "fast-forward" the generator. The random seed should always be supplied by the user, in order to have reproducible random number streams. If a different stream is needed, provide a different seed.

If sRan is an empty string, the default generator will be used: MWC8222. At present, the possible choices for sRan are "MWC8222", "Lehmer64", "PCG64DXSM", "Xoshiro256+" and "MELG19937". The string is case-sensitive, and should correspond exactly to one of these options; else, your program will crash.

The random seed uSeed is always an unsigned 64-bit integer, independently of the specific random number generator. A RNG-specific routine uses this seed to fully initialize the RNG.

For uJumpsize > 0, the initialization routine will "fast-forward" the generator, starting from the initially seeded state. This mechanism, often called "(block) splitting", is of importance for reliable parallelization of computer simulations.[2] In the case of "PCG64DXSM", "Xoshiro256+" and"MELG19937", long "jumps" of the generator are performed algorithmically. Each of the uJumpsize jumps fast-forwards the RNG, e.g. by 2^192 (Xoshiro256+) or 2^256 (MELG19937) steps, giving access to a stream of random numbers that is guaranteed to be independent of the other streams from the same seed. "MWC8222" and "Lehmer64" do not provide such algorithmic jumps to independent sequences from the same seed. For these RNGs, we have resorted to using uJumpsize to initialize these generators differently from the same uSeed by forwarding the internal Splitmix64 generator used for initialization of these RNGs. This very probably leads to independent sequences, although there is no formal guarantee in this case. It is still much better than simply using different seeds.

double DRanNormalZig(void)

Calculate and return the next random number in the normally distributed sequence using the ziggurat algorithm.

double DRanU(void)

Obtain a double-precision floating point random number from a uniform distribution (0, 1) using the active RNG. Full 52-bit mantissa randomness.

uint32_t U32RanU(void)

Obtain an unsigned 32-bit integer random number from the active uniform RNG. In case of a 64-bit RNG, the 32-bit number is typically obtained from the most significant bits (the other half is discarded). Only 32-bit unsigned values are supplied for now: the ZIGNOR algorithm relies in part on 32-bit unsigned integers. Interfaces supplying other types of random numbers may be developed.

Compilation, development and testing

The randommw.c library module and associated programs are developed exclusively using the gcc C compiler, on 64-bit x86-64 systems, both on Windows via mingw-w64/w64devkit and on standard Linux. The code relies on standard C (C99). Certain RNGs require __uint128_t arithmetic.

There is a Makefile in the root directory, and a separate Makefile for the test programs in ./tests. With a good gcc environment, it is sufficient to simply run make from the respective directories.

The test programs in ./tests, together with their makefile, provide clear examples how to integrate and use randommw.c in your own programs.

Status

We have validated the RNGs and are using them for normally distributed random numbers in numerical simulations of colloidal systems, The code is functional and is now contained in a monolithic module (randommw.c) that can be easily included in a scientific computing project in C. The random numbers have a good Gaussian distribution (tested up to 8 raw moments, see tests/test_moments.c). They are generated with high throughput, using MWC8222, Lehmer64, PCG64DXSM, Xoshiro256+ or MELG19937-64 as underlying uniform RNG.

Generated normally distributed random numbers can be written to a binary file using genzignor.c. These numbers have been used successfully for Brownian simulations in DDM Toolkit ,[18] giving consistent results between the simulation and subsequent DDM analysis of the simulated image stack.

Suggestions for future work

  • The presently used raw moments test (tests/test_moments.c) by McFarland [17] is already quite robust for testing the quality of the Gaussian distribution, but inspiration for additional tests may be found here and here.
  • Explicitly test final quality of randomness of generated normally distributed random numbers, complementing the testing already done by Doornik.[6] See, e.g., [19] for feeding output to standard random test suites. Also simply (re-)test the underlying uniform RNG, using, e.g., Lemire's testingRNG,[9] or other test environments,[20][21][22] independently reproducing the already available test results for all included RNGs.

Background

The numerical simulation of the Brownian motion of colloidal nanoparticles needs a source of random numbers. Moreover, these random numbers should have a Gaussian distribution. Computers generate random numbers by specific algorithms. As a result, the numbers generated are not really random (they are generated by a deterministic algorithm), but the sequences are (for all practical purposes) indistinguishable from really random numbers, in terms of statistical properties, and can be used in many numerical applications requiring random numbers. Computer-generated random numbers are sometimes referred to as 'pseudo-random numbers'.

The choice of the algorithm for the generation of random numbers is essential to the success of the numerical simulation.[1][2][3][4] There are many random number generators (RNGs) to choose from and new ones are still appearing.[1][2][7][8][10][11][12][13][14][15][23][25] However, it is not always very clear to the unsuspecting consumer of RNGs how to choose. Some developers are going so far as to give the impression that their RNG algorithm is "the best" and all other algorithms are "not good". There appears to be quite some sociology involved in as to how certain algorithms become the standard RNG in scientific environments such as Python/numpy, Fortran and Julia. The details about RNGs are somewhat hidden to the casual user of Python and other high-level languages, yet it is important to be aware and explicit about choosing RNGs, in particular when performing scientific simulations. Luckily, there are scientists specialized in testing RNGs, and specific test programs for RNGs have been developed.[9][20][21][22]

The diversity of RNGs in modern scientific computing is a strength, especially with open source code, and any serious flaw in RNGs for scientific computing will very likely be detected.[2][3][4] Also, it is elusive to look for "the best" RNG, but one should at least find one that is "good enough" for the particular purpose. A RNG can be good enough for Brownian simulation, while still failing certain very stringent statistical tests. A look in the literature and discussions with colleagues doing molecular dynamic simulations shows that currently the Mersenne Twister (MT19337) is often used. This does not mean that other generators such as PCG64 and Xoshiro256++ will not work. For instance, we have interchangeably used MT19937 and PCG64 in basic Brownian simulations,[18] with equal success.

Standard RNGs generally generate uniformly distributed random numbers. Since we need a normal distribution for Brownian simulation, the initial uniform should be converted into a random with a Gaussian distribution. There are several ways to do this,[19] with the Box-Muller transform and the ziggurat algorithm [26] being amongst the most well-known methods. Several modifications of the ziggurat algorithm now exist, improving statistical properties and/or numerical performance.[6][17]

Doornik's ziggurat code

A particularly portable, structured and relatively well-documented C code for the generation of normally distributed random numbers is provided by J. A. Doornik.[6] It has been tried and tested independently in the literature.[17] The Doornik code was found to compile and run correctly on both Linux and Windows gcc implementation, and will very likely work with other systems and C compilers. By default, it uses Marsaglias's MWC8222 (originally named MWC256 by Marsaglia,[8] and others[7]) RNG as the source of a uniform random variable, which is then converted to a random variable with a Gaussian distribution using a ziggurat algorithm. We used the Doornik ziggurat code as the starting point for our development.

In the folder original_ziggurat_code_doornik the source code of the original ZIP archive by Doornik is conserved. The compiled executables from the ZIP file have been removed for security reasons, and the "makefile" folders for gcc have been renamed to emphasize the 32-bit vs 64-bit nature of the targeted executables. The file contents have been left intact.

The files necessary for development (zignor.c, zignor.h, zigrandom.c and zigrandom.h) have been copied from original_ziggurat_code_doornik to the root folder. zignor and zigrandom were merged into randommw and have undergone further changes. The modifications only concern the structure of the code, not the fundamental algorithms. The functionDRan_MWC8222() generates random doubles with full 52-bit mantissa resolution (instead of 32-bit in Doornik's original) via two iterations of the MWC8222 generator (see [24]).

The MWC8222 RNG is a "lag-256 multiply-with-carry" 32-bit random number generator[25] with a period of about 2^8222. Not to be confused with other multiply-with-carry generators called MWC256, that do not have such a big state and period.

Other ziggurat codes

Several implementations of ziggurat algorithms for generation of normally distributed numbers from a uniform RNG can be found on the Internets. Here are two well-documented and reliable examples in the C programming language.

  • Kschischang has made a very nicely documented and well-structured (yet platform-dependent) C implementation of McFarland's 2016 [17] ziggurat algorithm, called ZMG. In initial (unoptimized?) testing, ZMG seems to be about 2 times slower than randommw for generation of sequences of random numbers with a Gaussian distribution.

  • Voss provides a concise and well-structured ziggurat code that is part of the GNU Scientific Library (function gsl_ran_gaussian_ziggurat() ). It is based on the original algorithm by Marsaglia & Tsang,[25] with some simplifications, and may suffer from the same randomness correlation problem found by Doornik.[6]

Selection of included RNGs

The random number generators included in randommw.c were chosen on basis of published reports of statistical quality and speed, support in the scientific literature, and availability of clear and working C source code, aiming for diversity of the underlying algorithms. It is not our aim to have an exhaustive collection of RNGs, but a well-chosen set of modern 'scientific grade' RNGs that allows for running simulation code using different RNGs to check consistency.

RNG native output state memory arithmetic ops.
(bit) (bit) (bit)
MWC8222 32 8224 64
Lehmer64 64 128 128
PCG64DXSM 64 128 + 128 128
Xoshiro256+ 64 (a) 256 64
MELG19937 64 19968 64
  • MWC8222 is a "lag-256 multiply-with-carry" generator.[25] It was the original RNG used by Doornik for ZIGNOR,[6] and was found there to have good statistical properties.
  • Lehmer64 is a Lehmer generator proposed by Lemire as a very simple (and supposedly fast) algorithm generating numbers of sufficient statistic quality.[9][10] The implementation requires 128-bit __uint128_t integer arithmetic.
  • PCG64DXSM[11] is the standard RNG of Numpy,[12] and is thus extensively being used. It has good statistical quality.[9] The implementation requires 128-bit __uint128_t integer arithmetic.
  • Xoshiro256+ is a fast and efficient RNG algorithm that can be used to generate doubles (52-bit mantissa) and 32-bit integers, as needed by ZIGNOR, with good statistical quality. (a) For 64-bit number generation, the slightly more elaborate Xoshiro256++ is recommended.[13]
  • MELG19937 is a modern 64-bit variant of the well-known Mersenne Twister RNG.[15][16] It has good statistical behaviour.

References

[1] D. Jones, "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications", http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf

[2] H. Bauke, S. Mertens, "Random Numbers for Large-Scale Distributed Monte Carlo Simulations", Phys. Rev. E 2007, 75, 066701. doi:10.1103/PhysRevE.75.066701.

[3] A.M. Ferrenberg, D. P. Landau, D. P., Y. J. Wong, "Monte Carlo Simulations: Hidden Errors from ‘“Good”’ Random Number Generators", Phys. Rev. Lett. 1992, 69, 3382–3384. doi:10.1103/PhysRevLett.69.3382.

[4] Click, T. H., Liu, A., & Kaminski, G. A., "Quality of random number generators significantly affects results of Monte Carlo simulations for organic and biological systems", Journal of computational chemistry 2011, 32, 513. https://doi.org/10.1002/jcc.21638

[5] G. Volpe, G. Volpe, "Simulation of a Brownian Particle in an Optical Trap", American Journal of Physics 2013, 81, 224–230. doi:10.1119/1.4772632.

[6] J. A. Doornik, "An Improved Ziggurat Method to Generate Normal Random Samples" (2005), https://www.doornik.com/research/ziggurat.pdf

[7] Collins, J. C. "Testing, Selection, and Implementation of Random Number Generators", Defense Technical Information Center: Fort Belvoir, VA, 2008. doi:10.21236/ADA486379.

[8] (a) G. Marsaglia, post to sci.math Usenet group, 25 Feb 2003. (b) G. Marsaglia, post to comp.lang.c Usenet group, 13 May 2003

[9] https://github.com/lemire/testingRNG

[10] P. L'Ecuyer, "Tables of linear congruential generators of different sizes and good lattice structure", Mathematics of Computation of the American Mathematical Society 1999, 68.225, 249-260.

[11] https://www.pcg-random.org/

[12] https://numpy.org/doc/stable/reference/random/bit_generators/pcg64dxsm.html

[13] (a) D. Blackman and S. Vigna. "Scrambled Linear Pseudorandom Number Generators", ACM Trans. Math. Softw. 2021, 47, 1–32. https://doi.org/10.1145/3460772. (b) https://prng.di.unimi.it/

[14] H. Bauke, "Tina’s Random Number Generator Library Version 4.24". Documentation. https://www.numbercrunch.de/trng/trng.pdf

[15] S. Harase, T. Kimoto, "Implementing 64-Bit Maximally Equidistributed F 2 -Linear Generators with Mersenne Prime Period", ACM Trans. Math. Softw. 2018, 44, 1–11. doi:10.1145/3159444.

[16] F. Le Floc’h, "Entropy of Mersenne-Twisters", arXiv doi:10.48550/arXiv.2101.11350

[17] C. D. McFarland, "A modified ziggurat algorithm for generating exponentially and normally distributed pseudorandom numbers", Journal of Statistical Computation and Simulation 2016, 7, 1281. https://dx.doi.org/10.1080/00949655.2015.1060234

[18] ddm-toolkit @ github

[19] D. B. Thomas, P. H. W. Leong, W. Luk, J. D. Villasenor, "Gaussian Random Number Generators", ACM Computing Surveys 2007, 39, 11. doi:10.1145/1287620.1287622. https://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf

[20] (a) P. L'Ecuyer and R. Simard, "TestU01: A C Library for Empirical Testing of Random Number Generators", ACM Transactions on Mathematical Software 2007, 33, 22. (b) http://simul.iro.umontreal.ca/testu01/tu01.html

[21] https://www.phy.duke.edu/~rgb/General/dieharder.php

[22] https://github.com/MartyMacGyver/PractRand

[23] F. Panneton, P. L'Ecuyer, M. Matsumoto, "Improved long-period generators based on linear recurrences modulo 2", ACM Transactions on Mathematical Software 2006, 32, 1. https://doi.org/10.1145/1132973.1132974

[24] J.A. Doornik, "Conversion of High-Period Random Numbers to Floating Point", ACM Trans. Model. Comput. Simul. 2007, 17, 3. https://dx.doi.org/10.1145/1189756.1189759

[25] G. Marsaglia, "Random Number Generators", J. Mod. App. Stat. Meth. 2003, 2, 2–13. doi:10.22237/jmasm/1051747320.

[26] G. Marsaglia and W. W. Tsang, "The Ziggurat Method for Generating Random Variables", Journal of Statistical Software 2000, 5, 1-7. https://doi.org/10.18637/jss.v005.i08

About

Generators of uniformly and normally distributed random numbers

Resources

License

Stars

Watchers

Forks

Packages

No packages published