-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: use PCG Source in math/rand for Go 2 #21835
Comments
@MichaelTJones perhaps you might be interested in this issue, as you already have an implementation of Melissa O'Neill's PCG generator at https://github.com/MichaelTJones/pcg. |
I would very much support this change. In fact my careful PCG implementation is because of the weaknesses mentioned here. I long ago switched to this code and will not use the present library PRNG. In addition to being technically excellent, a new generator can be created in 2.72ns, makes new 64-bit values in 5.56ns, and has small state. Examine the tests in my implementation for details. |
why not bring it to Go 1 |
@dongweigogo please read the proposal text carefully, it is explained. |
For some random generators, generating a bunch of new streams is worse statistically for once stream. Section 4.3.2 of the paper suggests that this is not a problem with PCG if the seeding is done properly. Just wanted to comment for anyone curious. |
yes that's how i do it.
BenchmarkNew32-8 2000000000 0.67 ns/op
BenchmarkRandom32-8 1000000000 3.23 ns/op
BenchmarkBounded32-8 100000000 15.0 ns/op
BenchmarkNew64-8 300000000 6.64 ns/op
BenchmarkRandom64-8 100000000 15.8 ns/op
BenchmarkBounded64-8 50000000 45.3 ns/op
A new one is 0.67ns for 32 bits or 0.67ns for 64. Easy to have one or more
per concurrent function.
…On Tue, Oct 24, 2017 at 2:28 PM, Brendan Tracey ***@***.***> wrote:
To put it another way, with a separate Source for each goroutine, no
locking would be required, but the current implementation is impractically
large for such an approach.
For some random generators, generating a bunch of new streams is worse
statistically for once stream. Section 4.3.2 of the paper suggests that
this is not a problem with PCG if the seeding is done properly. Just wanted
to comment for anyone curious.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21835 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHgypZGI0OER3trgpt7Eq23dW_V8-q-Yks5svlZwgaJpZM4PS_P2>
.
--
Michael T. Jones
[email protected]
|
@MichaelTJones Notice how the bounded functions are significantly slower, that's because of the divisions. You can follow the new implementation at go/src/math/rand/rand.go and avoid divisions (with high probability) and get something faster... Once Go gets something like a full 64-bit multiplication (outputting both the high and low 64-bit values), then we will be able to similarly speed up Bounded64. |
@robpike The implementation of the bounded RNGs at https://github.com/golang/exp/blob/master/rand/rand.go implies one or two 64-bit division per call, assuming no magical tricks from the compiler. It costs dozens of cycles. The divisions probably cost more than the generation of the random number itself! The 64-bit division is a monster on x64 processors. Even the 32-bit division is surprisingly expensive. It is not very different on ARM processors. A better alternative is to do a multiply followed by a shift... |
I am reading the linked paper very carefully... (in the Long Beach airport)
…On Tue, Jan 9, 2018 at 6:32 PM, Daniel Lemire ***@***.***> wrote:
@robpike <https://github.com/robpike> The implementation of the bounded
RNGs at https://github.com/golang/exp/blob/master/rand/rand.go implies
one or two 64-bit division per call, assuming no magical tricks from the
compiler. It costs dozens of cycles. The divisions probably cost more than
the generation of the random number itself! The 64-bit division is a
monster on x64 processors. Even the 32-bit division is surprisingly
expensive. It is not very different on ARM processors.
A better alternative is to do a multiply followed by a shift...
https://github.com/golang/go/blob/master/src/math/rand/rand.go#L138-L160
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21835 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHgypSZtKQ956CZ_eHuTCYIW2o4QwGh2ks5tJCFTgaJpZM4PS_P2>
.
--
Michael T. Jones
[email protected]
|
There are at least three parts of math/rand: the source, the conversion/distribution routines, and the concurrency strategy. I think that it is worth revisiting all three for Go 2 and that all three should be considered together. This proposal as written addresses the source. The conversion routines changes include #16213 and a handful of other math/rand decisions that were made to preserve existing output, e.g. #8013, #6721, #12290, #16124, #8731, and probably others. |
Excellent comment and very helpful. Thank you for the work in crafting this
hub of information.
…On Thu, Jan 11, 2018 at 2:11 PM, Josh Bleecher Snyder < ***@***.***> wrote:
There are at least three parts of math/rand: the source, the
conversion/distribution routines, and the concurrency strategy. I think
that it is worth revisiting all three for Go 2 and that all three should be
considered together.
This proposal as written addresses the source.
The conversion routines changes include #16213
<#16213> and a handful of other
math/rand decisions that were made to preserve existing output, e.g. #8013
<#8013>, #6721
<#6721>, #12290
<#12290>, #16124
<#16124>, #8731
<#8731>, and probably others.
For the concurrency strategy, see #20387
<#20387> and #21393
<#21393>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21835 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHgypSD4yXN6VKEdC04UL9_nK266zT0vks5tJoclgaJpZM4PS_P2>
.
--
Michael T. Jones
[email protected]
|
If get per-P storage in Go 2, or if we are willing to let math/rand have privileged runtime access, I think we should consider adding a global Source that:
This would allow people not to have to manually thread PRNG state around through goroutines, which seems to be a significant complaint. Maybe it should even be the default Source, and have people opt in to reproducibility by making their own Source. |
cc @cespare for that last comment |
…e default source PCG is suppose to be higher quality RNG: http://www.pcg-random.org/ See related: golang/go#21835 Xoshiro is also suppose to be a high quality RNG: http://prng.di.unimi.it/ PCG and Xoshiro seem to think each is better than the other: * http://www.pcg-random.org/posts/a-quick-look-at-xoshiro256.html * http://www.pcg-random.org/posts/xoshiro-repeat-flaws.html * http://pcg.di.unimi.it/pcg.php PCGSource perf on Go 1.13 is worse though... $ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta Benchmark/default1-8 9.74 10.0 +2.67% Benchmark/default4-8 15.7 15.1 -3.82% Benchmark/default8-8 22.1 20.6 -6.79% Benchmark/default16-8 37.2 33.4 -10.22% Benchmark/default32-8 67.1 59.4 -11.48% Benchmark/default64-8 122 116 -4.92% Benchmark/default128-8 234 214 -8.55% Benchmark/dhrand1-8 18.0 17.7 -1.67% Benchmark/dhrand4-8 47.2 47.5 +0.64% Benchmark/dhrand8-8 83.0 90.5 +9.04% Benchmark/dhrand16-8 141 156 +10.64% Benchmark/dhrand32-8 277 287 +3.61% Benchmark/dhrand64-8 524 542 +3.44% Benchmark/dhrand128-8 976 1083 +10.96% BenchmarkModule/1-8 24.8 23.0 -7.26% BenchmarkModule/4-8 25.9 25.9 +0.00% BenchmarkModule/8-8 30.2 31.4 +3.97% BenchmarkModule/16-8 43.3 43.4 +0.23% BenchmarkModule/32-8 67.4 65.6 -2.67% BenchmarkModule/64-8 124 128 +3.23% BenchmarkModule/128-8 212 231 +8.96% benchmark old allocs new allocs delta Benchmark/default1-8 0 0 +0.00% Benchmark/default4-8 0 0 +0.00% Benchmark/default8-8 0 0 +0.00% Benchmark/default16-8 0 0 +0.00% Benchmark/default32-8 0 0 +0.00% Benchmark/default64-8 0 0 +0.00% Benchmark/default128-8 0 0 +0.00% Benchmark/dhrand1-8 0 0 +0.00% Benchmark/dhrand4-8 0 0 +0.00% Benchmark/dhrand8-8 0 0 +0.00% Benchmark/dhrand16-8 0 0 +0.00% Benchmark/dhrand32-8 0 0 +0.00% Benchmark/dhrand64-8 0 0 +0.00% Benchmark/dhrand128-8 0 0 +0.00% BenchmarkModule/1-8 0 0 +0.00% BenchmarkModule/4-8 0 0 +0.00% BenchmarkModule/8-8 0 0 +0.00% BenchmarkModule/16-8 0 0 +0.00% BenchmarkModule/32-8 0 0 +0.00% BenchmarkModule/64-8 0 0 +0.00% BenchmarkModule/128-8 0 0 +0.00% benchmark old bytes new bytes delta Benchmark/default1-8 0 0 +0.00% Benchmark/default4-8 0 0 +0.00% Benchmark/default8-8 0 0 +0.00% Benchmark/default16-8 0 0 +0.00% Benchmark/default32-8 0 0 +0.00% Benchmark/default64-8 0 0 +0.00% Benchmark/default128-8 0 0 +0.00% Benchmark/dhrand1-8 0 0 +0.00% Benchmark/dhrand4-8 0 0 +0.00% Benchmark/dhrand8-8 0 0 +0.00% Benchmark/dhrand16-8 0 0 +0.00% Benchmark/dhrand32-8 0 0 +0.00% Benchmark/dhrand64-8 0 0 +0.00% Benchmark/dhrand128-8 0 0 +0.00% BenchmarkModule/1-8 0 0 +0.00% BenchmarkModule/4-8 0 0 +0.00% BenchmarkModule/8-8 0 0 +0.00% BenchmarkModule/16-8 0 0 +0.00% BenchmarkModule/32-8 0 0 +0.00% BenchmarkModule/64-8 0 0 +0.00% BenchmarkModule/128-8 0 0 +0.00%
I have stumbled in the code for PCGSource (we decided to teach Go as first language to undergrads at my university), and in case you are considering it for inclusion in the standard library, I think it would IMHO a good idea to add to the code some documentation about the perils of seeding the generator with states that share a large number of lower bits. For example:
This program creates two PCGSource instances whose initial states differ only in the upper bit by unmarshaling; then, it advances the generators a billion times, and then prints ten outputs:
It is visible even to the naked eye that the outputs are the same, modulo a 16-bit rotation and single-bit change, whereas we would expect to obtain, maybe after a few iterations, different and uncorrelated streams. The problem lies in the basic flaw of LCGs with power-of-2 modulus: if you create two LCGs with the same multiplier and additive constant, and start them from states with the same k lower bits, this will be true forever—the next-state operations all propagate bit changes to the left, never to the right. So the two generators above, even starting from different states, will have forever the same 127 lower bits. This is an extreme example—127 lower bits in common. But you can still experience a strong correlation between streams generate by different seeds with less low bits in common, even if you cannot see the correlation as easily. For example, the following program uses the standard seeding interface to create two different generators, and then write their output to standard output in binary:
Since we are using two different seeds, we expect the two associated streams, which are non-overlapping, to be uncorrelated. In particular, if we interleave the streams the resulting stream should still look random. But if we pipe the output of the program in PractRand, we obtain after few seconds of testing
All classical PRNGs (MCGs with prime modulus, F₂-linear, (C)MWC, etc.) mix wholly their state even after the smallest state change (e.g., one bit). Some are very slow (e.g., the Mersenne Twister with 19937 bits takes a few million iterations), but sooner or later the state change will propagate to all bits. At that point, they will pass the interleaving test above. All PRNGs based on cryptographic primitives will, too, as the smallest change in state generates a random perturbation of the output. This is not true for PCG because of the very weak base LCG. For the same reason, I suggest that in case the "jump" functionality of PCG is made available in the future, the documentation should state clearly that jumping by a large power of 2 is very dangerous, as it will yield a new state with a large number of lower bits in common with the starting state, leading to the consequences above. |
Shouldn't there be a discussion regarding the other potential alternative Source implementations? And regarding the requirements that such a replacement will have to fulfill? Currently math/rand is explicitly meant for applications that are not "security sensitive", but wouldn't it be in the spirit of Go to provide safe defaults? Standard library users will not always know or figure out in time that their application requires resistance to attacks on the PRNG. I think that this applies in the case of an algorithm that uses randomization being exposed on the public Internet, for example. Randen is an alternative to PCG that seems to pass the relevant statistical tests (as PCG likewise does), but should also provide resistance against prediction of future inputs and empirical indistinguishability from randomness while the generator state is not compromised, and also it ensures that past outputs can't be reconstructed even after the state is compromised. The Randen paper also shows that, given hardware acceleration of AES, Randen can be faster than PCG in real world applications like shuffling and sampling. The Randen paper on Arxiv (it includes a kind of overview of other available "modern" PRNGs): https://arxiv.org/abs/1810.02227 The Randen Git repository: https://github.com/google/randen However, considering Randen raises the question of whether the alternate Go PRNG should, given the same seed, generate the same pseudorandom bits across all supported computer architectures; because presumably one wouldn't want to use Randen on, e.g., the old ARMs or Pentiums (because of the lack of AES acceleration instructions). I guess PCG is good enough to be a default for the low-powered machines? |
There seems to be much discussion on the PCG family on the Numpy Github, a lot of which is probably relevant to this issue: It's a bit much to consume, but I noticed at least these important facts:
Another thing that may be of import is how nontrivial or "challenging" predictability is explicitly a design goal for the PCG family. AFAIK, "challenging" just means (in the context of PCG) that there are/were no publicly known efficient attacks, but without a stronger kind of guarantee, such as applicable to a CSPRNG or to Randen, making the claim imprecise in such a way that it is somewhat deceptive, coming from an expert. This "challenging" stuff seems to come at the cost of trade-offs at the expense of speed or statistical quality, even though it is of dubious utility (considering that no real guarantee is given, and that a relatively efficient attack has recently been demonstrated). |
Sebastiano Vigna wrote a pretty scathing review of some of the claims made in O'Neill's paper: |
Dr. Vigna's criticisms are mostly about (extraordinarily) highly correlated sequences given nearby initial states. This is a very important concern for people performing intensive mathematical, scientific, or statistical work. However, based on discussion here and in #26263, that does not seem to be the Go team's target audience for math/rand. In particular, such people are more likely to understand the reasons for choosing or avoiding a particular class of PRNG. Instead, the goals for a new math/rand generator seem to be speed, reasonable distribution quality, and a state size small enough that many goroutines can use independent generators. PCG fits these traits[n1][n2], as do many other generators, including several mentioned here. Now, since this proposal is for PCG, suggesting other generators here does not make progress. Instead, we should be looking at how people currently use math/rand – keeping in mind that it is already a weak generator – to verify that the problems proven with PCG are, in a sense, beside the point. Another idea is to look at how many projects implement or import specific PRNG algorithms such as PCG, xoshiro, Mersenne Twister, &c., or collections like gonum/mathext/prng that provide several PRNGs, to see the reasons people choose those algorithms over math/rand. [n1] The issue with speed on 32-bit platforms mentioned in #21835 (comment) does seem like an important concern, especially as Wasm becomes more prominent. [n2] Notably, the idea mentioned in the proposal to seed different PCGs with the generators' respective addresses, or having many goroutines each seed with the current nanosecond time, may exacerbate its problems with highly correlated states. On the other hand, a high-quality seeding algorithm might avoid that. |
@zephyrtronium I was under the impression that wasm did have 64-bit integer support universally today? Is this not the case? Or do you mean to refer to something else with the remark "especially as Wasm becomes more prominent". Or is there a particular issue with how Go generate wasm outputs? cc @jfbastien |
It's worth noting that I responded to his critiques here. |
So it does, my mistake. I must have misread something about Wasm at some point. |
Also, as observed by some of the commentators here, more recently Vigna has raised concerns about self-similarity in PCG. If we interleave outputs from carefully chosen seedings of PCG, although the outputs are superficially different, the closer scrutiny applied by sensitive PRNG tests will reveal correlations of some kind indicating that the generator exhibits self-similarity between distant parts of its output. I think this kind of self-similarity is interesting (and well known for the LCGs that underlie PCG), but Vigna's apparent grave concern about this issue was rather undercut when, on the NumPy mailing list, Tyge Løvset pointed out that similar issues affected Vigna's own most-recent PRNG, xoshift** (comment, code). If users follow good practice for seeding PRNGs (which they should for a wide variety of other reasons), these kinds of issues won't arise (or rather, they are vanishingly unlikely). An alternative is to scramble the output more aggressively, but the trade-off is execution time. PCG's newest output permutation, DXSM, does represent an attempt to scramble better at no added cost on many platforms, but it is still making a trade off in this space. |
I wasn't sure of the impact of the claims made, but the general tone was harsh enough it made it more difficult to take his critiques seriously, although I felt there was some interesting points mentioned. I'm happy you took the time to address them, and thank you for providing that information here. His post was certainly at best unintentionally misleading. |
@lemire sorry for missing this, I don't look at github notifications often. Yes, wasm supports 64-bit integers universally as a first-class type. The initial discussion in 2016 hedged a bit in case there were JS-based implementations in some browsers, but now all relevant browsers support wasm natively (that's 92% of all users on the web). Non-browser implementations of wasm also support 64-bit integers natively. Maybe the confusion is around "32-bit platforms"? wasm is currently ILP32, thought there's a proposal for wasm64 with 64-bit address spaces. Limiting the address space to 32 bits doesn't limit the use of 64-bit integers. |
There is no mention of a multiply-with-carry generator in these discussions. Unlike many of these other algorithms, it is listed on wikipedia and is simply a LCG with less constants. If the multiply-with-carry generator had a lag of at least 2 (according to the parlance used), it would be parallelizable by typical cpu dispatch units as there would be less dependencies. Hopefully whatever the generator used, it will allow for 64-bit seeds or at least frequent reseeding.
This AES-NI accelerated RNG is faster and has performance nearly that of a CPU's memory transfer limit: dragontamer/aesrand Adding an extra AES round might make it cryptographic. |
Hi everyone! I'm a numpy developer with a particular focus on our PRNG capabilities. For some background, the older PCG64 with the XSL-RR output function is our current default PRNG for the past couple of years. The next release will have an optional implementation with the stronger DXSM output function. We expect we will make it the default some undetermined time down the road. I have written up a what I hope to be an accessible explanation of the consequences so you don't have to reconstruct everything from those long digressing threads. My prose style is prone to long, complicated sentences, so please ask if anything needs to be clarified; that will help me edit this down for my own audience. I hope to provide a point of view as a PRNG library maintainer that is helpfully complementary to those of our PRNG algorithm authors. To be clear about where I'm writing from, numpy is an anchor library for scientific computing and machine learning, so we have a particular risk profile that is driving our decisions. The extreme conditions that would make this weakness non-negligibly likely are not really a practical use case right now, but you can see it on the horizon, at least for my audience. This is likely a different risk profile for you, but I try to give the information needed to evaluate things for yourself rather than boil everything down to a universal safe/unsafe declaration. If I were still asked my opinion for what you should do, if you do decide to use a PCG algorithm, I would recommend using the DXSM variant over the XSL-RR. While the risks of XSL-RR are rare outside of extreme use cases and usually inconsequential, I think they are still worth documenting. And it's a pain to document and a pain for the user to work through the math to determine that yes, they are safe using it. DXSM provides an extra safety margin that renders those computations (and my article) unnecessary. I can think about the DXSM variant with a mental model that is a lot less complicated than one I would now use for XSL-RR. As a library maintainer and documenter, that's golden. |
You sir have no idea what long and complicated sentences are... :-) |
Thanks for the write up! Nicely done. (It might be too tangential to include, but the only thing I'd add is that detectable self-similarity is not unique to LCGs (and PCG); it's findable in a number of non-cryptographic PRNGs.) |
See #60751 for a GitHub Discussion that I hope will lead to a math/rand/v2. Closing as duplicate of that. |
The original
math/rand
package had a number of flaws. Some of the flaws in the API have been addressed, such as the addition of the Source64 interface, but the most important issue remains: the defaultSource
is poor.The
Source
has several problems.This last point is not fatal if there is only one instance, but it couples poorly with another fundamental and perhaps unfixable problem: bad behavior when accessed concurrently. It must be protected with a mutex, but if it were much smaller (and cheaper to seed) it would be practical instead to have each client in the address space access a separate instance, seeded uniquely.
To put it another way, with a separate
Source
for each goroutine, no locking would be required, but the current implementation is impractically large for such an approach.The proposal here is in several parts.
First, we create an alternate
Source
implementation. Our choice is the Permuted Congruential Generator (PCG) designed and thoroughly tested by O'Neill in:PCG: A Family of Simple Fast Space-Efficient Statistically Good
Algorithms for Random Number Generation
Melissa E. O’Neill, Harvey Mudd College
http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf
A 128-bit PCG generator producing a 64-bit output is only two words of memory but provably has a period of
2**128
, is cheap to seed, and efficient. (A 64-bit generator would be noticeably cheaper, but a period of only2**64
is deemed too small for modern hardware). It is practical and reasonable for every goroutine to have a localSource
(a simple seeding algorithm is to use the address of theSource
as the seed).An implementation of this generator, specifically PCG XSL RR 128/64 (LCG), is checked in to
golang.org/x/exp/rand
as its defaultSource
.Second, we work on the compiler, possibly through
math/bits
, to make it a little more efficient. The current implementation, in pure Go, is somewhat slower thanmath/rand
, but could be comparably fast or even faster given compiler support for the 128-bit multiply inside.The third part is the critical piece.
We propose to make this generator the default one in
math/rand
for Go 2. The Go 1 compatibility decree could allow it to happen today, but we have chosen in the past to take the guarantee more literally than is expressed by also requiring that the generated stream of bits will never change. This decision was made when a fix was done, but enough tests in production were discovered that depended on the exact stream that it was deemed simpler to lock down the implementation rather than to fix all the tests.However, with Go 2 on the horizon, there is enough time and understanding and process to offer an opportunity to use rolling out an improved, API-compatible version of
math/rand
as an early test case for Go 2. Changingmath/rand
to use PCG is strictly incompatible, but not in a way that the Go 1 decree actually prohibits. A soft breakage at this level would be an excellent step on the way to understanding what it takes to roll out a breaking change for Go 2.In essence, the proposal is to use the fixing of
math/rand
as a test case for introducing breaking changes in Go 2. And a much better random number generator would result, too: the breaking change is worth making.The precise sequence of steps to roll it out needs discussion, but one possible outline is like this:
math/rand
, but not as the default.PCGSource
is the obvious.LFSRSource
.LFSRSource
rather thanSource
. SinceSource
is usually hidden, accessed throughRand
, this may be ineffectual unless we also provide PCG functions explicitly forInt32
, etc. Or perhaps provide LFSR versions and rotate towards those in legacy code.Source
points toPCGSource
andLFSRSource
is no longer the default.math/bits
support its operations more efficiently through the compiler.LFSRSource
.At some point along the way,
Source64
should become the default, under a similar sequence of events. Minor cleanups could occur too, tidying up the API and fixing some old-style names likeIntn
notIntN
.There is much detail that remains to be worked out; this proposal is only an outline.
The text was updated successfully, but these errors were encountered: