-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster randn!(::MersenneTwister, ::Array{Float64}) #35078
Conversation
@nanosoldier |
I just want to see how much of the ecosystem tests this will break. |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt |
Wow, when I had the idea for this change last week, I was surprised to not have thought about it earlier, and I just found a 5+ y.o. local branch with this exact same change 🙈 (the performance improvement might have not been worthwhile at that time though?) Anyway, someone willing to review? |
c235be7
to
b44c0d5
Compare
Ok to merge? |
Go for it! |
Would be good to open issues at the packages that started failing due to this change. |
I'm surprised by this suggestion, and would shy away from it, I would be worried to be spamming. Is there precedent for that? I remember having fixed few repos before Julia 0.7 to adjust them to last time similar changes, but now the random streams are expected to be subject to change in minor versions and the package authors will anyway be notified either by reading the NEWS or by running their tests. I can do, but am reluctant. |
If upgrading Julia breaks the tests, I'd imagine most package authors would want to be proactively notified about that, so I'm not sure that's terribly spammy. They can always ignore or close issues. |
Fine, but it will just end up with me filing these issues when I run PkgEval for the 1.5 release and have 63 extra regressions in the logs to go through, trying to figure out if the reason for the test failure is because of changes in the random number stream or that we broke something in Julia. |
Oh I didn't realize you were doing that, then sure I will do for these errors. But I'm afraid that in many cases this won't be handled when you run PkgEval for 1.5 and that you will have still some work :-/ |
Ok cool, I was just imagining many maintainners to not build Julia from sources and wait for prerelease builds before trying to fix the failures.
I will include "Julia 1.5" in the title, if this can help you find these issues. I might wait until after #29240 is merged in case this causes test failures in the same packages. |
And `randexp!`.
b44c0d5
to
aaa8f3a
Compare
I fixed a small bug, which makes |
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt |
You guys are making it very hard to use Julia for reproducible research when you change something as fundamental as this in a supposedly stable version of your language. |
Use the same julia version for reproducible research? Getting the exact random numbers for a given seed has nothing to do with the stability of the language. |
True, but my end users keep grabbing the latest version and then complaining when something doesn't work, because you guys are changing so much post-supposedly-stable-1.0.x. This doesn't happen with Matlab. If a barely computer literate user grabs the latest version of Matlab and wants to run your script from a paper published five years ago, it will usually just work. When is Julia planning on getting there? Also, I bet calling something like |
The default RNG is documented not to be reproducible across Julia versions. I can understand that you might find this annoying, but it is an intentional policy which allows us to make major improvements to the default RNGs, including these which are all improved in the 1.5 release of Julia:
Arguably more importantly, it allows even deeper changes like #34852, which will likely be in Julia 1.6. That PR replaces the now-quite-old Mersenne Twister (MT) RNG with the much faster, smaller and statistically better xoshiro* RNG. This not only majorly speeds up sequential RNG performance even more than the changes to the default RNG in 1.5 does, but also gives the guarantee that for the same Julia version, RNG sequences are reproducible even in the presence of multithreading. There is no way that kind of guarantee could be satisfied with an RNG as large as MT while maintaining acceptable performance characteristics. There is no way any of this would be possible if we did what Matlab does and guaranteed the same exact RNG output forever. Matlab has made a different choice, favoring never changing the default RNG outputs over major speed gains. As a result, they are now stuck with an old, slow, bloated and statistically unsound RNG (MT) and many suboptimal algorithms for using it to generate sequences. To address this question:
The answer is "never"—that is not a goal for the project. If a new, better RNG comes along that's better than xoshiro*, we will change the behavior again. In fact, it may be a good idea for us to intentionally change RNG sequences with every release so that people are forced not to rely on it. There are various ways to deal with changing RNG sequences in Julia:
|
I understand -- thanks for the detailed explanation, Stefan. I had always assumed Julia's goal was to be "Matlab done right" and assumed that it meant reaching some asymptotic behavior, but I can see the argument against that. This old dude will need to re-evaluate how he uses Julia for open-sourced research codes. Users want to get all the speed improvements of the new releases, but busy PIs like me don't want to have to patch old code bases every release either. It would be nice to have an advice document on how to use Julia for research codes so that this conflict is minimized, e.g. stuff like your bullet points above. |
You definitely shouldn't have to patch things with each release. A best practices doc for RNGs would be good. |
The main idea is that
randn()
essentially needs one randomFloat64
value (a bit more in rare cases)rand!(rng::MersenneTwister, ::Array{Float})
is optimized, compared to callingrand(rng, Float64)
in a loopSo for
randn!
, let's use this optimized version by filling the array with uniform float values, and then do a second pass to transform them into normal values. The same applies torandexp!
but with a smaller improvement.There is a threshold on the length of the array between 10-20 below which this refined algorithm is less performant, so we switch the algorithm depending on that length. Here are the speed-ups on my machine:
So for small arrays there is small "speed-down".
As future work, it would be interesting to see if this also applies to some other
AbstractArray
s, and to other RNGs which have optimized array generation, in which case we could have a "trait" for enabling this improvement.