Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster randn!(::MersenneTwister, ::Array{Float64}) #35078

Merged
merged 2 commits into from
Apr 24, 2020

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented Mar 11, 2020

The main idea is that

  1. randn() essentially needs one random Float64 value (a bit more in rare cases)
  2. rand!(rng::MersenneTwister, ::Array{Float}) is optimized, compared to calling rand(rng, Float64) in a loop

So for randn!, let's use this optimized version by filling the array with uniform float values, and then do a second pass to transform them into normal values. The same applies to randexp! but with a smaller improvement.

There is a threshold on the length of the array between 10-20 below which this refined algorithm is less performant, so we switch the algorithm depending on that length. Here are the speed-ups on my machine:

length 5 10 50 100 300 600 1 000 10 000 100 000 1 000 000
randn! 0.88 1.04 1.57 1.70 1.81 1.89 1.92 1.94 1.92 1.85
randexp! 0.87 1.14 1.27 1.37 1.37 1.34 1.42 1.39 1.37 1.31

So for small arrays there is small "speed-down".

As future work, it would be interesting to see if this also applies to some other AbstractArrays, and to other RNGs which have optimized array generation, in which case we could have a "trait" for enabling this improvement.

@rfourquet rfourquet added performance Must go faster randomness Random number generation and the Random stdlib labels Mar 11, 2020
@KristofferC
Copy link
Member

@nanosoldier runtests(ALL; vs =":master")

@KristofferC
Copy link
Member

I just want to see how much of the ecosystem tests this will break.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@rfourquet
Copy link
Member Author

Wow, when I had the idea for this change last week, I was surprised to not have thought about it earlier, and I just found a 5+ y.o. local branch with this exact same change 🙈 (the performance improvement might have not been worthwhile at that time though?)

Anyway, someone willing to review?

@rfourquet rfourquet force-pushed the rf/faster-randn-bang branch from c235be7 to b44c0d5 Compare April 21, 2020 10:48
@rfourquet
Copy link
Member Author

Ok to merge?

@StefanKarpinski
Copy link
Member

Go for it!

@KristofferC
Copy link
Member

Would be good to open issues at the packages that started failing due to this change.

@rfourquet
Copy link
Member Author

Would be good to open issues at the packages that started failing due to this change.

I'm surprised by this suggestion, and would shy away from it, I would be worried to be spamming. Is there precedent for that? I remember having fixed few repos before Julia 0.7 to adjust them to last time similar changes, but now the random streams are expected to be subject to change in minor versions and the package authors will anyway be notified either by reading the NEWS or by running their tests. I can do, but am reluctant.

@StefanKarpinski
Copy link
Member

If upgrading Julia breaks the tests, I'd imagine most package authors would want to be proactively notified about that, so I'm not sure that's terribly spammy. They can always ignore or close issues.

@KristofferC
Copy link
Member

KristofferC commented Apr 21, 2020

I'm surprised by this suggestion, and would shy away from it

Fine, but it will just end up with me filing these issues when I run PkgEval for the 1.5 release and have 63 extra regressions in the logs to go through, trying to figure out if the reason for the test failure is because of changes in the random number stream or that we broke something in Julia.

@rfourquet
Copy link
Member Author

but it will just end up with me filing these issues when I run PkgEval for the 1.5 release

Oh I didn't realize you were doing that, then sure I will do for these errors. But I'm afraid that in many cases this won't be handled when you run PkgEval for 1.5 and that you will have still some work :-/

@KristofferC
Copy link
Member

Sure, but at least there will be an issue there and I can quickly move on.

People do tend to fix these issues though:

image

@rfourquet
Copy link
Member Author

People do tend to fix these issues though:

Ok cool, I was just imagining many maintainners to not build Julia from sources and wait for prerelease builds before trying to fix the failures.

there will be an issue there and I can quickly move on.

I will include "Julia 1.5" in the title, if this can help you find these issues. I might wait until after #29240 is merged in case this causes test failures in the same packages.

@rfourquet rfourquet added the minor change Marginal behavior change acceptable for a minor release label Apr 22, 2020
@rfourquet rfourquet force-pushed the rf/faster-randn-bang branch from b44c0d5 to aaa8f3a Compare April 22, 2020 10:31
@rfourquet
Copy link
Member Author

I fixed a small bug, which makes randn! even slightly faster, but unfortunately it means I have to re-run PkgEval...

@rfourquet
Copy link
Member Author

@nanosoldier runtests(ALL; vs =":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

This was referenced Apr 24, 2020
@davidssmith
Copy link
Contributor

You guys are making it very hard to use Julia for reproducible research when you change something as fundamental as this in a supposedly stable version of your language.

@KristofferC
Copy link
Member

KristofferC commented Aug 23, 2020

Use the same julia version for reproducible research? Getting the exact random numbers for a given seed has nothing to do with the stability of the language.

@davidssmith
Copy link
Contributor

True, but my end users keep grabbing the latest version and then complaining when something doesn't work, because you guys are changing so much post-supposedly-stable-1.0.x. This doesn't happen with Matlab. If a barely computer literate user grabs the latest version of Matlab and wants to run your script from a paper published five years ago, it will usually just work. When is Julia planning on getting there?

Also, I bet calling something like Random.seed!(myseed) with an unannotated scalar is the most common type of initialization, and if you want people to port their algorithms to Julia, that use case needs to create the same sequence as it would in other languages. Maybe it does now, but it can't have done it both before and now, since it was changed.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 23, 2020

The default RNG is documented not to be reproducible across Julia versions. I can understand that you might find this annoying, but it is an intentional policy which allows us to make major improvements to the default RNGs, including these which are all improved in the 1.5 release of Julia:

Arguably more importantly, it allows even deeper changes like #34852, which will likely be in Julia 1.6. That PR replaces the now-quite-old Mersenne Twister (MT) RNG with the much faster, smaller and statistically better xoshiro* RNG. This not only majorly speeds up sequential RNG performance even more than the changes to the default RNG in 1.5 does, but also gives the guarantee that for the same Julia version, RNG sequences are reproducible even in the presence of multithreading. There is no way that kind of guarantee could be satisfied with an RNG as large as MT while maintaining acceptable performance characteristics.

There is no way any of this would be possible if we did what Matlab does and guaranteed the same exact RNG output forever. Matlab has made a different choice, favoring never changing the default RNG outputs over major speed gains. As a result, they are now stuck with an old, slow, bloated and statistically unsound RNG (MT) and many suboptimal algorithms for using it to generate sequences. To address this question:

When is Julia planning on getting there?

The answer is "never"—that is not a goal for the project. If a new, better RNG comes along that's better than xoshiro*, we will change the behavior again. In fact, it may be a good idea for us to intentionally change RNG sequences with every release so that people are forced not to rely on it.

There are various ways to deal with changing RNG sequences in Julia:

  • Don't give users the impression that they should be able to reproduce exact RNG sequences.
  • Explicitly document the version of Julia that is required in order to reproduce exact results.
  • Use https://github.com/rfourquet/StableRNGs.jl to provide explicitly reproducible RNG sequences.

@davidssmith
Copy link
Contributor

I understand -- thanks for the detailed explanation, Stefan. I had always assumed Julia's goal was to be "Matlab done right" and assumed that it meant reaching some asymptotic behavior, but I can see the argument against that.

This old dude will need to re-evaluate how he uses Julia for open-sourced research codes. Users want to get all the speed improvements of the new releases, but busy PIs like me don't want to have to patch old code bases every release either. It would be nice to have an advice document on how to use Julia for research codes so that this conflict is minimized, e.g. stuff like your bullet points above.

@StefanKarpinski
Copy link
Member

You definitely shouldn't have to patch things with each release. A best practices doc for RNGs would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor change Marginal behavior change acceptable for a minor release performance Must go faster randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants