Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the Random module #3402

Merged
merged 5 commits into from
Oct 15, 2016
Merged

Conversation

oprypin
Copy link
Member

@oprypin oprypin commented Oct 9, 2016

The Random module currently has several problems:

  • The upper bound for the desired random number in rand(max : Int) can be any integer type, but it is actually limited to UInt32::MAX by the random number generator itself, moreover a cast to Int32 happens, sometimes causing negative numbers in the output (fix Raise when rand() will overflow #3350)
  • Because a simple modulo operation is used to produce a random number in rand(max : Int), the output is not uniformly distributed. With a large enough range, some numbers may appear even twice as often as others.
  • It works only with 32-bit random number generators, while many modern PRNGs produce 64-bit numbers, and there are also byte-based generators (like urandom).
  • rand(range : Range(Int, Int)) is especially fragile because it depends on rand(max : Int) and adds more possibilities for overflows with large numbers.
  • rand(max : Float) uses the same 32-bit resolution, which means that quite a few floating point values can never be reached.

That's why I decided to apply my previous experience with developing an RNG library to rework the Random module.

These changes make it possible to:

  • Generate uniformly distributed random integers of any type and any range. The result will always be the same type as the supplied argument.

    Even things like rand(Int64::MIN..Int64::MAX) work

  • Provide the same nice interface based on a random number generator that produces integers of any size, not just 32-bit integers

    Now it is easy to make an RNG class based on urandom's bytes, like in Python, but that's not included in this PR

Now each RNG operation can consume multiple numbers from the underlying random number generator if needed, but of course everything is still deterministic. For implementation details see the comments in the code.

The interface stays almost the same (though I would like to suggest some changes in the future), but classes based on Random now need to provide a next_u method instead of next_u32.

Disclaimer:
I have put a lot of thought into this topic over the years, but I'm still an amateur. The algorithms here are based on widely accepted ideas, but they don't closely follow a particular widely-used implementation.

@@ -67,16 +109,71 @@ describe "Random" do
end

it "allows creating a new default random with a seed" do
rand = Random.new(1234)
value1 = rand.rand
values = Array.new(2) {
Copy link
Member

@RX14 RX14 Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use do/end. The formatter should really enforce this.

@asterite
Copy link
Member

@BlaXpirit I think we could use this PR as an example of what a PR should look like. The docs and comments in the code are extremely clear and useful, and the changes/addition are more than welcome. Even though you say you are an amateur in this subject you probably know more than most people, including me and I'd dare to say every other core committer.

I'd merge this right away, but I want to either double-check with @waj, or if someone else could review this and give a 👍 we can merge this.

@lbguilherme
Copy link
Contributor

How about naming the method next instead of next_u?

@asterite
Copy link
Member

I think next_u is fine, it must return an unsigned integer. next is too generic.

@refi64
Copy link
Contributor

refi64 commented Oct 13, 2016

Nice! Random numbers can be a pain point in a lot of statically typed languages (cough C++ cough).

@RX14
Copy link
Member

RX14 commented Oct 13, 2016

I think next_u looks weird, maybe next_unsigned? I don't think the longer name is an issue.

@asterite
Copy link
Member

@RX14 It's consistent with to_i, to_u, to_f, etc.

@RX14
Copy link
Member

RX14 commented Oct 15, 2016

While we're at it, why not add .uuid, .base64, .random_bytes and .hex methods like SecureRandom. Implement a Random implementation that uses SecureRandom's random source, and remove the unnecessary distinction between Random and SecureRandom's interfaces.

Maybe people want less-secure UUIDs from a fast source, or maybe they want to use a secure random source with rand().

@RX14
Copy link
Member

RX14 commented Oct 15, 2016

And probably merge #2716 or something like it too.

@oprypin
Copy link
Member Author

oprypin commented Oct 15, 2016

Implement a Random implementation that uses SecureRandom's random

@RX14 I had something like that planned (going step by step):

Now it is easy to make an RNG class based on urandom's bytes, like in Python, but that's not included in this PR

@RX14
Copy link
Member

RX14 commented Oct 15, 2016

@BlaXpirit adding helper methods for generating base64, hex, and at least a slice of random bytes probably should be in here though.

@asterite
Copy link
Member

@RX14 We can do small steps and make sure each of those steps is good. Then adding the next step becomes easier :-)

I'll merge this, I didn't get to review it with someone else but it looks really good to me, and it will give room to more improvements \o/

Big thank you, @BlaXpirit ❤️

@asterite asterite merged commit dc2e6d7 into crystal-lang:master Oct 15, 2016
@asterite asterite added this to the 0.20.0 milestone Oct 15, 2016
@asterite
Copy link
Member

@RX14 The problem with SecureRandom is that if you want several bytes then it's a single urandom call, but here we'll have to do one urandom call for each byte. I don't know what a solution for that is, but in the meantime both classes are usable like that so it's not a big issue.

@oprypin
Copy link
Member Author

oprypin commented Oct 16, 2016

Aww, this was squash-merged :(

@asterite
Copy link
Member

@BlaXpirit I did it because individual commits didn't compile. The first commit used rand_type which wasn't defined there, I think it got introduced in the third commit.

In any case, I don't consider commit history that important, and the whole PR makes sense as a whole. And sorry, maybe I should have asked or pointed that out.

@oprypin
Copy link
Member Author

oprypin commented Oct 16, 2016

@asterite, oh, sorry. That's what I get for trying to make the commit history make sense after writing everything. Good that you noticed this.

@RX14
Copy link
Member

RX14 commented Oct 16, 2016

I think commit history is very important, especially that all the commits compile. They're the only metadata to changes you get and so should have detailed explanations of anything that isn't clear in the commit.

@asterite
Copy link
Member

@BlaXpirit I noticed the missing method and thought it was easier to squash than to ask you to rewrite history (I'm lazy to do that when someone asks me to, maybe because I'm really bad at git), but next time I'll ask before merging :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raise when rand() will overflow
5 participants