-
-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rand::sample's API should be fallible #194
Comments
Do you have a context where you're using this function? I'm not sure what the main use-cases are which makes it hard to say what it should do. |
The major point is how this behaves compared to functions like
panics with:
The way this currently behaves is more like python's slice mechanics, where doing The counterpoint is something like Edit: I consider |
Also, I a requesting this change as part of the rand crate stabilization: rust-lang/rfcs#2106 |
Is it? I don't remember writing that. Do you mean it was requested in a comment? Regarding the previous post: true. But another point related to So without use cases I still don't know what this function should do (I'd be tempted just not to stabilise it, e.g. put it behind a feature gate). |
sorry, I meant that I am requesting it as part of that effort :) |
I might be wrong, but I think rust allows you to overload to provide different implementations for different input types, so such efficiency could be provided. Also, correct me if I'm wrong but the function already requires an iterator of a known length right? ... Hmm, I'm looking at it and I think it just requires the input to be an iterator, so it doesn't necessarily know anything about the length. I think there are really two use cases here that should mabe be two separate functions. At the very least, one that knows about the length and one that doesn't. |
Iterators never have a known length in Rust. At best, they have a size-hint, but that's too unreliable for this use-case. You're right, two different functions are called for to cover this functionality fully. But I wonder which one(s) would actually be used? |
I think the current interface of sample kind of makes sense and is self-consistent, but it's a very tricky to use function. This is reservoir sampling, and the function includes a note that the order of elements in the output is not random! The interface is that you input a sequence of n elements and then you pick at most m of those elements, where each element has equal chance of ending up in the output. This generalizes quite nicely into supporting situations where n <= m; you simply get the whole sequence back. |
So I can see three major use cases and am starting to think that these should be implemented as traits This one is the same as what exists and would be implemented just for any iterator
This is similar, but is for slices. Edit: I've split it into two methods so that
Another implementation could hopefully be done on The point here is that each is implementation specific and I can see use cases for all of them:
|
@bluss I actually didn't realize that if |
I edited |
Sample is suggestive of a random order, so I would prefer something matter of fact - like naming it reservoir sample, explicitly after the algorithm it implements. |
Also, it might be possible to have only the traits @bluss that sounds fine to me. Actually, Edit: and calling it when |
Also, an implementation detail: for |
This function makes sense only when your sample includes a significant fraction of the output of the source iterator. You could build a stable version of this too, maybe by using a heap, or maybe through indexing:
An aborted Fisher–Yates shuffle makes more sense when you do not mind permuting your source. Also Robert Floyd's combination algorithm makes more sense when sampling from a much larger |
@burdges just to be clear when you say "this function" you mean rand::sample as it exists today right? |
I think the request now boils down to these three things:
Edit: Another possiblity would be a Edit2: Someone could convince me that |
@vitiral would you make a PR, either here or against my branch? (Not really sure what's best; since this is fairly independent of the other stuff in my branch it could just land here.) The other question: should |
I can see what I can do! I'm not sure I like the name |
I wasn't at all sure how to organise this functionality so counter-proposals are welcome. Another option would be to put all this in another crate? |
Yes I was only pointing out that different situations make different sampling schemes optimal. |
see #195 for an initial stab |
Most rust operations are fallible if they don't make sense, either through a panic or by returning an
Option
orResult
. However,rand::sample
has the following API:This function will silently return less than amount if the size of
I
is less thanamount
. I would say this is very odd, and returningOption<Vec<T>>
would have been a better design choice. You can imagine the confusion if you didrand::sample(rng, v, 100)
but then only had 2 elements!If it were changed to returning
None
ifI.len() < amount
then the current behavior could still be accomplished with:With the current behavior, user has to put guards around their code (which is not required by the compiler) to prevent getting less than amount, which is unusual in rust.
Another argument is that trying to sample when
amount > len()
will just return an "inadequately shuffled" version of the iterator. If someone wanted that behavior it would be more correct to justcollect()
the iterator and callshuffle()
on it.The text was updated successfully, but these errors were encountered: