-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifying Quantiles to return ordered HashMap #2989
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed, please reply here (e.g.
|
/cc @PeteGillin |
This sounds reasonable to me in principle, but if we're going to do it then I think we should make documentation changes (an API that is specified as preserving ordering is much more useful than one that happens to preserve ordering by implementation but doesn't specify it) and I think we should therefore add tests for the behaviour. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. |
Usually, when someone computes percentiles, they would end up iterating over the resulting map to print the percentiles or store them somewhere. It is useful to return the percentiles in the order in which they were requested. e.g, If one computes `Quantiles.percentiles().indexes(50, 90, 99).compute(times)` and iterates over the result, it should be iterated in the order 50, 90, 99. Currently, we use HashMap, so the order is unpredictable. Making it a LinkedHashMap will ensure results in the expected order with very minimal perf impact.
CLAs look good, thanks! |
Makes sense. Updated the docs and added a test. One thing to note is, since the indexes are de-duped, if the same index is repeated, the order of the result will only contain the first instance. e.g, if indexes are |
This looks like a sensible approach to me. (I think @cgdecker is doing the actual review.) Aside: It's a shame we can't use ImmutableMap here, of course, but c.g.common.math can't depend on c.g.common.collect without creating a circular dependency :-(. |
@cgdecker Is this ready to submitted internally? |
I don't see an internal review, though I could be missing it. A question to anyone still following along here: Would it make sense to instead sort the indexes? That wouldn't make a difference for |
By and large I've totally missed when things have been assigned to me on GitHub like this, so no, there wasn't any internal review. I'm not clear enough about the use cases/user's expectations to know whether sorting making sense, though to hazard a guess I'd say it probably makes sense. Of course, the user can always copy the result to a |
If we do user-specified-order, as @hshankar originally suggested, and the user wanted sorted, then the user has another option: specify the indexes in sorted order. I just spot-checked about a dozen callers in our codebase, and most of them were hard-coding the indexes, so putting in order would be trivial (and the ones who weren't hard-coding them were generating them in order anyway). That seems like another argument for preferring user-specified-order, on top of the one that @cgdecker mentions (that converting to a TreeMap is easier than vice versa). In practice, all of the callers I spot-checked in our codebase were already specifying the indexes in order, so it wouldn't make any difference to them which we chose (and both would be more predictable than the current behaviour). |
…map with entries in the order in which the indexes were provided in the call to indexes(). Fixes #2989 RELNOTES=`math`: `Quantiles` `compute()` methods which return a `Map` now return a map with entries in the same order in which the indexes were provided. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=272459623
…map with entries in the order in which the indexes were provided in the call to indexes(). Fixes #2989 RELNOTES=`math`: `Quantiles` `compute()` methods which return a `Map` now return a map with entries in the same order in which the indexes were provided. ------------- Created by MOE: https://github.com/google/moe MOE_MIGRATED_REVID=272459623
Usually, when someone computes percentiles, they would end up iterating over the resulting map to print the percentiles or store them somewhere. It is useful to return the percentiles in the order in which they were requested.
e.g,
If one computes
Quantiles.percentiles().indexes(50, 90, 99).compute(times)
and iterates over the result, it should be iterated in the order 50, 90, 99. Currently, we use HashMap, so the order is unpredictable. Making it a LinkedHashMap will ensure results in the expected order with very minimal perf impact.