Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying Quantiles to return ordered HashMap #2989

Closed
wants to merge 1 commit into from

Conversation

hshankar
Copy link
Contributor

@hshankar hshankar commented Nov 9, 2017

Usually, when someone computes percentiles, they would end up iterating over the resulting map to print the percentiles or store them somewhere. It is useful to return the percentiles in the order in which they were requested.
e.g,
If one computes Quantiles.percentiles().indexes(50, 90, 99).compute(times) and iterates over the result, it should be iterated in the order 50, 90, 99. Currently, we use HashMap, so the order is unpredictable. Making it a LinkedHashMap will ensure results in the expected order with very minimal perf impact.

@googlebot
Copy link
Collaborator

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@kluever kluever requested a review from cgdecker November 10, 2017 14:38
@kluever
Copy link
Member

kluever commented Nov 10, 2017

/cc @PeteGillin

@PeteGillin
Copy link

This sounds reasonable to me in principle, but if we're going to do it then I think we should make documentation changes (an API that is specified as preserving ordering is much more useful than one that happens to preserve ordering by implementation but doesn't specify it) and I think we should therefore add tests for the behaviour.

@googlebot
Copy link
Collaborator

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for the commit author(s). If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

Usually, when someone computes percentiles, they would end up iterating over the resulting map to print the percentiles or store them somewhere. It is useful to return the percentiles in the order in which they were requested.
e.g,
If one computes `Quantiles.percentiles().indexes(50, 90, 99).compute(times)` and iterates over the result, it should be iterated in the order 50, 90, 99. Currently, we use HashMap, so the order is unpredictable. Making it a LinkedHashMap will ensure results in the expected order with very minimal perf impact.
@googlebot
Copy link
Collaborator

CLAs look good, thanks!

@hshankar
Copy link
Contributor Author

Makes sense. Updated the docs and added a test. One thing to note is, since the indexes are de-duped, if the same index is repeated, the order of the result will only contain the first instance. e.g, if indexes are 0, 1, 2, 1, 3, the results will be in order 0, 1, 2, 3. I think this should be ok though, since it doesn't change the existing behavior guarantees.

@PeteGillin
Copy link

This looks like a sensible approach to me. (I think @cgdecker is doing the actual review.)

Aside: It's a shame we can't use ImmutableMap here, of course, but c.g.common.math can't depend on c.g.common.collect without creating a circular dependency :-(.

@ronshapiro
Copy link
Contributor

@cgdecker Is this ready to submitted internally?

@cpovirk
Copy link
Member

cpovirk commented Oct 2, 2019

I don't see an internal review, though I could be missing it.

A question to anyone still following along here: Would it make sense to instead sort the indexes? That wouldn't make a difference for (50, 90, 99), of course, but maybe users would want sorted order in most cases? (We could even return a NavigableMap, though we might not bother to change the declared return type at this point.) I suspect that a TreeMap may be a little slower to iterate than a LinkedHashMap, but the difference is likely noise. I'm not sure how the two compare in size.

@cgdecker
Copy link
Member

cgdecker commented Oct 2, 2019

By and large I've totally missed when things have been assigned to me on GitHub like this, so no, there wasn't any internal review.

I'm not clear enough about the use cases/user's expectations to know whether sorting making sense, though to hazard a guess I'd say it probably makes sense. Of course, the user can always copy the result to a TreeMap if that's what they want; preserving the initial order is slightly harder, though still possible, by creating a LinkedHashMap manually and putting each key in the initial order and its value from the returned Map into that.

@PeteGillin
Copy link

If we do user-specified-order, as @hshankar originally suggested, and the user wanted sorted, then the user has another option: specify the indexes in sorted order. I just spot-checked about a dozen callers in our codebase, and most of them were hard-coding the indexes, so putting in order would be trivial (and the ones who weren't hard-coding them were generating them in order anyway).

That seems like another argument for preferring user-specified-order, on top of the one that @cgdecker mentions (that converting to a TreeMap is easier than vice versa).

In practice, all of the callers I spot-checked in our codebase were already specifying the indexes in order, so it wouldn't make any difference to them which we chose (and both would be more predictable than the current behaviour).

@cpovirk cpovirk mentioned this pull request Oct 2, 2019
cpovirk pushed a commit that referenced this pull request Oct 2, 2019
…map with entries in the order in which the indexes were provided in the call to indexes().

Fixes #2989

RELNOTES=`math`: `Quantiles` `compute()` methods which return a `Map` now return a map with entries in the same order in which the indexes were provided.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=272459623
@cpovirk cpovirk closed this in #3636 Oct 2, 2019
cpovirk pushed a commit that referenced this pull request Oct 2, 2019
…map with entries in the order in which the indexes were provided in the call to indexes().

Fixes #2989

RELNOTES=`math`: `Quantiles` `compute()` methods which return a `Map` now return a map with entries in the same order in which the indexes were provided.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=272459623
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants