-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datasketches perf: SketchAggregatorFactory.combine(..) returns Union object now #3471
Conversation
…it can be reused across multiple combine(..) calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 assuming rolling updates still work (it looks like they should)
public void serialize(Union union, JsonGenerator jgen, SerializerProvider provider) | ||
throws IOException, JsonProcessingException | ||
{ | ||
jgen.writeBinary(union.getResult(true, null).toByteArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that serialized format is unchanged, so rolling updates are still possible, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, changes here are totally backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @himanshug
👍 @himanshug what are the perf benefits? I think this is fine for 0.9.2 |
👍 These changes produce an 80% speed-up on some timeseries queries that we were running. Given that the old groupBy aggregates using the aggregator (which builds a Union implicitly and keeps unioning into it), I don't think this will greatly affect performance there. But it will significantly speed up timeseries, topN and any other query that depends on the |
moving to 0.9.2 since it is ready to be merged. |
@himanshug can you please do a backport PR to 0.9.2? |
…it can be reused across multiple combine(..) calls (apache#3471)
…it can be reused across multiple combine(..) calls (apache#3471)
so that it can be reused across multiple combine(..) calls which are extensively used for merging in various places.
improvement would depend upon the type of query and amount of data being merged at brokers, for some of the timeseries queries it brought down the response time from 10 secs to 1.5 secs.
existing UTs in the datasketches module cover the change.
Note: I can move the milestone to 0.9.2 if this gets merged in time or else, we will cherry-pick it in our internal distro.