Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize fields in postaggs #2883

Closed
wants to merge 13 commits into from
Closed

Finalize fields in postaggs #2883

wants to merge 13 commits into from

Conversation

jaehc
Copy link
Contributor

@jaehc jaehc commented Apr 26, 2016

According to #2433, automatically finalizing aggregators in a post aggregator sounds really good to druid client programs.
I gave it a try to make it work and it seems good. Now, you can write a query like the below. However, there are some duplicate code blocks here and there. Any comments on refactoring would be welcomed.

  "aggregations": [
     {"type": "hyperUnique", "fieldName": "user_unique", "name" : "user_unique"}
  ],
  "postAggregations" : [
     { "type" : "finalizingFieldAccess", "name": "user_unique", "fieldName" : "user_unique" }
  ],

instead of

  "aggregations": [
     {"type": "hyperUnique", "fieldName": "user_unique", "name" : "user_unique"}
  ],
  "postAggregations" : [
     { "type" : "hyperUniqueCardinality", "name": "user_unique", "fieldName" : "user_unique" }
  ],

This change is Reviewable

@fjy
Copy link
Contributor

fjy commented May 4, 2016

Thanks @jaehc, we'll try to get to reviewing this soon. We are really backlogged on PRs right now.

@jaehc
Copy link
Contributor Author

jaehc commented Aug 24, 2016

rebased to master

@fjy fjy added this to the 0.9.3 milestone Aug 24, 2016
@@ -84,6 +85,7 @@ public AggregatorsModule()
@JsonSubTypes(value = {
@JsonSubTypes.Type(name = "arithmetic", value = ArithmeticPostAggregator.class),
@JsonSubTypes.Type(name = "fieldAccess", value = FieldAccessPostAggregator.class),
@JsonSubTypes.Type(name = "finalFieldAccess", value = FinalFieldAccessPostAggregator.class),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this FinalizingFieldAccessPostAggregator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@fjy
Copy link
Contributor

fjy commented Oct 11, 2016

@nishantmonu51 can you help with review here?

@@ -55,6 +61,18 @@ public static void verifyAggregations(
"Missing fields [%s] for postAggregator [%s]", missing, postAgg.getName()
);
Preconditions.checkArgument(combinedAggNames.add(postAgg.getName()), "[%s] already defined", postAgg.getName());

if (postAgg instanceof HasDependentAggFactories) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably will not work when a dependent post agg is used within Arithmetic post aggregator.
We need to have a better way to inject aggregators instead of using instanceOf.

Copy link
Contributor Author

@jaehc jaehc Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Ut for a finalizing post aggregator with arithmetic post aggregator has beed added. It seems work, but do you mean this case or other situation? Of course, I agree that we can use better ways to get aggregators injected if some. Would you give me some recommandation for this?

Copy link
Member

@nishantmonu51 nishantmonu51 Nov 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you are referring to testComputedInArithmeticPostAggregator.
there its only testing computation instead of dependency injection.
can you add/modify the test to call Queries.prepareAggregations() for arithmetic aggregator and verify that all dependencies are set properly ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishantmonu51 You're right. Injections of post aggs in Arithmetic post aggregator does not work because jackson injects the post aggs while creating an instance which means there is no chance to go through Queries.prepareAggregations()

Copy link
Contributor Author

@jaehc jaehc Dec 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishantmonu51 I've made ArithmeticPostAggregator implements HasDependentAggFactories and modify a existing Ut for that(testIngestAndQueryWithArithmeticPostAggregator). It seems work so far now. But, do you think things are getting ugly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, looking at this further, there seems to be other aggregators too which act as a wrapper for other postAggs e.g DoubleGreatestPostAggregator, LongGreatestPostAggregator etc.

To make it more cleaner, I think it might just be simpler to add new method setDependentAggFactories in interface PostAggregator itself. Any implementation not having any dependent post aggregator will have a no-op implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishantmonu51 It makes sense to me, I will add setDependentAggFactories in PostAggregator and maybe I guess we would probably need another contextual information later and think about it then.

@nishantmonu51
Copy link
Member

@jaehc: can you address #2883 (comment) ?

@fjy
Copy link
Contributor

fjy commented Dec 21, 2016

ping @jaehc

@jaehc
Copy link
Contributor Author

jaehc commented Dec 25, 2016

@fjy @nishantmonu51 sorry for being late. I will handle it

@jaehc
Copy link
Contributor Author

jaehc commented Jan 4, 2017

@nishantmonu51
PostAggregator interface now has got to have decorate() which opens flexibility of implementing the computation with more contextual information (e.g. injecting aggregator factory dependencies).
Would you mind going though the idea of decorate()? We are using this interface in our local fork indeed.

@nishantmonu51
Copy link
Member

LGTM, with the changes for decorate. @fjy Can you also review ?

@nishantmonu51
Copy link
Member

updated the master comment to change finalFieldAccess -> finalizingFieldAccess

Copy link
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@drcrallen
Copy link
Contributor

Incompatibilities need called out in the master comment, please

@jaehc
Copy link
Contributor Author

jaehc commented Feb 7, 2017

@nishantmonu51 Thanks for giving a review. Is there somethign I need to handle about this PR? By the way, what is the master comment you mentioned?

@drcrallen
Copy link
Contributor

@jaehc master comment is the top comment in the issue at #2883 (comment)

@gianm gianm assigned gianm and unassigned fjy Feb 7, 2017
@gianm
Copy link
Contributor

gianm commented Feb 7, 2017

Taking a look.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaehc thanks for the contribution. I just had some small comments and a request to add Comparator support too.

@@ -35,4 +35,6 @@
public Object compute(Map<String, Object> combinedAggregators);

public String getName();

public PostAggregator decorate(Map<String, AggregatorFactory> aggregators);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a javadoc to this explaining what it's supposed to do, and what aggregators is expected to contain? It's not clear from the method type signature how this would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It needs more explanation. I put some comments on it.

@Override
public FinalizingFieldAccessPostAggregator decorate(final Map<String, AggregatorFactory> aggregators)
{
return new FinalizingFieldAccessPostAggregator(name, fieldName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should override getComparator too, so the post-aggregator can be sorted on. This should be possible by using the comparator of aggregators.get(fieldName).getComparator() if that exists, and Ordering.natural().nullsFirst() otherwise (hoping for the best).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sound good to me. But, why do we use Ordering.natural().nullsFirst() other than only Ordering.natural()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case there's a null return value from a postaggregator, I guess.

public String toString()
{
return "FinalizingFieldAccessPostAggregator{" +
"name'" + name + '\'' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name=' is missing the equals sign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return result;
}

public static FinalizingFieldAccessPostAggregator buildDecorated(String name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VisibleForTesting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -120,12 +120,15 @@ public GroupByQuery(
Preconditions.checkArgument(spec != null, "dimensions has null DimensionSpec");
}
this.aggregatorSpecs = aggregatorSpecs == null ? ImmutableList.<AggregatorFactory>of() : aggregatorSpecs;
this.postAggregatorSpecs = postAggregatorSpecs == null ? ImmutableList.<PostAggregator>of() : postAggregatorSpecs;
this.postAggregatorSpecs = Queries.prepareAggregations(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like postAggregatorSpecs == null ? ImmutableList.<PostAggregator>of() : postAggregatorSpecs is a common pattern, might as well have Queries.prepareAggregations do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere to apply it other than the line number of 125?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant, Queries.prepareAggregations could do this… return an empty list if postAggregatorSpecs is null. This is minor thing and not really necessary though.

import java.util.List;
import java.util.Map;

public class FinalizingFieldAccessPostAggregatorTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for the comparator too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with testComparatorsWithFinalizing and the below Ut.

import java.util.Set;

/**
*/
public class Queries
{
public static void verifyAggregations(
public static List<PostAggregator> decorate(List<PostAggregator> postAggs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: name this decoratePostAggregators? Or move to PostAggregators.decorate? The name "Queries.decorate" sounds a little too generic for what this does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sound good to me. It was somewhat obscure. For now, I rather lean to rename it. But, when we find out another things doing stuff with post aggregators, it would be good to be in a separated class like PostAggregators.decorate()

Copy link
Contributor Author

@jaehc jaehc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A patch to address comments.

@@ -120,12 +120,15 @@ public GroupByQuery(
Preconditions.checkArgument(spec != null, "dimensions has null DimensionSpec");
}
this.aggregatorSpecs = aggregatorSpecs == null ? ImmutableList.<AggregatorFactory>of() : aggregatorSpecs;
this.postAggregatorSpecs = postAggregatorSpecs == null ? ImmutableList.<PostAggregator>of() : postAggregatorSpecs;
this.postAggregatorSpecs = Queries.prepareAggregations(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there somewhere to apply it other than the line number of 125?

public String toString()
{
return "FinalizingFieldAccessPostAggregator{" +
"name'" + name + '\'' +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return result;
}

public static FinalizingFieldAccessPostAggregator buildDecorated(String name,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import java.util.Set;

/**
*/
public class Queries
{
public static void verifyAggregations(
public static List<PostAggregator> decorate(List<PostAggregator> postAggs,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sound good to me. It was somewhat obscure. For now, I rather lean to rename it. But, when we find out another things doing stuff with post aggregators, it would be good to be in a separated class like PostAggregators.decorate()

@@ -35,4 +35,6 @@
public Object compute(Map<String, Object> combinedAggregators);

public String getName();

public PostAggregator decorate(Map<String, AggregatorFactory> aggregators);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It needs more explanation. I put some comments on it.

@Override
public FinalizingFieldAccessPostAggregator decorate(final Map<String, AggregatorFactory> aggregators)
{
return new FinalizingFieldAccessPostAggregator(name, fieldName) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sound good to me. But, why do we use Ordering.natural().nullsFirst() other than only Ordering.natural()?

@gianm
Copy link
Contributor

gianm commented Feb 13, 2017

@jaehc thanks, re-reviewing.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaehc, LGTM after fixing conflicts; looks like they are all from #3899 which added cache keys to post-aggregators.

@gianm
Copy link
Contributor

gianm commented Feb 21, 2017

I will take a look at resolving conflicts for this patch for 0.10.0.

@gianm
Copy link
Contributor

gianm commented Feb 22, 2017

Opened #3957 with conflicts fixed, otherwise unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants