Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds hard_bounds to histogram aggregations #59175

Merged
merged 8 commits into from
Jul 13, 2020

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Jul 7, 2020

Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.

Closes #50109

Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.

Closes elastic#50109
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 7, 2020
@imotov imotov requested a review from not-napoleon July 7, 2020 17:32
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thanks for fielding it. I know some users will be really happy to see this land :)

I left a couple of nits, but nothing major.

@@ -139,7 +145,10 @@ public DateHistogramAggregationBuilder(StreamInput in) throws IOException {
minDocCount = in.readVLong();
dateHistogramInterval = new DateIntervalWrapper(in);
offset = in.readLong();
extendedBounds = in.readOptionalWriteable(ExtendedBounds::new);
extendedBounds = in.readOptionalWriteable(LongBounds::new);
if (in.getVersion().onOrAfter(Version.V_8_0_0)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll change this after the backport, I presume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's just until backport.

/** Set hard bounds on this histogram, specifying boundaries outside which buckets cannot be created. */
public DateHistogramAggregationBuilder hardBounds(LongBounds hardBounds) {
if (hardBounds == null) {
throw new IllegalArgumentException("[hardBounds] must not be null: [" + name + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not allow setting the bounds to null? It's not a required parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with all other optional parameters in this builder like order or externed_bounds.


import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg;

public class DoubleBounds implements ToXContentFragment, Writeable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct that there's no common interface between DoubleBound and LongBound because we'd have to autobox in the collection loop? If so, maybe let's just leave a comment to that effect and maybe link the two in Javadoc, so there's at least something indicating there are two implementations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow. Where do you expect them to intersect?

@@ -87,6 +91,7 @@ public static void registerAggregators(ValuesSourceRegistry.Builder builder) {
private double offset = 0;
private double minBound = Double.POSITIVE_INFINITY;
private double maxBound = Double.NEGATIVE_INFINITY;
private DoubleBounds hardBounds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we convert minBound and maxBound into a DoubleBounds instance? if not, should probably just leave a comment that it's done this way for legacy reasons, not out of any specific need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to refactor minBounds and maxBounds in a follow up PR. I just didn't feel like making this one more complicated than necessary. I will add TODO comment.

@@ -218,6 +215,16 @@ public Long getMax() {
return max;
}

public boolean contain(long time) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time seems like a confusing name for this parameter. Maybe just value like the double version uses?

);
}

public void testOverlappingBounds() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have a test for hard bounds exactly equal to extended bounds. That's a bit of an edge case, and we want to make sure it doesn't break down the line.

@imotov imotov merged commit 0af410a into elastic:master Jul 13, 2020
@imotov
Copy link
Contributor Author

imotov commented Jul 14, 2020

This PR requires a bit tricky backport, so we will let it slip into 7.10 in order to not disrupt feature freeze today.

@imotov imotov added the v7.10.0 label Jul 15, 2020
imotov added a commit to imotov/elasticsearch that referenced this pull request Jul 15, 2020
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
imotov added a commit that referenced this pull request Jul 16, 2020
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
@imotov imotov deleted the issue-50109-add-restrictive-bounds branch July 17, 2020 17:52
imotov added a commit that referenced this pull request Aug 4, 2020
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
imotov added a commit to imotov/elasticsearch that referenced this pull request Aug 4, 2020
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for elastic#59175
imotov added a commit that referenced this pull request Aug 4, 2020
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
@consulthys
Copy link
Contributor

Thanks for fixing this @imotov 👍

@ClaudioConsolmagno
Copy link

Should the renaming of ExtendedBounds to LongBounds have appeared in the 7.10 release notes or the migration guide?

Was upgrading elastic and dependencies today and came across this and was looking to see if there was any impact in simply renaming ExtendedBounds to LongBounds in my code so looked first in the notes and couldn't see anything. Had to search the repo until I tracked it down to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

date_histogram of date_range with null end points triggers a circuit breaker
6 participants