-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds hard_bounds to histogram aggregations #59175
Adds hard_bounds to histogram aggregations #59175
Conversation
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram can generate. This is especially useful in case of open ended ranges that can produce a very large number of buckets. Closes elastic#50109
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, thanks for fielding it. I know some users will be really happy to see this land :)
I left a couple of nits, but nothing major.
@@ -139,7 +145,10 @@ public DateHistogramAggregationBuilder(StreamInput in) throws IOException { | |||
minDocCount = in.readVLong(); | |||
dateHistogramInterval = new DateIntervalWrapper(in); | |||
offset = in.readLong(); | |||
extendedBounds = in.readOptionalWriteable(ExtendedBounds::new); | |||
extendedBounds = in.readOptionalWriteable(LongBounds::new); | |||
if (in.getVersion().onOrAfter(Version.V_8_0_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll change this after the backport, I presume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's just until backport.
/** Set hard bounds on this histogram, specifying boundaries outside which buckets cannot be created. */ | ||
public DateHistogramAggregationBuilder hardBounds(LongBounds hardBounds) { | ||
if (hardBounds == null) { | ||
throw new IllegalArgumentException("[hardBounds] must not be null: [" + name + "]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not allow setting the bounds to null? It's not a required parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with all other optional parameters in this builder like order or externed_bounds.
...ava/org/elasticsearch/search/aggregations/bucket/histogram/DateRangeHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...ava/org/elasticsearch/search/aggregations/bucket/histogram/DateRangeHistogramAggregator.java
Outdated
Show resolved
Hide resolved
|
||
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg; | ||
|
||
public class DoubleBounds implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct that there's no common interface between DoubleBound
and LongBound
because we'd have to autobox in the collection loop? If so, maybe let's just leave a comment to that effect and maybe link the two in Javadoc, so there's at least something indicating there are two implementations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow. Where do you expect them to intersect?
@@ -87,6 +91,7 @@ public static void registerAggregators(ValuesSourceRegistry.Builder builder) { | |||
private double offset = 0; | |||
private double minBound = Double.POSITIVE_INFINITY; | |||
private double maxBound = Double.NEGATIVE_INFINITY; | |||
private DoubleBounds hardBounds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we convert minBound
and maxBound
into a DoubleBounds
instance? if not, should probably just leave a comment that it's done this way for legacy reasons, not out of any specific need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to refactor minBounds and maxBounds in a follow up PR. I just didn't feel like making this one more complicated than necessary. I will add TODO comment.
@@ -218,6 +215,16 @@ public Long getMax() { | |||
return max; | |||
} | |||
|
|||
public boolean contain(long time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time
seems like a confusing name for this parameter. Maybe just value
like the double version uses?
...in/java/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java
Outdated
Show resolved
Hide resolved
...in/java/org/elasticsearch/search/aggregations/bucket/histogram/RangeHistogramAggregator.java
Outdated
Show resolved
Hide resolved
); | ||
} | ||
|
||
public void testOverlappingBounds() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have a test for hard bounds exactly equal to extended bounds. That's a bit of an edge case, and we want to make sure it doesn't break down the line.
…ket/histogram/DateRangeHistogramAggregator.java Co-authored-by: Mark Tozzi <[email protected]>
…ket/histogram/DateRangeHistogramAggregator.java Co-authored-by: Mark Tozzi <[email protected]>
…ket/histogram/RangeHistogramAggregator.java Co-authored-by: Mark Tozzi <[email protected]>
…ket/histogram/RangeHistogramAggregator.java Co-authored-by: Mark Tozzi <[email protected]>
This PR requires a bit tricky backport, so we will let it slip into 7.10 in order to not disrupt feature freeze today. |
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram can generate. This is especially useful in case of open ended ranges that can produce a very large number of buckets.
Refactors extendedBounds to use DoubleBounds instead of 2 variables. This is a follow up for #59175
Refactors extendedBounds to use DoubleBounds instead of 2 variables. This is a follow up for elastic#59175
Thanks for fixing this @imotov 👍 |
Should the renaming of Was upgrading elastic and dependencies today and came across this and was looking to see if there was any impact in simply renaming |
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
Closes #50109