-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a cluster setting configuring TDigestExecutionHint
#96943
Conversation
More work needed for TDigestPercentile*Tests and the TDigestTest (and the rest of the tests) in the tdigest lib to pass.
Remove wrong asserts from tests and MergingDigest.
Remove redundant serializing interfaces from the library.
These tests don't address compatibility issues in mixed cluster tests as the latter contain a mix of older and newer nodes, so the output depends on which node is picked as a data node since the forked TDigest library is not backwards compatible (produces slightly different results).
# Conflicts: # libs/tdigest/src/main/java/org/elasticsearch/tdigest/SortingDigest.java # server/src/main/java/org/elasticsearch/TransportVersion.java # server/src/main/java/org/elasticsearch/index/query/SearchExecutionContext.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregationBuilder.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregator.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregatorFactory.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/PercentilesConfig.java # server/src/main/java/org/elasticsearch/search/aggregations/metrics/TDigestState.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregatorTests.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationTests.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestExecutionHintTests.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestPercentileRanksAggregatorTests.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestPercentilesAggregatorTests.java # server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestStateTests.java # x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/aggregations/metrics/AbstractHistoBackedTDigestPercentilesAggregator.java # x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregationBuilder.java # x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregatorFactory.java # x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/aggregations/metrics/TDigestPreAggregatedPercentileRanksAggregatorTests.java # x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/aggregations/metrics/TDigestPreAggregatedPercentilesAggregatorTests.java # x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregationBuilderTests.java
Hi @kkrik-es, I've created a changelog YAML for you. |
SearchExecutionContext
to configure TDigestExecutionHint
TDigestExecutionHint
Pinging @elastic/es-analytics-geo (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to rely on the default defined in the new cluster setting and make execution hint an optional field in all the builders.
@@ -21,6 +22,14 @@ public enum TDigestExecutionHint implements Writeable { | |||
DEFAULT(0), // Use a TDigest that is optimized for performance, with a small penalty in accuracy. | |||
HIGH_ACCURACY(1); // Use a TDigest that is optimize for accuracy, at the expense of performance. | |||
|
|||
public static final Setting<String> SETTING = Setting.simpleString( | |||
"search.aggs.tdigest_execution_hint", | |||
"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define the default here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - empty string was already mapped to DEFAULT, but made it more explicit.
@@ -165,7 +165,7 @@ public void parseExecutionHint(String executionHint) { | |||
} | |||
|
|||
public TDigestExecutionHint getExecutionHint() { | |||
return executionHint; | |||
return executionHint == null ? TDigestExecutionHint.DEFAULT : executionHint; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow executionHint to be null here so that we can rely on the default defined in the new cluster setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this should only apply in cases where the executionHint hasn't been set through a call to a method that accesses the AggregationContext. In general, I think it's better to guarantee that this won't be used as null, otherwise we'd better use an Optional and modify all consumers accordingly. This does feel like overengineering, though - we have a default hint for this exact reason..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, fixed as suggested to cover for the case where the object is written without a hint but read when one is set in ClusterSettings.
} | ||
|
||
@Override | ||
public void writeTo(StreamOutput out) throws IOException { | ||
super.writeTo(out); | ||
out.writeDouble(compression); | ||
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_014)) { | ||
executionHint.writeTo(out); | ||
getExecutionHint().writeTo(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use writeOptionalWriteable(...)
here? That way we can allow exection hint to be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately means we do need to increase transport version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of leaving this as null at this point? Do we expect mutations after an object gets serialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used a writeBoolean explicitly to combine with TDigestExecutionHint.writeTo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments. I think it is almost ready.
Mainly around using: readOptionalWriteable()
and writeOptionalWriteable()
.
...org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregationBuilder.java
Outdated
Show resolved
Hide resolved
@@ -21,6 +22,14 @@ public enum TDigestExecutionHint implements Writeable { | |||
DEFAULT(0), // Use a TDigest that is optimized for performance, with a small penalty in accuracy. | |||
HIGH_ACCURACY(1); // Use a TDigest that is optimize for accuracy, at the expense of performance. | |||
|
|||
public static final Setting<String> SETTING = Setting.simpleString( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Setting#enumSetting(...)
should be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one is case sensitive - setting to HIGH_ACCURACY works but high_accuracy doesn't. That's why I think using ::parse is preferable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see this method always uses Enum.valueOf(...)
which is case sensitive and only allows values as is defined in the enum itself. (and in ExecutionHint
all enums are defined in uppercase).
Ok, let's keep it this way.
server/src/main/java/org/elasticsearch/search/aggregations/metrics/PercentilesConfig.java
Outdated
Show resolved
Hide resolved
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_014)) { | ||
executionHint.writeTo(out); | ||
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_017)) { | ||
if (executionHint != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeOptional?
@@ -650,6 +651,7 @@ public SearchExecutionContext newSearchExecutionContext( | |||
shardId, | |||
shardRequestIndex, | |||
indexSettings, | |||
clusterService != null ? clusterService.getClusterSettings() : ClusterSettings.createBuiltInClusterSettings(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curios, when is this null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't spot any, I was just defensive here. Removed the check.
/** | ||
* Factory method generating a dummy context with partial functionality. Strictly test-only. | ||
*/ | ||
public static SearchExecutionContext newDummyForTests( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is test code, but gets included in the release. Can we move it to test/framework/main/java ? Maybe as factory method in a class in org.elasticsearch.search package in the test framework?
...ytics/src/main/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregationBuilder.java
Outdated
Show resolved
Hide resolved
...ytics/src/main/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregationBuilder.java
Outdated
Show resolved
Hide resolved
Move test-only SearchExecutionContext method in helper class under test.
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersion.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
test/framework/src/main/java/org/elasticsearch/index/query/SearchExecutionContextHelper.java
Show resolved
Hide resolved
@@ -21,6 +22,14 @@ public enum TDigestExecutionHint implements Writeable { | |||
DEFAULT(0), // Use a TDigest that is optimized for performance, with a small penalty in accuracy. | |||
HIGH_ACCURACY(1); // Use a TDigest that is optimize for accuracy, at the expense of performance. | |||
|
|||
public static final Setting<String> SETTING = Setting.simpleString( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see this method always uses Enum.valueOf(...)
which is case sensitive and only allows values as is defined in the enum itself. (and in ExecutionHint
all enums are defined in uppercase).
Ok, let's keep it this way.
This was replaced by optional serialization in version V_8_500_018 (elastic#96943), breaking backwards compatibility for serverless. Related to elastic#95903
Provide a
Setting
forTDigestExecutionHint
that's configured throughClusterSettings
. WireClusterSettings
throughSearchExecutionContext
andAggregationContext
, to access the setting in agg builders that use percentiles. Add a helper factory method forSearchExecutionContext
to simplify initialization in tests.Any hint set directly inside the tdigest spec of a percentile query takes precedence over the one from
ClusterSettings
.Related to #95903