Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cluster setting configuring TDigestExecutionHint #96943

Merged
merged 176 commits into from
Jun 20, 2023

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Jun 20, 2023

Provide a Setting for TDigestExecutionHint that's configured through ClusterSettings. Wire ClusterSettings through SearchExecutionContext and AggregationContext, to access the setting in agg builders that use percentiles. Add a helper factory method for SearchExecutionContext to simplify initialization in tests.

Any hint set directly inside the tdigest spec of a percentile query takes precedence over the one from ClusterSettings.

Related to #95903

kkrik-es and others added 30 commits May 9, 2023 10:23
More work needed for TDigestPercentile*Tests and the TDigestTest (and
the rest of the tests) in the tdigest lib to pass.
Remove wrong asserts from tests and MergingDigest.
Remove redundant serializing interfaces from the library.
These tests don't address compatibility issues in mixed cluster tests as
the latter contain a mix of older and newer nodes, so the output depends
on which node is picked as a data node since the forked TDigest library
is not backwards compatible (produces slightly different results).
kkrik-es added 2 commits June 20, 2023 11:06
# Conflicts:
#	libs/tdigest/src/main/java/org/elasticsearch/tdigest/SortingDigest.java
#	server/src/main/java/org/elasticsearch/TransportVersion.java
#	server/src/main/java/org/elasticsearch/index/query/SearchExecutionContext.java
#	server/src/main/java/org/elasticsearch/search/aggregations/metrics/AbstractTDigestPercentilesAggregator.java
#	server/src/main/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregationBuilder.java
#	server/src/main/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregator.java
#	server/src/main/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregatorFactory.java
#	server/src/main/java/org/elasticsearch/search/aggregations/metrics/PercentilesConfig.java
#	server/src/main/java/org/elasticsearch/search/aggregations/metrics/TDigestState.java
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationAggregatorTests.java
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/MedianAbsoluteDeviationTests.java
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestExecutionHintTests.java
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestPercentileRanksAggregatorTests.java
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestPercentilesAggregatorTests.java
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/TDigestStateTests.java
#	x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/aggregations/metrics/AbstractHistoBackedTDigestPercentilesAggregator.java
#	x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregationBuilder.java
#	x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregatorFactory.java
#	x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/aggregations/metrics/TDigestPreAggregatedPercentileRanksAggregatorTests.java
#	x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/aggregations/metrics/TDigestPreAggregatedPercentilesAggregatorTests.java
#	x-pack/plugin/analytics/src/test/java/org/elasticsearch/xpack/analytics/boxplot/BoxplotAggregationBuilderTests.java
@kkrik-es kkrik-es added >enhancement :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 20, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es kkrik-es changed the title Add cluster setting to SearchExecutionContext to configure TDigestExecutionHint Add a cluster setting configuring TDigestExecutionHint Jun 20, 2023
@kkrik-es kkrik-es requested a review from martijnvg June 20, 2023 09:57
@kkrik-es kkrik-es marked this pull request as ready for review June 20, 2023 09:57
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should try to rely on the default defined in the new cluster setting and make execution hint an optional field in all the builders.

@@ -21,6 +22,14 @@ public enum TDigestExecutionHint implements Writeable {
DEFAULT(0), // Use a TDigest that is optimized for performance, with a small penalty in accuracy.
HIGH_ACCURACY(1); // Use a TDigest that is optimize for accuracy, at the expense of performance.

public static final Setting<String> SETTING = Setting.simpleString(
"search.aggs.tdigest_execution_hint",
"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe define the default here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - empty string was already mapped to DEFAULT, but made it more explicit.

@@ -165,7 +165,7 @@ public void parseExecutionHint(String executionHint) {
}

public TDigestExecutionHint getExecutionHint() {
return executionHint;
return executionHint == null ? TDigestExecutionHint.DEFAULT : executionHint;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow executionHint to be null here so that we can rely on the default defined in the new cluster setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this should only apply in cases where the executionHint hasn't been set through a call to a method that accesses the AggregationContext. In general, I think it's better to guarantee that this won't be used as null, otherwise we'd better use an Optional and modify all consumers accordingly. This does feel like overengineering, though - we have a default hint for this exact reason..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, fixed as suggested to cover for the case where the object is written without a hint but read when one is set in ClusterSettings.

}

@Override
public void writeTo(StreamOutput out) throws IOException {
super.writeTo(out);
out.writeDouble(compression);
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_014)) {
executionHint.writeTo(out);
getExecutionHint().writeTo(out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use writeOptionalWriteable(...) here? That way we can allow exection hint to be null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately means we do need to increase transport version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of leaving this as null at this point? Do we expect mutations after an object gets serialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used a writeBoolean explicitly to combine with TDigestExecutionHint.writeTo

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more comments. I think it is almost ready.
Mainly around using: readOptionalWriteable() and writeOptionalWriteable().

@@ -21,6 +22,14 @@ public enum TDigestExecutionHint implements Writeable {
DEFAULT(0), // Use a TDigest that is optimized for performance, with a small penalty in accuracy.
HIGH_ACCURACY(1); // Use a TDigest that is optimize for accuracy, at the expense of performance.

public static final Setting<String> SETTING = Setting.simpleString(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Setting#enumSetting(...) should be used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is case sensitive - setting to HIGH_ACCURACY works but high_accuracy doesn't. That's why I think using ::parse is preferable here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see this method always uses Enum.valueOf(...) which is case sensitive and only allows values as is defined in the enum itself. (and in ExecutionHint all enums are defined in uppercase).

Ok, let's keep it this way.

if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_014)) {
executionHint.writeTo(out);
if (out.getTransportVersion().onOrAfter(TransportVersion.V_8_500_017)) {
if (executionHint != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeOptional?

@@ -650,6 +651,7 @@ public SearchExecutionContext newSearchExecutionContext(
shardId,
shardRequestIndex,
indexSettings,
clusterService != null ? clusterService.getClusterSettings() : ClusterSettings.createBuiltInClusterSettings(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curios, when is this null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't spot any, I was just defensive here. Removed the check.

/**
* Factory method generating a dummy context with partial functionality. Strictly test-only.
*/
public static SearchExecutionContext newDummyForTests(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test code, but gets included in the release. Can we move it to test/framework/main/java ? Maybe as factory method in a class in org.elasticsearch.search package in the test framework?

kkrik-es added 3 commits June 20, 2023 20:00
Move test-only SearchExecutionContext method in helper class under
test.
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersion.java
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -21,6 +22,14 @@ public enum TDigestExecutionHint implements Writeable {
DEFAULT(0), // Use a TDigest that is optimized for performance, with a small penalty in accuracy.
HIGH_ACCURACY(1); // Use a TDigest that is optimize for accuracy, at the expense of performance.

public static final Setting<String> SETTING = Setting.simpleString(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I see this method always uses Enum.valueOf(...) which is case sensitive and only allows values as is defined in the enum itself. (and in ExecutionHint all enums are defined in uppercase).

Ok, let's keep it this way.

@kkrik-es kkrik-es merged commit 32bdd3b into elastic:main Jun 20, 2023
@kkrik-es kkrik-es deleted the fix/95903-clustersettings branch June 20, 2023 20:22
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this pull request Jun 20, 2023
This was replaced by optional serialization in version
V_8_500_018 (elastic#96943), breaking backwards compatibility for serverless.

Related to elastic#95903
kkrik-es added a commit that referenced this pull request Jun 20, 2023
This was replaced by optional serialization in version
V_8_500_018 (#96943), breaking backwards compatibility for serverless.

Related to #95903
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants