Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TDigestState configurable #96794

Merged
merged 166 commits into from
Jun 18, 2023
Merged

Make TDigestState configurable #96794

merged 166 commits into from
Jun 18, 2023

Conversation

kkrik-es
Copy link
Contributor

@kkrik-es kkrik-es commented Jun 13, 2023

Add SortingDigest as a simple structure for percentile calculations that tracks all data points in a sorted array. This is a fast and perfectly accurate solution that leads to bloated memory allocation.

Add HybridDigest that uses SortingDigest for small sample counts, then switches to MergingDigest. This approach delivers extreme performance and accuracy for small populations while scaling indefinitely and maintaining acceptable performance and accuracy with constant memory allocation (15kB by default).

Make TDigestState a TDigest decorator instead of a subclass. Its factories hide the details of the underlying TDigest implementation, offering a default version and one optimized for accuracy. They both point to AVLTreeDigest for now, will
switch the default to HybridDigest in a follow-up PR.

Introduce param optimize_for_accuracy to switch to the corresponding TDigestState and wire it in percentile-related aggs. Add a related note to the agg documentation.

Related to #95903

kkrik-es and others added 30 commits May 9, 2023 10:23
More work needed for TDigestPercentile*Tests and the TDigestTest (and
the rest of the tests) in the tdigest lib to pass.
Remove wrong asserts from tests and MergingDigest.
Remove redundant serializing interfaces from the library.
These tests don't address compatibility issues in mixed cluster tests as
the latter contain a mix of older and newer nodes, so the output depends
on which node is picked as a data node since the forked TDigest library
is not backwards compatible (produces slightly different results).
kkrik-es added 2 commits June 16, 2023 07:32
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersion.java
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es kkrik-es added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) and removed auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jun 16, 2023
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

7 similar comments
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es kkrik-es removed the :Analytics/Aggregations Aggregations label Jun 17, 2023
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 17, 2023
@kkrik-es kkrik-es added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/Aggregations Aggregations labels Jun 17, 2023
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es kkrik-es removed the needs:triage Requires assignment of a team area label label Jun 17, 2023
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

1 similar comment
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

1 similar comment
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@kkrik-es kkrik-es removed the test-full-bwc Trigger full BWC version matrix tests label Jun 18, 2023
@kkrik-es
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-1

@kkrik-es kkrik-es merged commit 6d62e09 into elastic:main Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants