-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TDigestState configurable #96794
Conversation
More work needed for TDigestPercentile*Tests and the TDigestTest (and the rest of the tests) in the tdigest lib to pass.
Remove wrong asserts from tests and MergingDigest.
Remove redundant serializing interfaces from the library.
These tests don't address compatibility issues in mixed cluster tests as the latter contain a mix of older and newer nodes, so the output depends on which node is picked as a data node since the forked TDigest library is not backwards compatible (produces slightly different results).
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersion.java
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
7 similar comments
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
1 similar comment
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/full-bwc |
1 similar comment
@elasticsearchmachine run elasticsearch-ci/full-bwc |
@elasticsearchmachine run elasticsearch-ci/part-1 |
Add SortingDigest as a simple structure for percentile calculations that tracks all data points in a sorted array. This is a fast and perfectly accurate solution that leads to bloated memory allocation.
Add HybridDigest that uses SortingDigest for small sample counts, then switches to MergingDigest. This approach delivers extreme performance and accuracy for small populations while scaling indefinitely and maintaining acceptable performance and accuracy with constant memory allocation (15kB by default).
Make TDigestState a TDigest decorator instead of a subclass. Its factories hide the details of the underlying TDigest implementation, offering a default version and one optimized for accuracy. They both point to AVLTreeDigest for now, will
switch the default to HybridDigest in a follow-up PR.
Introduce param
optimize_for_accuracy
to switch to the corresponding TDigestState and wire it in percentile-related aggs. Add a related note to the agg documentation.Related to #95903