Optimize sort on numeric long and date fields. #49732
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This rewrites long sort as a
DistanceFeatureQuery
, which canefficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.
The optimization can be disabled with setting the system property
es.search.rewrite_sort
tofalse
.Optimization is skipped when an index has 50% or more data with
the same value.
Optimization is done through:
Rewriting sort as
DistanceFeatureQuery
which canefficiently skip non-competitive blocks and segments of documents.
Sorting segments according to the primary numeric sort field(Sort leaves on search according to the primary numeric sort field #44021)
This allows to skip non-competitive segments.
Using collector manager.
When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
We use collectorManager, where for every segment a dedicated collector
will be created.
Using Lucene's shared TopFieldCollector manager
This collector manager is able to exchange minimum competitive
score between collectors, which allows us to efficiently skip
the whole segments that don't contain competitive scores.
When index is force merged to a single segment, Add a new merge policy that interleaves old and new segments on force merge #48533 interleaving
old and new segments allows for this optimization as well,
as blocks with non-competitive docs can be skipped.
Backport for #48804
Co-authored-by: Jim Ferenczi [email protected]