Optimize sort on numeric long and date fields. #49732

mayya-sharipova · 2019-11-29T19:49:28Z

This rewrites long sort as a DistanceFeatureQuery, which can
efficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.

The optimization can be disabled with setting the system property
es.search.rewrite_sort to false.

Optimization is skipped when an index has 50% or more data with
the same value.

Optimization is done through:

Rewriting sort as DistanceFeatureQuery which can
efficiently skip non-competitive blocks and segments of documents.
Sorting segments according to the primary numeric sort field(Sort leaves on search according to the primary numeric sort field #44021)
This allows to skip non-competitive segments.
Using collector manager.
When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
We use collectorManager, where for every segment a dedicated collector
will be created.
Using Lucene's shared TopFieldCollector manager
This collector manager is able to exchange minimum competitive
score between collectors, which allows us to efficiently skip
the whole segments that don't contain competitive scores.
When index is force merged to a single segment, Add a new merge policy that interleaves old and new segments on force merge #48533 interleaving
old and new segments allows for this optimization as well,
as blocks with non-competitive docs can be skipped.

Backport for #48804

Co-authored-by: Jim Ferenczi [email protected]

This rewrites long sort as a `DistanceFeatureQuery`, which can efficiently skip non-competitive blocks and segments of documents. Depending on the dataset, the speedups can be 2 - 10 times. The optimization can be disabled with setting the system property `es.search.rewrite_sort` to `false`. Optimization is skipped when an index has 50% or more data with the same value. Optimization is done through: 1. Rewriting sort as `DistanceFeatureQuery` which can efficiently skip non-competitive blocks and segments of documents. 2. Sorting segments according to the primary numeric sort field(elastic#44021) This allows to skip non-competitive segments. 3. Using collector manager. When we optimize sort, we sort segments by their min/max value. As a collector expects to have segments in order, we can not use a single collector for sorted segments. We use collectorManager, where for every segment a dedicated collector will be created. 4. Using Lucene's shared TopFieldCollector manager This collector manager is able to exchange minimum competitive score between collectors, which allows us to efficiently skip the whole segments that don't contain competitive scores. 5. When index is force merged to a single segment, elastic#48533 interleaving old and new segments allows for this optimization as well, as blocks with non-competitive docs can be skipped. Backport for elastic#48804 Co-authored-by: Jim Ferenczi <[email protected]>

elasticmachine · 2019-11-29T19:49:59Z

Pinging @elastic/es-search (:Search/Search)

mayya-sharipova added backport v7.6.0 :Search/Search Search-related issues that do not fall into other categories labels Nov 29, 2019

mayya-sharipova merged commit 7cf1708 into elastic:7.x Nov 29, 2019

mayya-sharipova deleted the backport-long-sort-opt branch November 29, 2019 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sort on numeric long and date fields. #49732

Optimize sort on numeric long and date fields. #49732

mayya-sharipova commented Nov 29, 2019

elasticmachine commented Nov 29, 2019

Optimize sort on numeric long and date fields. #49732

Optimize sort on numeric long and date fields. #49732

Conversation

mayya-sharipova commented Nov 29, 2019

elasticmachine commented Nov 29, 2019