Skip shards when querying constant keyword fields #96161

salvatore-campagna · 2023-05-16T13:34:24Z

When a query like a match phrase query or a wildcard query targets a constant keyword field we can skip the query execution on shards where the query builder is rewritten to a MatchNoneQueryBuilder.

This allows us to save time and resources executing queries on nodes that do not return any result. Constant keyword fields allow rewriting a MatchPhraseQueryBuider to TermsQueryBuilder which, in turn, is rewritten into a MatchNoneQueryBuilder in case a constant keyword field does not match the value specified by the query.

A match phrase query is often executed on constant keyword fields by different Elasticsearch integrations which target multiple indices by means of index patterns or data streams. In a real world scenario it is likely that the index pattern or data stream includes tens or hundreds of backing indices each of them with multiple shards involved. As a result, skipping shards in such scenario might result in better query performance and better cluster resource usage.

Note, anyway, that currently, execution of the pre-filter and the corresponding "can match" phase depends on the overall number of shards involved and on whether there is at least one of them returning a non-empty result.

We do the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access the index mapping and extract information about constant keyword fields and their value. Doing this on the coordinator node is not possible due to the unavailability of the index mapping.

Resolves #95541

When a query like a match phrase query or a wildcard query targets a constant keyword field we can skip the query execution on shards where the query builder is rewritten to a MatchNoneQueryBuilder. This allows us to save time and resources executing queries on nodes that do not return any result. Constant keyword fields allow rewriting a MatchPhraseQueryBuider to TermsQueryBuilder which, in turn, is rewritten into a MatchNoneQueryBuilder in case a constant keyword field does not match the value specified by the query. A match phrase query is often executed on constnt keyword fields by different Elasticsearch integrations which target multiple indices by means of index patterns or data streams. In a real world scenario it is likely that the index pattern or data stream includes tens or hundreds of backing indices. As a result, skipping shards in such scenario might result in better query prformance and better cluster resource usage. Note, anyway, that currently execution of the pre filter and the corresponding "can match" phase depends on the voerall number of shards involved and on whether there is at least one of them returning a non-empty result. We do the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access the index mapping and extract information about constant keyword fields and their value.

elasticsearchmachine · 2023-05-16T13:35:53Z

Hi @salvatore-campagna, I've created a changelog YAML for you.

elasticsearchmachine · 2023-05-16T13:35:53Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine · 2023-05-16T13:35:54Z

Pinging @elastic/es-search (Team:Search)

jimczi

I looked at the PR because I thought it would be difficult to achieve but looks like we already handle the case where the SearchExecutionContext doesn't have a searcher. Nice one!
I left one comment to simplify further.
I also wonder why the optim is disabled when runtime fields are added in the request?
What are we protecting against?

jimczi · 2023-05-16T17:56:51Z

server/src/main/java/org/elasticsearch/search/SearchService.java

+            if (rewritten.query() instanceof MatchNoneQueryBuilder) {
+                return new CanMatchShardResponse(false, null);
+            }
+        }


The MappingAwareRewriteContext doesn't seem to be needed. Why not creating a new SearchExecutionContext when the index service is available.
You should also reuse queryStillMatchesAfterRewrite(request, context) to ensure that all cases are checked (alias filter and global aggregations).

The MappingAwareRewriteContext doesn't seem to be needed. Why not creating a new SearchExecutionContext when the index service is available.

If we use SearchExecutionContext then also range queries (and maybe others?) could be resolved to MatchNoneQueryBuilder instance. But if there are pending refreshes we can't trust it. The idea was to have a special context that only rewrites with the mappings in mind (for constant keyword fields). If the main query gets rewritten to MatchNoneQueryBuilder then we can trust it even when there are pending refreshes.

You should also reuse queryStillMatchesAfterRewrite(request, context) to ensure that all cases are checked (alias filter and global aggregations).

Good point. This should be possible even with MappingAwareRewriteContext .

If we use SearchExecutionContext then also range queries (and maybe others?) could be resolved to MatchNoneQueryBuilder instance.

The initial change already used SearchExecutionContext but it sets the searcher to null. That's why I said that the special context is not needed. There's some logic to handle SearchExecutionContext with a null searcher, we should add some javadocs to explain the use case (rewriting with the mapping only) but it works.

Right, I see this now. Yes, if we pass a null searcher then this should work. 👍

I had to add an additional null check in isFieldWithinQuery. That logic uses the reader which might be null in this case and triggers a NPE at some point.

We just need a search execution context that includes the index service and the mappings to take advantage of query rewriting based on mappings.

salvatore-campagna · 2023-05-17T08:03:03Z

I looked at the PR because I thought it would be difficult to achieve but looks like we already handle the case where the SearchExecutionContext doesn't have a searcher. Nice one! I left one comment to simplify further. I also wonder why the optim is disabled when runtime fields are added in the request? What are we protecting against?

The issue with runtime mappings, I think, was happening because I used Collections.emptyMap() in the new SearchExecutionContext not passing the request.getRuntimeMappings() instead. So there was a mismatch between the actual runtime mappings and the ones used in the new SearchExecutionContext.

When rewriting a query before the searcher is available the search execution context uses a null searcher which, later on, results in a NPE.

salvatore-campagna · 2023-05-17T08:59:52Z

@elasticsearchmachine test this please

…shards

salvatore-campagna · 2023-05-23T10:31:21Z

I have just create issue #96280 describing the pre-requisite refactoring needed for this.

…shards

salvatore-campagna · 2023-06-05T08:44:57Z

@elasticsearchmachine run elasticsearch-ci/part-1 please.

salvatore-campagna · 2023-06-05T10:19:33Z

@javanna @romseygeek @martijnvg I adjusted this PR after merging #96353

Now we just use a QueryRewriteContext instead of a SearchExecutionContext with null IndexSearcher.

martijnvg

LGTM

martijnvg · 2023-06-07T08:34:23Z

server/src/main/java/org/elasticsearch/search/SearchService.java

+     * making the shard search active and waiting for refreshes. As a result, we only wait for refreshes to happen on shards that have
+     * actual data.
+     */
+    private static boolean canMatchAfterRewrite(final ShardSearchRequest request, final IndexService indexService) throws IOException {


Maybe reword: This allows us to avoid extra work other than making the shard search active and waiting for refreshes. to This allows us to avoid extra work for example making the shard search active and waiting for refreshes.?

javanna

I left a couple of comments, especially one on testing, LGTM otherwise.

javanna · 2023-06-08T14:15:56Z

server/src/main/java/org/elasticsearch/index/IndexService.java

+     * Creates a new {@link QueryRewriteContext}.
+     * This class is used to rewrite queries before we are able to get a valid {@link SearchExecutionContext} and
+     * allows us to anticipate rewriting queries before the query is executed on the data node. Not using a full
+     * SearhcExecutionContext allows us to save on query latency and IO operations.


SearchExecutionContext is mistyped. Shall we also clarify that this is for cases where access to the index is not required as we can make decisions based on mappings alone? Also, expand on what saves query latency and IO operations? e.g. not pulling a searcher removes the cost of pulling a searcher as well as the associated cost of refreshing idle shards.

javanna · 2023-06-08T14:24:28Z

server/src/main/java/org/elasticsearch/index/query/SearchExecutionContext.java

-     *  for instance if this rewrite context is used to index queries (percolation). */
+     * which happens in two cases:
+     * 1. if this rewrite context is used to index queries (percolation)
+     * 2. if we use mapping information to skip shards while doing shards pre-filtering in the 'can match' phase


is the second still true given the recent improvements you made? Are there places where we do mappings based can match with a null searcher?

javanna · 2023-06-08T14:31:49Z

...nt-keyword/src/test/java/org/elasticsearch/xpack/constantkeyword/mapper/SearchIdleTests.java

+        assertEquals(refreshStatsBefore.size(), refreshStatsAfter.size());
+        assertTrue(refreshStatsAfter.containsAll(refreshStatsBefore));
+        assertTrue(refreshStatsBefore.containsAll(refreshStatsAfter));
+    }


It is odd that we don't have unit tests, yet I see that SearchServiceTests that already exists is also an ESSingleNodeTestCase. You may want to simplify your test taking inspiration from it, in that it pulls the search service from the node and calls canMatch directly against it.

…shards

… rewrite without a SearchExecutionContext. With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available. The `TermQueryBuilder` can with this resolve to a `MatchAllQueryBuilder` with needing a `SearchExecutionContext`, which during the can_match phase means that no searcher needs to be acquired and therefor avoiding making a shard search active / potentially refresh. The `AbstractQueryBuilder#doRewrite(...)` method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This was forgotten as part of elastic#96161 and is needed to complete elastic#95776.

…t a SearchExecutionContext. (#96905) With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available. The TermQueryBuilder can with this change resolve to a MatchNoneQueryBuilder without needing a SearchExecutionContext, which during the can_match phase means that no searcher needs to be acquired and therefor avoid making a shard search active and doing a potentially refresh. The AbstractQueryBuilder#doRewrite(...) method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This is inline with what was discussed here: #96161 (comment) This change was forgotten as part of #96161 and is needed to complete #95776.

elasticsearchmachine · 2023-06-28T10:26:46Z

@salvatore-campagna according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:

The PR is labelled release highlight but the changelog has no highlight section

Update the changelog after making the original PR a `release highlight`. See #96161

Update the changelog after making the original PR a `release highlight`. See elastic#96161

Update the changelog after making the original PR a `release highlight`. See #96161

elasticsearchmachine added needs:triage Requires assignment of a team area label v8.9.0 labels May 16, 2023

salvatore-campagna self-assigned this May 16, 2023

salvatore-campagna added >enhancement :Search/Search Search-related issues that do not fall into other categories :StorageEngine/TSDB You know, for Metrics and removed needs:triage Requires assignment of a team area label labels May 16, 2023

elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels May 16, 2023

Update docs/changelog/96161.yaml

fa255a0

salvatore-campagna added 9 commits May 16, 2023 15:40

fix: use just one value for the 'area' field

2f321ec

fix: rename class o match naming conventions

1ce1c61

fix: rename class o match naming conventions

6f2dd52

fix: prevent possible NPE

1262e57

fix: prevent possible NPE

ea8e00d

test: reduce the number of shards to kick in shard pre-filtering

9f9e878

fix: use variable instead of raw value

3805907

fix: do not try to rewrite match none queries

12bd7c5

fix: do not skip shards targeted by queries using runtime mappings

73fbf4c

jimczi reviewed May 16, 2023

View reviewed changes

fix: simplify logic removing MappingAwareRewriteContext

d80d825

We just need a search execution context that includes the index service and the mappings to take advantage of query rewriting based on mappings.

salvatore-campagna added 2 commits May 17, 2023 10:48

fix: prevent possible NPE when searcher is null

d10b7e9

When rewriting a query before the searcher is available the search execution context uses a null searcher which, later on, results in a NPE.

fix: return intersects instead of disjoint

9318d5c

Merge branch 'main' into feature/95541-avoid-unnecessary-search-idle-…

2fdeed1

…shards

salvatore-campagna mentioned this pull request May 26, 2023

Query rewrite context and search execution context refactoring #96353

Merged

salvatore-campagna added 3 commits June 5, 2023 09:30

Merge branch 'main' into feature/95541-avoid-unnecessary-search-idle-…

be1292c

…shards

refactor: use a QueryRewriteContext instead of SearchExecutionContext

b5b824d

fix: typo in Javadoc

dfe9a6f

salvatore-campagna added 2 commits June 5, 2023 11:12

fix: number of acquireSearchSupplier invocations

af18a36

fix: not using null searcher anymore

40140cc

salvatore-campagna requested a review from martijnvg June 5, 2023 10:17

martijnvg approved these changes Jun 7, 2023

View reviewed changes

javanna approved these changes Jun 8, 2023

View reviewed changes

salvatore-campagna added 5 commits June 9, 2023 11:22

docs: uodate javadocs

92d5856

test: can match on matching and non matching field value

59b8445

fix: remove unused variable

00af0d1

Merge branch 'main' into feature/95541-avoid-unnecessary-search-idle-…

b27fe03

…shards

fix: method parameter name

78f7ba6

salvatore-campagna merged commit a732ecd into elastic:main Jun 9, 2023

martijnvg mentioned this pull request Jun 18, 2023

Update MatchPhrase- and TermQueryBuilder to be able to rewrite without a SearchExecutionContext. #96905

Merged

salvatore-campagna added the release highlight label Jun 28, 2023

salvatore-campagna mentioned this pull request Jun 28, 2023

Include release highlight for query rewrite #97178

Merged

elasticsearchmachine pushed a commit that referenced this pull request Jun 28, 2023

Include release highlight for query rewrite (#97178)

f9b22a9

Update the changelog after making the original PR a `release highlight`. See #96161

salvatore-campagna added a commit to salvatore-campagna/elasticsearch that referenced this pull request Jun 28, 2023

Include release highlight for query rewrite (elastic#97178)

2e4feb6

Update the changelog after making the original PR a `release highlight`. See elastic#96161

elasticsearchmachine pushed a commit that referenced this pull request Jun 28, 2023

Include release highlight for query rewrite (#97178) (#97202)

9a17eb3

Update the changelog after making the original PR a `release highlight`. See #96161

salvatore-campagna mentioned this pull request Aug 8, 2023

Aggs and constant_keyword #94637

Closed

costin mentioned this pull request Aug 31, 2023

Prefilter nodes/shard queries #99073

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip shards when querying constant keyword fields #96161

Skip shards when querying constant keyword fields #96161

salvatore-campagna commented May 16, 2023 •

edited

Loading

elasticsearchmachine commented May 16, 2023

elasticsearchmachine commented May 16, 2023

elasticsearchmachine commented May 16, 2023

jimczi left a comment

jimczi May 16, 2023

martijnvg May 17, 2023

jimczi May 17, 2023

martijnvg May 17, 2023

salvatore-campagna May 17, 2023

salvatore-campagna commented May 17, 2023 •

edited

Loading

salvatore-campagna commented May 17, 2023

salvatore-campagna commented May 23, 2023

salvatore-campagna commented Jun 5, 2023

salvatore-campagna commented Jun 5, 2023 •

edited

Loading

martijnvg left a comment

martijnvg Jun 7, 2023

javanna left a comment

javanna Jun 8, 2023

javanna Jun 8, 2023

javanna Jun 8, 2023

elasticsearchmachine commented Jun 28, 2023

Skip shards when querying constant keyword fields #96161

Skip shards when querying constant keyword fields #96161

Conversation

salvatore-campagna commented May 16, 2023 • edited Loading

elasticsearchmachine commented May 16, 2023

elasticsearchmachine commented May 16, 2023

elasticsearchmachine commented May 16, 2023

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

salvatore-campagna commented May 17, 2023 • edited Loading

salvatore-campagna commented May 17, 2023

salvatore-campagna commented May 23, 2023

salvatore-campagna commented Jun 5, 2023

salvatore-campagna commented Jun 5, 2023 • edited Loading

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Jun 28, 2023

salvatore-campagna commented May 16, 2023 •

edited

Loading

salvatore-campagna commented May 17, 2023 •

edited

Loading

salvatore-campagna commented Jun 5, 2023 •

edited

Loading