-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip shards when querying constant keyword fields #96161
Skip shards when querying constant keyword fields #96161
Conversation
When a query like a match phrase query or a wildcard query targets a constant keyword field we can skip the query execution on shards where the query builder is rewritten to a MatchNoneQueryBuilder. This allows us to save time and resources executing queries on nodes that do not return any result. Constant keyword fields allow rewriting a MatchPhraseQueryBuider to TermsQueryBuilder which, in turn, is rewritten into a MatchNoneQueryBuilder in case a constant keyword field does not match the value specified by the query. A match phrase query is often executed on constnt keyword fields by different Elasticsearch integrations which target multiple indices by means of index patterns or data streams. In a real world scenario it is likely that the index pattern or data stream includes tens or hundreds of backing indices. As a result, skipping shards in such scenario might result in better query prformance and better cluster resource usage. Note, anyway, that currently execution of the pre filter and the corresponding "can match" phase depends on the voerall number of shards involved and on whether there is at least one of them returning a non-empty result. We do the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access the index mapping and extract information about constant keyword fields and their value.
Hi @salvatore-campagna, I've created a changelog YAML for you. |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the PR because I thought it would be difficult to achieve but looks like we already handle the case where the SearchExecutionContext
doesn't have a searcher. Nice one!
I left one comment to simplify further.
I also wonder why the optim is disabled when runtime fields are added in the request?
What are we protecting against?
if (rewritten.query() instanceof MatchNoneQueryBuilder) { | ||
return new CanMatchShardResponse(false, null); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MappingAwareRewriteContext
doesn't seem to be needed. Why not creating a new SearchExecutionContext
when the index service is available.
You should also reuse queryStillMatchesAfterRewrite(request, context)
to ensure that all cases are checked (alias filter and global aggregations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MappingAwareRewriteContext doesn't seem to be needed. Why not creating a new SearchExecutionContext when the index service is available.
If we use SearchExecutionContext
then also range queries (and maybe others?) could be resolved to MatchNoneQueryBuilder
instance. But if there are pending refreshes we can't trust it. The idea was to have a special context that only rewrites with the mappings in mind (for constant keyword fields). If the main query gets rewritten to MatchNoneQueryBuilder
then we can trust it even when there are pending refreshes.
You should also reuse queryStillMatchesAfterRewrite(request, context) to ensure that all cases are checked (alias filter and global aggregations).
Good point. This should be possible even with MappingAwareRewriteContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use SearchExecutionContext then also range queries (and maybe others?) could be resolved to MatchNoneQueryBuilder instance.
The initial change already used SearchExecutionContext
but it sets the searcher to null
. That's why I said that the special context is not needed. There's some logic to handle SearchExecutionContext
with a null searcher, we should add some javadocs to explain the use case (rewriting with the mapping only) but it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I see this now. Yes, if we pass a null searcher then this should work. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add an additional null check in isFieldWithinQuery
. That logic uses the reader which might be null in this case and triggers a NPE at some point.
We just need a search execution context that includes the index service and the mappings to take advantage of query rewriting based on mappings.
The issue with runtime mappings, I think, was happening because I used |
When rewriting a query before the searcher is available the search execution context uses a null searcher which, later on, results in a NPE.
@elasticsearchmachine test this please |
I have just create issue #96280 describing the pre-requisite refactoring needed for this. |
@elasticsearchmachine run elasticsearch-ci/part-1 please. |
@javanna @romseygeek @martijnvg I adjusted this PR after merging #96353 Now we just use a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* making the shard search active and waiting for refreshes. As a result, we only wait for refreshes to happen on shards that have | ||
* actual data. | ||
*/ | ||
private static boolean canMatchAfterRewrite(final ShardSearchRequest request, final IndexService indexService) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reword: This allows us to avoid extra work other than making the shard search active and waiting for refreshes.
to This allows us to avoid extra work for example making the shard search active and waiting for refreshes.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments, especially one on testing, LGTM otherwise.
* Creates a new {@link QueryRewriteContext}. | ||
* This class is used to rewrite queries before we are able to get a valid {@link SearchExecutionContext} and | ||
* allows us to anticipate rewriting queries before the query is executed on the data node. Not using a full | ||
* SearhcExecutionContext allows us to save on query latency and IO operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SearchExecutionContext is mistyped. Shall we also clarify that this is for cases where access to the index is not required as we can make decisions based on mappings alone? Also, expand on what saves query latency and IO operations? e.g. not pulling a searcher removes the cost of pulling a searcher as well as the associated cost of refreshing idle shards.
* for instance if this rewrite context is used to index queries (percolation). */ | ||
* which happens in two cases: | ||
* 1. if this rewrite context is used to index queries (percolation) | ||
* 2. if we use mapping information to skip shards while doing shards pre-filtering in the 'can match' phase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the second still true given the recent improvements you made? Are there places where we do mappings based can match with a null searcher?
assertEquals(refreshStatsBefore.size(), refreshStatsAfter.size()); | ||
assertTrue(refreshStatsAfter.containsAll(refreshStatsBefore)); | ||
assertTrue(refreshStatsBefore.containsAll(refreshStatsAfter)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is odd that we don't have unit tests, yet I see that SearchServiceTests that already exists is also an ESSingleNodeTestCase
. You may want to simplify your test taking inspiration from it, in that it pulls the search service from the node and calls canMatch directly against it.
… rewrite without a SearchExecutionContext. With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available. The `TermQueryBuilder` can with this resolve to a `MatchAllQueryBuilder` with needing a `SearchExecutionContext`, which during the can_match phase means that no searcher needs to be acquired and therefor avoiding making a shard search active / potentially refresh. The `AbstractQueryBuilder#doRewrite(...)` method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This was forgotten as part of elastic#96161 and is needed to complete elastic#95776.
…t a SearchExecutionContext. (#96905) With this change, both query builders can rewrite without using a search context, because QueryRewriteContext often has all the mapping and other index metadata available. The TermQueryBuilder can with this change resolve to a MatchNoneQueryBuilder without needing a SearchExecutionContext, which during the can_match phase means that no searcher needs to be acquired and therefor avoid making a shard search active and doing a potentially refresh. The AbstractQueryBuilder#doRewrite(...) method is altered to by default attempt a coordination rewrite, then fall back to attempt a search rewrite, then finally fall back to do an index metadata aware rewrite. This is inline with what was discussed here: #96161 (comment) This change was forgotten as part of #96161 and is needed to complete #95776.
@salvatore-campagna according to this PR's labels, I need to update the changelog YAML, but I can't because the PR is closed. Please either update the changelog yourself on the appropriate branch, or adjust the labels. Specifically:
|
Update the changelog after making the original PR a `release highlight`. See #96161
Update the changelog after making the original PR a `release highlight`. See elastic#96161
When a query like a match phrase query or a wildcard query targets a constant keyword field we can skip the query execution on shards where the query builder is rewritten to a MatchNoneQueryBuilder.
This allows us to save time and resources executing queries on nodes that do not return any result. Constant keyword fields allow rewriting a MatchPhraseQueryBuider to TermsQueryBuilder which, in turn, is rewritten into a MatchNoneQueryBuilder in case a constant keyword field does not match the value specified by the query.
A match phrase query is often executed on constant keyword fields by different Elasticsearch integrations which target multiple indices by means of index patterns or data streams. In a real world scenario it is likely that the index pattern or data stream includes tens or hundreds of backing indices each of them with multiple shards involved. As a result, skipping shards in such scenario might result in better query performance and better cluster resource usage.
Note, anyway, that currently, execution of the pre-filter and the corresponding "can match" phase depends on the overall number of shards involved and on whether there is at least one of them returning a non-empty result.
We do the rewrite operation on the data node in the so called "can match" phase, taking advantage of the fact that, at that moment, we can access the index mapping and extract information about constant keyword fields and their value. Doing this on the coordinator node is not possible due to the unavailability of the index mapping.
Resolves #95541