-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972
Conversation
6670870
to
2bb0629
Compare
…etClusters in EnrichPolicyResolver.
…usters when looking for inconsistencies between enrich policy mappings
…policy resolution error handling
…Alias-to-skipUn-setting
2bb0629
to
ebc931e
Compare
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @quux00, I've created a changelog YAML for you. |
Map<String, EsField> mappings = new HashMap<>(); | ||
Map<String, String> concreteIndices = new HashMap<>(); | ||
ResolvedEnrichPolicy last = null; | ||
// loop over clusters with a ResolvedEnrichPolicy - ensure no errors within the policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that mismatches across policies (being checked in the section below here) are still fatal errors for skip_unavailable=true clusters. I started down the road of having these be skippable errors, but that looks rather tricky to pull off. At a minimum, you'd have to partition the policies by "skip_unavailable" and build a canonical list of fields/types/etc. from skip_un=false clusters and then compare the skip_un=true clusters and then if any mismatches are found, those aren't fatal, but you pull that cluster out of the list to be resolved for field-caps. Not impossible but this section would require a significant rewrite, so I decided to only handle missing enrich policies (and policies that have errors on the remote cluster during resolution), but still fail them based on mismatches between policies.
final String reason; | ||
if (failures.isEmpty()) { | ||
List<String> missingClusters = targetClusters.stream().filter(c -> policies.containsKey(c) == false).sorted().toList(); | ||
reason = missingPolicyError(policyName, targetClusters, missingClusters); | ||
List<String> missingClusters = targetClusters.keySet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this block (under (if failures.isEmpty
)) will ever execute now, but I was apprehensive to remove it. I couldn't find a way to enter this block based on my testing.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/EnrichResolution.java
Show resolved
Hide resolved
… during enrich CrossClusterEsqlRCS2EnrichUnavailableRemotesIT
@quux00 Can you explain why we choose to treat a missing enrich policy as unavailable? I think we should either fail the query in this case or return a partial result if partial_results is specified. This approach would allow users to fix the query or clusters accordingly. |
The definition of So a missing enrich policy on a remote cluster is a fatal error only if the cluster is skip_unavailable=false (or if it is the local cluster).
I'm not sure what you are referring to. At present ES|QL does not support partial results, except for the skip_unavailable handling we've been adding. Marking a cluster as skip_unavailable=true is effectively saying you want partial data from the query, just at a cluster level, not a shard/node level. |
I think a missing enrich policy error is different from an unavailable one. Could we narrow the scope of this PR to focus only on unavailable nodes or connections? |
I think generally our assumption has been that virtually any error on a cluster that is marked as |
See #33915 and #27182 (comment) |
Yeah it looks like there was the same kind of discussion on _search side, and while the "unavailable" thing is indeed quite misleading, I think the end resolution has been that this option works as "ignore the errors that is possible to ignore" rather than "only ignore errors that have to do with network connections". If I understand this correctly, then it makes sense to follow the same road with ES|QL? It may be a bit confusing that "unavailable" works this way, but IMHO it'd be much more confusing if it worked differently for different types of search. In any case, if there are concerns about it then we probably should raise it on the PM level to ensure we don't misunderstand what is supposed to happen? |
That is already done: #115266 The two in progress PRs around |
I think we should avoid the mistake we made with the _search API. This flag is inconsistent in _search, depending on whether ccs_minimized_round_trip is enabled. We can't easily fix this because of BWC. |
But for ES|QL MRT is always on, so there's no room for such inconsistency? |
Closing this as we will not be extending skip_unavailable=true to handle this use case. |
Missing enrich policies (or failures while looking up the policies on remote clusters) are no longer
fatal errors for skip_unavailable=true clusters. Those clusters will simply not be included
in the rest of the query.
Partially addresses #114531