ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972

quux00 · 2024-11-18T17:18:46Z

Missing enrich policies (or failures while looking up the policies on remote clusters) are no longer
fatal errors for skip_unavailable=true clusters. Those clusters will simply not be included
in the rest of the query.

Partially addresses #114531

…etClusters in EnrichPolicyResolver.

…usters when looking for inconsistencies between enrich policy mappings

…policy resolution error handling

…viors

…Alias-to-skipUn-setting

elasticsearchmachine · 2024-11-19T21:56:51Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2024-11-19T21:56:52Z

Hi @quux00, I've created a changelog YAML for you.

quux00 · 2024-11-19T22:02:12Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java

        Map<String, EsField> mappings = new HashMap<>();
        Map<String, String> concreteIndices = new HashMap<>();
        ResolvedEnrichPolicy last = null;
+        // loop over clusters with a ResolvedEnrichPolicy - ensure no errors within the policy


Note that mismatches across policies (being checked in the section below here) are still fatal errors for skip_unavailable=true clusters. I started down the road of having these be skippable errors, but that looks rather tricky to pull off. At a minimum, you'd have to partition the policies by "skip_unavailable" and build a canonical list of fields/types/etc. from skip_un=false clusters and then compare the skip_un=true clusters and then if any mismatches are found, those aren't fatal, but you pull that cluster out of the list to be resolved for field-caps. Not impossible but this section would require a significant rewrite, so I decided to only handle missing enrich policies (and policies that have errors on the remote cluster during resolution), but still fail them based on mismatches between policies.

quux00 · 2024-11-19T22:03:22Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java

            final String reason;
            if (failures.isEmpty()) {
-                List<String> missingClusters = targetClusters.stream().filter(c -> policies.containsKey(c) == false).sorted().toList();
-                reason = missingPolicyError(policyName, targetClusters, missingClusters);
+                List<String> missingClusters = targetClusters.keySet()


I'm not convinced this block (under (if failures.isEmpty)) will ever execute now, but I was apprehensive to remove it. I couldn't find a way to enter this block based on my testing.

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/EnrichResolution.java

…ich-policy-t3

… during enrich CrossClusterEsqlRCS2EnrichUnavailableRemotesIT

…ich-policy-t3

…ilableRemotesIT

…ich-policy-t3

dnhatn · 2024-11-22T21:48:00Z

@quux00 Can you explain why we choose to treat a missing enrich policy as unavailable? I think we should either fail the query in this case or return a partial result if partial_results is specified. This approach would allow users to fix the query or clusters accordingly.

quux00 · 2024-11-22T22:17:28Z

Can you explain why we choose to treat a missing enrich policy as unavailable?

The definition of skip_unavailable that we are using means that if an error occurs on a remote cluster with skip_unavailable=true, then that should not be a fatal error that fails the query. Instead, we return partial data from other clusters. Since the missing enrich policy will only affect that one remote cluster, it can be safely left out of the query and marked as SKIPPED with a failure notice in the ccs_metadata of the response.

So a missing enrich policy on a remote cluster is a fatal error only if the cluster is skip_unavailable=false (or if it is the local cluster).

return a partial result if partial_results is specified

I'm not sure what you are referring to. At present ES|QL does not support partial results, except for the skip_unavailable handling we've been adding. Marking a cluster as skip_unavailable=true is effectively saying you want partial data from the query, just at a cluster level, not a shard/node level.

dnhatn · 2024-11-22T22:29:07Z

I think a missing enrich policy error is different from an unavailable one. Could we narrow the scope of this PR to focus only on unavailable nodes or connections?

smalyshev · 2024-11-22T22:38:29Z

I think generally our assumption has been that virtually any error on a cluster that is marked as skip_unavailable is going to lead us to ignore this cluster, but not fail the whole request - provided the request can be performed at all ignoring that cluster (e.g. if the request only contained data from that cluster, it'll still fail). That includes missing indexes and missing policies too. If that assumption is not correct we need to re-sync and define the behavior we want, but that was also the underlying assumption for #112886

dnhatn · 2024-11-22T22:49:10Z

See #33915 and #27182 (comment)

smalyshev · 2024-11-22T23:00:50Z

Yeah it looks like there was the same kind of discussion on _search side, and while the "unavailable" thing is indeed quite misleading, I think the end resolution has been that this option works as "ignore the errors that is possible to ignore" rather than "only ignore errors that have to do with network connections". If I understand this correctly, then it makes sense to follow the same road with ES|QL? It may be a bit confusing that "unavailable" works this way, but IMHO it'd be much more confusing if it worked differently for different types of search. In any case, if there are concerns about it then we probably should raise it on the PM level to ensure we don't misunderstand what is supposed to happen?

quux00 · 2024-11-22T23:48:42Z

Could we narrow the scope of this PR to focus only on unavailable nodes or connections?

That is already done: #115266
as well as handling missing indices errors based on the skip_unavailable setting: #116348

The two in progress PRs around skip_unavailable are to handling missing enrich policies (this PR) and arbitrary errors at execution time (#116365). This will make ES|QL consistent with current skip_unavailable handling in _search CCS.

dnhatn · 2024-11-22T23:53:17Z

I think we should avoid the mistake we made with the _search API. This flag is inconsistent in _search, depending on whether ccs_minimized_round_trip is enabled. We can't easily fix this because of BWC.

smalyshev · 2024-11-25T14:56:12Z

But for ES|QL MRT is always on, so there's no room for such inconsistency?

quux00 · 2025-01-21T13:36:08Z

Closing this as we will not be extending skip_unavailable=true to handle this use case.

quux00 added auto-backport Automatically create backport pull requests when merged v9.0.0 v8.17.0 labels Nov 18, 2024

quux00 changed the title ~~ESQL: Enrich policy failures on skip_unavailable=true do not fail the query~~ ESQL: Enrich policy failures on skip_unavailable=true clusters do not fail the query Nov 18, 2024

quux00 force-pushed the esql-ccs/skip_un-enrich-policy-t3 branch 4 times, most recently from 6670870 to 2bb0629 Compare November 19, 2024 20:25

quux00 added 8 commits November 19, 2024 16:51

Init commit EnrichResolution changed

c53f65c

Sympatico with the original branch - new changes start after this

ad7c1c4

Changed away from Tuple<String, Boolean> to Map<String, Boolean> targ…

65e3654

…etClusters in EnrichPolicyResolver.

Minor cleanup after deciding to not handle errors for skip_un=true cl…

4733271

…usters when looking for inconsistencies between enrich policy mappings

Changed local cluster to map to skip_un=false for purposes of enrich …

d7e4df5

…policy resolution error handling

Modified EnrichPolicyResolverTests to match new skip_unavailable beha…

10d13cc

…viors

The enrich policy re-resolution now also passes in the map of cluster…

3705eb3

…Alias-to-skipUn-setting

Added missing enrich policy tests to RemoteClusterSecurityEsqlIT

ebc931e

quux00 force-pushed the esql-ccs/skip_un-enrich-policy-t3 branch from 2bb0629 to ebc931e Compare November 19, 2024 21:51

quux00 added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL >enhancement labels Nov 19, 2024

quux00 changed the title ~~ESQL: Enrich policy failures on skip_unavailable=true clusters do not fail the query~~ ESQL: Missing enrich policies on skip_unavailable=true clusters do not fail the query Nov 19, 2024

quux00 marked this pull request as ready for review November 19, 2024 21:56

quux00 requested a review from dnhatn November 19, 2024 21:56

quux00 requested review from smalyshev and pawankartik-elastic November 19, 2024 21:56

Update docs/changelog/116972.yaml

7e4cfaf

quux00 commented Nov 19, 2024

View reviewed changes

smalyshev reviewed Nov 19, 2024

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java Outdated Show resolved Hide resolved

smalyshev reviewed Nov 19, 2024

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java Show resolved Hide resolved

smalyshev reviewed Nov 19, 2024

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java Outdated Show resolved Hide resolved

smalyshev reviewed Nov 20, 2024

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java Show resolved Hide resolved

pawankartik-elastic reviewed Nov 20, 2024

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/EnrichResolution.java Show resolved Hide resolved

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

e12d7dd

…ich-policy-t3

quux00 changed the title ~~ESQL: Missing enrich policies on skip_unavailable=true clusters do not fail the query~~ ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query Nov 20, 2024

quux00 added 5 commits November 20, 2024 11:09

Minor changes based on PR feedback

8acb25c

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

22961c2

…ich-policy-t3

Added test from Pawan Karthik for RCS2 testing of unavailable remotes…

c9517ac

… during enrich CrossClusterEsqlRCS2EnrichUnavailableRemotesIT

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

ac92ab1

…ich-policy-t3

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

404a01a

…ich-policy-t3

elasticsearchmachine added v8.18.0 and removed v8.17.0 labels Nov 20, 2024

quux00 added 4 commits November 21, 2024 08:47

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

920f1d4

…ich-policy-t3

PR feedback changes and added Pawan's CrossClusterEsqlRCS1EnrichUnava…

d94a7c5

…ilableRemotesIT

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

8117530

…ich-policy-t3

Merge remote-tracking branch 'elastic/main' into esql-ccs/skip_un-enr…

f503eac

…ich-policy-t3

quux00 closed this Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972

ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972

quux00 commented Nov 18, 2024 •

edited

Loading

elasticsearchmachine commented Nov 19, 2024

elasticsearchmachine commented Nov 19, 2024

quux00 Nov 19, 2024 •

edited

Loading

quux00 Nov 19, 2024 •

edited

Loading

dnhatn commented Nov 22, 2024

quux00 commented Nov 22, 2024

dnhatn commented Nov 22, 2024

smalyshev commented Nov 22, 2024

dnhatn commented Nov 22, 2024

smalyshev commented Nov 22, 2024

quux00 commented Nov 22, 2024 •

edited

Loading

dnhatn commented Nov 22, 2024

smalyshev commented Nov 25, 2024

quux00 commented Jan 21, 2025

ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972

ESQL: Missing enrich policies on skip_unavailable=true clusters no longer fail the query #116972

Conversation

quux00 commented Nov 18, 2024 • edited Loading

elasticsearchmachine commented Nov 19, 2024

elasticsearchmachine commented Nov 19, 2024

quux00 Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

quux00 Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

dnhatn commented Nov 22, 2024

quux00 commented Nov 22, 2024

dnhatn commented Nov 22, 2024

smalyshev commented Nov 22, 2024

dnhatn commented Nov 22, 2024

smalyshev commented Nov 22, 2024

quux00 commented Nov 22, 2024 • edited Loading

dnhatn commented Nov 22, 2024

smalyshev commented Nov 25, 2024

quux00 commented Jan 21, 2025

quux00 commented Nov 18, 2024 •

edited

Loading

quux00 Nov 19, 2024 •

edited

Loading

quux00 Nov 19, 2024 •

edited

Loading

quux00 commented Nov 22, 2024 •

edited

Loading