-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL CCS uses skip_unavailable setting for handling disconnected remote clusters #115266
ES|QL CCS uses skip_unavailable setting for handling disconnected remote clusters #115266
Conversation
2647bf8
to
4d5ec71
Compare
…rectly; needs cleanup and more testing
Created ConnectionErrorLookupResponse as subclass of LookupResponse
…ich policy API handling. One key change: if after the enrich policy resolution we know we could not contact one or more clusters, we remove them from the index expression that is sent to field-caps so we don't trigger another round of network errors and ExecutionInfo updates.
… enrich API calls though
…usterSecurityEsqlIT to match the new error output
Fixed some bugs in EnrichPolicyResolver and/or adjusted logic around handling missing remotes in enrich based on testing.
…e error swallowing functionality
… indexResolution.isValid block Other minor cleanup
4d5ec71
to
67454fd
Compare
…lable-disconnected-t1
…r SKIPPED clusters
@@ -261,22 +281,34 @@ private void lookupPolicies( | |||
if (remotePolicies.isEmpty() == false) { | |||
for (String cluster : remoteClusters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By this point, as I understand, we have already run field caps resolution. So we may have some skipped clusters. Do we exclude them somewhere from the list of clusters for which we're going to resolve policies? Because it looks like it's pointless to try to resolve those policies if the cluster will be skipped anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the enrich policy resolution is done before the field-caps. I'm not sure why or if that's necessary for some reason or just historical accident. IMO, the field-caps call seems the logical one to do first, but that's not how it is, so we need to track unavailable clusters here and then pass that list to the callback listener which is the one that does the field-caps lookup: https://github.com/elastic/elasticsearch/pull/115266/files#diff-40060e2ec9003953a228c4a03bdc80a301949b0d4e3dccc1978798e33a992a73R345
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, I was for some reason assuming field caps go first... then of course yes, enrich should be telling field caps which clusters are skipped, not the other way around.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
Show resolved
Hide resolved
// if the preceding call to the enrich policy API found unavailable clusters, recreate the index expression to search | ||
// based only on available clusters (which could now be an empty list) | ||
String indexExpressionToResolve = createIndexExpressionFromAvailableClusters(executionInfo); | ||
if (indexExpressionToResolve.equals("")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not isEmpty
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next push.
@@ -491,6 +495,10 @@ public void testCrossClusterQueryWithRemoteDLSAndFLS() throws Exception { | |||
assertThat(flatList, containsInAnyOrder("engineering")); | |||
} | |||
|
|||
/** | |||
* Note: invalid_remote is "invalid" because it has a bogus API key, but the cluster does actually exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does invalid_remote
also always have skip_unavailable=true
? Because as I understand if it's not, it should error out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think my understanding of this test is wrong. I'll work on it next and update the comments and make sure this is behaving as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the javadoc comment. The test is actually missing and we get different errors from the security layer depending on its skip_un=true/false flag, which explains the different behavior in the test seen below (for the remote-only test; the local+remote test has a different issue).
{ | ||
var q = "FROM invalid_remote:employees,employees | SORT emp_id DESC | LIMIT 10"; | ||
Response response = performRequestWithRemoteSearchUser(esqlRequest(q)); | ||
assertLocalOnlyResults(response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, here - if skipUnavailable
is false, shouldn't it be an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyzed this case and the problem is that field-caps is returning nothing from this cluster, rather than an Exception, so it is indistinguishable from the scenario where no matching indices could be found. The "no matching indices" found is going to be handled in a follow-on PR, so I will investigate this tests behavior under RCS 2.0 in that test.
...erTest/java/org/elasticsearch/xpack/esql/action/CrossClusterEnrichUnavailableClustersIT.java
Outdated
Show resolved
Hide resolved
…lable-disconnected-t1
…lable-disconnected-t1
DRY'd up CrossClusterEnrichUnavailableClustersIT to reuse set up from CrossClustersEnrichIT Created NoClustersToSearchException sentinel exception Clarified how RemoteClusterSecurityEsqlIT works and flagged it for more work in a follow-on PR.
…lable-disconnected-t1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great @quux00.
I've left some comments.
I was also about to add something related to testing scenarios involving inexistent indices (excluding the remote clusters not being available case), but I think you are going to cover those in the second PR right?
For example, the case of from remote:existent, inexistent*
where there should be no error and the case of from remote:existent, inexistent
where there should be an error. This is not strictly related to CCS and skip_unvailable
but it would be good to see that the above behavior involving missing local indices (with and without wildcard) is also tested.
...erTest/java/org/elasticsearch/xpack/esql/action/CrossClusterEnrichUnavailableClustersIT.java
Outdated
Show resolved
Hide resolved
String query = "FROM *:events | eval ip= TO_STR(host) | " + enrichHosts(mode) + " | stats c = COUNT(*) by os | SORT os"; | ||
try (EsqlQueryResponse resp = runQuery(query, requestIncludeMeta)) { | ||
List<List<Object>> rows = getValuesList(resp); | ||
assertThat(rows.size(), equalTo(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it would be important to also check the number of columns. I think this case should be treated the same as this test here https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/EsqlActionIT.java#L1414
That <no-fields>
thingy comes from https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java#L211 where the use case is the one of an empty mappings index, meaning the index exists but it has no fields. In the special case of CCS-only remotes all which have skip_unavailable = true
I am wondering if "empty" response should actually be a one-column response that has <no-fields>
in it and it's of the null
type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. Thanks for flagging that. I'll work on adding that next. I've seen that response in my manual testing and didn't know what it meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I added it and did some manual testing. So here's a remote-only test against remote1
which is offline and skip_unavailable true. Is this the output that looks best to you?
{
"is_running": false,
"took": 7,
"all_columns": [
{
"name": "<no-fields>",
"type": "null"
}
],
"columns": [],
"values": [],
"_clusters": {
"total": 1,
"successful": 0,
"running": 0,
"skipped": 1,
"partial": 0,
"failed": 0,
"details": {
"remote1": {
"status": "skipped",
"indices": "bl*",
"took": 7,
"_shards": {
"total": 0,
"successful": 0,
"skipped": 0,
"failed": 0
},
"failures": [
{
"shard": -1,
"index": null,
"reason": {
"type": "remote_transport_exception",
"reason": "[connect_transport_exception - unable to connect to remote cluster]"
}
}
]
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that behavior and modified two tests in this IT test in this commit.
assertTrue(ExceptionsHelper.isRemoteUnavailableException(exception)); | ||
} else { | ||
try (EsqlQueryResponse resp = runQuery(query, requestIncludeMeta)) { | ||
assertThat(getValuesList(resp).size(), equalTo(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, about the number of columns.
assertClusterMetadataInResponse(resp, responseExpectMeta); | ||
} | ||
|
||
// close remote-cluster-12that it is also unavailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remote-cluster-2 I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
clusterInfo.put("remote.index", remoteIndex); | ||
|
||
if (numClusters == 3) { | ||
String remoteIndex2 = "logs-2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it wasn't clear for me if remoteIndex2
should have the same name as remoteIndex
or not. If it should be the same, then I suggest using remoteIndex
already defined value instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// close remote cluster 2 so that it is also unavailable | ||
cluster(REMOTE_CLUSTER_2).close(); | ||
|
||
// query only the REMOTE_CLUSTER_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this should be different, as the query queries two remote clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
) | ||
) { | ||
List<List<Object>> values = getValuesList(resp); | ||
assertThat(values, hasSize(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing about the number of columns check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
String policyName = unresolved.name; | ||
if (targetClusters.isEmpty()) { | ||
return Tuple.tuple(null, "enrich policy [" + policyName + "] cannot be resolved since remote clusters are unavailable"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a test for this situation. Does it make sense to add one now? (I know there are several other follow up PRs in this area)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was added in response to one of the IT tests I added failing:
CrossClusterEnrichUnavailableClustersIT..testEnrichWithHostsPolicyAndDisconnectedRemotesWithSkipUnavailableTrue, and in particular when the 2nd cluster is taken offline: https://github.com/elastic/elasticsearch/pull/115266/files#diff-67efeda55a8a497d595e7ce7737ee3ae3acc3ec966db031190bd6954a610f88aR240
In that case invariants in the code below are not met and assertion failures are thrown, so I added this to just leave the code immediately when there are no targetClusters, which can happen when the Enrich Mode type is "REMOTE" and the local cluster was not included in the query.
/** | ||
* Whether to return an empty result (HTTP status 200) for a CCS rather than a top level 4xx/5xx error. | ||
* | ||
* For cases where field-caps had no indices to search and the remotes were unavailable, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an edge case here that I might not have covered here: from inexistent, remote:inexistent
meaning when there is no index in remotes but also no index in local cluster. It doesn't matter if inexistent
uses wildcard or not in any of the local or remote clusters. I do also think this is in the same area as this use case from #112886 table:
from remote:inexistent | failure coming from post-_field_caps check, at the Verifier level. See IndexResolver:93
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First comment - I was planning to handle the "missing indices" questions in "ticket 2" of the skip_unavailable work (this is ticket 1 dealing with just disconnected clusters).
That said, let's clarify here what behavior you are proposing:
-
If, at the end of cluster resolution (and checking for connectivity) and index resolution, there are no indices to query (either because the clusters are unavailable and/or the available clusters have no matching indices), then we should throw a VerificationException, rather than an empty "successful" result.
-
The above is true even if it is a remote-only query and wildcards were used for either the cluster names or the index names or both.
If I have that right - do you think there is any scenario where at planning time (before starting ComputeService execution) we should return an empty "successful" result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If, at the end of cluster resolution (and checking for connectivity) and index resolution, there are no indices to query (either because the clusters are unavailable and/or the available clusters have no matching indices), then we should throw a VerificationException, rather than an empty "successful" result.
No :-). My comment was specific to from inexistent, remote:inexistent
and from inexistent, remote:inexistent*
. This is a scenario that I missed to cover when we discussed about behavior of CCS queries. This involves the local cluster as well and I think in this case it doesn't matter how CCS behaves. The fact that the local cluster doesn't have an explicitly specified index should fail no matter what. In the two tables from #112886 (skip_un true/false) a better comparison query than the one in my previous review comment is
from remote:existent, inexistent
where the behavior is
failure coming from ES at query time complaining about "inexistent" from the "local" cluster
But, in our case the failure should be coming from the Verifier because the field_caps
call returned no indices.
Another two use cases are from inexistent*, remote:inexistent
and from inexistent*, remote:inexistent*
and the principle for both these two cases and the previous two should be if the local cluster is involved and the overall - remote and local - resolution of indices ends up with no indices this should be an error because the user specified a missing indices pattern for the local cluster; even if the user cannot control how the remote clusters behave, the user is in control of the index pattern for the local cluster.
skip_unavailable = true/false
index pattern | behavior |
---|---|
from inexistent*, remote:inexistent | failure coming from post-_field_caps check, at the Verifier level |
from inexistent*, remote:inexistent* | failure coming from post-_field_caps check, at the Verifier level |
ESQL's current behavior where no CCS is used is to reject queries that have no indices after the index pattern is resolved:
from inexistent*
from inexistent1*,inexistent2*
from inexistent
from inexistent1,inexistent2
from inexistent,inexistent1*
I have updated the tables from #112886
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good. We are on the same page. This PR is not the one to handle those cases though. That is the next PR ("ticket 2" in #114531). Thanks for updating the tables in 112886. I will use that as a guide for the work on the next PR.
* on any of the requested clusters. | ||
*/ | ||
private boolean returnSuccessWithEmptyResult(Exception e) { | ||
if (executionInfo.isCrossClusterSearch() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review above about from inexistent, remote:inexistent
I think it started from this line where this ESQL query is considered to be a CCS search. If this ESQL query flow goes through the next condition I think returnSuccessWithEmptyResult
can return true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't think I follow what you are saying here. It sounds like there is a scenario where you think we should return an empty successful result at planning. Could you outline what that is again? (Same question I just asked in the previous comment, so you can just answer there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to say is that this method may return true
for a query like from inexistent, remote:inexistent
which I believe shouldn't be the case.
…otesIT. Some minor code cleanup
…lable-disconnected-t1
…er.NO_FIELDS as the schema; tests updated
…otesIT when empty result in returned
…lable-disconnected-t1
…lable-disconnected-t1
// 1) the local cluster indexExpression and REMOTE_CLUSTER_2 indexExpression match no indices | ||
// 2) the REMOTE_CLUSTER_1 is unavailable | ||
// 3) both remotes are marked as skip_un=true | ||
String query = "FROM nomatch*," + REMOTE_CLUSTER_1 + ":logs-*," + REMOTE_CLUSTER_2 + ":nomatch* | STATS sum (v)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific test should fail following the second PR from the skip_un series, right? (and following this discussion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @quux00
…lable-disconnected-t1
💚 Backport successful
|
…ote clusters (elastic#115266) As part of ES|QL planning of a cross-cluster search, a field-caps call is done to each cluster and, if an ENRICH command is present, the enrich policy-resolve API is called on each remote. If a remote cluster cannot be connected to in these calls, the outcome depends on the skip_unavailable setting. For skip_unavailable=false clusters, the error is fatal and the error will immediately be propagated back to the client with a top level error message with a 500 HTTP status response code. For skip_unavailable=true clusters, the error is not fatal. The error will be trapped, recorded in the EsqlExecutionInfo object for the query, marking the cluster as SKIPPED. If the user requested CCS metadata to be included, the cluster status and connection failure will be present in the _clusters/details section of the response. If no clusters can be contacted, if they are all marked as skip_unavailable=true, no error will be returned. Instead a 200 HTTP status will be returned with no column and no values. If the include_ccs_metadata: true setting was included on the query, the errors will listed in the _clusters metadata section. (Note: this is also how the _search endpoint works for CCS.) Partially addresses elastic#114531
…ote clusters (#115266) (#115881) As part of ES|QL planning of a cross-cluster search, a field-caps call is done to each cluster and, if an ENRICH command is present, the enrich policy-resolve API is called on each remote. If a remote cluster cannot be connected to in these calls, the outcome depends on the skip_unavailable setting. For skip_unavailable=false clusters, the error is fatal and the error will immediately be propagated back to the client with a top level error message with a 500 HTTP status response code. For skip_unavailable=true clusters, the error is not fatal. The error will be trapped, recorded in the EsqlExecutionInfo object for the query, marking the cluster as SKIPPED. If the user requested CCS metadata to be included, the cluster status and connection failure will be present in the _clusters/details section of the response. If no clusters can be contacted, if they are all marked as skip_unavailable=true, no error will be returned. Instead a 200 HTTP status will be returned with no column and no values. If the include_ccs_metadata: true setting was included on the query, the errors will listed in the _clusters metadata section. (Note: this is also how the _search endpoint works for CCS.) Partially addresses #114531
…cted remote clusters (elastic#115266)" This reverts commit b6d2d4b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving feedback here after this PR was merged for future enhancement (as I understand there's going to be more work).
It mainly resolved around the Enrich and CCS intricacies being spread through-out EsqlSession which makes evolving this problematic - as this class is at the code of doing cluster calls.
The Enrich and CCS parts need encapsulation (such as moving to a separate class like CcsUtils, defining a dedicated listener, etc...) so that EsqlSession focused on its core purpose : working on the LogicalPlan.
Please make this a priority moving forward - thanks!
if (failures == null) { | ||
this.failures = List.of(); | ||
} else { | ||
this.failures = failures; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.failures = failures == null ? emptyList() : failures
@@ -362,6 +370,11 @@ public Cluster(StreamInput in) throws IOException { | |||
this.failedShards = in.readOptionalInt(); | |||
this.took = in.readOptionalTimeValue(); | |||
this.skipUnavailable = in.readBoolean(); | |||
if (in.getTransportVersion().onOrAfter(TransportVersions.ESQL_CCS_EXEC_INFO_WITH_FAILURES)) { | |||
this.failures = Collections.unmodifiableList(in.readCollectionAsList(ShardSearchFailure::readShardSearchFailure)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why the collection is unmodifiable - within the ESQL code we're not using the defensive style since the collections are never modified, rather copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EsqlExecution.Cluster class, like the SearchResponse.Cluster class it is based on, is immutable. You can only change state by coping an existing Cluster and swapping it in. I can change line 374 to use a mutable list, but it won't make any difference since the coding practice for this class is to never modify in place. In fact, the line below where I do this.failures = List.of();
should likely be changed to an immutable list as well to enforce this model.
new ActionListenerResponseHandler<>(lookupListener.delegateResponse((l, e) -> { | ||
if (ExceptionsHelper.isRemoteUnavailableException(e) | ||
&& remoteClusterService.isSkipUnavailable(cluster)) { | ||
l.onResponse(new LookupResponse(e)); | ||
} else { | ||
l.onFailure(e); | ||
} | ||
}), LookupResponse::new, threadPool.executor(ThreadPool.Names.SEARCH)) | ||
); | ||
} | ||
|
||
@Override | ||
public void onFailure(Exception e) { | ||
if (ExceptionsHelper.isRemoteUnavailableException(e) && remoteClusterService.isSkipUnavailable(cluster)) { | ||
lookupListener.onResponse(new LookupResponse(e)); | ||
} else { | ||
lookupListener.onFailure(e); | ||
} | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the code is similar between onFailure and onResponse, pull it out in a separate method.
* Any Exception sent to onFailure stops processing, but not all are fatal (return a 4xx or 5xx), so | ||
* the onFailure handler determines whether to return an empty successful result or a 4xx/5xx error. | ||
*/ | ||
class LogicalPlanActionListener implements ActionListener<LogicalPlan> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, the reason for this listener is to pick up remote CCS error and properly report them. These occur during the preAnalysis phase or the execution.
Hence why I suggest to:
- use a different name that better conveys the purpose (e.g. CcsAwareActionListener or CcsPartialFailuresActionListener)
- update the javadoc description accordingly
|
||
@Override | ||
public void onFailure(Exception e) { | ||
if (returnSuccessWithEmptyResult(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To improve readability, please extract this code into a CcsUtils class and call it here as a one liner:
CcsUtils.handleException(Exception, ExecutionInfo, ActionListener)
@@ -302,6 +440,30 @@ private void preAnalyzeIndices( | |||
} | |||
} | |||
|
|||
// visible for testing | |||
static String createIndexExpressionFromAvailableClusters(EsqlExecutionInfo executionInfo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this into a separate class CcsUtils
executionInfo.swapCluster( | ||
clusterAlias, | ||
(k, v) -> new EsqlExecutionInfo.Cluster.Builder(v).setStatus(EsqlExecutionInfo.Cluster.Status.SKIPPED).build() | ||
static void updateExecutionInfoWithUnavailableClusters(EsqlExecutionInfo execInfo, Map<String, FieldCapabilitiesFailure> unavailable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this into a separate CcsUtils
} | ||
|
||
// visible for testing | ||
static Set<String> determineUnavailableRemoteClusters(List<FieldCapabilitiesFailure> failures) { | ||
Set<String> unavailableRemotes = new HashSet<>(); | ||
static Map<String, FieldCapabilitiesFailure> determineUnavailableRemoteClusters(List<FieldCapabilitiesFailure> failures) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above- look into moving this into a CcsUtils class
…ote clusters (elastic#115266) As part of ES|QL planning of a cross-cluster search, a field-caps call is done to each cluster and, if an ENRICH command is present, the enrich policy-resolve API is called on each remote. If a remote cluster cannot be connected to in these calls, the outcome depends on the skip_unavailable setting. For skip_unavailable=false clusters, the error is fatal and the error will immediately be propagated back to the client with a top level error message with a 500 HTTP status response code. For skip_unavailable=true clusters, the error is not fatal. The error will be trapped, recorded in the EsqlExecutionInfo object for the query, marking the cluster as SKIPPED. If the user requested CCS metadata to be included, the cluster status and connection failure will be present in the _clusters/details section of the response. If no clusters can be contacted, if they are all marked as skip_unavailable=true, no error will be returned. Instead a 200 HTTP status will be returned with no column and no values. If the include_ccs_metadata: true setting was included on the query, the errors will listed in the _clusters metadata section. (Note: this is also how the _search endpoint works for CCS.) Partially addresses elastic#114531
As part of ES|QL planning of a cross-cluster search, a field-caps call is done to each cluster and,
if an ENRICH command is present, the enrich policy-resolve API is called on each remote. If a
remote cluster cannot be connected to in these calls, the outcome depends on the
skip_unavailable setting.
For skip_unavailable=false clusters, the error is fatal and the error will immediately be propagated
back to the client with a top level error message with a 500 HTTP status response code.
For skip_unavailable=true clusters, the error is not fatal. The error will be trapped, recorded in the
EsqlExecutionInfo object for the query, marking the cluster as SKIPPED. If the user requested
CCS metadata to be included, the cluster status and connection failure will be present in the
_clusters/details section of the response.
If no clusters can be contacted, if they are all marked as skip_unavailable=true, no error will be
returned. Instead a 200 HTTP status will be returned with no column and no values. If the
include_ccs_metadata: true setting was included on the query, the errors will listed in the
_clusters metadata section. (Note: this is also how the _search endpoint works for CCS.)
Partially addresses #114531