Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportTasksAction does not respect an empty set of nodes #53281

Closed
hendrikmuhs opened this issue Mar 9, 2020 · 5 comments
Closed

TransportTasksAction does not respect an empty set of nodes #53281

hendrikmuhs opened this issue Mar 9, 2020 · 5 comments
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@hendrikmuhs
Copy link

This is a follow up of #52887

With #53277 it has been clarified that an empty list of nodes resolves to all nodes. https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/support/tasks/TransportTasksAction.java#L241 is in-effective.If an implementer sets an empty list its implicitly resolved to all nodes:
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/support/tasks/TransportTasksAction.java#L230

Iff request.getNodes() returns an empty array, the resolver should be skipped, so that network calls are not sent in L241 as originally designed.

@hendrikmuhs hendrikmuhs added :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. team-discuss labels Mar 9, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Task Management)

@DaveCTurner
Copy link
Contributor

I don't understand the change you're proposing @hendrikmuhs. If request.getNodes() is empty then we operate on all nodes: for instance, GET _tasks returns tasks from all nodes. I don't think we should change that.

Can you explain the context of this in a bit more detail?

@hendrikmuhs
Copy link
Author

I don't understand the change you're proposing @hendrikmuhs. If request.getNodes() is empty then we operate on all nodes: for instance, GET _tasks returns tasks from all nodes. I don't think we should change that.

If you are not setting the nodes, the default is used, the default is defined in BaseTasksRequest like this:

private String[] nodes = ALL_NODES;

But if you set the nodes to an empty list - which means you explicitly override ALL_NODES - in BaseTasksRequest this is not respected, because it calls a resolver that resolves the empty list back to ALL_NODES.

Context: Tasks are used in transform, a transform can have a running task or not, depending on whether the transform is active or not. For _transform/_stats
transform collects the nodes where tasks are running to avoid unnecessary network calls. This list might be empty.

Note, that I can fix the problem directly in the implementation of transform, however I think it would be cleaner to fix the API. Its counter-intuitive to say: do not call anyone and it does quite the opposite.

The same problem exists in the ML plugin.

@DaveCTurner
Copy link
Contributor

Thanks for the clarification. In your case, I guess the whole resolution process is inappropriate since you already know exactly which nodes you want to query so you're looking for a way to provide a list of (potentially zero) node IDs? Using resolveNodes() in that situation seems a little risky for other reasons too: we don't distinguish node IDs from node names and hostnames in this API, and users could in theory set a node's name to the ID of a different node.

I wonder if we should extend this behaviour to allow things to bypass the resolver and interpret the node IDs as IDs alone. That's roughly the direction that #50836 is taking, and I'm sure I've seen other such cases.

FWIW ALL_NODES is defined as an empty list of node filters:

public static final String[] ALL_NODES = Strings.EMPTY_ARRAY;

(this could of course be changed, but there are BWC implications that make such a change nontrivial)

@hendrikmuhs
Copy link
Author

ok, thanks for the feedback. Changing the request object (from the transform code) is probably not clean either as it should be considered immutable.

Will dig a bit deeper and propose something better if I find it.

FWIW the original case was a cluster with nodes that had some plugins disabled (this is a problem on its own) and as nodes could not answer, the user got errors in the response although those nodes should not have been called in the 1st place.

@rjernst rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
@ywelsch ywelsch closed this as completed Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

5 participants