Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Ensure data frame analytics jobs don't run on a node that's too new #62749

Merged
merged 3 commits into from
Oct 2, 2020

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Sep 22, 2020

This PR modifies data frame analytics job assignment code so that the nodes running 7.9 or earlier are not allowed to be assigned data frame analytics whose persistent task params were created in 7.10 or later.

Relates #61325

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@przemekwitek
Copy link
Contributor Author

run elasticsearch-ci/1

+ StartDataFrameAnalyticsAction.TaskParams.VERSION_INTRODUCED + "] or higher";
+ TaskParams.VERSION_INTRODUCED + "] or higher";
}
if (node.getVersion().onOrAfter(TaskParams.VERSION_DESTINATION_INDEX_MAPPINGS_CHANGED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support rolling upgrades for data frame analytics jobs.

The experience here would be really bad. The user kicks off a rolling upgrade and the job basically stays in unassigned state forever.

We should check the job version and if it is less than 7.10, then we need to set the state back before reindex, and start from the reindex state again.

Copy link
Contributor

@droberts195 droberts195 Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support rolling upgrades for data frame analytics jobs

We said we wouldn't put the effort into that while it's experimental.

The experience here would be really bad. The user kicks off a rolling upgrade and the job basically stays in unassigned state forever.

True. We should probably add logic so that if an assignment is attempted and no node in the cluster has the required version then the job gets stopped. Then the user can restart it and it will go through the logic of #62960.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#62960 deletes the destination index if it is too old. Which covers this branch.

This predicate should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

+ "], because the data frame analytics created for version [" + params.getVersion() + "] requires a node of version "
+ "before [" + TaskParams.VERSION_DESTINATION_INDEX_MAPPINGS_CHANGED + "]";
}
if (node.getVersion().before(TaskParams.VERSION_DESTINATION_INDEX_MAPPINGS_CHANGED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok. During a rolling upgrade, all nodes will eventually get to the latest version. And since we don't support upgrade rollbacks, this seems like a good check

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the part of the problem it's solving.

I just noticed one typo in a comment.

The solution for jobs that are stopped during the upgrade can be a separate PR.

@przemekwitek przemekwitek force-pushed the dfa_dont_run_on_new_nodes branch from 8f7ac5c to 098ad15 Compare September 23, 2020 08:19
@przemekwitek przemekwitek force-pushed the dfa_dont_run_on_new_nodes branch from 098ad15 to e210935 Compare September 30, 2020 09:21
@przemekwitek przemekwitek force-pushed the dfa_dont_run_on_new_nodes branch from e210935 to 35c30e9 Compare October 2, 2020 08:52
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants