-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Ensure data frame analytics jobs don't run on a node that's too new #62749
[ML] Ensure data frame analytics jobs don't run on a node that's too new #62749
Conversation
Pinging @elastic/ml-core (:ml) |
run elasticsearch-ci/1 |
+ StartDataFrameAnalyticsAction.TaskParams.VERSION_INTRODUCED + "] or higher"; | ||
+ TaskParams.VERSION_INTRODUCED + "] or higher"; | ||
} | ||
if (node.getVersion().onOrAfter(TaskParams.VERSION_DESTINATION_INDEX_MAPPINGS_CHANGED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support rolling upgrades for data frame analytics jobs.
The experience here would be really bad. The user kicks off a rolling upgrade and the job basically stays in unassigned state forever.
We should check the job version and if it is less than 7.10, then we need to set the state back before reindex, and start from the reindex state again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support rolling upgrades for data frame analytics jobs
We said we wouldn't put the effort into that while it's experimental.
The experience here would be really bad. The user kicks off a rolling upgrade and the job basically stays in unassigned state forever.
True. We should probably add logic so that if an assignment is attempted and no node in the cluster has the required version then the job gets stopped. Then the user can restart it and it will go through the logic of #62960.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#62960 deletes the destination index if it is too old. Which covers this branch.
This predicate should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
+ "], because the data frame analytics created for version [" + params.getVersion() + "] requires a node of version " | ||
+ "before [" + TaskParams.VERSION_DESTINATION_INDEX_MAPPINGS_CHANGED + "]"; | ||
} | ||
if (node.getVersion().before(TaskParams.VERSION_DESTINATION_INDEX_MAPPINGS_CHANGED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok. During a rolling upgrade, all nodes will eventually get to the latest version. And since we don't support upgrade rollbacks, this seems like a good check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the part of the problem it's solving.
I just noticed one typo in a comment.
The solution for jobs that are stopped during the upgrade can be a separate PR.
...test/java/org/elasticsearch/xpack/ml/action/TransportStartDataFrameAnalyticsActionTests.java
Outdated
Show resolved
Hide resolved
8f7ac5c
to
098ad15
Compare
098ad15
to
e210935
Compare
e210935
to
35c30e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR modifies data frame analytics job assignment code so that the nodes running 7.9 or earlier are not allowed to be assigned data frame analytics whose persistent task params were created in 7.10 or later.
Relates #61325