Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Automatically rollover legacy ml indices #120405

Merged
merged 10 commits into from
Jan 23, 2025
Merged

Conversation

davidkyle
Copy link
Member

@davidkyle davidkyle commented Jan 17, 2025

Indices created in 7.x cannot be written to in 9. For ml to continue working in 9 any pre-8 indices must be rolled over and a new index created. Non legacy (i.e. created in 8) indices are not rolled over. This PR uses the existing MlAutoUpdateService to trigger the rollover. It will happen soon after the cluster is upgraded.

The indices rolled over in this PR are

  • .ml-state-X
  • .ml-stats-X
  • .ml-annotations-X

TODO

The remaining ml associated indices need extra work which is not covered in this PR

  • .ml-notifications needs is missing an alias. Added in [ML] Change the auditor to write via an alias #120064
  • .ml-inference-XXXXXX (DFA) requires an alias for rollover
  • .ml-inference-native-XXXXXX requires an alias for rollover (added in 8.0)
  • .ml-anomalies- needs copy mappings

@davidkyle davidkyle added >upgrade :ml Machine learning auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 labels Jan 17, 2025
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @davidkyle, I've created a changelog YAML for you.

@valeriy42 valeriy42 added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Jan 21, 2025
public boolean isMinTransportVersionSupported(TransportVersion minTransportVersion) {
// Automatic rollover does not require any new features
// but wait for all nodes to be upgraded anyway
return minTransportVersion.onOrAfter(TransportVersions.ML_ROLLOVER_LEGACY_INDICES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will effectively wait until the entire cluster is on 8.x or later, guaranteeing that the index will be created as v8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've updated the comment to that effect

PlainActionFuture<Boolean> rolloverIndices = new PlainActionFuture<>();
rolloverLegacyIndices(latestState, indexPatternAndAlias.indexPattern(), indexPatternAndAlias.alias(), rolloverIndices);
try {
rolloverIndices.actionGet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk of a deadlock here? Or is that avoided because the rolloverLegacyIndices and submethods are answered on a transport/management thread, and this waiting thread is an ML utility thread? Asking because I feel like refactoring this would be tricky, since I think SubscribableListener doesn't have a good way to ignore failures and carry on with the next call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk of a deadlock here? Or is that avoided because the rolloverLegacyIndices and submethods are answered on a transport/management thread,

This got me thinking 🫢

The good news is that PlainActionFuture has assertions that throw if the request could deadlock due to threadpool exhaustion https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/PlainActionFuture.java#L372

But this code path isn't testing in CI because we don't have testing route for 7 -> 8 -> 9 so I hacked the code to always rollover the index and ran an 8 -> 9 upgrade test. See commit 97995d4

I also added an assertion to check which thread the response comes from and it is the clusterApplierService#updateTask thread not one of the ml_utility threads. This makes sense as rollover and alias are master node actions and this code only runs on the master node. There is very little work performed on the responding thread it basically completes the listener and that is ok to do on a cluster update thread.

The for loop in runUpdate(ClusterState) is executed on a ml_utility thread so the chain of actions for each index is started on that thread and it is the ml_utility thread is the one blocked by the actionGet().

https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlAutoUpdateService.java#L64

Blocking with actionGet() is not usually recommended but in this case it has the advantage that requests are processed serially instead of firing off a bunch of requests at once. I'm not sure it is worth a complicated refactor to completely unblock the code.

@davidkyle davidkyle merged commit 928040e into elastic:main Jan 23, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Jan 23, 2025
Rollover ml indices created in 7.x and create new indices that version 9 can 
read and write to. This is required for ml to continue to run after during 
upgrade and reindex of 7.x indices
elasticsearchmachine added a commit that referenced this pull request Jan 24, 2025
* [ML] Automatically rollover legacy ml indices (#120405)

Rollover ml indices created in 7.x and create new indices that version 9 can 
read and write to. This is required for ml to continue to run after during 
upgrade and reindex of 7.x indices

* fix for backport

* annotations index can be ignored

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged cloud-deploy Publish cloud docker image for Cloud-First-Testing :ml Machine learning Team:ML Meta label for the ML team >upgrade v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants