[ML] Automatically rollover legacy ml indices #120405

davidkyle · 2025-01-17T15:54:55Z

Indices created in 7.x cannot be written to in 9. For ml to continue working in 9 any pre-8 indices must be rolled over and a new index created. Non legacy (i.e. created in 8) indices are not rolled over. This PR uses the existing MlAutoUpdateService to trigger the rollover. It will happen soon after the cluster is upgraded.

The indices rolled over in this PR are

.ml-state-X
.ml-stats-X
.ml-annotations-X

TODO

The remaining ml associated indices need extra work which is not covered in this PR

.ml-notifications needs is missing an alias. Added in [ML] Change the auditor to write via an alias #120064
.ml-inference-XXXXXX (DFA) requires an alias for rollover
.ml-inference-native-XXXXXX requires an alias for rollover (added in 8.0)
.ml-anomalies- needs copy mappings

elasticsearchmachine · 2025-01-17T15:55:20Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2025-01-17T15:55:20Z

Hi @davidkyle, I've created a changelog YAML for you.

prwhelan · 2025-01-21T17:10:27Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlIndexRollover.java

+    public boolean isMinTransportVersionSupported(TransportVersion minTransportVersion) {
+        // Automatic rollover does not require any new features
+        // but wait for all nodes to be upgraded anyway
+        return minTransportVersion.onOrAfter(TransportVersions.ML_ROLLOVER_LEGACY_INDICES);


this will effectively wait until the entire cluster is on 8.x or later, guaranteeing that the index will be created as v8?

Yes, I've updated the comment to that effect

prwhelan · 2025-01-21T17:18:20Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlIndexRollover.java

+            PlainActionFuture<Boolean> rolloverIndices = new PlainActionFuture<>();
+            rolloverLegacyIndices(latestState, indexPatternAndAlias.indexPattern(), indexPatternAndAlias.alias(), rolloverIndices);
+            try {
+                rolloverIndices.actionGet();


Is there any risk of a deadlock here? Or is that avoided because the rolloverLegacyIndices and submethods are answered on a transport/management thread, and this waiting thread is an ML utility thread? Asking because I feel like refactoring this would be tricky, since I think SubscribableListener doesn't have a good way to ignore failures and carry on with the next call

Is there any risk of a deadlock here? Or is that avoided because the rolloverLegacyIndices and submethods are answered on a transport/management thread,

This got me thinking 🫢

The good news is that PlainActionFuture has assertions that throw if the request could deadlock due to threadpool exhaustion https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/PlainActionFuture.java#L372

But this code path isn't testing in CI because we don't have testing route for 7 -> 8 -> 9 so I hacked the code to always rollover the index and ran an 8 -> 9 upgrade test. See commit 97995d4

I also added an assertion to check which thread the response comes from and it is the clusterApplierService#updateTask thread not one of the ml_utility threads. This makes sense as rollover and alias are master node actions and this code only runs on the master node. There is very little work performed on the responding thread it basically completes the listener and that is ok to do on a cluster update thread.

The for loop in runUpdate(ClusterState) is executed on a ml_utility thread so the chain of actions for each index is started on that thread and it is the ml_utility thread is the one blocked by the actionGet().

https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlAutoUpdateService.java#L64

Blocking with actionGet() is not usually recommended but in this case it has the advantage that requests are processed serially instead of firing off a bunch of requests at once. I'm not sure it is worth a complicated refactor to completely unblock the code.

This reverts commit 97995d4.

elasticsearchmachine · 2025-01-23T10:57:41Z

💚 Backport successful

Status	Branch	Result
✅	8.x

Rollover ml indices created in 7.x and create new indices that version 9 can read and write to. This is required for ml to continue to run after during upgrade and reindex of 7.x indices

* [ML] Automatically rollover legacy ml indices (#120405) Rollover ml indices created in 7.x and create new indices that version 9 can read and write to. This is required for ml to continue to run after during upgrade and reindex of 7.x indices * fix for backport * annotations index can be ignored * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>

davidkyle added >upgrade :ml Machine learning auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 labels Jan 17, 2025

elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 17, 2025

davidkyle added 2 commits January 20, 2025 15:34

Automatically rollover legacy ml indices

af0a6b5

Update docs/changelog/120405.yaml

7a46d80

davidkyle force-pushed the rollover branch from 70a183f to 7a46d80 Compare January 20, 2025 15:44

Merge branch 'main' into rollover

5d7cc65

valeriy42 added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Jan 21, 2025

Merge branch 'main' into rollover

aa45bcc

prwhelan approved these changes Jan 21, 2025

View reviewed changes

davidkyle and others added 6 commits January 22, 2025 10:44

Merge branch 'main' into rollover

28b4a7f

better comment

6372c41

Test rollover in upgrade

97995d4

Revert "Test rollover in upgrade"

8ced46f

This reverts commit 97995d4.

Merge branch 'main' into rollover

3113867

[CI] Auto commit changes from spotless

fce11b3

davidkyle merged commit 928040e into elastic:main Jan 23, 2025
17 checks passed

davidkyle mentioned this pull request Jan 23, 2025

[8.x] [ML] Automatically rollover legacy ml indices (#120405) #120699

Merged

benwtrent mentioned this pull request Jan 24, 2025

[CI] XPackRestIT class failing #120816

Open

davidkyle mentioned this pull request Jan 27, 2025

[ML] Automatically rollover legacy .ml-anomalies indices #120885

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Automatically rollover legacy ml indices #120405

[ML] Automatically rollover legacy ml indices #120405

davidkyle commented Jan 17, 2025 •

edited

Loading

elasticsearchmachine commented Jan 17, 2025

elasticsearchmachine commented Jan 17, 2025

prwhelan Jan 21, 2025

davidkyle Jan 22, 2025

prwhelan Jan 21, 2025

davidkyle Jan 22, 2025

elasticsearchmachine commented Jan 23, 2025

[ML] Automatically rollover legacy ml indices #120405

[ML] Automatically rollover legacy ml indices #120405

Conversation

davidkyle commented Jan 17, 2025 • edited Loading

TODO

elasticsearchmachine commented Jan 17, 2025

elasticsearchmachine commented Jan 17, 2025

prwhelan Jan 21, 2025

Choose a reason for hiding this comment

davidkyle Jan 22, 2025

Choose a reason for hiding this comment

prwhelan Jan 21, 2025

Choose a reason for hiding this comment

davidkyle Jan 22, 2025

Choose a reason for hiding this comment

elasticsearchmachine commented Jan 23, 2025

💚 Backport successful

davidkyle commented Jan 17, 2025 •

edited

Loading