Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index and reindex request settings to speed up reindex #119780

Merged

Conversation

parkertimmins
Copy link
Contributor

Just a couple optimizations to speed up the reindexing of a single data stream index:

  • set slices:auto on the reindex request
  • set refresh_interval: -1 on destination index before reindexing into it
  • set number_of_replicas: 0 on destination index before reindexing into it
  • reset refresh_interval and number_of_replicas to previous value or default after reindex

@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jan 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @parkertimmins, I've created a changelog YAML for you.

- if source index was read-only, setting update needs to happen before
  test is made read only
- default test was failing due to a template causing number_of_replica
 to not come from setting default
// random_index_template sets value for number_of_replicas, remove template so default value is used instead
assertAcked(
indicesAdmin().execute(TransportDeleteIndexTemplateAction.TYPE, new DeleteIndexTemplateRequest("random_index_template"))
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hacky, but not sure of a better way to force number_of_replicas to use the default value.

builder.copy(setting, settingsBefore);
} else {
// otherwise, delete from dest index so that it loads from the settings default
builder.putNull(setting);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could get the current setting value using one of the methods that falls back to the default, then set this value on the dest index. But I prefer the current version as it avoids adding a settings explicitly to the dest index which had been unset (and using the default) on the source index.

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline about maybe using a higher-priority template rather than deleting random_index_template, but I think either one gives the same result (and either one requires that you know that randomly sometimes a random_index_template changes your default replicas.

@parkertimmins parkertimmins merged commit c4024dc into elastic:main Jan 10, 2025
16 checks passed
@parkertimmins parkertimmins deleted the reindex-datastream-reindex-args branch January 10, 2025 15:13
@parkertimmins parkertimmins added auto-backport Automatically create backport pull requests when merged and removed auto-backport Automatically create backport pull requests when merged labels Jan 10, 2025
@parkertimmins
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

parkertimmins added a commit to parkertimmins/elasticsearch that referenced this pull request Jan 10, 2025
…19780)

- set slices:auto on the reindex request
- set refresh_interval: -1 on destination index before reindexing into it
- set number_of_replicas: 0 on destination index before reindexing into it
- reset refresh_interval and number_of_replicas to previous value or default after reindex

(cherry picked from commit c4024dc)
parkertimmins added a commit that referenced this pull request Jan 13, 2025
…119992)

- set slices:auto on the reindex request
- set refresh_interval: -1 on destination index before reindexing into it
- set number_of_replicas: 0 on destination index before reindexing into it
- reset refresh_interval and number_of_replicas to previous value or default after reindex

(cherry picked from commit c4024dc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement Team:Data Management Meta label for data/management team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants