[Feature Proposal] Snapshot Interoperability with Remote Store #6483

harishbhakuni · 2023-02-24T21:08:41Z

Problem Description:
Today, Snapshot Feature in OpenSearch allows user to create an on-demand snapshot of the entire cluster data and metadata in a configured repository. Snapshot mechanism is used for recovering from red indices or even for cluster level recoveries in the event of failures.

Now, with Remote Store Feature, we are giving user an option to create indices which will store the translog data in remote translog store for each request and segment to remote segment store at refresh intervals. This means that the data is already present in remote store repository for such indices.

As the data for remote store backed indices is already stored in a repository, creating a snapshot of these indices will duplicate the segment data. For cost reasons, users may not want to store data in multiple repositories: a) One as part of snapshot and b) Other as part of remote store backed indices. For these use cases it would be beneficial to still provide the out of box snapshot experience but only keep 1 copy of the data. There can be other use cases as well where copying data is required as part of snapshot so we should also provide flexibility to users to choose between both the options.

Proposed Solution:
The Idea is to keep the reference to remote store metadata file in snapshot shard metadata.
So, remote store feature uses a metadata file to store the name of segment files which are live during refresh. On high level, we want to keep those metadata files for remote store backed indices in snapshot metadata and during snapshot restore operation, we will call remote store feature to restore segment files using those metadata files.

Some Key Points:

We still need to support renaming of indices during restore operation to restore an existing index with a different name.
Users can have different repositories for snapshot and remote store.
Restoring a remote store backed index from snapshot as a searchable snapshot index will not be supported in phase 1. However, for cases where complete data copy is taken in snapshot repository it should still work out of the box. For other case, we will provide the support in phase 2.
As we are just referring the data stored in remote segment store, Snapshot Status api will not have any incremental file size and file count details for remote store backed indices.

Next Steps:

Any general comments about the feature are helpful. Also let us know if this feature would help any of your current use-case.
We will share design and POCs for the proposed solution going forward.
We will create and work on different issues needed to support this feature.

nandi-github · 2023-02-28T21:09:07Z

Do we need to consider both option, label in place and/or create a copy?

andrross · 2023-02-28T22:37:41Z

Thanks @harishbhakuni21, this looks like a great enhancement to remote-backed storage and snapshots!

Other than the callout about searchable snapshots, are there any other limitations for the snapshot restore API? Some of the potential cases I'm thinking about:

restoring a remote-backed index snapshot to a non-remote-backed index
restoring a remote-backed index snapshot to a different remote store repository
restoring a traditional snapshot to a remote-backed index

Since the snapshot restore API allows for changing index settings, these cases (and likely many more) seem possible via the API and it would be great to keep the functionality as transparent as possible so the user doesn't need to be concerned with where the data is being stored under the hood.

anasalkouz · 2023-02-28T23:42:46Z

Thanks @harishbhakuni21 for your suggested enhancements.

For these use cases it would be beneficial to still provide the out of box snapshot experience but only keep 1 copy of the data. There can be other use cases as well where copying data is required as part of snapshot so we should also provide flexibility to users to choose between both the options.

Since remote backend index is considered as a continuous snapshot and it will gives the user a point on time restore capability, then why we need to have the on-demand snapshot capability regardless if you copy the data or not. Could you provide some of the use-case that still could benefit from the on-demand snapshot?

In my opinion, I would improve our snapshot restore user experience, to give the user the flexibility to restore a point on time remote backed index easily and deprecate on-demand snapshot or at least disable it if the index has the remote backed index feature enabled

harishbhakuni · 2023-03-01T18:06:50Z

Since the snapshot restore API allows for changing index settings, these cases (and likely many more) seem possible via the API and it would be great to keep the functionality as transparent as possible so the user doesn't need to be concerned with where the data is being stored under the hood.

@andrross Good Point, Agree with the functionality transparency. logically all the above usecases should be possible, but we will be performing some POCs to verify if all these cases will be supported if snapshot interop feature is enabled for a snapshot.

harishbhakuni · 2023-03-01T18:13:44Z

Since remote backend index is considered as a continuous snapshot and it will gives the user a point on time restore capability, then why we need to have the on-demand snapshot capability regardless if you copy the data or not. Could you provide some of the use-case that still could benefit from the on-demand snapshot?

@anasalkouz remote store feature today just stores the live segments of indices so it do not support point in time restore. to restore indices back to the cluster, user still need to use snapshots.

anasalkouz · 2023-03-01T21:58:26Z

@harishbhakuni21 Thats why I am suggested to improve our snapshot restore user experience, to give the user the flexibility to restore a point on time remote backed index easily and deprecate on-demand snapshot or at least disable it if the index has the remote backed index feature enabled.

harishbhakuni · 2023-03-01T23:52:42Z

@anasalkouz even for remote backed storage, we cannot support Point In Time Restore as mentioned above we only store live segments in remote store right now. supporting PITR is totally a different discussion. as part of this doc, we are proposing if customer do not want data duplication for remote store backed indices, how can we use the segment data in remote store and provide the same snapshot experience as today.

andrross · 2023-03-02T00:17:48Z

supporting PITR is totally a different discussion

@harishbhakuni21 It is a different discussion, but the question is whether on-demand snapshots will have any use case once PITR is supported. If the answer is "no", then this proposal is talking about introducing functionality that will presumably be deprecated once PITR is implemented, and the obvious follow up to that is whether PITR should be prioritized in lieu of implementing on-demand snapshots.

I honestly don't know the answer. There's certainly an argument for meeting users where they are today, and users do have systems and workflows built around the existing snapshot functionality. That being said, a fully implemented PITR feature does seem like it would obviate the need for on-demand snapshots.

sohami · 2023-03-02T17:54:25Z

For a snapshot to be consumable by users, it needs to copy the data to repository and also have a reference marker to that data (which is snapshot metadata). With remote store, the continuous upload of index data is being done however there is no reference to that data copy which can be used to restore. The current snapshot mechanism can be utilized to create that reference (metadata) to the already uploaded index data (with other existing functionalities like snapshot of entire cluster state and snapshot of one or multiple indices, etc). One can also use the existing snapshot mechanism as well to provide PITR sort of capability (not saying this is the only way). e.g. Using the snapshot management plugin or some custom workflows one can continue to use existing snapshot functionality over remote store backed index to create these markers into their snapshot repository (with desired frequency). Snapshot on remote store will mostly be metadata only operation which this RFC is talking about. Each of these snapshots thus generated can be referenced as point in time reference to restore an index.

The RFC also talks about supporting option to copy the data from one repository to other in snapshot format. One of the use case I can think of for this is as a service provider I may want to keep the data repository not exposed or directly accessible to the users. But the users still need to take copies of the data into their own repository for compliance purposes or use cases like powering different OS cluster with same data without incurring the cost of indexing again. For such use cases using snapshot mechanism to actually make copies of data across repositories will be useful. Since users already have systems using the existing snapshot mechanism it will work out of the box for them.

andrross · 2023-03-02T19:50:05Z

Copying data from one repository to another definitely makes sense, and the existing snapshot API will work well for that use case. Perhaps similarly, I also would expect any PITR feature to have a retention policy and not keep all data indefinitely, but there will probably be use cases for manually "pinning" a particular point in time snapshot even within the same repository to keep indefinitely, and the existing snapshot API should work for that as well.

@anasalkouz What do you think?

harishbhakuni added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 24, 2023

harishbhakuni mentioned this issue Mar 7, 2023

[Design Proposal] Snapshot Interoperability with Remote Store #6575

Closed

sachinpkale mentioned this issue Jun 14, 2023

[Remote Store] Delete data from remote store (translog and segments) on index delete #3511

Closed

harishbhakuni mentioned this issue Jul 28, 2023

[DOC] Update PUT Repository API documentation with Snapshot Interoperability w/ Remote Store related changes opensearch-project/documentation-website#4598

Closed

4 tasks

harishbhakuni closed this as completed Aug 18, 2023

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.11.0 - (Launched) in OpenSearch Project Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Proposal] Snapshot Interoperability with Remote Store #6483

[Feature Proposal] Snapshot Interoperability with Remote Store #6483

harishbhakuni commented Feb 24, 2023 •

edited

Loading

nandi-github commented Feb 28, 2023

andrross commented Feb 28, 2023

anasalkouz commented Feb 28, 2023

harishbhakuni commented Mar 1, 2023 •

edited

Loading

harishbhakuni commented Mar 1, 2023 •

edited

Loading

anasalkouz commented Mar 1, 2023

harishbhakuni commented Mar 1, 2023

andrross commented Mar 2, 2023

sohami commented Mar 2, 2023

andrross commented Mar 2, 2023

[Feature Proposal] Snapshot Interoperability with Remote Store #6483

[Feature Proposal] Snapshot Interoperability with Remote Store #6483

Comments

harishbhakuni commented Feb 24, 2023 • edited Loading

nandi-github commented Feb 28, 2023

andrross commented Feb 28, 2023

anasalkouz commented Feb 28, 2023

harishbhakuni commented Mar 1, 2023 • edited Loading

harishbhakuni commented Mar 1, 2023 • edited Loading

anasalkouz commented Mar 1, 2023

harishbhakuni commented Mar 1, 2023

andrross commented Mar 2, 2023

sohami commented Mar 2, 2023

andrross commented Mar 2, 2023

harishbhakuni commented Feb 24, 2023 •

edited

Loading

harishbhakuni commented Mar 1, 2023 •

edited

Loading

harishbhakuni commented Mar 1, 2023 •

edited

Loading