Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Snapshot Interoperability with Remote Store #6483

Closed
harishbhakuni opened this issue Feb 24, 2023 · 10 comments
Closed

[Feature Proposal] Snapshot Interoperability with Remote Store #6483

harishbhakuni opened this issue Feb 24, 2023 · 10 comments
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request feedback needed Issue or PR needs feedback idea Things we're kicking around. Indexing & Search RFC Issues requesting major changes Search Search query, autocomplete ...etc Storage:Durability Issues and PRs related to the durability framework

Comments

@harishbhakuni
Copy link
Contributor

harishbhakuni commented Feb 24, 2023

Problem Description:
Today, Snapshot Feature in OpenSearch allows user to create an on-demand snapshot of the entire cluster data and metadata in a configured repository. Snapshot mechanism is used for recovering from red indices or even for cluster level recoveries in the event of failures.

Now, with Remote Store Feature, we are giving user an option to create indices which will store the translog data in remote translog store for each request and segment to remote segment store at refresh intervals. This means that the data is already present in remote store repository for such indices.

As the data for remote store backed indices is already stored in a repository, creating a snapshot of these indices will duplicate the segment data. For cost reasons, users may not want to store data in multiple repositories: a) One as part of snapshot and b) Other as part of remote store backed indices. For these use cases it would be beneficial to still provide the out of box snapshot experience but only keep 1 copy of the data. There can be other use cases as well where copying data is required as part of snapshot so we should also provide flexibility to users to choose between both the options.

Proposed Solution:
The Idea is to keep the reference to remote store metadata file in snapshot shard metadata.
So, remote store feature uses a metadata file to store the name of segment files which are live during refresh. On high level, we want to keep those metadata files for remote store backed indices in snapshot metadata and during snapshot restore operation, we will call remote store feature to restore segment files using those metadata files.

Some Key Points:

  • We still need to support renaming of indices during restore operation to restore an existing index with a different name.
  • Users can have different repositories for snapshot and remote store.
  • Restoring a remote store backed index from snapshot as a searchable snapshot index will not be supported in phase 1. However, for cases where complete data copy is taken in snapshot repository it should still work out of the box. For other case, we will provide the support in phase 2.
  • As we are just referring the data stored in remote segment store, Snapshot Status api will not have any incremental file size and file count details for remote store backed indices.

Next Steps:

  1. Any general comments about the feature are helpful. Also let us know if this feature would help any of your current use-case.
  2. We will share design and POCs for the proposed solution going forward.
  3. We will create and work on different issues needed to support this feature.
@harishbhakuni harishbhakuni added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 24, 2023
@kartg kartg added feedback needed Issue or PR needs feedback idea Things we're kicking around. Indexing & Search distributed framework Storage:Durability Issues and PRs related to the durability framework RFC Issues requesting major changes Search Search query, autocomplete ...etc and removed untriaged labels Feb 27, 2023
@nandi-github
Copy link

Do we need to consider both option, label in place and/or create a copy?

@andrross
Copy link
Member

Thanks @harishbhakuni21, this looks like a great enhancement to remote-backed storage and snapshots!

Other than the callout about searchable snapshots, are there any other limitations for the snapshot restore API? Some of the potential cases I'm thinking about:

  • restoring a remote-backed index snapshot to a non-remote-backed index
  • restoring a remote-backed index snapshot to a different remote store repository
  • restoring a traditional snapshot to a remote-backed index

Since the snapshot restore API allows for changing index settings, these cases (and likely many more) seem possible via the API and it would be great to keep the functionality as transparent as possible so the user doesn't need to be concerned with where the data is being stored under the hood.

@anasalkouz
Copy link
Member

Thanks @harishbhakuni21 for your suggested enhancements.

For these use cases it would be beneficial to still provide the out of box snapshot experience but only keep 1 copy of the data. There can be other use cases as well where copying data is required as part of snapshot so we should also provide flexibility to users to choose between both the options.

Since remote backend index is considered as a continuous snapshot and it will gives the user a point on time restore capability, then why we need to have the on-demand snapshot capability regardless if you copy the data or not. Could you provide some of the use-case that still could benefit from the on-demand snapshot?

In my opinion, I would improve our snapshot restore user experience, to give the user the flexibility to restore a point on time remote backed index easily and deprecate on-demand snapshot or at least disable it if the index has the remote backed index feature enabled

@harishbhakuni
Copy link
Contributor Author

harishbhakuni commented Mar 1, 2023

Since the snapshot restore API allows for changing index settings, these cases (and likely many more) seem possible via the API and it would be great to keep the functionality as transparent as possible so the user doesn't need to be concerned with where the data is being stored under the hood.

@andrross Good Point, Agree with the functionality transparency. logically all the above usecases should be possible, but we will be performing some POCs to verify if all these cases will be supported if snapshot interop feature is enabled for a snapshot.

@harishbhakuni
Copy link
Contributor Author

harishbhakuni commented Mar 1, 2023

Since remote backend index is considered as a continuous snapshot and it will gives the user a point on time restore capability, then why we need to have the on-demand snapshot capability regardless if you copy the data or not. Could you provide some of the use-case that still could benefit from the on-demand snapshot?

@anasalkouz remote store feature today just stores the live segments of indices so it do not support point in time restore. to restore indices back to the cluster, user still need to use snapshots.

@anasalkouz
Copy link
Member

@harishbhakuni21 Thats why I am suggested to improve our snapshot restore user experience, to give the user the flexibility to restore a point on time remote backed index easily and deprecate on-demand snapshot or at least disable it if the index has the remote backed index feature enabled.

@harishbhakuni
Copy link
Contributor Author

@anasalkouz even for remote backed storage, we cannot support Point In Time Restore as mentioned above we only store live segments in remote store right now. supporting PITR is totally a different discussion. as part of this doc, we are proposing if customer do not want data duplication for remote store backed indices, how can we use the segment data in remote store and provide the same snapshot experience as today.

@andrross
Copy link
Member

andrross commented Mar 2, 2023

supporting PITR is totally a different discussion

@harishbhakuni21 It is a different discussion, but the question is whether on-demand snapshots will have any use case once PITR is supported. If the answer is "no", then this proposal is talking about introducing functionality that will presumably be deprecated once PITR is implemented, and the obvious follow up to that is whether PITR should be prioritized in lieu of implementing on-demand snapshots.

I honestly don't know the answer. There's certainly an argument for meeting users where they are today, and users do have systems and workflows built around the existing snapshot functionality. That being said, a fully implemented PITR feature does seem like it would obviate the need for on-demand snapshots.

@sohami
Copy link
Collaborator

sohami commented Mar 2, 2023

For a snapshot to be consumable by users, it needs to copy the data to repository and also have a reference marker to that data (which is snapshot metadata). With remote store, the continuous upload of index data is being done however there is no reference to that data copy which can be used to restore. The current snapshot mechanism can be utilized to create that reference (metadata) to the already uploaded index data (with other existing functionalities like snapshot of entire cluster state and snapshot of one or multiple indices, etc). One can also use the existing snapshot mechanism as well to provide PITR sort of capability (not saying this is the only way). e.g. Using the snapshot management plugin or some custom workflows one can continue to use existing snapshot functionality over remote store backed index to create these markers into their snapshot repository (with desired frequency). Snapshot on remote store will mostly be metadata only operation which this RFC is talking about. Each of these snapshots thus generated can be referenced as point in time reference to restore an index.

The RFC also talks about supporting option to copy the data from one repository to other in snapshot format. One of the use case I can think of for this is as a service provider I may want to keep the data repository not exposed or directly accessible to the users. But the users still need to take copies of the data into their own repository for compliance purposes or use cases like powering different OS cluster with same data without incurring the cost of indexing again. For such use cases using snapshot mechanism to actually make copies of data across repositories will be useful. Since users already have systems using the existing snapshot mechanism it will work out of the box for them.

@andrross
Copy link
Member

andrross commented Mar 2, 2023

Copying data from one repository to another definitely makes sense, and the existing snapshot API will work well for that use case. Perhaps similarly, I also would expect any PITR feature to have a retention policy and not keep all data indefinitely, but there will probably be use cases for manually "pinning" a particular point in time snapshot even within the same repository to keep indefinitely, and the existing snapshot API should work for that as well.

@anasalkouz What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request feedback needed Issue or PR needs feedback idea Things we're kicking around. Indexing & Search RFC Issues requesting major changes Search Search query, autocomplete ...etc Storage:Durability Issues and PRs related to the durability framework
Projects
Status: 2.11.0 - (Launched)
Development

No branches or pull requests

6 participants