-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restoring a snapshot destroys history between the restored primary and existing replicas #26544
Labels
blocker
:Distributed Indexing/Engine
Anything around managing Lucene and the Translog in an open shard.
v6.0.0-rc1
Comments
jasontedor
added a commit
that referenced
this issue
Sep 8, 2017
This commit removes a norelease from the codebase now that there is a CI job that fails on the norelease pattern being present. Instead, a new issue has been opened to track this one. Relates #26544
jasontedor
added a commit
that referenced
this issue
Sep 8, 2017
This commit removes a norelease from the codebase now that there is a CI job that fails on the norelease pattern being present. Instead, a new issue has been opened to track this one. Relates #26544
jasontedor
added a commit
that referenced
this issue
Sep 8, 2017
This commit removes a norelease from the codebase now that there is a CI job that fails on the norelease pattern being present. Instead, a new issue has been opened to track this one. Relates #26544
Note that the same issue applies when force-allocating an empty / stale primary using the reroute commands. |
bleskes
added a commit
that referenced
this issue
Sep 19, 2017
…#26694) Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore. As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid. Relates #10708 Closes #26544
bleskes
added a commit
that referenced
this issue
Sep 19, 2017
…#26694) Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore. As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid. Relates #10708 Closes #26544
bleskes
added a commit
that referenced
this issue
Sep 19, 2017
…#26694) Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore. As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid. Relates #10708 Closes #26544
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
blocker
:Distributed Indexing/Engine
Anything around managing Lucene and the Translog in an open shard.
v6.0.0-rc1
Restoring a snapshot means that history on any replicas is no longer valid. Without a way to detect this situation, we can end up with a primary divergent from its replicas. We will add a new history UUID to address this situation.
The text was updated successfully, but these errors were encountered: