-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: fix disappearing RHS merge bug #31842
Conversation
Ok, this is RFAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/storage/replica_command.go, line 630 at r1 (raw file):
retryOpts := base.DefaultRetryOptions() retryOpts.MaxRetries = 5 retryLoop:
Feel free to dismiss this suggestion, but I always try to avoid labels because they're hard to follow. Here it seems that you could just drop the continue
below and add a check if lastErr != nil { lastErr = nil; continue }
before the return nil
. You don't need to propagate lastErr
outside of the retry loop because at that point you know that all replicas caught the same error, so you can just return one you manufacture right at the end.
pkg/storage/replica_command.go, line 631 at r1 (raw file):
retryOpts.MaxRetries = 5 retryLoop: for retrier := retry.StartWithCtx(ctx, retryOpts); retrier.Next(); {
Can we fail-fast if we're not the leader? Seems silly to spend a number of retries here and block merges that could actually proceed.
pkg/storage/replica_command.go, line 640 at r1 (raw file):
// that the replicas are initialized, and the merge will (hopefully) be // retried on the leader. if raftStatus.Progress[uint64(r.ReplicaID)].Match == 0 {
raftStatus
could be nil
. Probably not in reality since you've already somehow made sure that it's initialized before getting here, but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with @tschottdorf 's comments
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tobi and I chatted and decided that it would be vastly simpler to just introduce a WaitForReplicaInit RPC into the stores server. Neither of us were fully confident that we wouldn't get stuck in a leader-but-not-leaseholder loop with the previous approach, and the code to transfer Raft leadership to match the leaseholdership is, uhm, hairy. (For example: do you want transferRaftLeader or transferRaftLeadership?)
PTAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/replica_command.go, line 630 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Feel free to dismiss this suggestion, but I always try to avoid labels because they're hard to follow. Here it seems that you could just drop the
continue
below and add a checkif lastErr != nil { lastErr = nil; continue }
before thereturn nil
. You don't need to propagatelastErr
outside of the retry loop because at that point you know that all replicas caught the same error, so you can just return one you manufacture right at the end.
No longer relevant.
pkg/storage/replica_command.go, line 631 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Can we fail-fast if we're not the leader? Seems silly to spend a number of retries here and block merges that could actually proceed.
No longer relevant.
pkg/storage/replica_command.go, line 640 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
raftStatus
could benil
. Probably not in reality since you've already somehow made sure that it's initialized before getting here, but still.
No longer relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious to know if this has implications for backporting. I don't love the idea of introducing a new RPC in a patch release.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this RPC in a patch release is fine since it'll only be used if the cluster setting is enabled, and that setting isn't changing in a patch release. Even if the user upgrades directly from 2.1.0 to 2.2.0 (which will flip the default), the loop that calls this RPC should just bail out and abort the merge if it gets an unknown RPC error.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/stores_server.go, line 119 at r2 (raw file):
resp := &WaitForReplicaInitResponse{} err := is.execStoreCommand(req.StoreRequestHeader, func(s *Store) error { retryOpts := retry.Options{InitialBackoff: 10 * time.Millisecond}
There should be an upper bound on this retry loop. Better to return an error and abort the merge than to hang indefinitely.
The upper bound is enforced by context timeouts. (This is how WaitForApplication works too.) The client sets the timeout at 5s currently. Is that sufficient or do you want me to add a server-side upper bound too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
pkg/storage/replica_command.go, line 633 at r2 (raw file):
g := ctxgroup.WithContext(ctx) for _, repl := range desc.Replicas { repl := repl
Things would get really awkward (and incorrect) without this line. I hate that this is necessary but that's the Go way. Please add a comment.
pkg/storage/stores_server.go, line 119 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
There should be an upper bound on this retry loop. Better to return an error and abort the merge than to hang indefinitely.
Having the caller use context cancellation works for me, but add a comment here.
That's fine for now. I generally prefer limits set on both sides, but it doesn't need to change until we get around to doing a general review of all of our retry loops. |
06ef494
to
7095be2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
pkg/storage/replica_command.go, line 633 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Things would get really awkward (and incorrect) without this line. I hate that this is necessary but that's the Go way. Please add a comment.
Done.
pkg/storage/stores_server.go, line 119 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Having the caller use context cancellation works for me, but add a comment here.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanna fix your imports and get this baby in?
Reviewed 3 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale)
The strategy used by the replica GC queue to determine whether a subsumed range can be GC'd is flawed. If a replica of the LHS was uninitialized at the time the merge commmitted, there is a small window where the replica GC queue can think that it's safe to clean up an RHS replica when in fact the uninitialized LHS replica could still initialize and apply a merge trigger that required that RHS to be present. Make the replica GC queue's strategy valid by requiring that all replicas of the LHS are initialized before beginning a merge transaction. This closes the window during which a replica of the RHS could be incorrectly GC'd with a patch that is small enough to be backported to v2.1.1. Fix #31719. Release note: None
bors r=tschottdorf,bdarnell |
31842: storage: fix disappearing RHS merge bug r=tschottdorf,bdarnell a=benesch The strategy used by the replica GC queue to determine whether a subsumed range can be GC'd is flawed. If a replica of the LHS was uninitialized at the time the merge commmitted, there is a small window where the replica GC queue can think that it's safe to clean up an RHS replica when in fact the uninitialized LHS replica could still initialize and apply a merge trigger that required that RHS to be present. Make the replica GC queue's strategy valid by requiring that all replicas of the LHS are initialized before beginning a merge transaction. This closes the window during which a replica of the RHS could be incorrectly GC'd with a patch that is small enough to be backported to v2.1.1. Fix #31719. Release note: None Co-authored-by: Nikhil Benesch <[email protected]>
Build succeeded |
Whew, got it in for this night's test run. 🤞 |
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.
Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.
Fix #31719.
Release note: None