-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Avoid deadlock between competing pushers #5710
Conversation
Perhaps this is unrelated but using the notion of priority seems wrong. Thanks On Wed, Mar 30, 2016 at 10:15 AM Tobias Schottdorf [email protected]
|
"timestamp and transaction ID" missing comma? "* can easily happen if t..." shouldn't have a bullet on this line "* and also higher than that of B." shouldn't have a bullet on this line and the first commit message doesn't describe the actual fix =/ Reviewed 3 of 3 files at r1, 1 of 1 files at r2. storage/replica_command.go, line 1090 [r1] (raw file): storage/replica_command.go, line 1091 [r1] (raw file): storage/replica_command.go, line 1124 [r1] (raw file): storage/replica_command.go, line 996 [r2] (raw file): storage/replica_command.go, line 1083 [r2] (raw file): util/uuid/uuid.go, line 36 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful. util/uuid/uuid.go, line 36 [r1] (raw file): Comments from the review on Reviewable.io |
LGTM Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. storage/replica_command.go, line 1080 [r2] (raw file): Comments from the review on Reviewable.io |
@vivekmenezes maybe. Wanna file the above and include the footnote from my commit message as well? Review status: 2 of 3 files reviewed at latest revision, 7 unresolved discussions. storage/replica_command.go, line 1091 [r1] (raw file): storage/replica_command.go, line 1124 [r1] (raw file): storage/replica_command.go, line 996 [r2] (raw file): storage/replica_command.go, line 1083 [r2] (raw file): util/uuid/uuid.go, line 36 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: 2 of 3 files reviewed at latest revision, 6 unresolved discussions. storage/replica_command.go, line 1080 [r2] (raw file): Comments from the review on Reviewable.io |
Review status: 2 of 3 files reviewed at latest revision, 4 unresolved discussions. storage/replica_command.go, line 1083 [r2] (raw file): Comments from the review on Reviewable.io |
Reviewed 1 of 1 files at r3. storage/replica_command.go, line 996 [r2] (raw file): storage/replica_command.go, line 1083 [r2] (raw file): util/uuid/uuid.go, line 36 [r1] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. storage/replica_command.go, line 996 [r2] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. util/uuid/uuid.go, line 36 [r1] (raw file): Comments from the review on Reviewable.io |
Reviewed 1 of 1 files at r4, 6 of 6 files at r5. Comments from the review on Reviewable.io |
Reviewed 6 of 6 files at r6, 6 of 6 files at r7. Comments from the review on Reviewable.io |
When two transactions try to push each other, one of them must be able to win in order to avoid deadlock. This was (seemingly) achieved by, in that order, using priority, timestamp, and transaction ID as criterions. In particular, when priorities were equal, we chose to let the transaction with the lower timestamp win based on seniority. The issue with that lies in the asynchronicity between the active KV client and the persisted record. While the record is the source of truth, the client may actually operate at a higher timestamp and will supply that higher timestamp with its Pushes. On the other hand, when being pushed, the timestamp used is that of the record, which may be smaller. Consequently, the following situation could occur: * Transactions A and B end up with the same priority, B has the larger ID. This can easily happen if they both lose against a higher-priority transaction[1]. * Based on their persisted record's timestamps, B should lose: it has the higher timestamp. * A operates at a timestamp which is higher than that of its persisted record, and also higher than that of B. * B operates at its true timestamp (for simplicity). Consequently, since priorities are equal, * A can't push B: it uses its client timestamp, which is higher than that of B's record. * B can't push A: it uses its client=real timestamp, but that can't win against A's persisted record (which is lower than B's). This changes fixes this by removing the timestamp from the equation. The larger priority wins, and the greater transaction ID otherwise. There is priority skew as well, but we have already broken this by letting the Txn update from its persisted state when it fails to Push in a prior change. A simple alternative fix would have been to let the larger timestamp win, but this is also dangerous (though it should ultimately work): The persisted timestamp can also be larger than the active timestamp; this commonly happens when a transaction gets pushed but does not know it yet. We update the transaction entry after a failed Push, so this scenario would resolve in practice, but it demonstrates that the lack of synchronicity leads to surprising situations which are best avoided. Fixes cockroachdb#5685 (though low throughput still happens, I'll investigate this more). Also contains a drive-by fix: Use OrigTimestamp as the basis of the heartbeat timeout computation. An abandoned transaction which gets pushed continuously could otherwise be kept alive erroneously (since its timestamp would keep increasing). The fact that this timestamp was persisted to disk exacerbated this: the GC queue would not be able to tell that this was an abandoned record as well. This has not happened in practice. [1]: Independently from this change, we may want to reevaluate this strategy at some point. If A at priority 1 and B at priority 2^16 both lose against priority 2^25, it does not seem fair that both should end up at priority 2^25-1. Instead, it would seem more reasonable that their priority be increased as a fraction of the difference, for instance bumping A by (2^25-1)/2 and B by (2^25-2^16)/2, which preserves the relative ordering of A and B but still lets them approach the priority of their mutual enemy.
Removed txn.Short() (for uuid.Short()) and txn.GetMeta().
When two transactions try to push each other, one of them must be able to win
in order to avoid deadlock. This was (seemingly) achieved by, in that order,
using priority, timestamp and transaction ID as criterions.
In particular, when priorities were equal, we chose to let the transaction with
the lower timestamp win based on seniority.
The issue with that lies in the asynchronicity between the active KV client and
the persisted record. While the record is the source of truth, the client may
actually operate at a higher timestamp and will supply that higher timestamp
with its Pushes. On the other hand, when being pushed, the timestamp used is
that of the record, which may be smaller.
Consequently, the following situation could occur:
timestamp.
Consequently, since priorities are equal,
B's record.
A's persisted record (which is lower than B's).
A simple alternative fix would have been to let the larger timestamp win, but
this is also dangerous (though it should ultimately work): The persisted
timestamp can also be larger than the active timestamp; this commonly happens
when a transaction gets pushed but does not know it yet. We update the
transaction entry after a failed Push, so this scenario would resolve in
practice, but it demonstrates that the lack of synchronicity leads to
surprising situations which are best avoided.
Fixes #5685 (though low throughput still happens, I'll investigate this more).
[1]: Independently from this change, we may want to reevaluate this strategy at
some point. If A at priority 1 and B at priority 2^16 both lose against
priority 2^25, it does not seem fair that both should end up at priority
2^25-1. Instead, it would seem more reasonable that their priority be increased
as a fraction of the difference, for instance bumping A by (2^25-1)/2 and B by
(2^25-2^16)/2, which preserves the relative ordering of A and B but still lets
them approach the priority of their mutual enemy.
This change is