-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: create new tracing span for each replicatedCmd during application #39425
storage: create new tracing span for each replicatedCmd during application #39425
Conversation
This allows us to remove some messy code around unsetting a local proposal in replicaStateMachine.ApplySideEffects. Release note: None
…ation This change adjusts our handling of contexts and replicatedCmds. We no longer assign a local proposal's context to its replicatedCmds directly. Instead, we always create a new tracing span that follows from this context. This eliminates a whole class of bugs that we have fought to fix in changes like cockroachdb#39203. In fact, we are still seeing issues after the recent refactor when stressing cockroachdb#39390. The change also paves the way for tracing the application of command application on follower replicas. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewed 1 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/storage/replica_application_decoder.go, line 104 at r2 (raw file):
// proposal from the map. We expect this entry to be rejected by // checkForcedErr. cmd.raftCmd.MaxLeaseIndex == cmd.proposal.command.MaxLeaseIndex
so this is... a fix unrelated to the tracing changes?
pkg/storage/replica_application_state_machine.go, line 854 at r1 (raw file):
log.Fatalf(ctx, "command already applied: %+v; unexpected successful result", cmd) } cmd.proposal = nil
how come this went away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/storage/replica_application_decoder.go, line 104 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
so this is... a fix unrelated to the tracing changes?
Well, it was a workaround that we got to remove now that we don't have to worry about use-after-free bugs. Forking the context span fixes all of these kinds of issues.
pkg/storage/replica_application_state_machine.go, line 854 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
how come this went away?
Because we weren't allowed to call ProposalData.finishApplication
multiple times. Doing this wasn't really a great way to approach the problem.
bors r+ |
39425: storage: create new tracing span for each replicatedCmd during application r=nvanbenschoten a=nvanbenschoten This change adjusts our handling of contexts and replicatedCmds. We no longer assign a local proposal's context to its replicatedCmds directly. Instead, we always create a new tracing span that follows from this context. This eliminates a whole class of bugs that we have fought to fix in changes like #39203. In fact, we are still seeing issues after the recent refactor when stressing #39390. The change also paves the way for tracing the application of command application on follower replicas. Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
@@ -112,10 +118,10 @@ func (c *replicatedCmd) Rejected() bool { | |||
|
|||
// FinishAndAckOutcome implements the apply.AppliedCommand interface. | |||
func (c *replicatedCmd) FinishAndAckOutcome() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's documented that this must only be called once, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is in #39468.
Curious whether we're just throwing the towel because it's not worth to fix it, or whether you found something that means that we would never be able to fix it. |
I had a different fix for this but it was going to make things even harder on @andreimatei in #39431. Talking to him set me straight and made me realize that this is free in the common case where there is no tracing span already in the proposal's context and it avoids the entire class of bugs that we had been seeing. I consider this the correct fix given the lifetimes of these |
TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert cockroachdb#39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes cockroachdb#40478 Release note: None
TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert cockroachdb#39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes cockroachdb#40478 Release note: None
40505: storage: prevent reproposals of applied commands r=nvanbenschoten a=ajwerner TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert #39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes #40478 Release note: None Co-authored-by: Andrew Werner <[email protected]>
Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.
Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.
72642: democluster: improve logging/tracing r=otan a=knz All commits but the last 3 from #72644 (Reviewers: only review last 3 commits) Informs #58938 72727: tracing: misc cleanups r=andreimatei a=andreimatei See individual commits. 72738: kvserver: include Raft application on leaseholder in request trace r=andreimatei a=andreimatei Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in #39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although I don't think that it does because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 Regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until #59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches #70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication. 72896: server: don't mix Tracers in test tenant r=andreimatei a=andreimatei Before this patch, TestServer.StartTenant() would share the Stopper between the server and the tenant, and it would also proceed to give the Tenant a new Tracer. The stopper in question has the server's Tracer in it, and so this sharing is no bueno, because the tenant ends up having two different Tracers, and it uses either of them fairly arbitrarily at different points. Intermingling two Tracers like this was never intended, and attempting to create a child with a different tracer then the parent will become illegal shortly. This shows that sharing stoppers between different "server" is a bad idea. This patch stops the practice. The patch also removes the tracer config options for test tenants because it was unused and supporting it in combination with the stopper configuration would require more sanity checks. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]>
This change adjusts our handling of contexts and replicatedCmds. We no longer
assign a local proposal's context to its replicatedCmds directly. Instead,
we always create a new tracing span that follows from this context. This
eliminates a whole class of bugs that we have fought to fix in changes
like #39203. In fact, we are still seeing issues after the recent refactor
when stressing #39390.
The change also paves the way for tracing the application of command application
on follower replicas.