storage: create new tracing span for each replicatedCmd during application #39425

nvanbenschoten · 2019-08-07T21:23:34Z

This change adjusts our handling of contexts and replicatedCmds. We no longer
assign a local proposal's context to its replicatedCmds directly. Instead,
we always create a new tracing span that follows from this context. This
eliminates a whole class of bugs that we have fought to fix in changes
like #39203. In fact, we are still seeing issues after the recent refactor
when stressing #39390.

The change also paves the way for tracing the application of command application
on follower replicas.

This allows us to remove some messy code around unsetting a local proposal in replicaStateMachine.ApplySideEffects. Release note: None

…ation This change adjusts our handling of contexts and replicatedCmds. We no longer assign a local proposal's context to its replicatedCmds directly. Instead, we always create a new tracing span that follows from this context. This eliminates a whole class of bugs that we have fought to fix in changes like cockroachdb#39203. In fact, we are still seeing issues after the recent refactor when stressing cockroachdb#39390. The change also paves the way for tracing the application of command application on follower replicas. Release note: None

cockroach-teamcity · 2019-08-07T21:23:41Z

This change is

andreimatei

LGTM

Reviewed 1 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

pkg/storage/replica_application_decoder.go, line 104 at r2 (raw file):

			// proposal from the map. We expect this entry to be rejected by
			// checkForcedErr.
			cmd.raftCmd.MaxLeaseIndex == cmd.proposal.command.MaxLeaseIndex

so this is... a fix unrelated to the tracing changes?

pkg/storage/replica_application_state_machine.go, line 854 at r1 (raw file):

				log.Fatalf(ctx, "command already applied: %+v; unexpected successful result", cmd)
			}
			cmd.proposal = nil

how come this went away?

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/storage/replica_application_decoder.go, line 104 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

so this is... a fix unrelated to the tracing changes?

Well, it was a workaround that we got to remove now that we don't have to worry about use-after-free bugs. Forking the context span fixes all of these kinds of issues.

pkg/storage/replica_application_state_machine.go, line 854 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

how come this went away?

Because we weren't allowed to call ProposalData.finishApplication multiple times. Doing this wasn't really a great way to approach the problem.

nvanbenschoten · 2019-08-07T22:16:00Z

bors r+

39425: storage: create new tracing span for each replicatedCmd during application r=nvanbenschoten a=nvanbenschoten This change adjusts our handling of contexts and replicatedCmds. We no longer assign a local proposal's context to its replicatedCmds directly. Instead, we always create a new tracing span that follows from this context. This eliminates a whole class of bugs that we have fought to fix in changes like #39203. In fact, we are still seeing issues after the recent refactor when stressing #39390. The change also paves the way for tracing the application of command application on follower replicas. Co-authored-by: Nathan VanBenschoten <[email protected]>

craig · 2019-08-07T22:47:24Z

Build succeeded

GitHub CI (Cockroach)

tbg · 2019-08-08T09:47:07Z

pkg/storage/replica_application_cmd.go

@@ -112,10 +118,10 @@ func (c *replicatedCmd) Rejected() bool {

 // FinishAndAckOutcome implements the apply.AppliedCommand interface.
 func (c *replicatedCmd) FinishAndAckOutcome() error {


It's documented that this must only be called once, right?

It is in #39468.

tbg · 2019-08-08T09:47:34Z

In fact, we are still seeing issues after the recent refactor
when stressing #39390.

Curious whether we're just throwing the towel because it's not worth to fix it, or whether you found something that means that we would never be able to fix it.

nvanbenschoten · 2019-08-08T19:17:31Z

Curious whether we're just throwing the towel because it's not worth to fix it, or whether you found something that means that we would never be able to fix it.

I had a different fix for this but it was going to make things even harder on @andreimatei in #39431. Talking to him set me straight and made me realize that this is free in the common case where there is no tracing span already in the proposal's context and it avoids the entire class of bugs that we had been seeing.

I consider this the correct fix given the lifetimes of these replicatedCmds and how they interact with the lifetime of the proposer's context. That's especially true once we start acking clients before entry application more frequently.

TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert cockroachdb#39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes cockroachdb#40478 Release note: None

40505: storage: prevent reproposals of applied commands r=nvanbenschoten a=ajwerner TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was merged. Prior to that change there was an invariant that if a command was reproposed at a higher MaxLeaseIndex then the client would only be acknowledged by a command which applied at that higher MaxLeaseIndex. That change also worked to simplify context lifecycle management for replicatedCmd's by creating individual child spans for each application. This was not a complete solution however because all of the commands derived from the same proposal share a context when used for reproposals. As a consequence, commands which are reproposed and are at a higher MaxLeaseIndex than an already applied command would use a context which carried a tracing span which might already be finished. Several approaches were explored to fix this issue. The one chosen here seems to be the least invasive. The rotten test has been simplified to cover the now problematic case. The enabling mechanism for the testing is a hammer of a TestingKnob which will always refresh unconditionally all pending proposals in the proposals map at the end of a raft ready iteration. The new test fails reliably under stress in ~10s of iterations and <5s before making the change to delete proposals after they've been applied. An alternative approach would have been to partially revert #39425 and ensure that only commands which carry the highest MaxLeaseIndex for a proposal may be locally applied. If it's deemed cleaner and simpler then we can go with it. This approach prevents some reproposals and allows clients of commands which will fail due to non-equivalent lease changes to be acknowledged sooner of their need to retry. Fixes #40478 Release note: None Co-authored-by: Andrew Werner <[email protected]>

Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in cockroachdb#39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although generally it doesn't because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 In case of sync consensus requests, it's possible for the Raft application span to outlive the evaluation span in cases when the application part (as opposed to the consensus part) can be done async (see CanAckBeforeApplication). Still, regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until cockroachdb#59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches cockroachdb#70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication.

72642: democluster: improve logging/tracing r=otan a=knz All commits but the last 3 from #72644 (Reviewers: only review last 3 commits) Informs #58938 72727: tracing: misc cleanups r=andreimatei a=andreimatei See individual commits. 72738: kvserver: include Raft application on leaseholder in request trace r=andreimatei a=andreimatei Before this patch, the decoding and application of a Raft command was not included in the recording of the request that generated the respective command, even in the case where consensus is synchronous with respect to the request (i.e. non-AsyncConsensus requests). This was because, although we plumb tracing information below Raft, the span under which Raft processing was occurring was Fork()ed from the parent span (i.e. the request evaluation's span). The reasons why that was are not good: 1) forking (as opposed to creating a regular child) was introduced in #39425. I'm not sure whether there was a particular reason for this decision. Perhaps there was fear at the time about the child outliving the parent - although I don't think that it does because, in the case of async consensus, we fork a parent which I think will outlive the child: https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157 Regardless of the exact details of the lifetimes, with time it has become clear that it's appropriate to create a child when it's expected that it will usually not outlive the parent even if, on occasion, it can outlive it. 2) forked spans used to be included in the parent's recording until #59815. This explains why we regressed here - Raft application used to be included in recordings properly. This patch makes the application span be a regular child, and so the raft application span is included in the request trace. Touches #70864. That issue asks for a new tracing feature but I think what motivated it is the application info missing from query traces. This patch is sufficient for that. Release note (general change): Tracing transaction commits now includes details about replication. 72896: server: don't mix Tracers in test tenant r=andreimatei a=andreimatei Before this patch, TestServer.StartTenant() would share the Stopper between the server and the tenant, and it would also proceed to give the Tenant a new Tracer. The stopper in question has the server's Tracer in it, and so this sharing is no bueno, because the tenant ends up having two different Tracers, and it uses either of them fairly arbitrarily at different points. Intermingling two Tracers like this was never intended, and attempting to create a child with a different tracer then the parent will become illegal shortly. This shows that sharing stoppers between different "server" is a bad idea. This patch stops the practice. The patch also removes the tracer config options for test tenants because it was unused and supporting it in combination with the stopper configuration would require more sanity checks. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]>

nvanbenschoten added 2 commits August 7, 2019 15:15

storage: make ProposalData.finishApplication safe to call multiple times

c8d1cbb

This allows us to remove some messy code around unsetting a local proposal in replicaStateMachine.ApplySideEffects. Release note: None

nvanbenschoten requested a review from andreimatei August 7, 2019 21:23

andreimatei approved these changes Aug 7, 2019

View reviewed changes

nvanbenschoten commented Aug 7, 2019

View reviewed changes

craig bot merged commit de8b047 into cockroachdb:master Aug 7, 2019

tbg reviewed Aug 8, 2019

View reviewed changes

nvanbenschoten deleted the nvanbenschoten/traceSpanPerCmd branch August 9, 2019 03:06

ajwerner mentioned this pull request Sep 5, 2019

storage: prevent reproposals of applied commands #40505

Merged

andreimatei mentioned this pull request Nov 14, 2021

kvserver: include Raft application on leaseholder in request trace #72738

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: create new tracing span for each replicatedCmd during application #39425

storage: create new tracing span for each replicatedCmd during application #39425

nvanbenschoten commented Aug 7, 2019

cockroach-teamcity commented Aug 7, 2019

andreimatei left a comment

nvanbenschoten left a comment

nvanbenschoten commented Aug 7, 2019

craig bot commented Aug 7, 2019

tbg Aug 8, 2019

nvanbenschoten Aug 8, 2019

tbg commented Aug 8, 2019

nvanbenschoten commented Aug 8, 2019

		@@ -112,10 +118,10 @@ func (c *replicatedCmd) Rejected() bool {

		// FinishAndAckOutcome implements the apply.AppliedCommand interface.
		func (c *replicatedCmd) FinishAndAckOutcome() error {

storage: create new tracing span for each replicatedCmd during application #39425

storage: create new tracing span for each replicatedCmd during application #39425

Conversation

nvanbenschoten commented Aug 7, 2019

cockroach-teamcity commented Aug 7, 2019

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Aug 7, 2019

craig bot commented Aug 7, 2019

Build succeeded

tbg Aug 8, 2019

Choose a reason for hiding this comment

nvanbenschoten Aug 8, 2019

Choose a reason for hiding this comment

tbg commented Aug 8, 2019

nvanbenschoten commented Aug 8, 2019