Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: create new tracing span for each replicatedCmd during application #39425

Merged

Conversation

nvanbenschoten
Copy link
Member

This change adjusts our handling of contexts and replicatedCmds. We no longer
assign a local proposal's context to its replicatedCmds directly. Instead,
we always create a new tracing span that follows from this context. This
eliminates a whole class of bugs that we have fought to fix in changes
like #39203. In fact, we are still seeing issues after the recent refactor
when stressing #39390.

The change also paves the way for tracing the application of command application
on follower replicas.

This allows us to remove some messy code around unsetting a local proposal
in replicaStateMachine.ApplySideEffects.

Release note: None
…ation

This change adjusts our handling of contexts and replicatedCmds. We no longer
assign a local proposal's context to its replicatedCmds directly. Instead,
we always create a new tracing span that follows from this context. This
eliminates a whole class of bugs that we have fought to fix in changes
like cockroachdb#39203. In fact, we are still seeing issues after the recent refactor
when stressing cockroachdb#39390.

The change also paves the way for tracing the application of command application
on follower replicas.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewed 1 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/storage/replica_application_decoder.go, line 104 at r2 (raw file):

			// proposal from the map. We expect this entry to be rejected by
			// checkForcedErr.
			cmd.raftCmd.MaxLeaseIndex == cmd.proposal.command.MaxLeaseIndex

so this is... a fix unrelated to the tracing changes?


pkg/storage/replica_application_state_machine.go, line 854 at r1 (raw file):

				log.Fatalf(ctx, "command already applied: %+v; unexpected successful result", cmd)
			}
			cmd.proposal = nil

how come this went away?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/storage/replica_application_decoder.go, line 104 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

so this is... a fix unrelated to the tracing changes?

Well, it was a workaround that we got to remove now that we don't have to worry about use-after-free bugs. Forking the context span fixes all of these kinds of issues.


pkg/storage/replica_application_state_machine.go, line 854 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

how come this went away?

Because we weren't allowed to call ProposalData.finishApplication multiple times. Doing this wasn't really a great way to approach the problem.

@nvanbenschoten
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Aug 7, 2019
39425: storage: create new tracing span for each replicatedCmd during application r=nvanbenschoten a=nvanbenschoten

This change adjusts our handling of contexts and replicatedCmds. We no longer
assign a local proposal's context to its replicatedCmds directly. Instead,
we always create a new tracing span that follows from this context. This
eliminates a whole class of bugs that we have fought to fix in changes
like #39203. In fact, we are still seeing issues after the recent refactor
when stressing #39390.

The change also paves the way for tracing the application of command application
on follower replicas.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 7, 2019

Build succeeded

@craig craig bot merged commit de8b047 into cockroachdb:master Aug 7, 2019
@@ -112,10 +118,10 @@ func (c *replicatedCmd) Rejected() bool {

// FinishAndAckOutcome implements the apply.AppliedCommand interface.
func (c *replicatedCmd) FinishAndAckOutcome() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's documented that this must only be called once, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in #39468.

@tbg
Copy link
Member

tbg commented Aug 8, 2019

In fact, we are still seeing issues after the recent refactor
when stressing #39390.

Curious whether we're just throwing the towel because it's not worth to fix it, or whether you found something that means that we would never be able to fix it.

@nvanbenschoten
Copy link
Member Author

Curious whether we're just throwing the towel because it's not worth to fix it, or whether you found something that means that we would never be able to fix it.

I had a different fix for this but it was going to make things even harder on @andreimatei in #39431. Talking to him set me straight and made me realize that this is free in the common case where there is no tracing span already in the proposal's context and it avoids the entire class of bugs that we had been seeing.

I consider this the correct fix given the lifetimes of these replicatedCmds and how they interact with the lifetime of the proposer's context. That's especially true once we start acking clients before entry application more frequently.

@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/traceSpanPerCmd branch August 9, 2019 03:06
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 5, 2019
TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.

Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.

The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.

An alternative approach would have been to partially revert cockroachdb#39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.

Fixes cockroachdb#40478

Release note: None
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Sep 5, 2019
TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when cockroachdb#39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.

Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.

The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.

An alternative approach would have been to partially revert cockroachdb#39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.

Fixes cockroachdb#40478

Release note: None
craig bot pushed a commit that referenced this pull request Sep 5, 2019
40505: storage: prevent reproposals of applied commands r=nvanbenschoten a=ajwerner

TestHighestMaxLeaseIndexReproposalFinishesCommand rotted when #39425 was
merged. Prior to that change there was an invariant that if a command was
reproposed at a higher MaxLeaseIndex then the client would only be acknowledged
by a command which applied at that higher MaxLeaseIndex. That change also
worked to simplify context lifecycle management for replicatedCmd's by creating
individual child spans for each application. This was not a complete solution
however because all of the commands derived from the same proposal share a
context when used for reproposals. As a consequence, commands which are
reproposed and are at a higher MaxLeaseIndex than an already applied command
would use a context which carried a tracing span which might already be
finished.

Several approaches were explored to fix this issue. The one chosen here seems
to be the least invasive.

The rotten test has been simplified to cover the now problematic case. The
enabling mechanism for the testing is a hammer of a TestingKnob which will
always refresh unconditionally all pending proposals in the proposals map
at the end of a raft ready iteration. The new test fails reliably under
stress in ~10s of iterations and <5s before making the change to delete
proposals after they've been applied.

An alternative approach would have been to partially revert #39425 and
ensure that only commands which carry the highest MaxLeaseIndex for a proposal
may be locally applied. If it's deemed cleaner and simpler then we can go with
it. This approach prevents some reproposals and allows clients of commands
which will fail due to non-equivalent lease changes to be acknowledged sooner
of their need to retry.

Fixes #40478

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Nov 15, 2021
Before this patch, the decoding and application of a Raft command was
not included in the recording of the request that generated the
respective command, even in the case where consensus is synchronous with
respect to the request (i.e. non-AsyncConsensus requests). This was
because, although we plumb tracing information below Raft, the span
under which Raft processing was occurring was Fork()ed from the parent
span (i.e. the request evaluation's span). The reasons why that was are
not good:

1) forking (as opposed to creating a regular child) was introduced in
   cockroachdb#39425. I'm not sure whether there was a particular reason for this
   decision. Perhaps there was fear at the time about the child
   outliving the parent - although generally it doesn't because,
   in the case of async consensus, we fork a parent which I think will
   outlive the child:
   https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157

   In case of sync consensus requests, it's possible for the Raft application
   span to outlive the evaluation span in cases when the application part
   (as opposed to the consensus part) can be done async (see
   CanAckBeforeApplication). Still, regardless of the exact details of the
   lifetimes, with time it has become clear that it's appropriate to create
   a child when it's expected that it will usually not outlive the parent
   even if, on occasion, it can outlive it.

2) forked spans used to be included in the parent's recording until cockroachdb#59815.
   This explains why we regressed here - Raft application used to be
   included in recordings properly.

This patch makes the application span be a regular child, and so the
raft application span is included in the request trace.

Touches cockroachdb#70864. That issue asks for a new tracing feature but I think
what motivated it is the application info missing from query traces.
This patch is sufficient for that.

Release note (general change): Tracing transaction commits now includes
details about replication.
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Nov 18, 2021
Before this patch, the decoding and application of a Raft command was
not included in the recording of the request that generated the
respective command, even in the case where consensus is synchronous with
respect to the request (i.e. non-AsyncConsensus requests). This was
because, although we plumb tracing information below Raft, the span
under which Raft processing was occurring was Fork()ed from the parent
span (i.e. the request evaluation's span). The reasons why that was are
not good:

1) forking (as opposed to creating a regular child) was introduced in
   cockroachdb#39425. I'm not sure whether there was a particular reason for this
   decision. Perhaps there was fear at the time about the child
   outliving the parent - although generally it doesn't because,
   in the case of async consensus, we fork a parent which I think will
   outlive the child:
   https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157

   In case of sync consensus requests, it's possible for the Raft application
   span to outlive the evaluation span in cases when the application part
   (as opposed to the consensus part) can be done async (see
   CanAckBeforeApplication). Still, regardless of the exact details of the
   lifetimes, with time it has become clear that it's appropriate to create
   a child when it's expected that it will usually not outlive the parent
   even if, on occasion, it can outlive it.

2) forked spans used to be included in the parent's recording until cockroachdb#59815.
   This explains why we regressed here - Raft application used to be
   included in recordings properly.

This patch makes the application span be a regular child, and so the
raft application span is included in the request trace.

Touches cockroachdb#70864. That issue asks for a new tracing feature but I think
what motivated it is the application info missing from query traces.
This patch is sufficient for that.

Release note (general change): Tracing transaction commits now includes
details about replication.
craig bot pushed a commit that referenced this pull request Nov 18, 2021
72642: democluster: improve logging/tracing r=otan a=knz

All commits but the last 3 from #72644
(Reviewers: only review last 3 commits)

Informs #58938

72727: tracing: misc cleanups r=andreimatei a=andreimatei

See individual commits.

72738: kvserver: include Raft application on leaseholder in request trace  r=andreimatei a=andreimatei

Before this patch, the decoding and application of a Raft command was
not included in the recording of the request that generated the
respective command, even in the case where consensus is synchronous with
respect to the request (i.e. non-AsyncConsensus requests). This was
because, although we plumb tracing information below Raft, the span
under which Raft processing was occurring was Fork()ed from the parent
span (i.e. the request evaluation's span). The reasons why that was are
not good:

1) forking (as opposed to creating a regular child) was introduced in
   #39425. I'm not sure whether there was a particular reason for this
   decision. Perhaps there was fear at the time about the child
   outliving the parent - although I don't think that it does because,
   in the case of async consensus, we fork a parent which I think will
   outlive the child:
   https://github.com/cockroachdb/cockroach/blame/13669f9c9bd92a4c3b0378a558d7735f122c4e72/pkg/kv/kvserver/replica_raft.go#L157

   Regardless of the exact details of the lifetimes, with time it has become
   clear that it's appropriate to create a child when it's expected that
   it will usually not outlive the parent even if, on occasion, it can
   outlive it.

2) forked spans used to be included in the parent's recording until #59815.
   This explains why we regressed here - Raft application used to be
   included in recordings properly.

This patch makes the application span be a regular child, and so the
raft application span is included in the request trace.

Touches #70864. That issue asks for a new tracing feature but I think
what motivated it is the application info missing from query traces.
This patch is sufficient for that.

Release note (general change): Tracing transaction commits now includes
details about replication.

72896: server: don't mix Tracers in test tenant r=andreimatei a=andreimatei

Before this patch, TestServer.StartTenant() would share the Stopper
between the server and the tenant, and it would also proceed to give the
Tenant a new Tracer. The stopper in question has the server's Tracer in
it, and so this sharing is no bueno, because the tenant ends up having
two different Tracers, and it uses either of them fairly arbitrarily at
different points. Intermingling two Tracers like this was never
intended, and attempting to create a child with a different tracer then
the parent will become illegal shortly. This shows that sharing stoppers
between different "server" is a bad idea. This patch stops the practice.

The patch also removes the tracer config options for test tenants
because it was unused and supporting it in combination with the stopper
configuration would require more sanity checks.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants