Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/importccl: TestImportCSVStmt failed under stress #31719

Closed
cockroach-teamcity opened this issue Oct 23, 2018 · 6 comments · Fixed by #31842
Closed

ccl/importccl: TestImportCSVStmt failed under stress #31719

cockroach-teamcity opened this issue Oct 23, 2018 · 6 comments · Fixed by #31842
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/d3f39a5cb60eb9cea64635f78e25f67a435057a5

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=TestImportCSVStmt PKG=github.com/cockroachdb/cockroach/pkg/ccl/importccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=980299&tab=buildLog


I181023 05:34:49.586318 33503 storage/store_snapshot.go:636  [n1,raftsnapshot,s1,r30/1:/Table/75/2/"{J"/4326-N"/1808}] sending Raft snapshot b2c8df3b at applied index 19
I181023 05:34:49.588054 33503 storage/store_snapshot.go:679  [n1,raftsnapshot,s1,r30/1:/Table/75/2/"{J"/4326-N"/1808}] streamed snapshot to (n3,s3):2: kv pairs: 684, log entries: 9, rate-limit: 8.0 MiB/sec, 2ms
I181023 05:34:49.594509 33506 storage/replica_raftstorage.go:807  [n3,s3,r30/2:{-}] applying Raft snapshot at index 19 (id=b2c8df3b, encoded size=31478, 1 rocksdb batches, 9 log entries)
I181023 05:34:49.595766 33506 storage/replica_raftstorage.go:813  [n3,s3,r30/2:/Table/75/2/"{J"/4326-N"/1808}] applied Raft snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=0ms]
I181023 05:34:49.609580 9149 sql/event_log.go:126  [n1,client=127.0.0.1:60632,user=root] Event: "create_database", target: 81, info: {DatabaseName:csv19 Statement:CREATE DATABASE csv19 User:root}
I181023 05:34:49.616381 8021 storage/store.go:2580  [n1,s1,r29/1:/Table/75/2/"{G"/1931-J"/4326}] removing replica r30/1
I181023 05:34:49.617863 8518 storage/store.go:2580  [n2,s2,r29/3:/Table/75/2/"{G"/1931-J"/4326}] removing replica r30/3
I181023 05:34:49.619024 33294 storage/store_snapshot.go:636  [n1,raftsnapshot,s1,r29/1:/Table/75/2/"{G"/1931-N"/1808}] sending Raft snapshot ed24128a at applied index 19
I181023 05:34:49.620528 33294 storage/store_snapshot.go:679  [n1,raftsnapshot,s1,r29/1:/Table/75/2/"{G"/1931-N"/1808}] streamed snapshot to (n3,s3):2: kv pairs: 684, log entries: 9, rate-limit: 8.0 MiB/sec, 12ms
I181023 05:34:49.620976 33605 storage/replica_raftstorage.go:807  [n3,s3,r29/2:{-}] applying Raft snapshot at index 19 (id=ed24128a, encoded size=31954, 1 rocksdb batches, 9 log entries)
I181023 05:34:49.622244 33575 storage/store.go:2580  [n3,replicaGC,s3,r30/2:/Table/75/2/"{J"/4326-N"/1808}] removing replica r30/2
I181023 05:34:49.625133 33575 storage/replica.go:863  [n3,replicaGC,s3,r30/2:/Table/75/2/"{J"/4326-N"/1808}] removed 680 (671+9) keys in 0ms [clear=0ms commit=0ms]
I181023 05:34:49.622765 33605 storage/replica_raftstorage.go:813  [n3,s3,r29/2:/Table/75/2/"{G"/1931-J"/4326}] applied Raft snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=0ms]
F181023 05:34:49.657430 8751 storage/replica.go:5713  [n3,s3,r29/2:/Table/75/2/"{G"/1931-J"/4326}] RHS of merge r29:/Table/75/2/"{G"/1931-N"/1808} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=2] <- r30:/Table/75/2/"{J"/4326-N"/1808} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1] not present on store; found r30:{-} [<no replicas>, next=0, gen=0] in place of the RHS
goroutine 8751 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc4200c6e01, 0xc4200c6ea0, 0x3eb4500, 0x12)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:997 +0xcf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x44763a0, 0xc400000004, 0x3eb45d0, 0x12, 0x1651, 0xc4256d4c80, 0x13b)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:866 +0x8e1
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x303b040, 0xc425515aa0, 0x4, 0x2, 0x2b96eaf, 0x48, 0xc423d969f8, 0x3, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2e5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x303b040, 0xc425515aa0, 0x1, 0x4, 0x2b96eaf, 0x48, 0xc423d969f8, 0x3, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(0x303b040, 0xc425515aa0, 0x2b96eaf, 0x48, 0xc423d969f8, 0x3, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:172 +0x7e
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).acquireMergeLock(0xc422374f00, 0x303b040, 0xc425515aa0, 0xc424fe3e00, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:5713 +0x1e5
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).maybeAcquireSplitMergeLock(0xc422374f00, 0x303b040, 0xc425515aa0, 0x100000001, 0x1, 0x5, 0x0, 0x7, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:5648 +0x66
@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 23, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 23, 2018
@tbg tbg assigned benesch and unassigned dt Oct 23, 2018
@tbg
Copy link
Member

tbg commented Oct 23, 2018

@benesch this is actually the same error I saw in #31409, so you'll probably get a nice repro out of this

benesch added a commit to benesch/cockroach that referenced this issue Oct 23, 2018
There are unfortunately still known issues with range merges (cockroachdb#31719,
perhaps among others). It's time to use the escape hatch before v2.1
ships.

Release note: None
@benesch
Copy link
Contributor

benesch commented Oct 23, 2018

[05:34:50]	[Step 2/2] I181023 05:34:47.518560 30742 storage/replica_command.go:300  [n1,s1,r28/1:/Table/75/{2/"C"/4…-3/4063/…}] initiating a split of this range at key /Table/75/2/"G"/1931 [r29]
[05:34:50]	[Step 2/2] E181023 05:34:47.539021 31392 storage/queue.go:791  [n1,raftsnapshot,s1,r29/1:/Table/75/{2/"G"/1…-3/4063/…}] snapshot failed: (n3,s3):2: remote couldn't accept Raft snapshot a761a898 at applied index 11 with error: [n3,s3],r29: cannot apply snapshot: snapshot intersects existing range; initiated GC: [n3,s3,r28/2:/Table/75/{2/"C"/4…-3/4063/…}]
[05:34:50]	[Step 2/2] I181023 05:34:49.300001 33355 storage/replica_command.go:432  [n1,merge,s1,r29/1:/Table/75/2/"{G"/1931-J"/4326}] initiating a merge of r30:/Table/75/2/"{J"/4326-N"/1808} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1] into this range
[05:34:50]	[Step 2/2] I181023 05:34:49.619024 33294 storage/store_snapshot.go:636  [n1,raftsnapshot,s1,r29/1:/Table/75/2/"{G"/1931-N"/1808}] sending Raft snapshot ed24128a at applied index 19
[05:34:50]	[Step 2/2] I181023 05:34:49.620528 33294 storage/store_snapshot.go:679  [n1,raftsnapshot,s1,r29/1:/Table/75/2/"{G"/1931-N"/1808}] streamed snapshot to (n3,s3):2: kv pairs: 684, log entries: 9, rate-limit: 8.0 MiB/sec, 12ms
[05:34:50]	[Step 2/2] I181023 05:34:49.620976 33605 storage/replica_raftstorage.go:807  [n3,s3,r29/2:{-}] applying Raft snapshot at index 19 (id=ed24128a, encoded size=31954, 1 rocksdb batches, 9 log entries)
[05:34:50]	[Step 2/2] I181023 05:34:49.622244 33575 storage/store.go:2580  [n3,replicaGC,s3,r30/2:/Table/75/2/"{J"/4326-N"/1808}] removing replica r30/2
[05:34:50]	[Step 2/2] I181023 05:34:49.625133 33575 storage/replica.go:863  [n3,replicaGC,s3,r30/2:/Table/75/2/"{J"/4326-N"/1808}] removed 680 (671+9) keys in 0ms [clear=0ms commit=0ms]
[05:34:50]	[Step 2/2] I181023 05:34:49.622765 33605 storage/replica_raftstorage.go:813  [n3,s3,r29/2:/Table/75/2/"{G"/1931-J"/4326}] applied Raft snapshot in 1ms [clear=0ms batch=0ms entries=0ms commit=0ms]
[05:34:50]	[Step 2/2] F181023 05:34:49.657430 8751 storage/replica.go:5713  [n3,s3,r29/2:/Table/75/2/"{G"/1931-J"/4326}] RHS of merge r29:/Table/75/2/"{G"/1931-N"/1808} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=2] <- r30:/Table/75/2/"{J"/4326-N"/1808} [(n1,s1):1, (n3,s3):2, (n2,s2):3, next=4, gen=1] not present on store; found r30:{-} [<no replicas>, next=0, gen=0] in place of the RHS

Womp womp. The merge commits before one of the LHS replicas has been replicated to all of the stores. Then the RHS is GC'd on the store with the missing LHS because there is no initialized LHS replica on that store. Then that store gets a Raft snapshot pre-merge, applies the merge, and there's no RHS, so 💥.

@benesch
Copy link
Contributor

benesch commented Oct 23, 2018

Ok, there are two ways I see of fixing this.

The first is to require that the every replica of the LHS is initialized before committing a merge transaction. This would be somewhat akin to the checking of the RHS's applied index, in that I think it would be best implemented as a side-channel RPC to each replica in the range. This is kind of gross, but it has the nice property of being an easily-backportable change.

The second is to implement a proper range graveyard. I'm envisioning something like /System/<range-id> -> <subsuming-range-desc>. The replica GC queue would then work like this. If it discovers that meta2 has a range ID that differs from the local descriptor, it looks up the new range ID in the range graveyard to find out who subsumed it. Note that the range descriptor in the graveyard is the range descriptor at the time of subsumption. There are two possibilities:

  1. The local store was not a member of the subsuming range at the time of subsumption. This means that the range was merged away after the local store's replica of the LHS was removed. It's safe to remove the local replica in this case.
  2. The local store was a member of the subsuming range at the time of subsumption. We need to prove not just that we don't have a replica of the LHS, but that we will never, ever receive a replica of the LHS (e.g. because all Raft messages about that range were super delayed, for some reason). So we need to avoid GC'ing our local replica until we have a replica tombstone for the LHS.

I'm inclined towards the second approach, but the downside is backwards compatibility. It's not going to be easy to backport, and it's going to be wholly incompatible with the merge implementation that's currently on v2.1 and require a complicated cluster versioning story. Unless we decide to declare bankruptcy on merges for v2.1 and disable the setting entirely.

@benesch
Copy link
Contributor

benesch commented Oct 23, 2018

Actually I think i see a trivial solution to this problem that’s along the lines of the first approach but requires no new RPCs. I’m going to pursue that in the hopes that it can be backported.

@benesch
Copy link
Contributor

benesch commented Oct 24, 2018

Ok, I'm not going to have time to try the solution tonight. But here's a sketch of what I think could work.

When an AdminMerge request arrives on the LHS, before launching a merge transaction, we look at the Raft status of the replica. If it doesn't exist, then we're not the Raft leader, and we bail out. No big deal. We can retry the merge later. If the Raft status does exist, then we verify that every replica of the LHS has a progress.Match > 0, which (I think) indicates that the replica has been initialized. We need to abort the merge if any of the replicas have progress.Match == 0, as that would indicate that the replica of the LHS might be uninitialized and, if the merge were to go through, the the RHS could be GC'd before the LHS gets initialized.

@tbg
Copy link
Member

tbg commented Oct 24, 2018

Ok, I'm not going to have time to try the solution tonight. But here's a sketch of what I think could work.

Sounds good to me. Just pointing out that this very similar in spirit to calling WaitForApplication on the LHS with a nonzero LeaseAppliedIndex, though we're probably better off not making this call blocking (plus the lease applied index isn't really what we care about here).

benesch added a commit to benesch/cockroach that referenced this issue Oct 25, 2018
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix cockroachdb#31719.

Release note: None
benesch added a commit that referenced this issue Oct 25, 2018
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix #31719.

Release note: None
benesch added a commit to benesch/cockroach that referenced this issue Oct 26, 2018
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix cockroachdb#31719.

Release note: None
benesch added a commit to benesch/cockroach that referenced this issue Oct 28, 2018
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix cockroachdb#31719.

Release note: None
craig bot pushed a commit that referenced this issue Oct 28, 2018
31842: storage: fix disappearing RHS merge bug r=tschottdorf,bdarnell a=benesch

The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix #31719.

Release note: None

Co-authored-by: Nikhil Benesch <[email protected]>
@craig craig bot closed this as completed in #31842 Oct 28, 2018
benesch added a commit to benesch/cockroach that referenced this issue Nov 28, 2018
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix cockroachdb#31719.

Release note: None
benesch added a commit to benesch/cockroach that referenced this issue Nov 28, 2018
The strategy used by the replica GC queue to determine whether a
subsumed range can be GC'd is flawed. If a replica of the LHS was
uninitialized at the time the merge commmitted, there is a small window
where the replica GC queue can think that it's safe to clean up an RHS
replica when in fact the uninitialized LHS replica could still
initialize and apply a merge trigger that required that RHS to be
present.

Make the replica GC queue's strategy valid by requiring that all
replicas of the LHS are initialized before beginning a merge
transaction. This closes the window during which a replica of the RHS
could be incorrectly GC'd with a patch that is small enough to be
backported to v2.1.1.

Fix cockroachdb#31719.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants