Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/tpch/nodes=32 failed #31184

Closed
cockroach-teamcity opened this issue Oct 10, 2018 · 18 comments
Closed

roachtest: import/tpch/nodes=32 failed #31184

cockroach-teamcity opened this issue Oct 10, 2018 · 18 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/ac2f39fcc6be7366bc786d231890ee91e84f1c3c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=955173&tab=buildLog

The test failed on master:
	test.go:570,cluster.go:1327,import.go:114: unexpected node event: 10: dead

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 10, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 10, 2018
@tbg
Copy link
Member

tbg commented Oct 10, 2018

I181010 10:19:04.223150 79371 storage/replica_raftstorage.go:803  [n22,s17,r5809/3:{-}] applying Raft snapshot at index 15 (id=a06ce429, encoded size=53266610, 204 rocksdb batches, 5 log entries)
W181010 10:19:05.381440 79371 storage/engine/rocksdb.go:1911  batch [1515946/53264451/5] commit took 569.771763ms (>500ms):
goroutine 79371 [running]:
runtime/debug.Stack(0x305ec060, 0xed34fc918, 0x0)
	/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/cockroachdb/cockroach/pkg/storage/engine.(*rocksDBBatch).commitInternal(0xc4251bfd40, 0x0, 0x1, 0xc47409f3a0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:1912 +0x1b3
github.com/cockroachdb/cockroach/pkg/storage/engine.(*rocksDBBatch).Commit(0xc4251bfd40, 0xc42038c401, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/engine/rocksdb.go:1828 +0x7ed
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).applySnapshot(0xc42781bb00, 0x30aaf40, 0xc4274967b0, 0x294a459d29e46ca0, 0x9001fde31f28d7bb, 0xc42aed0000, 0xcc, 0x100, 0xc429687c80, 0x5, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_raftstorage.go:925 +0xc98
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc42781bb00, 0x294a459d29e46ca0, 0x9001fde31f28d7bb, 0xc42aed0000, 0xcc, 0x100, 0xc429687c80, 0x5, 0x5, 0xc425273800, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:4250 +0x2040
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftSnapshotRequest.func1(0x30aaf40, 0xc420840b70, 0xc42781bb00, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3745 +0x7fe
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc420dbe580, 0x30aaf40, 0xc420840b70, 0xc425273048, 0xc47679d838, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3521 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRaftSnapshotRequest(0xc420dbe580, 0x30aaf40, 0xc42670d380, 0xc425273000, 0x294a459d29e46ca0, 0x9001fde31f28d7bb, 0xc42aed0000, 0xcc, 0x100, 0xc429687c80, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3570 +0xd9
github.com/cockroachdb/cockroach/pkg/storage.(*Store).receiveSnapshot(0xc420dbe580, 0x30aaf40, 0xc42670d380, 0xc425273000, 0x7f1f16923500, 0xc4202c5140, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store_snapshot.go:514 +0x330
github.com/cockroachdb/cockroach/pkg/storage.(*Store).HandleSnapshot(0xc420dbe580, 0xc425273000, 0x7f1f169234d0, 0xc4202c5140, 0xc4202c5140, 0xc42885c300)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3399 +0x206
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1.1(0x30c61a0, 0xc4202c5140, 0xc4206f02c0, 0x30aaf40, 0xc42670d320, 0x6db3a9, 0xc42955e470)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:386 +0x146
github.com/cockroachdb/cockroach/pkg/storage.(*RaftTransport).RaftSnapshot.func1(0x30aaf40, 0xc42670d320)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/raft_transport.go:387 +0x5d
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc420a00000, 0x30aaf40, 0xc42670d320, 0xc425daca40, 0x32, 0x0, 0x0, 0xc42670d350)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:324 +0xe6
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:319 +0x133
I181010 10:19:05.477890 79371 storage/replica_raftstorage.go:809  [n22,s17,r5809/3:/Table/53/9/6{02951/…-12333/…}] applied Raft snapshot in 1255ms [clear=0ms batch=588ms entries=0ms commit=666ms]
I181010 10:19:05.794110 79780 storage/replica_command.go:298  [n22,s17,r5682/3:/Table/53/9/9{32641/…-72311/…}] initiating a split of this range at key /Table/53/9/945765/19195707/5347938/3 [r5886]
E181010 10:19:05.924083 181 util/log/crash_reporting.go:477  [n22,s17,r5809/3:/Table/53/9/6{02951/…-12333/…}] Reported as error ca1717597b2a4baca71b38e8f101dffb
F181010 10:19:05.928144 181 storage/store.go:2373  [n22,s17,r5809/3:/Table/53/9/6{02951/…-12333/…}] raft group deleted
goroutine 181 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc4204ae600, 0xc4204ae6c0, 0x3fb7f00, 0x10)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:997 +0xcf
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x48a5620, 0xc400000004, 0x3fb7f9f, 0x10, 0x945, 0xc4273b5300, 0x47)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:864 +0x8fe
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x30aaf40, 0xc4274967b0, 0x4, 0x2, 0x0, 0x0, 0xc426df7ac0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2e5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x30aaf40, 0xc4274967b0, 0x1, 0xc400000004, 0x0, 0x0, 0xc426df7ac0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(0x30aaf40, 0xc4274967b0, 0xc426df7ac0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:181 +0x6c
github.com/cockroachdb/cockroach/pkg/storage.splitPostApply(0x30aaf40, 0xc4274967b0, 0x0, 0x155c37fb94ba2ecf, 0x0, 0x0, 0x32cb4bf, 0x172197, 0x2b90ccc, 0x172197, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:2373 +0x1e6
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleReplicatedEvalResult(0xc42781bb00, 0x30aaf40, 0xc4274967b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:649 +0x117f
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleEvalResultRaftMuLocked(0xc42781bb00, 0x30aaf40, 0xc4274967b0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica_proposal.go:822 +0xaa
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).processRaftCommand(0xc42781bb00, 0x30aaf40, 0xc4274967b0, 0xc42604ed50, 0x8, 0x6, 0x10, 0x190000000c, 0x1, 0x8, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:5635 +0x979
github.com/cockroachdb/cockroach/pkg/storage.(*Replica).handleRaftReadyRaftMuLocked(0xc42781bb00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/replica.go:4460 +0x1356
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue.func1(0x30aaf40, 0xc4280f8450, 0xc42781bb00, 0x30aaf40)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3870 +0x109
github.com/cockroachdb/cockroach/pkg/storage.(*Store).withReplicaForRequest(0xc420dbe580, 0x30aaf40, 0xc4280f8450, 0xc425d1c680, 0xc439bc5ed0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3521 +0x135
github.com/cockroachdb/cockroach/pkg/storage.(*Store).processRequestQueue(0xc420dbe580, 0x30aaf40, 0xc4207bacf0, 0x16b1)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/store.go:3858 +0x229
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).worker(0xc42054b800, 0x30aaf40, 0xc4207bacf0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:225 +0x21b
github.com/cockroachdb/cockroach/pkg/storage.(*raftScheduler).Start.func2(0x30aaf40, 0xc4207bacf0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/scheduler.go:165 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc42028ea80, 0xc420a00000, 0xc42028e950)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:199 +0xe9
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:192 +0xad

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2215217e8ee38d28a14eb9fd2fe9af8b0b702e7d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=958181&tab=buildLog

The test failed on master:
	test.go:950: test timed out (11h48m24.425139473s)
	test.go:575,cluster.go:1330,import.go:114: context canceled

@tbg tbg added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Oct 12, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3e1c440300e4b41858e70ec9e44663bf35ec2134

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=963115&tab=buildLog

The test failed on master:
	test.go:950: test timed out (11h48m30.590650316s)
	test.go:575,cluster.go:1433,import.go:114: context canceled

@tbg
Copy link
Member

tbg commented Oct 15, 2018

^- the health checker keeps logging

health: n17/s16 1.00 metrics requests.slow.raft

which means that Raft proposals got stuck.

We're going to have to catch this in the act.

@tbg
Copy link
Member

tbg commented Oct 15, 2018

This failure is #21146.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/85ccbd67d3cb5b7d18ceade231fd01d63579bacd

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=965471&tab=buildLog

The test failed on master:
	test.go:575,cluster.go:1433,import.go:114: dial tcp 35.232.155.39:26257: connect: connection timed out

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a0b7cd4ebddf5ebc8f8c2119b119e57688f072f9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=968704&tab=buildLog

The test failed on master:
	test.go:584,test.go:596: /home/agent/work/.go/bin/roachprod create teamcity-968704-import-tpch-nodes-32 -n 32 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		2018/10/16 05:27:53 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		2018/10/16 05:27:53 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		Error:  failed to run: aws ec2 describe-instances --region us-east-2 --output json: exit status 255
		: exit status 1

craig bot pushed a commit that referenced this issue Oct 16, 2018
31343: roachtest: apply timeouts to import/restore tests r=petermattis a=tschottdorf

Make stuck tests less expensive, in particular since 32 node clusters are involved.

Touches #31184.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3e69f3acba8f66b4b8019f52890aaa3f63a848ee

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=969842&tab=buildLog

The test failed on release-2.1:
	test.go:584,test.go:596: /home/agent/work/.go/bin/roachprod create teamcity-969842-import-tpch-nodes-32 -n 32 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		2018/10/16 15:21:40 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		2018/10/16 15:21:40 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		Error:  failed to run: aws ec2 describe-instances --region us-east-2 --output json: exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e6348bb4abbfd117424c382ce5ab42e8abbe88f0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=970034&tab=buildLog

The test failed on release-2.1:
	test.go:584,test.go:596: /home/agent/work/.go/bin/roachprod create teamcity-970034-import-tpch-nodes-32 -n 32 --gce-machine-type=n1-standard-4 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		2018/10/16 15:43:34 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		2018/10/16 15:43:34 Unable to locate credentials. You can configure credentials by running "aws configure".
		
		Error:  failed to run: aws ec2 describe-instances --region us-west-2 --output json: exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2cbfb514fed209e9e4192bd07af6baa8dd073bab

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=970864&tab=buildLog

The test failed on master:
	test.go:606,cluster.go:1441,import.go:122: pq: split at key /Table/53/1/499397124/3 failed: aborted during DistSender.Send: context deadline exceeded

@tbg tbg assigned andreimatei and unassigned tbg Oct 17, 2018
@tbg
Copy link
Member

tbg commented Oct 17, 2018

@andreimatei could you take a look? I think there's a good chance this particular failure has to do with problems similar to #31409 but it'd be good to see if there's a new thread to pull on within.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/310a04983cda8ab8d67cd401814341b9b7f8ce79

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=972821&tab=buildLog

The test failed on master:
	test.go:1002: test timed out (3h0m0s)
	test.go:606,cluster.go:1441,import.go:122: context canceled

@andreimatei
Copy link
Contributor

passing to @benesch in the hope that his gc fixes solve this

@andreimatei andreimatei assigned benesch and unassigned andreimatei Oct 18, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e6348bb4abbfd117424c382ce5ab42e8abbe88f0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=974325&tab=buildLog

The test failed on release-2.1:
	test.go:1002: test timed out (3h0m0s)
	test.go:606,cluster.go:1453,import.go:122: context canceled

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/04cba2800919bdcf6a8467e8da97ae44b77a9626

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=974812&tab=buildLog

The test failed on master:
	test.go:1002: test timed out (3h0m0s)
	test.go:606,cluster.go:1453,import.go:122: context canceled

@tbg
Copy link
Member

tbg commented Oct 19, 2018

W181019 10:01:02.709742 58136 storage/replica.go:3345 [n8,s8,r1447/5:/Table/53/1/435{38458…-65085…}] have been waiting 1m0s for proposing command AddSSTable [/Table/53/1/435384580/5/0,/Table/53/1/435650851/3)

Probably same as #31618.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3035b84a682e61fb1cd34db4027dd41f7f2f226a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=977057&tab=buildLog

The test failed on master:
	test.go:1037: test timed out (3h0m0s)
	test.go:639,cluster.go:1453,import.go:122: context canceled

@benesch benesch assigned tbg and unassigned benesch Oct 21, 2018
@tbg tbg removed the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Oct 22, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2998190f18fab952357133aaca9fdda8bc52d5ac

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=import/tpch/nodes=32 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=978508&tab=buildLog

The test failed on master:
	test.go:1037: test timed out (3h0m0s)
	test.go:639,cluster.go:1453,import.go:122: context canceled

tbg added a commit to tbg/cockroach that referenced this issue Oct 22, 2018
The tracking of the uncommitted portion of the log had a bug where
it wasn't releasing everything as it should've. As a result, over
time, all proposals would be dropped. We're hitting this way earlier
in our import tests, which propose large proposals. As an intentional
implementation detail, a proposal that itself exceeds the max
uncommitted log size is allowed only if the uncommitted log is empty.
Due to the leak, we weren't ever hitting this case and so AddSSTable
commands were often dropped indefinitely.

Fixes cockroachdb#31184.
Fixes cockroachdb#28693.
Fixes cockroachdb#31642.

Optimistically:
Fixes cockroachdb#31675.
Fixes cockroachdb#31654.
Fixes cockroachdb#31446.

Release note: None
craig bot pushed a commit that referenced this issue Oct 22, 2018
31554: exec: initial commit of execgen tool r=solongordon a=solongordon

Execgen will be our tool for generating templated code necessary for
columnarized execution. So far it only generates the
EncDatumRowsToColVec function, which is used by the columnarizer to
convert a RowSource into a columnarized Operator.

Release note: None

31610: sql: fix pg_catalog.pg_constraint's confkey column r=BramGruneir a=BramGruneir

Prior to this patch, all columns in the index were included instead of only the
ones being used in the foreign key reference.

Fixes #31545.

Release note (bug fix): Fix pg_catalog.pg_constraint's confkey column from
including columns that were not involved in the foreign key reference.

31689: storage: pick up fix for Raft uncommitted entry size tracking r=benesch a=tschottdorf

Waiting for the upstream PR

etcd-io/etcd#10199

to merge, but this is going to be what the result will look like.

----

The tracking of the uncommitted portion of the log had a bug where
it wasn't releasing everything as it should've. As a result, over
time, all proposals would be dropped. We're hitting this way earlier
in our import tests, which propose large proposals. As an intentional
implementation detail, a proposal that itself exceeds the max
uncommitted log size is allowed only if the uncommitted log is empty.
Due to the leak, we weren't ever hitting this case and so AddSSTable
commands were often dropped indefinitely.

Fixes #31184.
Fixes #28693.
Fixes #31642.

Optimistically:
Fixes #31675.
Fixes #31654.
Fixes #31446.

Release note: None

Co-authored-by: Solon Gordon <[email protected]>
Co-authored-by: Bram Gruneir <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
@craig craig bot closed this as completed in #31689 Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants