Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: restore2TB/nodes=32 failed #84132

Closed
cockroach-teamcity opened this issue Jul 9, 2022 · 2 comments
Closed

roachtest: restore2TB/nodes=32 failed #84132

cockroach-teamcity opened this issue Jul 9, 2022 · 2 comments
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 9, 2022

roachtest.restore2TB/nodes=32 failed with artifacts on release-22.1 @ f9e7181a96fa72e48e3ac0df730843fed4a09ec4:

		Wraps: (2) output in run_085010.987640276_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: Get "https://storage.googleapis.com/cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank/8-1016.sst": stream error: stream ID 7; INTERNAL_ERROR; received from peer
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:510,test_runner.go:883: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:510
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:883
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

Jira issue: CRDB-17481

@cockroach-teamcity cockroach-teamcity added branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 9, 2022
@cockroach-teamcity cockroach-teamcity added this to the 22.1 milestone Jul 9, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on release-22.1 @ 64d9c24214b7d6912ecc1a8c8ed93073134c860d:

		Wraps: (2) output in run_085058.429263503_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: Get "https://storage.googleapis.com/cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank/2-690.sst": stream error: stream ID 7; INTERNAL_ERROR; received from peer
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:510,test_runner.go:883: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:510
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:883
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@msbutler
Copy link
Collaborator

determined this is a test infra flake. Closing this issue.

We see the same error message on roachtests restore2TB* across master, 22.1, and 21.2.
The error relates the gcs server failing to get an sst. In this failure for example, we failed to get this specific sst:
"https://storage.googleapis.com/cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank/8-1016.sst"
the following error message in the logs, was hit deep in golang’s networking package:

8555 I220709 08:53:38.156431 13689 jobs/registry.go:1134 ⋮ [n1] 8456 +  | type name: net/url/*url.Error
8556 I220709 08:53:38.156431 13689 jobs/registry.go:1134 ⋮ [n1] 8456 +Wraps: (6) ‹stream error: stream ID 7; INTERNAL_ERROR; received from peer›
8557 I220709 08:53:38.156431 13689 jobs/registry.go:1134 ⋮ [n1] 8456 +  |            
8558 I220709 08:53:38.156431 13689 jobs/registry.go:1134 ⋮ [n1] 8456 +  | (opaque error leaf)
8559 I220709 08:53:38.156431 13689 jobs/registry.go:1134 ⋮ [n1] 8456 +  | type name: golang.org/x/net/http2/http2.StreamError

I could not repro this, implying these roachtests are not failing consistently, but clearly they’re failing quite often, and across many releases. After scanning the commit histories of each branch, notably 21.2 which has the fewest commits, I do not see any commit (e.g. a gcs vendor upgrade) that could relate to these new networking issues, which makes me think the root cause is unrelated to restore or the cockroach binary.

It's also worth noting that we have hit this error before and that it was made retryable way back in 2018 - googleapis/google-cloud-go@d19004d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants