-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: improve and de-flake version-upgrade #47268
Conversation
The code to get the cluster version was already written, so reuse it. Release note: None
The recent changes to this test seem to have uncovered a latent [bug], as a result of which the test was skipped. This commit lets the test start from v19.2 fixtures (instead of v19.1 before) which experimentally were verified to not exhibit the bug, so the test is unskipped. While I was there, I automated the base version discovery of this test. On a branch at version X, the test will now use the fixtures for the last major release preceding X. We'll occasionally have to bake new fixtures and update the mapping in PredecessorVersion() but this is better than having the test rot on older branches whenever we update it for a new dev cycle. The root cause of the bug will be fixed separately; this is tracked in issue cockroachdb#44732. [bug]: cockroachdb#47235 (comment) Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @spaskob and @tbg)
pkg/cmd/roachtest/versionupgrade.go, line 107 at r1 (raw file):
waitForUpgradeStep(), // NB: before the next step, cluster and binary version equals baseVersion,
nit: NB: at this point, cluster and binary version equal predecessorVersion,
pkg/cmd/roachtest/versionupgrade.go, line 117 at r1 (raw file):
preventAutoUpgradeStep(), // Roll nodes forward. binaryUpgradeStep("HEAD"),
nit: headVersion
also elsewhere in this file
What's the reason this is not followed by waitForUpgradeStep
?
pkg/cmd/roachtest/versionupgrade.go, line 122 at r1 (raw file):
// exercises this in more detail, so here we just rely on things working // as they ought to. binaryUpgradeStep("19.2.1"),
shouldn't this be predecessorVersion?
pkg/cmd/roachtest/versionupgrade.go, line 129 at r1 (raw file):
waitForUpgradeStep(), cp.createCheckpointsAndFail, // noop by default
please add a comment on why the checkpoint creation step needs to be last in the sequence
pkg/cmd/roachtest/versionupgrade.go, line 394 at r1 (raw file):
// v1.0 and maintain it as a fixture for this test. // // The checkpoints will be created in the log directories. The test will fail
I tried to use this instructions yesterday and could not figure out if we have to run the roachtest locally to be able to run the shell script below.
pkg/cmd/roachtest/versionupgrade.go, line 433 at r1 (raw file):
name := checkpointName(u.binaryVersion(ctx, t, 1).String()) c.Stop(ctx, nodes) c.Run(ctx, c.All(), "./cockroach-HEAD", "debug", "rocksdb", "--db={store-dir}",
I was confused initially why the name ./cockroach-HEAD
then I found out this is how uploadVersion
names the binary, it may worthwhile to have this as a constant to prevent some hard to find bugs in the future if headVerdsion
changes from HEAD
for some reason.
Now that version-upgrade automatically starts up on the predecessor version, it is more important to be able to create fixtures easily. Streamline and document that process so that "anyone can do it". Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR, PTAL
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)
pkg/cmd/roachtest/versionupgrade.go, line 117 at r1 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
nit: headVersion
also elsewhere in this fileWhat's the reason this is not followed by
waitForUpgradeStep
?
It's not upgrading - we cycled the binaries but the cluster version is not being bumped. waitForUpgradeStep
waits for the cluster version bump.
pkg/cmd/roachtest/versionupgrade.go, line 122 at r1 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
shouldn't this be predecessorVersion?
Absolutely, thanks for catching this.
pkg/cmd/roachtest/versionupgrade.go, line 129 at r1 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
please add a comment on why the checkpoint creation step needs to be last in the sequence
Done.
pkg/cmd/roachtest/versionupgrade.go, line 394 at r1 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
I tried to use this instructions yesterday and could not figure out if we have to run the roachtest locally to be able to run the shell script below.
You don't, the artifacts get downloaded into the same place either way. But I've never tried it remotely.
pkg/cmd/roachtest/versionupgrade.go, line 433 at r1 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
I was confused initially why the name
./cockroach-HEAD
then I found out this is howuploadVersion
names the binary, it may worthwhile to have this as a constant to prevent some hard to find bugs in the future ifheadVerdsion
changes fromHEAD
for some reason.
I realized we could just rip out headVersion, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)
TFTR! bors r=spaskob |
Build failed |
bors r=spaskob Flake in distsql_automatic_stats, so not this PR. |
Build succeeded |
47275: roachtest: generalize version-upgrade to per-node op r=spaskob a=tbg First three commits from #47268, which is borsing right now. ---- In anticipation of tests that want to use mixed version testing on clusters that contain workload-only nodes, set up the harness so that this is natively supported. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
roachtest: automate and de-flake version-upgrade
The recent changes to this test seem to have uncovered a latent bug,
as a result of which the test was skipped. This commit lets the test
start from v19.2 fixtures (instead of v19.1 before) which experimentally
were verified to not exhibit the bug, so the test is unskipped.
While I was there, I automated the base version discovery of this test.
On a branch at version X, the test will now use the fixtures for the
last major release preceding X. We'll occasionally have to bake new
fixtures and update the mapping in PredecessorVersion() but this is
better than having the test rot on older branches whenever we update
it for a new dev cycle.
The root cause of the bug will be fixed separately; this is tracked in
issue #44732.
Release note: None