Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: improve and de-flake version-upgrade #47268

Merged
merged 3 commits into from
Apr 9, 2020

Conversation

tbg
Copy link
Member

@tbg tbg commented Apr 9, 2020

roachtest: automate and de-flake version-upgrade

The recent changes to this test seem to have uncovered a latent bug,
as a result of which the test was skipped. This commit lets the test
start from v19.2 fixtures (instead of v19.1 before) which experimentally
were verified to not exhibit the bug, so the test is unskipped.

While I was there, I automated the base version discovery of this test.
On a branch at version X, the test will now use the fixtures for the
last major release preceding X. We'll occasionally have to bake new
fixtures and update the mapping in PredecessorVersion() but this is
better than having the test rot on older branches whenever we update
it for a new dev cycle.

The root cause of the bug will be fixed separately; this is tracked in
issue #44732.

Release note: None

tbg added 2 commits April 9, 2020 13:28
The code to get the cluster version was already written, so reuse it.

Release note: None
The recent changes to this test seem to have uncovered a latent [bug],
as a result of which the test was skipped. This commit lets the test
start from v19.2 fixtures (instead of v19.1 before) which experimentally
were verified to not exhibit the bug, so the test is unskipped.

While I was there, I automated the base version discovery of this test.
On a branch at version X, the test will now use the fixtures for the
last major release preceding X. We'll occasionally have to bake new
fixtures and update the mapping in PredecessorVersion() but this is
better than having the test rot on older branches whenever we update
it for a new dev cycle.

The root cause of the bug will be fixed separately; this is tracked in
issue cockroachdb#44732.

[bug]: cockroachdb#47235 (comment)

Release note: None
@tbg tbg requested a review from spaskob April 9, 2020 12:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob and @tbg)


pkg/cmd/roachtest/versionupgrade.go, line 107 at r1 (raw file):

		waitForUpgradeStep(),

		// NB: before the next step, cluster and binary version equals baseVersion,

nit: NB: at this point, cluster and binary version equal predecessorVersion,


pkg/cmd/roachtest/versionupgrade.go, line 117 at r1 (raw file):

		preventAutoUpgradeStep(),
		// Roll nodes forward.
		binaryUpgradeStep("HEAD"),

nit: headVersion
also elsewhere in this file

What's the reason this is not followed by waitForUpgradeStep?


pkg/cmd/roachtest/versionupgrade.go, line 122 at r1 (raw file):

		// exercises this in more detail, so here we just rely on things working
		// as they ought to.
		binaryUpgradeStep("19.2.1"),

shouldn't this be predecessorVersion?


pkg/cmd/roachtest/versionupgrade.go, line 129 at r1 (raw file):

		waitForUpgradeStep(),

		cp.createCheckpointsAndFail, // noop by default

please add a comment on why the checkpoint creation step needs to be last in the sequence


pkg/cmd/roachtest/versionupgrade.go, line 394 at r1 (raw file):

// v1.0 and maintain it as a fixture for this test.
//
// The checkpoints will be created in the log directories. The test will fail

I tried to use this instructions yesterday and could not figure out if we have to run the roachtest locally to be able to run the shell script below.


pkg/cmd/roachtest/versionupgrade.go, line 433 at r1 (raw file):

	name := checkpointName(u.binaryVersion(ctx, t, 1).String())
	c.Stop(ctx, nodes)
	c.Run(ctx, c.All(), "./cockroach-HEAD", "debug", "rocksdb", "--db={store-dir}",

I was confused initially why the name ./cockroach-HEAD then I found out this is how uploadVersion names the binary, it may worthwhile to have this as a constant to prevent some hard to find bugs in the future if headVerdsion changes from HEAD for some reason.

Now that version-upgrade automatically starts up on the predecessor
version, it is more important to be able to create fixtures easily.
Streamline and document that process so that "anyone can do it".

Release note: None
@tbg tbg force-pushed the version-upgrade-fix branch from da7f7ee to 6e0b72f Compare April 9, 2020 13:28
@tbg tbg requested a review from spaskob April 9, 2020 13:28
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR, PTAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)


pkg/cmd/roachtest/versionupgrade.go, line 117 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

nit: headVersion
also elsewhere in this file

What's the reason this is not followed by waitForUpgradeStep?

It's not upgrading - we cycled the binaries but the cluster version is not being bumped. waitForUpgradeStep waits for the cluster version bump.


pkg/cmd/roachtest/versionupgrade.go, line 122 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

shouldn't this be predecessorVersion?

Absolutely, thanks for catching this.


pkg/cmd/roachtest/versionupgrade.go, line 129 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

please add a comment on why the checkpoint creation step needs to be last in the sequence

Done.


pkg/cmd/roachtest/versionupgrade.go, line 394 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

I tried to use this instructions yesterday and could not figure out if we have to run the roachtest locally to be able to run the shell script below.

You don't, the artifacts get downloaded into the same place either way. But I've never tried it remotely.


pkg/cmd/roachtest/versionupgrade.go, line 433 at r1 (raw file):

Previously, spaskob (Spas Bojanov) wrote…

I was confused initially why the name ./cockroach-HEAD then I found out this is how uploadVersion names the binary, it may worthwhile to have this as a constant to prevent some hard to find bugs in the future if headVerdsion changes from HEAD for some reason.

I realized we could just rip out headVersion, PTAL

Copy link
Contributor

@spaskob spaskob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spaskob)

@tbg
Copy link
Member Author

tbg commented Apr 9, 2020

TFTR!

bors r=spaskob

@craig
Copy link
Contributor

craig bot commented Apr 9, 2020

Build failed

@tbg
Copy link
Member Author

tbg commented Apr 9, 2020

bors r=spaskob

Flake in distsql_automatic_stats, so not this PR.

@craig
Copy link
Contributor

craig bot commented Apr 9, 2020

Build succeeded

@craig craig bot merged commit 3967538 into cockroachdb:master Apr 9, 2020
craig bot pushed a commit that referenced this pull request Apr 10, 2020
47275: roachtest: generalize version-upgrade to per-node op r=spaskob a=tbg

First three commits from #47268, which is borsing right now.

----

In anticipation of tests that want to use mixed version testing on
clusters that contain workload-only nodes, set up the harness so that
this is natively supported.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants