forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
server: rework cluster version initialization
A central invariant around cluster versions is that when an update arrives, we need to persist it before exposing it through the version setting. This was not true during server start time, as described in: cockroachdb#47235 (comment) In short, we were registering the callback to persist to the engines *after* Gossip had already connected, opening up a window during which a cluster version bump simply would not be persisted. In the acceptance/version-upgrade test, this would manifest as nodes refusing to start because their binary version had proceeded to far beyond the persisted version. At the time of writing and before this commit, this would happen in perhaps 1-10% of local runs on a linux machine (rarely on OSX). The buggy code was originally written when the startup sequence was a lot more intricate and we were, as I recall, under pressure to deliver. Now, with recent refactors around the `initServer`, we're in a really good place to solve this while also simplifying the whole story. In this commit, - persist the cluster version on *all* engines, not just the initialized ones; this completely decouples the timing of when the engines get initialized from when we can set up the persistence callback and makes everything *much simpler*. - stop opportunistically backfilling the cluster version. It is now done once at the beginning, in the right place, without FUD later on. - remove the cluster version persistence from engine initialization. Anyone initializing an engine must put a cluster version on it first. In a running server, this happens during initServer creation time, way before there are any moving parts in the system. In tests, extra writes were added as needed. - set up the callback with Gossip before starting Gossip, and make sure (via an assertion) that this property does not rot. By setting up the callback before Gossip starts, we make sure there isn't a period during which Gossip receives an update but doesn't have the callback yet. - as a result of all of the above, take all knowledge of cluster version init away from `*Node` and `*Stores`. As a last note, we are planning to stop using Gossip for this version business in 20.1, though this too will be facilitated by this change. Release note (bug fix): Avoid a condition during rapid version upgrades where a node would refuse to start, claiming "[a store is] too old for running version". Before this bug fix, the workaround is to decommission the node, delete the store directory, and re-add it to the cluster as a new node.
- Loading branch information
Showing
16 changed files
with
318 additions
and
296 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.