storage: support rolling upgrades of storage format changes #1780

tamird · 2015-07-23T15:56:33Z

In order to support (hypothetical) storage format changes, we need to be able to drain a node, let it rewrite its database, and undrain it. We need an administrative tool to put a node in the draining state, the rebalancer must respect this state (and probably move data off a draining node faster than it otherwise would), and the admin tool needs to be able to watch for the drain to be completed.

This feature complements the version number to be added in #2718.

Original message follows:

After we implement the current StorageKey proposal meant to address the Raft races, we should figure out a general system that will allow us to change the StorageKey encoding without breaking existing deployments.

The benefits here can't be overstated; together with #629, this will allow us to provide a 1.0-esque stability guarantee without appreciably handcuffing us from making rather invasive modifications.

Tagging for beta.

tbg · 2015-07-23T16:01:41Z

from what I understood, we're not clear on that we're going to implement the StorageKey solution. @bdarnell is dusting off some of his WIP so that we can examine the issues again and weigh our options once more.

spencerkimball · 2015-07-23T16:13:30Z

@tschottdorf a little bit of recent discussion has put the StorageKey proposal back in vogue. The gist of the new thinking is that we're all having a really hard time convincing ourselves that the variously proposed measures actually handle the races we know about, much less the ones we don't know about.

Performance is the concern with implementing the proposal. On the other hand, we need to address correctness issues most urgently.

bdarnell · 2015-07-23T18:17:15Z

Even if we don't implement StorageKey, we should think before beta about how we're going to handle backwards-incompatible changes to our lowest-level storage systems, since they'll surely happen one way or another (see #1772).

petermattis · 2015-07-23T19:19:51Z

In gmail, backwards incompatible changes were performed via the replication mechanism and rolling migrations of users. I think something similar would work here and would be both general and lightweight in the code, though with a bit more work for administrators. The gist is that a node only knows how to read and write a single format. We store that format inside the rocksdb engine and barf if the on-disk format differs from what the software supports. To change the format we do rolling upgrade of the servers in the cluster: draining the data off a server, deleting the data so the server will start up empty, upgrading the software (or flipping a flag) and then bringing the server back up. We can add various complexities to this, but I much prefer this scheme to one that tries to simultaneously support multiple versions of our lowest-level formats.

bdarnell · 2015-07-24T02:25:21Z

@petermattis The situation is more complicated than it was with gmail because we will have a mix of versions within a raft replica group (this is necessarily true for any zero-downtime upgrade). We can swap out different Engines (and certain engine-level details - I think Storage Keys fall into this category) more or less transparently, but our hands are tied for anything that might be visible at or above the raft level (for example, I don't think the schema version numbers proposed in #1772 could be added in this way).

A rolling update system also implies certain work that must be done in the version before the incompatible change: we must be able to administratively drain a node and ensure that new ranges will not be rebalanced onto it; the rebalancer should be able to avoid adding new load to old-version nodes as much as possible, etc.

Bigtable maintained support for its old on-disk formats and lazily converted everything to the new format as a part of the regular compaction cycle. This was also how schema changes related to locality groups were made.

petermattis · 2015-07-24T13:00:13Z

Yes, we must be able to administratively drain a node. That functionality is necessary for administration independent of handling backwards-incompatible on-disk format changes.

bdarnell · 2015-10-14T12:06:27Z

The StorageKey proposal was rejected so this issue is moot.

tamird · 2015-10-14T12:08:23Z

Does it matter that the storagekey proposal was rejected? This issue is about allowing for changing our in-rocksdb data representation - it just happened to piggyback on the storagekey name.

bdarnell · 2015-10-14T12:15:55Z

If we remove the StorageKey part then there's some overlap with #2718, although I guess there are parts here that are reasonably independent (e.g. the drain functionality). I'll reopen and edit the title and first message.

spencerkimball · 2017-03-28T18:48:19Z

I'm closing this issue due to advanced age and the advent of prop eval kv (although that's clearly not a complete fix for the issues raised here). Let's not have open issues for things which aren't currently a problem.

tamird added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 23, 2015

tamird added this to the v0.1 (Beta) milestone Jul 23, 2015

bdarnell closed this as completed Oct 14, 2015

bdarnell reopened this Oct 14, 2015

bdarnell changed the title ~~Figure out how to permit backwards-compatible StorageKey changes~~ Support rolling upgrades of storage format changes Oct 14, 2015

petermattis mentioned this issue Feb 12, 2016

Add schema version number to MVCC encoded key #1772

Closed

petermattis changed the title ~~Support rolling upgrades of storage format changes~~ storage: support rolling upgrades of storage format changes Feb 12, 2016

petermattis modified the milestones: 1.0, Beta Mar 1, 2016

spencerkimball closed this as completed Mar 28, 2017

jbowens mentioned this issue Jun 22, 2022

storage: optimize read path for the no-range-keys case #82559

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: support rolling upgrades of storage format changes #1780

storage: support rolling upgrades of storage format changes #1780

tamird commented Jul 23, 2015

tbg commented Jul 23, 2015

spencerkimball commented Jul 23, 2015

bdarnell commented Jul 23, 2015

petermattis commented Jul 23, 2015

bdarnell commented Jul 24, 2015

petermattis commented Jul 24, 2015

bdarnell commented Oct 14, 2015

tamird commented Oct 14, 2015

bdarnell commented Oct 14, 2015

spencerkimball commented Mar 28, 2017

storage: support rolling upgrades of storage format changes #1780

storage: support rolling upgrades of storage format changes #1780

Comments

tamird commented Jul 23, 2015

tbg commented Jul 23, 2015

spencerkimball commented Jul 23, 2015

bdarnell commented Jul 23, 2015

petermattis commented Jul 23, 2015

bdarnell commented Jul 24, 2015

petermattis commented Jul 24, 2015

bdarnell commented Oct 14, 2015

tamird commented Oct 14, 2015

bdarnell commented Oct 14, 2015

spencerkimball commented Mar 28, 2017