Add schema version number to MVCC encoded key #1772

spencerkimball · 2015-07-22T23:31:34Z

Problem:

After a schema change, extant queries for older schemas may run into unexpected data (e.g. a schema change to move column 'foo' from type 'string' to 'int').
After a schema change, extant updates for older schemas may overwrite data in bad ways (e.g. schema change to column type, dropping a column, etc.).

Proposal:

Currently we append the timestamp value to encoded keys. We could instead append <schema version><timestamp>. All operation would be augmented to include schema version, set at the gateway which processed the SQL. Reads would be serviced for any version with less-than or equal schema version and of course less than or equal timestamp. Reads with old schema versions would simply ignore newer schema versions. Writes would fail if encountering a key containing a newer schema version than the proposed schema version.

This solution allows old schema queries to proceed unhindered by potentially destructed schema changes and protects us from overwriting tuples inserted with newer schemas with data for older schemas. Additionally, this schema version on the key provides an elegant method to track and revert all changes made using each successive schema revision in turn.

The text was updated successfully, but these errors were encountered:

tbg · 2015-07-23T14:00:32Z

Aren't online schema changes going to take care of that? If we have a registry of which node is using which schema (some table somewhere) we should be able to cut everything into a series of schema changes which can be done phase by phase, with all nodes required to move into the next phase before going further (guaranteeing that nothing incompatible with that current schema is executed any more).

We could even seamlessly allow clients to use both the old and the new schema in some situations: for instance, when converting a column type, we could re-write the AST for queries and do the conversion for requests on the old schema, while really using the new one. That would allow clients to roll over asynchronously.

tamird · 2015-07-23T14:05:23Z

which node is using which schema

What do you mean by node? A gateway? A node containing the data range? They are almost guaranteed to not be the same nodes.

tbg · 2015-07-23T14:25:48Z

You're right, the tricky bit is coordinating those two sides. During each phase, both schemas are valid, but switching to the newer version may require a synchronous rewrite of the whole data, which is nothing the gateway can be in charge of.
If that rewrite is (hopefully) idempotent, it should be ok for the gateways and nodes to simply enforce the latest schema. So if, for example, an index is added (in which case the first step will probably be something like making sure the index is there and up to date, but not actually use it for anything), and the gateway knows about that already but the range it writes to doesn't, it'll just update the index anyways on a write. Once the range does the migration, it'll have to go through all of its data and update the index, but no biggie.
I'm sure there are a lot more complex cases, if you have something in mind let's discuss those specifically.

tamird · 2015-07-23T15:31:33Z

@tschottdorf the idea of using such a table is pretty onerous. Remember that there is not just one schema in the system. If we're multitenant we will need to maintain a registry of size len(schemas) * len(nodes) in the kv map which will have the same problem as the schemas in terms of distribution and wanting to avoid consistent reads.

I think this discussion is a bit hand wavey altogether - let's revisit this when we're ready to discuss how to implement schema changes (ahead of this, we should discuss what types of schema changes we're going to allow). We should be prepared for the possibility of a storage-format-breaking-change when we do that, though.

spencerkimball · 2015-07-23T15:46:07Z

I agree we can "table" this discussion for now. But if the Raft race issues have taught us anything, it's that it's helpful to version the data.

tbg · 2015-07-23T15:49:30Z

@tamird You'll persist that data anyways, so it's really just a question of where to put it. Schema is per-tenant, and a list of nodes per tenant seems fine. The table actually wouldn't have to be accessed nor updated much in normal operation. A schema change could be signaled via Gossip, and each node would update their entry with a single CPut. If you insist that all nodes which hold data for a certain tenant register themselves in that table before doing anything, you can atomically switch to the next phase (you need to deal with nodes going down during a schema change, but that'll always be an issue and a Gossip-based timeout may actually do the trick).
In any case, I also think we'll need more time to get anything serious out of this.

spencerkimball · 2015-07-23T16:02:37Z

@tschottdorf there's currently no way to list nodes, and no plans to change that. Even if we did list them, that would be manual and you can imagine that a failing node would have to be manually (and somehow synchronously to all tables) delisted. This would introduce very non-trivial latency to any kind of schema changes in node failure scenarios.

petermattis · 2016-02-12T16:55:56Z

This issue was filed before we had our SQL story in place. Our SQL implementation has support for asynchronous schema changes. There is no work currently planned here, though see #1780 for discussion have handling low-level data format changes.

bdarnell mentioned this issue Jul 23, 2015

storage: support rolling upgrades of storage format changes #1780

Closed

petermattis closed this as completed Feb 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schema version number to MVCC encoded key #1772

Add schema version number to MVCC encoded key #1772

spencerkimball commented Jul 22, 2015

tbg commented Jul 23, 2015

tamird commented Jul 23, 2015

tbg commented Jul 23, 2015

tamird commented Jul 23, 2015

spencerkimball commented Jul 23, 2015

tbg commented Jul 23, 2015

spencerkimball commented Jul 23, 2015

petermattis commented Feb 12, 2016

Add schema version number to MVCC encoded key #1772

Add schema version number to MVCC encoded key #1772

Comments

spencerkimball commented Jul 22, 2015

tbg commented Jul 23, 2015

tamird commented Jul 23, 2015

tbg commented Jul 23, 2015

tamird commented Jul 23, 2015

spencerkimball commented Jul 23, 2015

tbg commented Jul 23, 2015

spencerkimball commented Jul 23, 2015

petermattis commented Feb 12, 2016