-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema version number to MVCC encoded key #1772
Comments
Aren't online schema changes going to take care of that? If we have a registry of which node is using which schema (some table somewhere) we should be able to cut everything into a series of schema changes which can be done phase by phase, with all nodes required to move into the next phase before going further (guaranteeing that nothing incompatible with that current schema is executed any more). We could even seamlessly allow clients to use both the old and the new schema in some situations: for instance, when converting a column type, we could re-write the AST for queries and do the conversion for requests on the old schema, while really using the new one. That would allow clients to roll over asynchronously. |
What do you mean by node? A gateway? A node containing the data range? They are almost guaranteed to not be the same nodes. |
You're right, the tricky bit is coordinating those two sides. During each phase, both schemas are valid, but switching to the newer version may require a synchronous rewrite of the whole data, which is nothing the gateway can be in charge of. |
@tschottdorf the idea of using such a table is pretty onerous. Remember that there is not just one schema in the system. If we're multitenant we will need to maintain a registry of size I think this discussion is a bit hand wavey altogether - let's revisit this when we're ready to discuss how to implement schema changes (ahead of this, we should discuss what types of schema changes we're going to allow). We should be prepared for the possibility of a storage-format-breaking-change when we do that, though. |
I agree we can "table" this discussion for now. But if the Raft race issues have taught us anything, it's that it's helpful to version the data. |
@tamird You'll persist that data anyways, so it's really just a question of where to put it. Schema is per-tenant, and a list of nodes per tenant seems fine. The table actually wouldn't have to be accessed nor updated much in normal operation. A schema change could be signaled via Gossip, and each node would update their entry with a single CPut. If you insist that all nodes which hold data for a certain tenant register themselves in that table before doing anything, you can atomically switch to the next phase (you need to deal with nodes going down during a schema change, but that'll always be an issue and a Gossip-based timeout may actually do the trick). |
@tschottdorf there's currently no way to list nodes, and no plans to change that. Even if we did list them, that would be manual and you can imagine that a failing node would have to be manually (and somehow synchronously to all tables) delisted. This would introduce very non-trivial latency to any kind of schema changes in node failure scenarios. |
This issue was filed before we had our SQL story in place. Our SQL implementation has support for asynchronous schema changes. There is no work currently planned here, though see #1780 for discussion have handling low-level data format changes. |
Problem:
Proposal:
Currently we append the timestamp value to encoded keys. We could instead append
<schema version><timestamp>
. All operation would be augmented to include schema version, set at the gateway which processed the SQL. Reads would be serviced for any version with less-than or equal schema version and of course less than or equal timestamp. Reads with old schema versions would simply ignore newer schema versions. Writes would fail if encountering a key containing a newer schema version than the proposed schema version.This solution allows old schema queries to proceed unhindered by potentially destructed schema changes and protects us from overwriting tuples inserted with newer schemas with data for older schemas. Additionally, this schema version on the key provides an elegant method to track and revert all changes made using each successive schema revision in turn.
The text was updated successfully, but these errors were encountered: