-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: expose MVCC timestamps to SQL #50102
Comments
Another proposal is to expose the MVCC timestamp as a builtin. It's not clear exactly what the signature of such a builtin would be but it seems conceivable that we could invent one. I have a harder time envisioning how such a builtin would get planned or implemented. |
We already "collect" information about each kv read when buffering data for a row. It wouldn't be hard to keep a running max mvcc timestamp that is reset every time a row is emitted from the fetcher.
This seems reasonable. I might hack on this and see how possible it is. I'll update the thread with results if I get anywhere. |
Turns out the fetcher level implementation of this is really easy -- it already tracks timestamps and buffers them across families for each row:
The next step is just planning -- we need to advertise this timestamp column in the catalog, ensure that only the primary index contains it (which forces an index join), and then tell the fetcher in what column slot to place the timestamp. @RaduBerinde do you think you could give me a pointer about where to look to do this in the optimizer/is this something that would be easily done? |
One question is whether decoding the timestamp has some cost here (I anticipate it does) and thus whether we want to only decode it if it we need it? |
There is going to be some small amount of overhead, but it looks pretty easy to only decode the timestamp when we need it |
cc @RaduBerinde, did you get a change to look at this? It seems like this is a pretty desired feature from customers, so we might look at changing the priority of it. |
For the optimizer, it should be as easy as changing the catalog ( |
Hmm, it doesn't seem to be as easy as just changing the |
It's up to the |
Fixes cockroachdb#50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables.
Fixes cockroachdb#50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables. This column is named `crdb_internal_mvcc_timestamp` and is accessible only in a limited set of contexts.
Fixes cockroachdb#50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables. This column is named `crdb_internal_mvcc_timestamp` and is accessible only in a limited set of contexts.
Fixes cockroachdb#50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables. This column is named `crdb_internal_mvcc_timestamp` and is accessible only in a limited set of contexts.
Fixes cockroachdb#50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables. This column is named `crdb_internal_mvcc_timestamp` and is accessible only in a limited set of contexts.
Fixes cockroachdb#50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables. This column is named `crdb_internal_mvcc_timestamp` and is accessible only in a limited set of contexts.
51494: sql: expose mvcc timestamps to SQL r=raduberinde a=rohany Fixes #50102. This PR adds introspection into the KV layer's concurrency control from the SQL level. In particular, we now expose the MVCC HLC timestamp of a row as a special system column on every table. This system column is exposed as a decimal, and is computed as `wall time * 10^10 + logical time`. To accomplish this, this PR adds planning and execution infrastructure for implicit system columns that can be produced by the execution layer for a particular row. Release note (sql change): Expose the MVCC timestamp of each row as a system column on tables. Co-authored-by: Rohan Yadav <[email protected]>
Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in cockroachdb#49266 from needing it. Fixes cockroachdb#49266. Informs cockroachdb#50102. Release note: None
Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in cockroachdb#49266 from needing it. Fixes cockroachdb#49266. Informs cockroachdb#50102. Release note: None
Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in cockroachdb#49266 from needing it. Fixes cockroachdb#49266. Informs cockroachdb#50102. Release note: None
48842: sql: fix portals after exhausting rows r=yuzefovich a=yuzefovich Previously, we would erroneously restart the execution from the very beginning of empty, unclosed portals after they have been fully exhausted when we should be returning no rows or an error in such scenarios. This is now fixed by tracking whether a portal is exhausted or not and intercepting the calls to `execStmt` when the conn executor state machine is in an open state. Note that the current solution has known deviations from Postgres: - when attempting to execute portals of statements that don't return row sets, on the second and consequent attempt PG returns an error while we are silently doing nothing (meaning we do not run the statement at all and return 0 rows) - we incorrectly populate "command tag" field of pgwire messages of some rows-returning statements after the portal suspension (for example, a suspended UPDATE RETURNING in PG will return the total count of "rows affected" while we will return the count since the last suspension). These deviations are deemed acceptable since this commit fixes a much worse problem - re-executing an exhausted portal (which could be a mutation meaning, previously, we could have executed a mutation multiple times). The reasons for why this commit does not address these deviations are: - Postgres has a concept of "portal strategy" (see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89). - Postgres has a concept of "command" type (these are things like SELECTs, UPDATEs, INSERTs, etc, see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672). CRDB doesn't have these concepts, and without them any attempt to simulate Postgres results in a very error-prone and brittle code. Fixes: #48448. Release note (bug fix): Previously, CockroachDB would erroneously restart the execution of empty, unclosed portals after they have been fully exhausted, and this has been fixed. 52098: sql: better distribute distinct processors r=yuzefovich a=yuzefovich **sql: better distribute distinct processors** The distinct processors are planned in two stages - first, a "local" stage is planned on the same nodes as the previous stage, then, a "global" stage is added. Previously, the global stage would be planned on the gateway, and this commit changes that to make it distributed - by planning "global" distinct processors on the same nodes as the "local" ones and connecting them via a hash router hashing the distinct columns. Release note: None **sql: implement ConstructDistinct in the new factory** Addresses: #47473. Release note: None 52358: engine: set the MVCC timestamp on reads due to historical intents r=ajwerner a=ajwerner Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in #49266 from needing it. Fixes #49266. Informs #50102. Release note: None 52384: sql: properly reset extraTxnState in COPY r=ajwerner a=ajwerner Apparently we support some sort of COPY protocol that I know nothing about. We allow operations in that protocol in the open state and in the noTxn state in the connExecutor. In the noTxn state we let the `copyMachine` handle its transaction lifecycle all on its own. In that case, we also hand have the `connExecutor` in a fresh state when calling `execCopyIn()`. During the execution of `COPY`, it seems like sometime we can pick up table descriptor leases. In the noTxn case we'd like to drop those leases and generally leave the executor in the fresh state in which it was handed to us. To deal with that, we call `resetExtraTxnState` before returning in the happy noTxn case. Fixes #52065. Release note (bug fix): Fixed a bug when using the COPY protocol which could prevent schema changes for up to 5 minutes. Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in cockroachdb#49266 from needing it. Fixes cockroachdb#49266. Informs cockroachdb#50102. Release note: None
Before this commit, we'd return a zero-value MVCC timestamp when reading an intent from the intent history. This was problematic because elsewhere in the code we assume that we will always get a non-zero MVCC timestamp when a read returns a value. This is especially bizarre given that a read of the latest intent will return its write timestamp. The semantics here are such that we'll return the timestamp of the MVCC metadata for the row. I think this is the most reasonable thing to do as that timestamp ought to reflect the timestamp we return when we return the current intent and furthermore is the only timestamp we really have around. We could return the transactions current read or write timestamp but those both seem like worse choices. It's also worth noting that in the case where we need to read a non-zero value, we don't really care what that value is and the fact that we are reading this intent itself is somewhat suspect. That being said, I still think we should make this change in addition to any change made to prevent the code referenced in cockroachdb#49266 from needing it. Fixes cockroachdb#49266. Informs cockroachdb#50102. Release note: None
For posterity, the hidden column that was added here is |
Is your feature request related to a problem? Please describe.
Cockroach associates each row in its KV store with an HLC timestamp for the purpose of MVCC. Currently this information is not accessible to client applications. Making this data accessible would be valuable to allow client applications to detect when data changed. We know for a fact that this has been a customer request on more than one occasion.
It also loosely relates to #39275 as using these timestamp might make detecting differences between rows easier. It seems like this task would be something of a requirement to implementing #19749.
Describe the solution you'd like
I'd like to see cockroach expose a
system
column that a sql query can ask for to retrieve the MVCC timestamp of a row. Postgres offers such system columns (see https://www.postgresql.org/docs/current/ddl-system-columns.html) that provide introspection into its concurrency control.One wrinkle in all of this is that different column families in a row can have different MVCC timestamps. Perhaps the name of the column will need to somehow refer to the name of the column family. Perhaps then we'd need some way to refer to the upper bound of all of the column families.
Another related problem has to do with retrieving this column from indexes. Only in some cases will we know that the write timestamp on an index will be the same as the timestamp on a secondary index. Figuring out when the timestamp off of the secondary index can be used vs. needing to go back and consult the primary index may also be something of a complex problem. I suspect for many cases it's not a big deal -- the need to do an index join back on the primary index when using this feature would likely be acceptable for v0.
Describe alternatives you've considered
Not much. Perhaps some sort of auto-updating column that updates for every write to the row?
Triggers could relieve the need for this in some cases but it's not ideal.
Additional context
The changes required for this should be pretty isolated. The row-fetcher already always retrieves the MVCC timestamps in get and scan requests. It's a matter of telling the row-fetcher to decode it and use it and to plan that retrieval of these columns. This feels like a squarely SQL features project and might even make for a good intern project.
The remaining complication as far as I'm concerned is that today schema change backfills will update rows and their MVCC timestamps. The solution to this will be to construct new indexes and swapping over (#47989). This issue shouldn't necessarily precede the work to expose MVCC columns but it has major bearing on their semantics. I suspect the work should proceed in parallel.
The text was updated successfully, but these errors were encountered: