SERIAL type #362

jseldess · 2016-06-09T03:25:02Z

Adding docs for the new SERIAL type. Fixes #356.

HTML version: http://cockroach-draft-docs.s3-website-us-east-1.amazonaws.com/serial.html

@paperstreet, @petermattis, not sure if the last example is clear enough.

This change is

uptimeDBA · 2016-06-09T04:27:09Z

Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion.

The default value for a `SERIAL` column is the combination of the insert timestamp and the ID of the node executing the insert. This combination is guaranteed to be globally unique, and because value generation does not require talking to other nodes, it is much faster than sequentially auto-incrementing a value, which requires distributed coordination.

{{site.data.alerts.callout_info}}This data type is <strong>experimental</strong>. We believe it is a better solution than PostgeSQL's <code>SERIAL</code> and MySQL's <code>AUTO_INCREMENT</code> types, both of which auto-increment integers but not necessarily in a strictly sequential fashion (see the <a href="#auto-incrementing-is-not-always-sequential">Auto-Incrementing Is Not Always Sequential</a> example below). However, if you find that this feature is incompatible with your application, please <a href="https://github.com/cockroachdb/cockroach/issues">open an issue</a> or <a href="https://gitter.im/cockroachdb/cockroach">chat with us on Gitter</a>.{{site.data.alerts.end}}

The name(s) for this feature across all databases I'm aware of have caused no end of grief between DBA's and developers. The only thing this type of column can guarantee is uniqueness. (and even that can be called into question if you're allowed to insert your own values.) There can be gaps as values are consumed even if the transaction is rolled back (as the example shows). Are the values strictly increasing based on transaction time? In the case of Oracle, sequence values can appear out of transaction order due to multi instances and caching.

Comments from Reviewable

danhhz · 2016-06-09T14:09:45Z

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions.

serial.md, line 46 [r1] (raw file):

When we insert 3 rows without column a values and return the new rows, we see that each row has defaulted to a unique value in column a.

"without column a values" reads a bit awkward to me

serial.md, line 59 [r1] (raw file):

Of course, we can explicitly insert an integer value into the SERIAL column as long as it meets the unique and non-null constraints.

Not sure we want to call this out in the docs. If you're using a SERIAL column, it's generally because you're not manually specifying the ids. Same for "## Format" above

serial.md, line 73 [r1] (raw file):

Auto-Incrementing Is Not Always Sequential

Did we cut the out of order example?

serial.md, line 75 [r1] (raw file):

### Auto-Incrementing Is Not Always Sequential

Some people assume that the auto-incrementing types in PostgreSQL and MySQL generate strictly sequential values that provide an accurate count of rows in a table. In fact, each insert increases the sequence by one, even when the insert is not commited. This means that these auto-incrementing types may leave gaps in a sequence.

"Some people" feels accusatory. How about "A common misconception"

serial.md, line 75 [r1] (raw file):

### Auto-Incrementing Is Not Always Sequential

Some people assume that the auto-incrementing types in PostgreSQL and MySQL generate strictly sequential values that provide an accurate count of rows in a table. In fact, each insert increases the sequence by one, even when the insert is not commited. This means that these auto-incrementing types may leave gaps in a sequence.

There are other reasons this is a bad idea besides the "accurate count of rows" example. Maybe delete the example or make it clear it's one of many ways this assumption can bite you

serial.md, line 77 [r1] (raw file):

Some people assume that the auto-incrementing types in PostgreSQL and MySQL generate strictly sequential values that provide an accurate count of rows in a table. In fact, each insert increases the sequence by one, even when the insert is not commited. This means that these auto-incrementing types may leave gaps in a sequence. 

To demonstrate this possibility, we create a table and treat the `INT` column like an auto-incrementing column in PostgreSQL or MySQL.

This reads a lot clearer if you're actually using Postgres. Are we intentionally avoiding that?

Comments from Reviewable

jseldess · 2016-06-09T16:38:15Z

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions.

serial.md, line 10 [r1] (raw file):

Previously, uptimeDBA (Paul Steffensen) wrote…

The name(s) for this feature across all databases I'm aware of have caused no end of grief between DBA's and developers. The only thing this type of column can guarantee is uniqueness. (and even that can be called into question if you're allowed to insert your own values.) There can be gaps as values are consumed even if the transaction is rolled back (as the example shows). Are the values strictly increasing based on transaction time? In the case of Oracle, sequence values can appear out of transaction order due to multi instances and caching.

I think the values are strictly increasing based on transaction time, yes. But Dan, can you confirm?

serial.md, line 46 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

"without column a values" reads a bit awkward to me

Done.

serial.md, line 59 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

Not sure we want to call this out in the docs. If you're using a SERIAL column, it's generally because you're not manually specifying the ids. Same for "## Format" above

Done. Let me know if the revisions need more work.

serial.md, line 75 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

"Some people" feels accusatory. How about "A common misconception"

Done.

Comments from Reviewable

danhhz · 2016-06-09T17:08:44Z

Previously, jseldess wrote…

SERIAL type

Adding docs for the new SERIAL type. Fixes #356.

HTML version: http://cockroach-draft-docs.s3-website-us-east-1.amazonaws.com/serial.html

@paperstreet, @petermattis, not sure if the last example is clear enough.

Review status: 0 of 3 files reviewed at latest revision, 8 unresolved discussions.

serial.md, line 94 [r2] (raw file):

Since each insert increased the sequence in column a by one, the first commited insert got the value 2, and the second commited insert got the value 4. As you can see, the values aren't strictly sequential, and the last value doesn't give an accurate count of rows in the table.

When we discussed this offline, I think the plan for the end of this example was to specifically call out that Postgres and Cockroach are thus doing the same thing, we're just doing it more often in exchange for speed.

Comments from Reviewable

jseldess · 2016-06-09T17:26:42Z

Review status: 0 of 3 files reviewed at latest revision, 8 unresolved discussions.

serial.md, line 73 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

Did we cut the out of order example?

Discussed with Dan. Decided to leave the out of order example out.

serial.md, line 75 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

There are other reasons this is a bad idea besides the "accurate count of rows" example. Maybe delete the example or make it clear it's one of many ways this assumption can bite you

Removed example.

serial.md, line 77 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

This reads a lot clearer if you're actually using Postgres. Are we intentionally avoiding that?

Changed example to use postgres.

serial.md, line 94 [r2] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

When we discussed this offline, I think the plan for the end of this example was to specifically call out that Postgres and Cockroach are thus doing the same thing, we're just doing it more often in exchange for speed.

Done.

Comments from Reviewable

bdarnell · 2016-06-13T22:02:09Z

"Guaranteed to be globally unique" is a stronger claim than I'm comfortable making for a 64-bit integer generated in a decentralized way. IMHO we're providing this type for compatibility with existing systems and the fact that people are really attracted to integers as IDs, but if you want a guarantee of global uniqueness you should be using a UUID in a BYTES field instead.

jseldess · 2016-06-14T01:45:23Z

Chatted with @bdarnell. He was thinking that SERIAL is the combination of timestamp and random. Since it's timestamp and node ID, it is in fact guaranteed to be globally unique. That is, as Ben mentioned, "until you reach some very large number of nodes." For future reference:

In a large enough cluster (32K nodes), it breaks down. Why? Because that’s how many bits we allocated to the node id portion of the value. It’s 15 bits of node id and 49 bits of timestamp.

Jesse Seldess added 2 commits June 8, 2016 17:30

starting serial docs

832e186

finishing serial type docs

1067a26

jseldess assigned danhhz Jun 9, 2016

jseldess mentioned this pull request Jun 9, 2016

sql: implement SERIAL column type cockroachdb/cockroach#7032

Merged

revisions for dan

e495a44

adding conclusion to auto-increment example

1a3a5dc

jseldess merged commit e0502d6 into gh-pages Jun 9, 2016

jseldess deleted the serial-type branch June 9, 2016 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SERIAL type #362

SERIAL type #362

jseldess commented Jun 9, 2016 •

edited by andreimatei

Loading

uptimeDBA commented Jun 9, 2016

danhhz commented Jun 9, 2016

Auto-Incrementing Is Not Always Sequential

jseldess commented Jun 9, 2016

danhhz commented Jun 9, 2016

SERIAL type

jseldess commented Jun 9, 2016

bdarnell commented Jun 13, 2016

jseldess commented Jun 14, 2016

SERIAL type #362

SERIAL type #362

Conversation

jseldess commented Jun 9, 2016 • edited by andreimatei Loading

uptimeDBA commented Jun 9, 2016

danhhz commented Jun 9, 2016

Auto-Incrementing Is Not Always Sequential

jseldess commented Jun 9, 2016

danhhz commented Jun 9, 2016

SERIAL type

jseldess commented Jun 9, 2016

bdarnell commented Jun 13, 2016

jseldess commented Jun 14, 2016

jseldess commented Jun 9, 2016 •

edited by andreimatei

Loading