Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SERIAL type #362

Merged
merged 4 commits into from
Jun 9, 2016
Merged

SERIAL type #362

merged 4 commits into from
Jun 9, 2016

Conversation

jseldess
Copy link
Contributor

@jseldess jseldess commented Jun 9, 2016

Adding docs for the new SERIAL type. Fixes #356.

HTML version: http://cockroach-draft-docs.s3-website-us-east-1.amazonaws.com/serial.html

@paperstreet, @petermattis, not sure if the last example is clear enough.


This change is Reviewable

@uptimeDBA
Copy link
Contributor

Review status: 0 of 3 files reviewed at latest revision, 1 unresolved discussion.


serial.md, line 10 [r1] (raw file):

The default value for a `SERIAL` column is the combination of the insert timestamp and the ID of the node executing the insert. This combination is guaranteed to be globally unique, and because value generation does not require talking to other nodes, it is much faster than sequentially auto-incrementing a value, which requires distributed coordination.

{{site.data.alerts.callout_info}}This data type is <strong>experimental</strong>. We believe it is a better solution than PostgeSQL's <code>SERIAL</code> and MySQL's <code>AUTO_INCREMENT</code> types, both of which auto-increment integers but not necessarily in a strictly sequential fashion (see the <a href="#auto-incrementing-is-not-always-sequential">Auto-Incrementing Is Not Always Sequential</a> example below). However, if you find that this feature is incompatible with your application, please <a href="https://github.com/cockroachdb/cockroach/issues">open an issue</a> or <a href="https://gitter.im/cockroachdb/cockroach">chat with us on Gitter</a>.{{site.data.alerts.end}}

The name(s) for this feature across all databases I'm aware of have caused no end of grief between DBA's and developers. The only thing this type of column can guarantee is uniqueness. (and even that can be called into question if you're allowed to insert your own values.) There can be gaps as values are consumed even if the transaction is rolled back (as the example shows). Are the values strictly increasing based on transaction time? In the case of Oracle, sequence values can appear out of transaction order due to multi instances and caching.


Comments from Reviewable

@danhhz
Copy link
Contributor

danhhz commented Jun 9, 2016

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions.


serial.md, line 46 [r1] (raw file):

When we insert 3 rows without column a values and return the new rows, we see that each row has defaulted to a unique value in column a.

"without column a values" reads a bit awkward to me


serial.md, line 59 [r1] (raw file):

Of course, we can explicitly insert an integer value into the SERIAL column as long as it meets the unique and non-null constraints.

Not sure we want to call this out in the docs. If you're using a SERIAL column, it's generally because you're not manually specifying the ids. Same for "## Format" above


serial.md, line 73 [r1] (raw file):

Auto-Incrementing Is Not Always Sequential

Did we cut the out of order example?


serial.md, line 75 [r1] (raw file):

### Auto-Incrementing Is Not Always Sequential

Some people assume that the auto-incrementing types in PostgreSQL and MySQL generate strictly sequential values that provide an accurate count of rows in a table. In fact, each insert increases the sequence by one, even when the insert is not commited. This means that these auto-incrementing types may leave gaps in a sequence. 

"Some people" feels accusatory. How about "A common misconception"


serial.md, line 75 [r1] (raw file):

### Auto-Incrementing Is Not Always Sequential

Some people assume that the auto-incrementing types in PostgreSQL and MySQL generate strictly sequential values that provide an accurate count of rows in a table. In fact, each insert increases the sequence by one, even when the insert is not commited. This means that these auto-incrementing types may leave gaps in a sequence. 

There are other reasons this is a bad idea besides the "accurate count of rows" example. Maybe delete the example or make it clear it's one of many ways this assumption can bite you


serial.md, line 77 [r1] (raw file):

Some people assume that the auto-incrementing types in PostgreSQL and MySQL generate strictly sequential values that provide an accurate count of rows in a table. In fact, each insert increases the sequence by one, even when the insert is not commited. This means that these auto-incrementing types may leave gaps in a sequence. 

To demonstrate this possibility, we create a table and treat the `INT` column like an auto-incrementing column in PostgreSQL or MySQL. 

This reads a lot clearer if you're actually using Postgres. Are we intentionally avoiding that?


Comments from Reviewable

@jseldess
Copy link
Contributor Author

jseldess commented Jun 9, 2016

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions.


serial.md, line 10 [r1] (raw file):

Previously, uptimeDBA (Paul Steffensen) wrote…

The name(s) for this feature across all databases I'm aware of have caused no end of grief between DBA's and developers. The only thing this type of column can guarantee is uniqueness. (and even that can be called into question if you're allowed to insert your own values.) There can be gaps as values are consumed even if the transaction is rolled back (as the example shows). Are the values strictly increasing based on transaction time? In the case of Oracle, sequence values can appear out of transaction order due to multi instances and caching.

I think the values are strictly increasing based on transaction time, yes. But Dan, can you confirm?

serial.md, line 46 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

"without column a values" reads a bit awkward to me

Done.

serial.md, line 59 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

Not sure we want to call this out in the docs. If you're using a SERIAL column, it's generally because you're not manually specifying the ids. Same for "## Format" above

Done. Let me know if the revisions need more work.

serial.md, line 75 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

"Some people" feels accusatory. How about "A common misconception"

Done.

Comments from Reviewable

@danhhz
Copy link
Contributor

danhhz commented Jun 9, 2016

:lgtm:

Previously, jseldess wrote…

SERIAL type

Adding docs for the new SERIAL type. Fixes #356.

HTML version: http://cockroach-draft-docs.s3-website-us-east-1.amazonaws.com/serial.html

@paperstreet, @petermattis, not sure if the last example is clear enough.


Review status: 0 of 3 files reviewed at latest revision, 8 unresolved discussions.


serial.md, line 94 [r2] (raw file):

Since each insert increased the sequence in column a by one, the first commited insert got the value 2, and the second commited insert got the value 4. As you can see, the values aren't strictly sequential, and the last value doesn't give an accurate count of rows in the table.

When we discussed this offline, I think the plan for the end of this example was to specifically call out that Postgres and Cockroach are thus doing the same thing, we're just doing it more often in exchange for speed.


Comments from Reviewable

@jseldess
Copy link
Contributor Author

jseldess commented Jun 9, 2016

Review status: 0 of 3 files reviewed at latest revision, 8 unresolved discussions.


serial.md, line 73 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

Did we cut the out of order example?

Discussed with Dan. Decided to leave the out of order example out.

serial.md, line 75 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

There are other reasons this is a bad idea besides the "accurate count of rows" example. Maybe delete the example or make it clear it's one of many ways this assumption can bite you

Removed example.

serial.md, line 77 [r1] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

This reads a lot clearer if you're actually using Postgres. Are we intentionally avoiding that?

Changed example to use postgres.

serial.md, line 94 [r2] (raw file):

Previously, paperstreet (Daniel Harrison) wrote…

When we discussed this offline, I think the plan for the end of this example was to specifically call out that Postgres and Cockroach are thus doing the same thing, we're just doing it more often in exchange for speed.

Done.

Comments from Reviewable

@jseldess jseldess merged commit e0502d6 into gh-pages Jun 9, 2016
@jseldess jseldess deleted the serial-type branch June 9, 2016 17:27
@bdarnell
Copy link
Contributor

"Guaranteed to be globally unique" is a stronger claim than I'm comfortable making for a 64-bit integer generated in a decentralized way. IMHO we're providing this type for compatibility with existing systems and the fact that people are really attracted to integers as IDs, but if you want a guarantee of global uniqueness you should be using a UUID in a BYTES field instead.

@jseldess
Copy link
Contributor Author

Chatted with @bdarnell. He was thinking that SERIAL is the combination of timestamp and random. Since it's timestamp and node ID, it is in fact guaranteed to be globally unique. That is, as Ben mentioned, "until you reach some very large number of nodes." For future reference:

In a large enough cluster (32K nodes), it breaks down. Why? Because that’s how many bits we allocated to the node id portion of the value. It’s 15 bits of node id and 49 bits of timestamp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants