Recommend against using autoincremented integer primary keys #3420

rmloveland · 2018-07-24T20:01:25Z

... unless it's absolutely necessary for a specific use case, in which case you should be aware of the effects on INSERT performance (and possibly time required for range rebalancing across nodes? I'm much murkier on this point).

We do mention this issue obliquely in the performance best practices docs, but in terms of general transaction contention.

However, this issue keeps coming up in the forum in one guise or another:

cockroachDB single hot node
is this insert performance normal
performance of JDBC batch inserts & upserts (potentially JDBC issue, but the schema looks very sequential-integer-y)
how does cockroachdb scale
cockroachdb - what are the read/write perf goals?

Therefore, in addition to our current docs on this issue, we should state explicitly in the primary key docs, the CREATE TABLE docs, and possibly others that most users should use random values as primary keys most of the time. Users who are aware of and OK with the tradeoffs that will result from using sequential values can still do so of course, but we should describe those tradeoffs.

Perhaps a variation of @knz's comment on the last forum post in the list above would suffice, i.e.,

Meanwhile, for this type of use case, you can tune write performance effectively with CockroachDB by using some random value as PK and use a secondary index on the time column. In this way, inserts are properly scattered through the cluster, resulting in better throughput scalability of writes (more inserts/sec when you increase the number of nodes), at the cost of slightly higher latency for individual read queries.

(Aside: the latter point about read latency may be mitigated somewhat in a magical follower reads future, AIUI, but that is beside the point at this time.)

(cc @tim-o FYI)

The text was updated successfully, but these errors were encountered:

rmloveland · 2019-01-31T20:59:27Z

Also, this FAQ entry on generated IDs should be updated to deemphasize / discourage INT PKs.

Fixes #4183, #4254. Addresses part of #3420, #4298. Summary of changes: - Add `default_int_size` to session vars page - Update `INT` data type page with new subsections of the **Size** section: - Description of compatibility issues caused by 64-bit integers vs. e.g. JavaScript runtimes, including when using autogenerated JS clients - Why you should avoid integer primary keys, with links to more information in Perf Best Practices and the Architecture docs

rmloveland · 2019-02-01T14:55:57Z

I drafted this recently while working on updates to the INT docs for #4183, but it doesn't belong in that unit of work. Stashing here for when this issue gets worked on - I think this should be a new section on the INT page (note: links don't work in this context).

Integer primary keys

If you are planning to use a primary key of type INT, please note that it will lead to performance degradation due to write contention (since sequential values write to contiguous areas in the same and/or adjacent ranges on disk). If fast range scans are important to your app's design, but INSERT/UPDATE performance is not, then using INT primary keys can work.

If INSERT/UPDATE performance is important for your application, we recommend using UUID primary keys. Performance is better because writes can be spread more evenly across the cluster's ranges, avoiding hot spots. For more information, see:

jseldess added the T-missing-info label Jul 26, 2018

jseldess added this to the 2.1 milestone Jul 26, 2018

jseldess added the A-sql label Oct 30, 2018

jseldess modified the milestones: 2.1, Later Oct 30, 2018

jseldess added the O-external Origin: Issue comes from external users. label Nov 9, 2018

rmloveland added the P-2 Normal priority; secondary task label Jan 16, 2019

rmloveland mentioned this issue Jun 13, 2019

Update Primary Key constraint page to discuss data layout / perf considerations #4911

Closed

awoods187 added P-1 High priority; must be done this release and removed P-2 Normal priority; secondary task labels Jul 15, 2020

ericharmeling mentioned this issue Oct 26, 2020

Updated primary key performance guidance #8756

Merged

ericharmeling closed this as completed in #8756 Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommend against using autoincremented integer primary keys #3420

Recommend against using autoincremented integer primary keys #3420

rmloveland commented Jul 24, 2018

rmloveland commented Jan 31, 2019 •

edited

Loading

rmloveland commented Feb 1, 2019

Recommend against using autoincremented integer primary keys #3420

Recommend against using autoincremented integer primary keys #3420

Comments

rmloveland commented Jul 24, 2018

rmloveland commented Jan 31, 2019 • edited Loading

rmloveland commented Feb 1, 2019

Integer primary keys

rmloveland commented Jan 31, 2019 •

edited

Loading