Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend against using autoincremented integer primary keys #3420

Closed
rmloveland opened this issue Jul 24, 2018 · 2 comments · Fixed by #8756
Closed

Recommend against using autoincremented integer primary keys #3420

rmloveland opened this issue Jul 24, 2018 · 2 comments · Fixed by #8756
Labels
O-external Origin: Issue comes from external users. P-1 High priority; must be done this release T-missing-info

Comments

@rmloveland
Copy link
Contributor

... unless it's absolutely necessary for a specific use case, in which case you should be aware of the effects on INSERT performance (and possibly time required for range rebalancing across nodes? I'm much murkier on this point).

We do mention this issue obliquely in the performance best practices docs, but in terms of general transaction contention.

However, this issue keeps coming up in the forum in one guise or another:

Therefore, in addition to our current docs on this issue, we should state explicitly in the primary key docs, the CREATE TABLE docs, and possibly others that most users should use random values as primary keys most of the time. Users who are aware of and OK with the tradeoffs that will result from using sequential values can still do so of course, but we should describe those tradeoffs.

Perhaps a variation of @knz's comment on the last forum post in the list above would suffice, i.e.,

Meanwhile, for this type of use case, you can tune write performance effectively with CockroachDB by using some random value as PK and use a secondary index on the time column. In this way, inserts are properly scattered through the cluster, resulting in better throughput scalability of writes (more inserts/sec when you increase the number of nodes), at the cost of slightly higher latency for individual read queries.

(Aside: the latter point about read latency may be mitigated somewhat in a magical follower reads future, AIUI, but that is beside the point at this time.)

(cc @tim-o FYI)

@jseldess jseldess added this to the 2.1 milestone Jul 26, 2018
@jseldess jseldess added the A-sql label Oct 30, 2018
@jseldess jseldess modified the milestones: 2.1, Later Oct 30, 2018
@jseldess jseldess added the O-external Origin: Issue comes from external users. label Nov 9, 2018
@rmloveland rmloveland added the P-2 Normal priority; secondary task label Jan 16, 2019
@rmloveland
Copy link
Contributor Author

rmloveland commented Jan 31, 2019

Also, this FAQ entry on generated IDs should be updated to deemphasize / discourage INT PKs.

rmloveland added a commit that referenced this issue Jan 31, 2019
Fixes #4183, #4254.

Addresses part of #3420, #4298.

Summary of changes:

- Add `default_int_size` to session vars page

- Update `INT` data type page with new subsections of the **Size**
  section:

  - Description of compatibility issues caused by 64-bit integers
    vs. e.g. JavaScript runtimes, including when using autogenerated JS
    clients

  - Why you should avoid integer primary keys, with links to more
    information in Perf Best Practices and the Architecture docs
@rmloveland
Copy link
Contributor Author

I drafted this recently while working on updates to the INT docs for #4183, but it doesn't belong in that unit of work. Stashing here for when this issue gets worked on - I think this should be a new section on the INT page (note: links don't work in this context).


Integer primary keys

If you are planning to use a primary key of type INT, please note that it will lead to performance degradation due to write contention (since sequential values write to contiguous areas in the same and/or adjacent ranges on disk). If fast range scans are important to your app's design, but INSERT/UPDATE performance is not, then using INT primary keys can work.

If INSERT/UPDATE performance is important for your application, we recommend using UUID primary keys. Performance is better because writes can be spread more evenly across the cluster's ranges, avoiding hot spots. For more information, see:

@awoods187 awoods187 added P-1 High priority; must be done this release and removed P-2 Normal priority; secondary task labels Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-external Origin: Issue comes from external users. P-1 High priority; must be done this release T-missing-info
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants