-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: add unique_rowid_nonserial() builtin #7186
Comments
alternate proposal: implement this function by hashing the output of unique_rowid |
Why not just random? |
Right, of course. +1 for random |
Random makes sense to me. |
On a somewhat related note, how do we handle the situation in which there is "morally" a unique primary key but ordering is explicitly not desired? For example, ingesting many rows which are only ever going to be accessed individually is a bit tricky. You could assign random uuids, but then you must save that uuid for later retrieving the value (or a secondary index is needed, which again has hot spots). Instead, the reasonable thing to do is to chose as the "real" primary key a hash of the "moral" primary key. For example, if a table is used to look up users by email, then one would want a primary key column That makes sure that writes are spread out over the keyspace, but there is no overhead to look up data (though the lookup would have to query Do we have something like that already? |
We already have a @tschottdorf it sounds reasonable to me to use Another possibility would be to make the hashing a part of the index definition, so the SQL would just see the raw email, but the index would encode things using the hash. |
The latter option has the benefit of fitting in nicely as a special case of user-defined index functions (for example a secondary index on |
Supposing we had such a random function, would it make sense to use it as the default for newly created table from that point forward? Since SQL does not guarantee that rows are placed in the same order in storage (more specifically, that order on select is not guaranteed unless there's an explicit "order by" clause) as they were inserted, this would be semantically correct -- and wouldn't it buy us protection against hotspots during large batched inserts? |
I think we discussed this and decided that sometimes you do want rows to be inserted generally near each other, so it's not clear that this is always a good idea. |
Yeah, my sense is that approximate locality based on time of insertion is desired more often than random distribution. That's the whole reason behind the design of the unique_rowid function. It's not suitable in all cases but we have to pick one way or the other to be the default and I think we made the right choice. |
I am providing the integer random function in #25388. |
(Also since then we have built-in hash functions.) |
Closing as per #25388 (comment). Also note that the trade-offs are now documented: https://www.cockroachlabs.com/docs/stable/sql-faqs.html#what-are-the-differences-between-uuid-sequences-and-unique_rowid |
unique_rowid() creates values in the sequential order of time. This can lead to a write hotspot on writes on a table. We should provide another builtin unique_rowid_nonserial() that spreads new unique values around. Placing the nodeid in the higher order bits and the timestamp in the lower order bits is a solution--albeit not a very good one--because it spreads new values across a few buckets (number of nodes),
The text was updated successfully, but these errors were encountered: