sql: add unique_rowid_nonserial() builtin #7186

vivekmenezes · 2016-06-13T14:06:07Z

unique_rowid() creates values in the sequential order of time. This can lead to a write hotspot on writes on a table. We should provide another builtin unique_rowid_nonserial() that spreads new unique values around. Placing the nodeid in the higher order bits and the timestamp in the lower order bits is a solution--albeit not a very good one--because it spreads new values across a few buckets (number of nodes),

danhhz · 2016-06-13T14:08:44Z

alternate proposal: implement this function by hashing the output of unique_rowid

bdarnell · 2016-06-13T14:13:55Z

Why not just random?

danhhz · 2016-06-13T14:16:21Z

Right, of course. +1 for random

maddyblue · 2016-06-13T16:49:45Z

Random makes sense to me.

tbg · 2016-06-13T17:13:11Z

On a somewhat related note, how do we handle the situation in which there is "morally" a unique primary key but ordering is explicitly not desired? For example, ingesting many rows which are only ever going to be accessed individually is a bit tricky.

You could assign random uuids, but then you must save that uuid for later retrieving the value (or a secondary index is needed, which again has hot spots). Instead, the reasonable thing to do is to chose as the "real" primary key a hash of the "moral" primary key. For example, if a table is used to look up users by email, then one would want a primary key column int64 DEFAULT some_hash(email).

That makes sure that writes are spread out over the keyspace, but there is no overhead to look up data (though the lookup would have to query WHERE pk = some_hash($email)). This effectively emulates how hashing data distribution works.

Do we have something like that already?

bdarnell · 2016-06-13T17:36:41Z

We already have a random() function that returns a float between 0 and 1, so it sounds like all we need is a random_int64() function. (-1 on giving it a name like unique_rowid_nonserial()).

@tschottdorf it sounds reasonable to me to use id BYTES DEFAULT sha256(email) PRIMARY KEY, but we don't currently allow references to other columns in default values (nor do postgres and mysql). But since the application would need to do the hashing in their WHERE clauses as well, I'm not sure it's worth trying to make the default more magical.

Another possibility would be to make the hashing a part of the index definition, so the SQL would just see the raw email, but the index would encode things using the hash.

tbg · 2016-06-13T18:24:50Z

The latter option has the benefit of fitting in nicely as a special case of user-defined index functions (for example a secondary index on LOWER(username) should work much the same as a primary key HASH(username)).

knz · 2016-11-08T16:01:38Z

Supposing we had such a random function, would it make sense to use it as the default for newly created table from that point forward? Since SQL does not guarantee that rows are placed in the same order in storage (more specifically, that order on select is not guaranteed unless there's an explicit "order by" clause) as they were inserted, this would be semantically correct -- and wouldn't it buy us protection against hotspots during large batched inserts?

maddyblue · 2016-11-08T18:39:11Z

I think we discussed this and decided that sometimes you do want rows to be inserted generally near each other, so it's not clear that this is always a good idea.

bdarnell · 2016-11-10T14:34:34Z

Yeah, my sense is that approximate locality based on time of insertion is desired more often than random distribution. That's the whole reason behind the design of the unique_rowid function. It's not suitable in all cases but we have to pick one way or the other to be the default and I think we made the right choice.

knz · 2018-05-09T19:34:03Z

I am providing the integer random function in #25388.
This together with computed columns, which can be used as index keys, satisfy the various use cases that were discussed in the history of this issue.

knz · 2018-05-09T19:34:16Z

(Also since then we have built-in hash functions.)

knz · 2018-05-14T13:12:35Z

Closing as per #25388 (comment).

Also note that the trade-offs are now documented: https://www.cockroachlabs.com/docs/stable/sql-faqs.html#what-are-the-differences-between-uuid-sequences-and-unique_rowid

vivekmenezes assigned maddyblue Jun 13, 2016

maddyblue removed their assignment Jul 6, 2016

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels Oct 27, 2016

knz mentioned this issue Oct 27, 2016

sql: support expression-based index columns #9682

Closed

petermattis added this to the Later milestone Feb 22, 2017

knz mentioned this issue May 9, 2018

sql: provde a new function random_int64() #25388

Closed

knz closed this as completed May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: add unique_rowid_nonserial() builtin #7186

sql: add unique_rowid_nonserial() builtin #7186

vivekmenezes commented Jun 13, 2016

danhhz commented Jun 13, 2016

bdarnell commented Jun 13, 2016

danhhz commented Jun 13, 2016

maddyblue commented Jun 13, 2016

tbg commented Jun 13, 2016

bdarnell commented Jun 13, 2016

tbg commented Jun 13, 2016 •

edited

Loading

knz commented Nov 8, 2016 •

edited

Loading

maddyblue commented Nov 8, 2016

bdarnell commented Nov 10, 2016

knz commented May 9, 2018

knz commented May 9, 2018

knz commented May 14, 2018

sql: add unique_rowid_nonserial() builtin #7186

sql: add unique_rowid_nonserial() builtin #7186

Comments

vivekmenezes commented Jun 13, 2016

danhhz commented Jun 13, 2016

bdarnell commented Jun 13, 2016

danhhz commented Jun 13, 2016

maddyblue commented Jun 13, 2016

tbg commented Jun 13, 2016

bdarnell commented Jun 13, 2016

tbg commented Jun 13, 2016 • edited Loading

knz commented Nov 8, 2016 • edited Loading

maddyblue commented Nov 8, 2016

bdarnell commented Nov 10, 2016

knz commented May 9, 2018

knz commented May 9, 2018

knz commented May 14, 2018

tbg commented Jun 13, 2016 •

edited

Loading

knz commented Nov 8, 2016 •

edited

Loading