Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: introduce virtual sequences and use with SERIAL #28575

Merged
merged 3 commits into from
Aug 16, 2018

Conversation

knz
Copy link
Contributor

@knz knz commented Aug 14, 2018

Very good idea by @BramGruneir

New features: virtual sequences (opt-in) and create-sequence-upon-SERIAL (opt-in). See individual commits for details and release notes.

For reference:

  • I made everything opt-in here (i.e. default to previous CockroachDB behavior) to minimize risk.
  • UX with Hibernate will thus not automatically be improved. We need to inform users (and upgrade our tutorials/docs) that this experimental feature is available and how to use it to unblock problems.
  • It is possible that making the new default be virtual_sequence will both be backward-compatible with performance and unlock UX for Hibernate users. This has @awoods187 preference I think. However I propose to only flip that switch after we've seen a week or two pass with this patch merged.

Fixes #22607 (using the new setting set to sql_sequence).
Fixes #24062 (using the new setting set to sql_sequence).
Fixes #26730 (using the new setting set to either virtual_sequence or sql_sequence).

@knz knz requested review from vilterp and BramGruneir August 14, 2018 13:16
@knz knz requested review from a team August 14, 2018 13:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Aug 14, 2018

I tested locally and this seems to work well. Please have a look already. However I still need to iterate to iron out a couple of test failures.

@knz knz force-pushed the 20180814-vseq-serial branch from 7ac0ae7 to 3aa8703 Compare August 14, 2018 13:31
@knz
Copy link
Contributor Author

knz commented Aug 14, 2018

oh it so appears that there are no test failures. Wowza

craig bot pushed a commit that referenced this pull request Aug 14, 2018
28573: roachtest: fix queue failure message r=petermattis a=tschottdorf

It was using a global variable instead of the one it wanted.

Touches #28372.

Release note: None

28576: sql: cache sequence descriptors r=knz a=knz

Required for the tests in #28575.

`nextval()` was using uncached sequence descriptors.
There is no reason to do so anymore.

Release note (performance improvement): SQL sequences receive a slight
performance boost in the `nextval()` built-in function.

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz knz force-pushed the 20180814-vseq-serial branch from 3aa8703 to 1f76c1c Compare August 14, 2018 14:07
@knz knz changed the title [DNM] sql: introduce virtual sequences [DNM] sql: introduce virtual sequences and use with SERIAL Aug 15, 2018
@knz knz force-pushed the 20180814-vseq-serial branch from 1f76c1c to 149d85f Compare August 15, 2018 17:20
@knz knz requested a review from a team August 15, 2018 17:20
@knz
Copy link
Contributor Author

knz commented Aug 15, 2018

I just completed prototype support for sequences with SERIAL in CREATE TABLE and ALTER TABLE.

The thing is not (yet) configurable. I am pushing this early to get a CI run and see how much tests need adjusting.

Feel free to look but it's not ready yet.

Note to self: ensure that ALTER TABLE SET TYPE with SERIAL types does the same as pg.

@knz
Copy link
Contributor Author

knz commented Aug 15, 2018

@mjibson since you plan to look at this let me flesh out the configuration options that andy W, Bram and I have discussed.

There would be a new cluster setting (name TBD) to serve as cluster-wide default.
Then there would be a new session var (name TBD) which, in every session, defaults to the cluster setting. This is the same configuration mechanism we're using for distsql and experimental_opt already.

The setting would have the following 3 values:

  • rowid: SERIAL maps to INT NOT NULL DEFAULT unique_rowid()
  • virtual_sequence: SERIAL maps to INT NOT NULL DEFAULT nextval(seqname) with a sequence created as per CREATE SEQUENCE seqname VIRTUAL
  • sql_sequence: SERIAL maps to sequences like PostgreSQL.

@knz
Copy link
Contributor Author

knz commented Aug 15, 2018

We're thinking about having the cluster setting default to virtual_sequence for new clusters, rowid for existing clusters, with sql_sequence become opt-in for client apps that are still not happy with virtual sequences.

@maddyblue
Copy link
Contributor

Reading over the code, it looks like these new virtual sequences are a sugar wrapper over unique_rowid. The IMPORT stuff calls that same function, so I think the actual bytes generated by IMPORT are compatible with this. It appears that IMPORT just needs to be taught about SERIAL or something. Overall I think this will be an easy fix for IMPORT.

@knz knz force-pushed the 20180814-vseq-serial branch from 149d85f to 4ed2ef7 Compare August 15, 2018 18:21
@knz knz requested a review from a team as a code owner August 15, 2018 18:21
@knz
Copy link
Contributor Author

knz commented Aug 15, 2018

@mjibson you may have glanced only at the first commit. Either a virtual or real sequence needs to be created as a descriptor, get a table ID, and serve name resolution requests. I am not sure how to best integrate that and where it needs to be integrated.

@maddyblue
Copy link
Contributor

I think we have two options here. The first and easiest one is to have IMPORT convert SERIAL like the rowid option you describe above to avoid the sequence descriptor problem completely. I'm completely ok with that for now.

The second option is a full solution. The current IMPORT PGDUMP code supports sequences

desc, err := sql.MakeSequenceTableDesc(
by adding them to a list of descriptors in a resolver
fks.resolver[desc.Name] = &desc
and then passing that resolver in
desc, err := MakeSimpleTableDescriptor(evalCtx.Ctx(), settings, create, parentID, id, fks, walltime)
and at

I don't think it'd be too hard to do similar SERIAL processing in the IMPORT PGDUMP stuff.

A difficulty may arise when processing CSV, though, since it's never had to make more than 1 descriptor and so lacks some of the plumbing. I'm looking at CSV now to see what would be required for this.

@knz
Copy link
Contributor Author

knz commented Aug 16, 2018

@mjibson I modified importccl/import_stmt.go as we discussed. It would be nice if you could review just that.

Also there's another thing. PostgreSQL has the SERIAL pseudo-type always imply NOT NULL (we didn't do that, incorrectly, previously). This patch changes this, so I had to modify the import CSV test. Please have a look at that too. Thanks

@knz knz force-pushed the 20180814-vseq-serial branch 2 times, most recently from 699d2b8 to 3a0860a Compare August 16, 2018 20:27
@@ -191,6 +211,9 @@ func MakeSimpleTableDescriptor(
Sequence: &importSequenceOperators{},
}
affected := make(map[sqlbase.ID]*sqlbase.TableDescriptor)

log.Warningf(ctx, "WOO %q", create.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

@@ -46,11 +46,14 @@ func TestMakeSimpleTableDescriptorErrors(t *testing.T) {
stmt: "create table a (i int, constraint a foreign key (i) references c (id))",
error: `this IMPORT format does not support foreign keys`,
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test still accurate? I think this and the next test should be reverted.

if err != nil {
return nil, err
}
*refTable = tree.MakeUnqualifiedTableName(refTable.TableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@knz knz force-pushed the 20180814-vseq-serial branch from 3a0860a to 33f3636 Compare August 16, 2018 20:27
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/ccl/importccl/import_stmt.go, line 215 at r8 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Remove.

Done.


pkg/ccl/importccl/csv_internal_test.go, line 49 at r8 (raw file):

Previously, mjibson (Matt Jibson) wrote…

Is this test still accurate? I think this and the next test should be reverted.

Done.

Copy link
Contributor

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCL stuff LGTM.

craig bot pushed a commit that referenced this pull request Aug 16, 2018
28629: sql: better acknowledge fresh sequence descs in the current txn r=knz a=knz

To be used with #28575.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Copy link
Member

@BramGruneir BramGruneir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Just a few nits and comments.

Reviewed 8 of 11 files at r2, 1 of 1 files at r3, 20 of 20 files at r4, 9 of 29 files at r5, 8 of 11 files at r6, 1 of 1 files at r7, 27 of 27 files at r8, 2 of 2 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/sql/serial.go, line 116 at r4 (raw file):

		return nil, nil, nil, err
	}
	if res != nil {

Can you add the first attempt to the loop to?

Might make sense to pull it out into a function.


pkg/sql/logictest/testdata/logic_test/sequences, line 833 at r4 (raw file):

user root

# Test virtual sequences.

subtest please


pkg/sql/logictest/testdata/logic_test/serial, line 88 at r8 (raw file):

DROP TABLE serials; DROP TABLE smallbig; DROP TABLE serial

lint: extra lines


pkg/sql/logictest/testdata/logic_test/serial, line 91 at r8 (raw file):



subtest serial_virtual_sequence

Please add a test that has to loop 2 times to get a sequence name.


pkg/sql/logictest/testdata/logic_test/serial, line 182 at r8 (raw file):


statement ok
DROP TABLE serials; DROP TABLE smallbig; DROP TABLE serial

this can be a single drop table statement


pkg/sql/logictest/testdata/logic_test/serial, line 276 at r8 (raw file):


statement ok
DROP TABLE serials; DROP TABLE smallbig; DROP TABLE serial

can be a single drop table


pkg/sql/sqlbase/structured.proto, line 699 at r4 (raw file):

    // Start value of the sequence.
    optional int64 start = 4 [(gogoproto.nullable) = false];
	// Whether the sequence is virtual.

lint: tabs

@knz
Copy link
Contributor Author

knz commented Aug 16, 2018

All tests pass :)

Let me clean this up for you then we'll let the tires kick the (master) road.

knz added 2 commits August 16, 2018 22:57
Suggested/recommended by @BramGruneir.

This patch introduces **virtual sequences**, a CockroachDB-specific
extension to SQL sequences. Virtual sequences are sequences that do
not generate monotonically increasing values and instead produce
values like the built-in function `unique_rowid()`.

They are intended for use in combination with the PostgreSQL column
pseudo-type "SERIAL", where clients expect a sequence object to be
created but usually don't really care about the particular values
generated by the sequence. Virtual sequences buy us a behavior that's
compatible schema-wise with PostgreSQL but without the performance
scalability bottleneck of regular sequences.

Technically, a virtual sequence is supported by a regular sequence
descriptor with just an extra option called "virtual". Therefore it
behaves as a database object much alike a regular SQL sequence and can
be manipulated with the same DDL (CREATE/DROP/SHOW) etc. and thus
appears very much alike a regular SQL sequence to clients -- to ensure
they can be used as a substitute to regular SQL sequences without
clients noticing.

The difference between a virtual sequence and a regular sequence is
only to be found in the behavior of the `nextval()` built-in function:
with virtual sequences this skips the KV increment operation and
instead runs the code of `unique_rowid()`.

Release note (sql change): A CockroachDB-specific experimental
extension to SQL sequences, called "virtual sequences", is introduced
as an experiment to more transparently support PostgreSQL applications
using the SERIAL pseudo-type.
This will enable reuse in other places.

Release note: None
@knz knz force-pushed the 20180814-vseq-serial branch from 33f3636 to 5c5ad5c Compare August 16, 2018 21:01
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/serial.go, line 116 at r4 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Can you add the first attempt to the loop to?

Might make sense to pull it out into a function.

Done.


pkg/sql/logictest/testdata/logic_test/sequences, line 833 at r4 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

subtest please

Done.


pkg/sql/logictest/testdata/logic_test/serial, line 88 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

lint: extra lines

I liked the little boost to readability of the logic test file. Oh well. Done.


pkg/sql/logictest/testdata/logic_test/serial, line 91 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

Please add a test that has to loop 2 times to get a sequence name.

Done.


pkg/sql/logictest/testdata/logic_test/serial, line 182 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

this can be a single drop table statement

Done.


pkg/sql/logictest/testdata/logic_test/serial, line 276 at r8 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

can be a single drop table

Done.


pkg/sql/sqlbase/structured.proto, line 699 at r4 (raw file):

Previously, BramGruneir (Bram Gruneir) wrote…

lint: tabs

Done.

@knz knz added the docs-todo label Aug 16, 2018
Prior to this patch, SERIAL in CockroachDB was always alias for `INT
DEFAULT unique_rowid()`. However 3rd party apps developed for
PostgreSQL expect SERIAL to associate a column with a SQL sequence.

Although the particular values produced for the SERIAL column rarely
matter to the client app, we have found in practice that client
apps (or ORM frameworks) are likely to assert that there is a sequence
object created in the namespace with a particular name, and that they
are able to call `nextval()` / `currval()` on that object while
populating new rows on the table.

This patch increases compatibility with those apps by optionally
associating SERIAL columns to sequences like PostgreSQL.

This behavior is driven by a new session variable called
`experimental_serial_normalization`. Its default for new SQL sessions
is configured via the new cluster setting
`sql.defaults.serial_normalization`. The three values are:

- `rowid` (the default): SERIAL behaves like before and expands to
  `INT NOT NULL DEFAULT unique_rowid()`. The SERIAL
  width (SMALLSERIAL/SERIAL4/etc) is ignored.
- `virtual_sequence`: SERIAL creates a virtual sequence and expands
  to `INT NOT NULL DEFAULT nextval(...seqname...)`. The SERIAL
  width (SMALLSERIAL/SERIAL4/etc) is ignored.
- `sql_sequence`: This is the PostgreSQL "stricter" compatibility
  mode. SERIAL creates a regular SQL sequence and expands to
  `...inttype... NOT NULL DEFAULT nextval(...seqname...)`. The
  SERIAL width (SMALLSERIAL/SERIAL4/etc) is respected.

Finally, this patch ensures that SERIAL always implies NOT NULL, as
required for PostgreSQL compatibility, even when SERIAL is not used as
PRIMARY KEY.

Release note (sql change): CockroachDB now supports two experimental
compatibility modes with how PostgreSQL handles SERIAL and sequences,
to ease reuse of 3rd party frameworks or apps developed for
PostgreSQL. These modes can be enabled with the session variable
`experimental_serial_normalization` (per client) and cluster setting
`sql.defaults.serial_normalization` (cluster-wide). The first mode
`virtual_sequence` enables compatibility with many applications using
SERIAL with maximum performance and scalability. The second mode
`sql_sequence` enables maximum PostgreSQL compatibility but thus uses
regular SQL sequences and is thus subject to performance constraints.
@knz knz force-pushed the 20180814-vseq-serial branch from 5c5ad5c to 46e0449 Compare August 16, 2018 21:13
@knz
Copy link
Contributor Author

knz commented Aug 16, 2018

ok, let's let these tires kick the road

bors r+

craig bot pushed a commit that referenced this pull request Aug 16, 2018
28575: sql: introduce virtual sequences and use with SERIAL r=knz a=knz

Very good idea by @BramGruneir 

New features: virtual sequences (opt-in) and create-sequence-upon-SERIAL (opt-in). See individual commits for details and release notes.

For reference:

- I made everything opt-in here (i.e. default to previous CockroachDB behavior) to minimize risk.
- UX with Hibernate will thus not automatically be improved. We need to inform users (and upgrade our tutorials/docs) that this experimental feature is available and how to use it to unblock problems.
- It is possible that making the new default be `virtual_sequence` will both be backward-compatible with performance and unlock UX for Hibernate users. This has @awoods187 preference I think. However I propose to only flip that switch after we've seen a week or two pass with this patch merged.

Fixes #22607 (using the new setting set to `sql_sequence`).
Fixes #24062 (using the new setting set to `sql_sequence`).
Fixes #26730 (using the new setting set to either `virtual_sequence` or `sql_sequence`).



Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig
Copy link
Contributor

craig bot commented Aug 16, 2018

Build succeeded

@craig craig bot merged commit 46e0449 into cockroachdb:master Aug 16, 2018
@knz knz deleted the 20180814-vseq-serial branch August 16, 2018 21:45
@vilterp
Copy link
Contributor

vilterp commented Aug 16, 2018

Nice work; so speedy! It's not on by default, right? Have to use the VIRTUAL keyword? (Edit: I see you said this in the PR description.) Does it create it automatically if you use a SERIAL type?

@knz
Copy link
Contributor Author

knz commented Aug 16, 2018

@vilterp read the release notes :)

@jordanlewis
Copy link
Member

I'm confused - what does that cluster/session setting really do? Does it affect runtime behavior of existing SERIAL datatypes? Or does it only affect what's created when you make a table with SERIAL?

@knz
Copy link
Contributor Author

knz commented Aug 17, 2018

The setting only affects new tables created inside that session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants