Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: ON CONFLICT DO UPDATE form of upsert #6591

Merged
merged 1 commit into from
May 11, 2016
Merged

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented May 9, 2016

name                    old time/op    new time/op    delta
Upsert1_Cockroach-8       1.15ms ± 5%    1.18ms ± 1%    ~     (p=0.151 n=5+5)
Upsert10_Cockroach-8      1.71ms ± 1%    1.75ms ± 1%  +2.26%  (p=0.008 n=5+5)
Upsert100_Cockroach-8     7.31ms ± 1%    7.41ms ± 2%    ~     (p=0.056 n=5+5)
Upsert1000_Cockroach-8    74.2ms ± 4%    74.2ms ± 3%    ~     (p=1.000 n=5+5)

name                    old alloc/op   new alloc/op   delta
Upsert1_Cockroach-8       82.4kB ± 0%    86.7kB ± 0%  +5.11%  (p=0.008 n=5+5)
Upsert10_Cockroach-8       164kB ± 0%     168kB ± 0%  +2.45%  (p=0.008 n=5+5)
Upsert100_Cockroach-8     1.05MB ± 0%    1.06MB ± 0%  +0.89%  (p=0.008 n=5+5)
Upsert1000_Cockroach-8    9.19MB ± 0%    9.27MB ± 0%  +0.90%  (p=0.008 n=5+5)

name                    old allocs/op  new allocs/op  delta
Upsert1_Cockroach-8        1.09k ± 0%     1.17k ± 0%  +7.40%  (p=0.008 n=5+5)
Upsert10_Cockroach-8       1.60k ± 0%     1.69k ± 0%  +5.58%  (p=0.008 n=5+5)
Upsert100_Cockroach-8      7.05k ± 0%     7.32k ± 0%  +3.87%  (p=0.008 n=5+5)
Upsert1000_Cockroach-8     60.9k ± 0%     63.0k ± 0%  +3.39%  (p=0.008 n=5+5)

The benchmarks will get better once I do the "all Puts" optimization, which is
coming next.

Closes #1962.

Followups:

  • ON CONFLICT DO UPDATE SET (a, b) = (1, 2)
  • the all Put optimization
  • RETURNING
  • ON CONFLICT DO NOTHING
  • Use the fetchCols stuff in all the right places

This change is Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented May 10, 2016

cc @knz @nvanbenschoten @RaduBerinde

@RaduBerinde
Copy link
Member

Great stuff Dan! Some initial comments

Previously, paperstreet (Dan Harrison) wrote…

cc @knz @nvanbenschoten @RaduBerinde


Review status: 0 of 13 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


sql/select_qvalue.go, line 77 [r1] (raw file):

      }
  } else {
      for _, table := range qt.aliasedTables {

It seems like aliasedTables acts as a "fallback". If the actual table name is excluded what do we want to happen? (please add a test for that)

The semantics of aliasedTables do not match how we want this code to behave when we have multiple tables in a select. What we would want is described in the TODO you removed. Until we actually implement that, please leave the TODO somewhere.

We could implement this to satisfy both usecases:

  • don't add aliasedTables; change table to a slice
  • add a defaultTableIdx field

If defaultTableIdx is set, the code prefers the default table if Base is not set or if it matches that table. Otherwise it looks at the other tables.

If defaultTableIdx is not set, the code finds the table with the correct name. If Base is not given, it finds a table where the column name makes sense AND verifies that it is the ONLY table where that column name makes sense. This behavior can be left as a TODO.


sql/select_qvalue.go, line 267 [r1] (raw file):

func (q qvalMap) populateQVals(row parser.DTuple) {
  for ref, qval := range q {
      qval.datum = row[ref.colIdx+ref.table.columnOffset]

Why not just call populateQvals(row[offset:]) instead of adding the columnOffset?


sql/tablewriter.go, line 293 [r1] (raw file):

      for i, result := range b.Results {
          if len(result.Rows) == 0 {
              // No conflict for this row, so leave upsertRowPKs[i] as nil.

Can remove the first if and just leave the comment. Also, maybe panic if we get more than 1 row?


sql/tablewriter.go, line 326 [r1] (raw file):

          pkSpans = append(pkSpans, sqlbase.Span{Start: primaryKey, End: primaryKey.PrefixEnd()})
          if _, ok := rowIdxForPrimaryKey[string(primaryKey)]; ok {
              return nil, fmt.Errorf("UPSERT/ON CONFLICT DO UPDATE command cannot affect row a second time")

this error message is very unclear.. maybe see what error pg would return in this case and match that?
Also, is this error even possible with the UPSERT syntax?


sql/update.go, line 346 [r1] (raw file):

}

func (p *planner) namesForExprs(exprs parser.UpdateExprs) (parser.QualifiedNames, error) {

function needs comment


sql/upsert.go, line 103 [r1] (raw file):

}

func upsertExprsAndIndex(

Function needs a description


sql/upsert.go, line 163 [r1] (raw file):

  }
  if conflictIndex == nil {
      return nil, nil, util.Errorf("there is no unique or exclusion constraint matching the ON CONFLICT specification")

Isn't this a user-facing error? (if yes, use fmt.Errorf)


sql/testdata/upsert, line 68 [r1] (raw file):


statement ok
INSERT INTO abc VALUES (1, 2, 3) ON CONFLICT (c) DO UPDATE SET a = 4

are we no longer testing the UPSERT form?


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Nice! Reviewed sql subpackages so far.

Previously, RaduBerinde wrote…

Great stuff Dan! Some initial comments


Reviewed 7 of 13 files at r1.
Review status: 7 of 13 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.


sql/parser/insert.go, line 76 [r1] (raw file):

//
// The zero value for OnConflict is used to signal the UPSERT short form, which
// uses the primary key for Index and the values being inserted for Exprs.

Does this comment need to be updated? It looks like Index is gone.

Also maybe add exprs and wheres to the ON CONFLICT index DO UPDATE SET part of the comment to show how they fit in.


sql/parser/insert.go, line 85 [r1] (raw file):

// IsUpsertAlias returns true if the UPSERT syntactic sugar was used.
func (oc *OnConflict) IsUpsertAlias() bool {
  return oc.Columns == nil && oc.Exprs == nil && oc.Where == nil

Consider just comparing agains the zero value. return oc != nil && *oc == OnConflict{}

That would also pull in the nil check, which I think cleans up the semantics a bit and allows you to remove it at least for the first if node.OnConflict != nil && node.OnConflict.IsUpsertAlias() {


sql/testdata/upsert, line 68 [r1] (raw file):

Previously, RaduBerinde wrote…

are we no longer testing the UPSERT form?


I think @paperstreet pulled them up top. Still, it would be nice to double check that we're not losing any logical test coverage here.


Comments from Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented May 10, 2016

Review status: 1 of 13 files reviewed at latest revision, 10 unresolved discussions, some commit checks pending.


sql/select_qvalue.go, line 77 [r1] (raw file):

Previously, RaduBerinde wrote…

It seems like aliasedTables acts as a "fallback". If the actual table name is excluded what do we want to happen? (please add a test for that)

The semantics of aliasedTables do not match how we want this code to behave when we have multiple tables in a select. What we would want is described in the TODO you removed. Until we actually implement that, please leave the TODO somewhere.

We could implement this to satisfy both usecases:

  • don't add aliasedTables; change table to a slice
  • add a defaultTableIdx field

If defaultTableIdx is set, the code prefers the default table if Base is not set or if it matches that table. Otherwise it looks at the other tables.

If defaultTableIdx is not set, the code finds the table with the correct name. If Base is not given, it finds a table where the column name makes sense AND verifies that it is the ONLY table where that column name makes sense. This behavior can be left as a TODO.


TIL: upserting into a table named "excluded" and using "excluded" in the set expr, gives you a table reference "excluded" is ambiguous error. Fixed and tested.

As discussed offline, postgres seems to use the "defaultTableIdx is not set" logic all the time. So I copied that.


sql/select_qvalue.go, line 267 [r1] (raw file):

Previously, RaduBerinde wrote…

Why not just call populateQvals(row[offset:]) instead of adding the columnOffset?


Yeah, I really don't like the columnOffset solution. Not sure this alternative works though. The qvalMap now contains references to column offsets in different tables so you can't really call this table-by-table.

I also considered offsetting all the indexes we store in qvalMap so they'd match up with the row that gets passed in here, but it really felt like the logic would be in the wrong place.


sql/tablewriter.go, line 293 [r1] (raw file):

Previously, RaduBerinde wrote…

Can remove the first if and just leave the comment. Also, maybe panic if we get more than 1 row?


Done.


sql/tablewriter.go, line 326 [r1] (raw file):

Previously, RaduBerinde wrote…

this error message is very unclear.. maybe see what error pg would return in this case and match that?
Also, is this error even possible with the UPSERT syntax?


This is copied exactly from pg (with the addition of our UPSERT shorthand) : - p

Definitely possible: UPSERT INTO foo VALUES (1, 2), (1, 3)

It may be working looking into fixing this for special cases, but the general case is hard enough that it's probably not worth it.


sql/update.go, line 346 [r1] (raw file):

Previously, RaduBerinde wrote…

function needs comment


Done.


sql/upsert.go, line 103 [r1] (raw file):

Previously, RaduBerinde wrote…

Function needs a description


Done.


sql/upsert.go, line 163 [r1] (raw file):

Previously, RaduBerinde wrote…

Isn't this a user-facing error? (if yes, use fmt.Errorf)


Oops


sql/parser/insert.go, line 76 [r1] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this comment need to be updated? It looks like Index is gone.

Also maybe add exprs and wheres to the ON CONFLICT index DO UPDATE SET part of the comment to show how they fit in.


Done.


sql/parser/insert.go, line 85 [r1] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider just comparing agains the zero value. return oc != nil && *oc == OnConflict{}

That would also pull in the nil check, which I think cleans up the semantics a bit and allows you to remove it at least for the first if node.OnConflict != nil && node.OnConflict.IsUpsertAlias() {


invalid operation: *oc == OnConflict literal (struct containing NameList cannot be compared)

I pulled in the != nil check though


sql/testdata/upsert, line 68 [r1] (raw file):

Previously, RaduBerinde wrote…

are we no longer testing the UPSERT form?


We have both, there are some tests for that in here, too. Sorry for the diff spam in this file, I was unhappy with the previous version and more or less rewrote it.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 3 of 13 files at r1, 7 of 11 files at r2.
Review status: 8 of 13 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


Comments from Reviewable

@RaduBerinde
Copy link
Member

sql/select_qvalue.go, line 267 [r1] (raw file):

Previously, paperstreet (Dan Harrison) wrote…

Yeah, I really don't like the columnOffset solution. Not sure this alternative works though. The qvalMap now contains references to column offsets in different tables so you can't really call this table-by-table.

I also considered offsetting all the indexes we store in qvalMap so they'd match up with the row that gets passed in here, but it really felt like the logic would be in the wrong place.


Ah I see now. Yeah while the qvalues are able to refer to multiple tables, this API wasn't extended yet. We could pass the *tableInfo along with the row and only update the qvals that refer to that table. We would call the function once for each table.

That also maps nicely to what would have to happen for something like SELECT t1.a,t2.b FROM t1, t2 - we would do something along the lines of:

for each row in t1:
   populateQVals(t1, t1node.Value())
   for each row in t2:
      populateQVals(t2, t2node.Value())
      // evaluate
      ...

What do you think?


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 8 of 13 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


sql/select_qvalue.go, line 76 [r2] (raw file):

                  ref.table = table
                  ref.colIdx = idx
                  if len(qname.Base) > 0 {

qname.Base != "" (for consistency with the above check)


sql/select_qvalue.go, line 77 [r2] (raw file):

                  ref.colIdx = idx
                  if len(qname.Base) > 0 {
                      // If the base is non-empty then this is unambiguous. Short circuit

If we remove this check, we would detect the ambiguous table thing without having to actually check for it (in makeUpsertHelper) - though the error would be the more generic column reference is ambiguous. It would also allow the statement to work if it doesn't actually resolve any columns (e.g. ON CONFLICT DO UPDATE SET x = 15). I think it's ok either way though, this is a very weird cornercase.


sql/tablewriter.go, line 326 [r1] (raw file):

Previously, paperstreet (Dan Harrison) wrote…

This is copied exactly from pg (with the addition of our UPSERT shorthand) : - p

Definitely possible: UPSERT INTO foo VALUES (1, 2), (1, 3)

It may be working looking into fixing this for special cases, but the general case is hard enough that it's probably not worth it.


Cool! Makes sense, I did not understand exactly what this error means earlier.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Feel free to ignore the nits.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Nice! Reviewed sql subpackages so far.


Review status: 8 of 13 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful.


sql/tablewriter.go, line 254 [r2] (raw file):

// conflicts.
//
// If the conflict index is the primary index, we can compute them directly.

[nit] I'd move this paragraph inside the if block, and the next paragraph in the else block. These are useful when reading that code, not necessarily that useful to understand the interface of the function.


sql/tablewriter.go, line 274 [r2] (raw file):

          upsertRowPKs[i] = upsertRowPK
      }
  } else {

[nit] could just return here to avoid indenting the entire else block.


sql/tablewriter.go, line 305 [r2] (raw file):

              }
          } else if len(result.Rows) > 1 {
              panic(fmt.Errorf("Expected <= 1 but got %d conflicts for row %s", len(result.Rows), tu.insertRows[i]))

long line


sql/upsert.go, line 160 [r2] (raw file):

  var conflictIndex *sqlbase.IndexDescriptor
  if indexMatch(tableDesc.PrimaryIndex) {
      conflictIndex = &tableDesc.PrimaryIndex

[nit] can just return onConflict.Exprs, &tableDesc.PrimaryIndex, nil


sql/upsert.go, line 164 [r2] (raw file):

      for _, index := range tableDesc.Indexes {
          if indexMatch(index) {
              conflictIndex = &index

[nit] return onConflict.Exprs, &index, nil


sql/sqlbase/table.go, line 537 [r2] (raw file):

      }
  }
  extractedValues := make([]parser.Datum, len(valueTypes))

[nit] maybe use len(index.ColumnIDs) for consistency with dirs and the code near the end


sql/sqlbase/table.go, line 553 [r2] (raw file):

      dirs[i] = encoding.Ascending
  }
  extractedImplicitValues := make([]parser.Datum, len(valueTypes))

[nit] len(index.ImplicitColumnIDs)


Comments from Reviewable

@danhhz
Copy link
Contributor Author

danhhz commented May 10, 2016

Review status: 5 of 13 files reviewed at latest revision, 12 unresolved discussions, all commit checks successful.


sql/select_qvalue.go, line 267 [r1] (raw file):

Previously, RaduBerinde wrote…

Ah I see now. Yeah while the qvalues are able to refer to multiple tables, this API wasn't extended yet. We could pass the *tableInfo along with the row and only update the qvals that refer to that table. We would call the function once for each table.

That also maps nicely to what would have to happen for something like SELECT t1.a,t2.b FROM t1, t2 - we would do something along the lines of:

for each row in t1:
   populateQVals(t1, t1node.Value())
   for each row in t2:
      populateQVals(t2, t2node.Value())
      // evaluate
      ...

What do you think?


Much better!


sql/select_qvalue.go, line 76 [r2] (raw file):

Previously, RaduBerinde wrote…

qname.Base != "" (for consistency with the above check)


Done.


sql/select_qvalue.go, line 77 [r2] (raw file):

Previously, RaduBerinde wrote…

If we remove this check, we would detect the ambiguous table thing without having to actually check for it (in makeUpsertHelper) - though the error would be the more generic column reference is ambiguous. It would also allow the statement to work if it doesn't actually resolve any columns (e.g. ON CONFLICT DO UPDATE SET x = 15). I think it's ok either way though, this is a very weird cornercase.


The UPSERT sugar leaks a little bit, but I like this a lot.

statement error column reference "excluded.b" is ambiguous
UPSERT INTO excluded VALUES (1, 1)

sql/tablewriter.go, line 254 [r2] (raw file):

Previously, RaduBerinde wrote…

[nit] I'd move this paragraph inside the if block, and the next paragraph in the else block. These are useful when reading that code, not necessarily that useful to understand the interface of the function.


Done.


sql/tablewriter.go, line 274 [r2] (raw file):

Previously, RaduBerinde wrote…

[nit] could just return here to avoid indenting the entire else block.


Done.


sql/tablewriter.go, line 305 [r2] (raw file):

Previously, RaduBerinde wrote…

long line


Done.


sql/upsert.go, line 160 [r2] (raw file):

Previously, RaduBerinde wrote…

[nit] can just return onConflict.Exprs, &tableDesc.PrimaryIndex, nil


Done.


sql/upsert.go, line 164 [r2] (raw file):

Previously, RaduBerinde wrote…

[nit] return onConflict.Exprs, &index, nil


Done.


sql/sqlbase/table.go, line 537 [r2] (raw file):

Previously, RaduBerinde wrote…

[nit] maybe use len(index.ColumnIDs) for consistency with dirs and the code near the end


Done.


sql/sqlbase/table.go, line 553 [r2] (raw file):

Previously, RaduBerinde wrote…

[nit] len(index.ImplicitColumnIDs)


Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Reviewed 1 of 7 files at r3.
Review status: 6 of 13 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


sql/select_qvalue.go, line 258 [r3] (raw file):

func (q qvalMap) populateQVals(tableAlias string, row parser.DTuple) {
  for ref, qval := range q {
      if ref.table.alias == tableAlias {

It would be great if we could compare tableInfo pointers here instead of strings, given that this is in the "hot" path for select. Also, given the other change, won't we now have possibly two tables with the same name excluded, meaning we may mix column indexes (possibly going out of bounds)? BTW I think multiple tables with same name might be a possibility in the general select case e.g. SELECT .* FROM db1.t, db2.t


Comments from Reviewable

name                    old time/op    new time/op    delta
Upsert1_Cockroach-8       1.15ms ± 5%    1.18ms ± 1%    ~     (p=0.151 n=5+5)
Upsert10_Cockroach-8      1.71ms ± 1%    1.75ms ± 1%  +2.26%  (p=0.008 n=5+5)
Upsert100_Cockroach-8     7.31ms ± 1%    7.41ms ± 2%    ~     (p=0.056 n=5+5)
Upsert1000_Cockroach-8    74.2ms ± 4%    74.2ms ± 3%    ~     (p=1.000 n=5+5)

name                    old alloc/op   new alloc/op   delta
Upsert1_Cockroach-8       82.4kB ± 0%    86.7kB ± 0%  +5.11%  (p=0.008 n=5+5)
Upsert10_Cockroach-8       164kB ± 0%     168kB ± 0%  +2.45%  (p=0.008 n=5+5)
Upsert100_Cockroach-8     1.05MB ± 0%    1.06MB ± 0%  +0.89%  (p=0.008 n=5+5)
Upsert1000_Cockroach-8    9.19MB ± 0%    9.27MB ± 0%  +0.90%  (p=0.008 n=5+5)

name                    old allocs/op  new allocs/op  delta
Upsert1_Cockroach-8        1.09k ± 0%     1.17k ± 0%  +7.40%  (p=0.008 n=5+5)
Upsert10_Cockroach-8       1.60k ± 0%     1.69k ± 0%  +5.58%  (p=0.008 n=5+5)
Upsert100_Cockroach-8      7.05k ± 0%     7.32k ± 0%  +3.87%  (p=0.008 n=5+5)
Upsert1000_Cockroach-8     60.9k ± 0%     63.0k ± 0%  +3.39%  (p=0.008 n=5+5)

The benchmarks will get better once I do the "all Puts" optimization, which is
coming next.

Closes cockroachdb#1962.
@danhhz
Copy link
Contributor Author

danhhz commented May 11, 2016

Review status: 4 of 13 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


sql/select_qvalue.go, line 258 [r3] (raw file):

Previously, RaduBerinde wrote…

It would be great if we could compare tableInfo pointers here instead of strings, given that this is in the "hot" path for select. Also, given the other change, won't we now have possibly two tables with the same name excluded, meaning we may mix column indexes (possibly going out of bounds)? BTW I think multiple tables with same name might be a possibility in the general select case e.g. SELECT .* FROM db1.t, db2.t


Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Awesome! Thanks for the updates! :lgtm:

Previously, RaduBerinde wrote…

Feel free to ignore the nits.


Reviewed 4 of 11 files at r2, 2 of 7 files at r3, 6 of 6 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


Comments from Reviewable

@danhhz danhhz merged commit be280e7 into cockroachdb:master May 11, 2016
@danhhz danhhz deleted the upsert branch May 11, 2016 11:52
@danhhz
Copy link
Contributor Author

danhhz commented May 11, 2016

Thanks for the review!

@jseldess
Copy link
Contributor

Docs updated with cockroachdb/docs#308

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants