Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fetch minimal set of columns on returning mutations #38594

Merged

Conversation

ridwanmsharif
Copy link

Previously, we used to fetch all columns when a mutation contained
a RETURNING clause. This is an issue because it forces us to
retrieve unnecessary data and creates extra contention.
This change adds logic to compute the minimal set of required columns
and fetches only those.

Fixes #30618.
Unblocks #30624.

Release note: None

@ridwanmsharif ridwanmsharif requested a review from a team July 1, 2019 22:09
@ridwanmsharif ridwanmsharif requested a review from a team as a code owner July 1, 2019 22:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ridwanmsharif ridwanmsharif removed request for a team July 1, 2019 22:09
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about his deeply enough, but I was expecting this work to happen in optimization rules and/or execution engine code. The optbuilder mutation code is very tricky as-is, and I'd like to avoid adding any optimizations directly to that code. From what I can see, this PR is folding the "fetch minimal set of columns" optimization directly into the optbuilder code, making it even more difficult to reason about. Is there some reason why you went with this design? Do other alternatives not work well?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've marked this as a WIP. My initial approach was to fix the creation of the returning columns correctly in the optbuilder instead of fixing it later as a norm rule. I wasn't certain how we'd extract the return cols and thought it'd be easier to add that in the optbuilder. But now that I'm extracting them from the projection, you're right that its better to have this elsewhere. I'll push up a change to add a norm rule to do this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 6fd7498 to 93d6b63 Compare July 2, 2019 01:13
Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a norm rule. The output columns don't look right though (and also the not null columns I suspect). I'll take a closer look in the morning.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 93d6b63 to 6872528 Compare July 2, 2019 02:13
@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch 6 times, most recently from 37436c8 to 2fac7f2 Compare July 3, 2019 02:27
@ridwanmsharif ridwanmsharif requested review from a team July 3, 2019 04:06
Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've made some good progress on this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @ridwanmsharif)


pkg/sql/opt/norm/prune_cols.go, line 535 at r4 (raw file):

	// Find all the columns referenced by the projections.
	returningOrdSet := exec.ColumnOrdinalSet{}

We shouldn't take a dependency on the exec package from norm. Coupled with the comment below, I'd create a NeededMutationReturnCols function that returns an opt.ColSet that contains the column ids for the columns needed by the projection + primary key columns. That way, it fits better with the patterns already established in this file. In PruneMutationReturnCols, you can then iterate over the ReturnCols and only include them in the new MutationPrivate if they're part of that needed set.


pkg/sql/opt/norm/prune_cols.go, line 548 at r4 (raw file):

	})

	// The columns of the primary index are always returned regardless of

NIT: primary index => primary key


pkg/sql/opt/norm/prune_cols.go, line 567 at r4 (raw file):

	newPrivate.ReturnCols = newReturnCols
	newPrivate.ReturnPruned = true

We should avoid using ReturnPruned, because it means that we won't be able to repeatedly prune more columns as more of the expression is optimized. Like imagine something like:

SELECT a
FROM
(
  SELECT a, b
  FROM [UPDATE abcde SET c=5 WHERE d=100 RETURNING *]
)

We'd first prune according to the inner projection, which might prune away column 'e'. But then if we set ReturnPruned, we won't be able to prune column b. This is why we typically follow the pattern CanPruneXXX and PruneXXX. We don't want to assume that one pruning attempt is enough.

I'd recommend that you follow the pattern we established for other pruning rules:

// Not sure what 
(CanPruneMutationReturnCols $input $needed:(NeededMutationReturnCols $mutationPrivate))

pkg/sql/opt/norm/rules/prune_cols.opt, line 472 at r4 (raw file):

[PruneMutationReturnCols, Normalize]
(Project
    $input: (Insert | Update | Upsert | Delete

NIT: Remove blank space after $input:.


pkg/sql/opt/norm/testdata/rules/prune_cols, line 1941 at r4 (raw file):


# Do not prune columns that must be returned.
# TODO(justin): in order to prune e here we need a PruneMutationReturnCols rule.

Can update the comment now.


pkg/sql/rowcontainer/datum_row_container.go, line 216 at r4 (raw file):

func (c *RowContainer) AddRow(ctx context.Context, row tree.Datums) (tree.Datums, error) {
	if len(row) != c.numCols {
		//panic(fmt.Sprintf("invalid row length %d, expected %d", len(row), c.numCols))

Need to uncomment this.

@ridwanmsharif ridwanmsharif removed request for a team July 4, 2019 13:30
@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch 5 times, most recently from 6da5461 to cc8fa38 Compare July 8, 2019 16:03
@ridwanmsharif ridwanmsharif requested a review from justinj July 9, 2019 16:33
@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 00b30ad to b919eab Compare July 9, 2019 16:46
Copy link
Contributor

@justinj justinj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @ridwanmsharif)


pkg/sql/delete.go, line 337 at r12 (raw file):

		// public columns.
		resultValues := make(tree.Datums, d.run.rows.NumCols())
		for i := 0; i < d.run.rows.NumCols(); i++ {

can use copy for this, but this is doing an additional copy now, is that needed? why can we no longer just pass in the existing slice?


pkg/sql/opt_exec_factory.go, line 1290 at r12 (raw file):

	var returnCols sqlbase.ResultColumns
	if rowsNeeded {
		// Update always returns the minimal set of non-mutation columns required,

super nit: it's not really, conceptually, the minimal set here, it's just the set that is specified in the operator. no need to change this but it's a little bit abstraction-leaky


pkg/sql/opt_exec_factory.go, line 1292 at r12 (raw file):

		// Update always returns the minimal set of non-mutation columns required,
		// in the same order they are defined in the table.
		var returnColDescs []sqlbase.ColumnDescriptor

no need to var this, i think, just use := below


pkg/sql/opt/memo/expr.go, line 385 at r12 (raw file):

// OutputOrd returns true if the i-th ordinal column should be part of the
// mutations output columns.

nit: mutation's, and maybe give this a more descriptive name, like IsColumnOutput.


pkg/sql/opt/memo/expr.go, line 395 at r12 (raw file):

// InsertOrd returns true if the i-th ordinal column is being inserted by
// the mutation.
func (m *MutationPrivate) InsertOrd(i int) bool {

ditto


pkg/sql/opt/memo/logical_props_builder.go, line 1091 at r12 (raw file):

	// Only non-mutation columns are output columns.
	for i, n := 0, tab.ColumnCount(); i < n; i++ {
		if private.OutputOrd(i) || private.InsertOrd(i) {

are these only referenced here? if so can they be combined into a single condition somehow?


pkg/sql/opt/norm/prune_cols.go, line 515 at r5 (raw file):

// CanPruneMutationReturnCols checks whether the mutations return columns can
// be pruned. This is the base case for the PruneMutationReturnCols rule.

nit: s/base case/precondition/?


pkg/sql/opt/norm/prune_cols.go, line 527 at r5 (raw file):

// keep ReturnCols that are not referenced by the RETURNING clause or are not
// part of the primary key. The caller must have already done the analysis to
// prove that these columns are not needed, by calling CanPruneMutationReturnCols.

possibly: s/that these columns are not needed/that such columns exist/? but nothing bad will happen if there are no such columns, right? it just won't be productive for the caller


pkg/sql/opt/norm/prune_cols.go, line 404 at r12 (raw file):

// to be pruned, then there must be logic in the PruneCols method to actually
// prune those columns when requested.
func DerivePruneCols(e memo.RelExpr) opt.ColSet {

I think there should be a corresponding change here in DerivePruneCols so that we can push down Projections to allow pruning to occur


pkg/sql/opt/norm/prune_cols.go, line 523 at r12 (raw file):

	tabID := c.mem.Metadata().TableMeta(private.Table).MetaID
	for i := 0; i < len(private.ReturnCols); i++ {

for i := range private.ReturnCols {


pkg/sql/opt/norm/prune_cols.go, line 543 at r12 (raw file):

	tabID := c.mem.Metadata().TableMeta(private.Table).MetaID

	for i := 0; i < len(private.ReturnCols); i++ {

nit: for i := range private.ReturnCols {


pkg/sql/opt/norm/prune_cols.go, line 547 at r12 (raw file):

			newReturnCols[i] = private.ReturnCols[i]
		} else {
			newReturnCols[i] = 0

nit: this is unnecessary since it will default to 0


pkg/sql/opt/norm/prune_cols.go, line 559 at r12 (raw file):

// the mutation.
func (c *CustomFuncs) NeededMutationReturnCols(
	private *memo.MutationPrivate, projections memo.ProjectionsExpr, passthrough opt.ColSet,

as mentioned in prune_cols.opt, I think this can be simplified


pkg/sql/opt/norm/rules/prune_cols.opt, line 475 at r12 (raw file):

        $innerInput:*
        $checks:*
        $mutationPrivate:*)

nit: ) on new line


pkg/sql/opt/norm/rules/prune_cols.opt, line 479 at r12 (raw file):

    $passthrough:* &
      (CanPruneMutationReturnCols
        $mutationPrivate $needed: (NeededMutationReturnCols $mutationPrivate $projections $passthrough))

do you need to pass the projections and passthrough directly to this function? a common pattern we use is:

$needed:(UnionCols3
    (ProjectionOuterCols $projections)
    $passthrough
    (NeededMutationReturnCols $mutationPrivate)
)

which I think keeps things more contained


pkg/sql/opt/norm/rules/prune_cols.opt, line 486 at r12 (raw file):

        $innerInput
        $checks
        (PruneMutationReturnCols $mutationPrivate $needed))

nit: ) on new line


pkg/sql/opt/norm/testdata/rules/prune_cols, line 2241 at r12 (raw file):

           └── CASE WHEN a IS NULL THEN column2 ELSE 10 END [type=int, outer=(7,10)]

nit: 2 extra blank lines


pkg/sql/rowcontainer/datum_row_container.go, line 198 at r12 (raw file):

	rsz := c.fixedColsSize
	for _, i := range c.varSizedColumns {
		if i < len(row) && row[i] != nil {

this smells to me like we've violated an existing contract. why was this change necessary? in general I think making nils valid where they were not before is a risky proposition, is there a way we can avoid having to do so?

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from de69d15 to ff6ce77 Compare July 9, 2019 20:09
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: (other than the outstanding comments)

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @ridwanmsharif)


pkg/sql/opt/norm/prune_cols.go, line 562 at r22 (raw file):

) opt.ColSet {
	// Find all the columns referenced by the projections.
	returningColSet := opt.ColSet{}

var returningColSet opt.ColSet


pkg/sql/opt/norm/prune_cols.go, line 566 at r22 (raw file):

	for _, projection := range projections {
		outCols := projection.ScalarProps(c.mem).OuterCols
		outCols.ForEach(func(colID opt.ColumnID) {

returningColSet.UnionWith(outCols) (same below)


pkg/sql/opt/norm/prune_cols.go, line 572 at r22 (raw file):

	passthrough.ForEach(func(colID opt.ColumnID) {
		if colID > 0 {

Why would passthrough contain a 0 ColumnID?


pkg/sql/opt/norm/rules/prune_cols.opt, line 479 at r22 (raw file):

    $passthrough:* &
      (CanPruneMutationReturnCols
        $mutationPrivate $needed: (NeededMutationReturnCols $mutationPrivate $projections $passthrough))

[nit] no space after :, and ) on new line


pkg/sql/opt/norm/testdata/rules/prune_cols, line 2270 at r22 (raw file):


# Fetch all the columns for the RETURN expression.
opt

any reason we're using opt instead of norm? (I see it's already used a lot in this file, not sure why)


pkg/sql/opt/norm/testdata/rules/prune_cols, line 2323 at r22 (raw file):

                └── a + d [type=int, outer=(9,12)]

# Fetch only whats being updated (not the (d, e, f, g)) family.

[nit] should "family" be inside the outer )


pkg/sql/opt/xform/testdata/rules/join, line 2127 at r21 (raw file):

 │    │    ├── side-effects, mutations
 │    │    ├── insert abc
 │    │    │    ├── columns: abc.a:5(int) abc.b:6(int) abc.c:7(int) abc.rowid:8(int!null)

how come we lost the FDs and !null info?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I don't have much to add beyond what Radu/Justin/Andy already said.

Don't forget to squash the commits...

Reviewed 1 of 8 files at r5, 1 of 7 files at r15, 1 of 5 files at r16, 1 of 4 files at r18, 2 of 5 files at r19, 3 of 6 files at r20, 2 of 2 files at r21, 1 of 1 files at r22.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @ridwanmsharif)


pkg/sql/opt/norm/prune_cols.go, line 513 at r22 (raw file):

}

// CanPruneMutationReturnCols checks whether the mutations return columns can

[nit] mutations -> mutation's


pkg/sql/opt/norm/rules/prune_cols.opt, line 479 at r22 (raw file):

Previously, RaduBerinde wrote…

[nit] no space after :, and ) on new line

also put $mutationPrivate and $needed on different lines

@ridwanmsharif
Copy link
Author

There will me some more involved changes in the upcoming commits in sql. This PR doesn't correctly set the output columns in upsert and insert mutations and right now. Even though the mutations fetch the minimal set of columns it needs to, it populates the result row with all the fetch columns. That is why we have weird exec plans where the output columns include more columns than the returning clause specifies. I'll address and fix all the comments once I've made the appropriate changes.

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 00c9100 to 1580d09 Compare July 11, 2019 19:34
@ridwanmsharif
Copy link
Author

cc @jordanlewis to get some eyes on the SQL changes.

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch 2 times, most recently from ee94de8 to c4a93b6 Compare July 14, 2019 21:00
Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @jordanlewis, @justinj, @RaduBerinde, and @rytaft)


pkg/sql/delete.go, line 337 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

can use copy for this, but this is doing an additional copy now, is that needed? why can we no longer just pass in the existing slice?

Good catch! This remained from one of my debugging sessions. A copy shouldn't be necessary, just appropriate indexing.


pkg/sql/opt_exec_factory.go, line 1290 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

super nit: it's not really, conceptually, the minimal set here, it's just the set that is specified in the operator. no need to change this but it's a little bit abstraction-leaky

Done.


pkg/sql/opt_exec_factory.go, line 1292 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

no need to var this, i think, just use := below

Will need this now (returnColumns is empty when no RETURNING clause is specified)


pkg/sql/opt/memo/expr.go, line 395 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

ditto

Done.


pkg/sql/opt/memo/logical_props_builder.go, line 1091 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

are these only referenced here? if so can they be combined into a single condition somehow?

Done.


pkg/sql/opt/norm/prune_cols.go, line 527 at r5 (raw file):

Previously, justinj (Justin Jaffray) wrote…

possibly: s/that these columns are not needed/that such columns exist/? but nothing bad will happen if there are no such columns, right? it just won't be productive for the caller

Done.


pkg/sql/opt/norm/prune_cols.go, line 404 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I think there should be a corresponding change here in DerivePruneCols so that we can push down Projections to allow pruning to occur

I added a second commit to make the appropriate changes to DerivePruneCols to allow this to happen.


pkg/sql/opt/norm/prune_cols.go, line 523 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

for i := range private.ReturnCols {

Done.


pkg/sql/opt/norm/prune_cols.go, line 559 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

as mentioned in prune_cols.opt, I think this can be simplified

Done.


pkg/sql/opt/norm/prune_cols.go, line 513 at r22 (raw file):

Previously, rytaft wrote…

[nit] mutations -> mutation's

Done.


pkg/sql/opt/norm/rules/prune_cols.opt, line 479 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

do you need to pass the projections and passthrough directly to this function? a common pattern we use is:

$needed:(UnionCols3
    (ProjectionOuterCols $projections)
    $passthrough
    (NeededMutationReturnCols $mutationPrivate)
)

which I think keeps things more contained

Done.


pkg/sql/opt/xform/testdata/rules/join, line 2127 at r21 (raw file):

Previously, RaduBerinde wrote…

how come we lost the FDs and !null info?

It was a problem with how the OutputCols of mutations were collected. Fixed now.


pkg/sql/rowcontainer/datum_row_container.go, line 198 at r12 (raw file):

Previously, justinj (Justin Jaffray) wrote…

this smells to me like we've violated an existing contract. why was this change necessary? in general I think making nils valid where they were not before is a risky proposition, is there a way we can avoid having to do so?

We avoid needing to do this now.

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from f65ddde to 8910ee1 Compare July 15, 2019 00:26
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r31, 8 of 18 files at r33, 4 of 5 files at r34, 9 of 9 files at r35, 3 of 3 files at r36.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @jordanlewis, @justinj, @RaduBerinde, and @ridwanmsharif)


pkg/sql/delete.go, line 219 at r35 (raw file):

	// to the columns in the resultRowBuffer. A value of -1 is used to indicate
	// that the column at that index is not part of the resultRowBuffer
	// of the mutation. Otherwise, the value an the i-th index refers to the

an -> at


pkg/sql/insert.go, line 389 at r35 (raw file):

	// columns in the resultRowBuffer. A value of -1 is used to indicate
	// that the table column at that index is not part of the resultRowBuffer
	// of the mutation. Otherwise, the value an the i-th index refers to the

an -> at


pkg/sql/insert.go, line 392 at r35 (raw file):

	// index of the resultRowBuffer where the i-th column of the table is
	// to be returned.
	tabIdxToRetIdx []int

[nit] for consistency, either call this tabColIdxToRetIdx or change the above name to rowIdxToTabIdx. Or maybe just get rid of tab and use colIdx for both.


pkg/sql/opt_exec_factory.go, line 1166 at r35 (raw file):

	// Initialize the rowIdxToTabColIdx array.
	rowIdxToRetIdx := make([]int, len(origColDescs))
	for i := range origColDescs {

[nit] I'd do range rowIdxToRetIdx to make it obvious that there's no array out of bounds issue here


pkg/sql/opt_exec_factory.go, line 1348 at r35 (raw file):

		// Update the rowIdxToReturnColDescs for the mutation. Update returns
		// the of non-mutation columns specified, in the same order they are

should rowIdxToReturnColDescs be rowIdxToRetIdx?

the of -> the


pkg/sql/opt_exec_factory.go, line 1353 at r35 (raw file):

		// The Updater derives/stores the fetch columns of the mutation and
		// since the return columns are always a subset of the fetch columns,
		// we can use use the fetch columns generate the mapping for the

columns generate -> columns to generate


pkg/sql/opt_exec_factory.go, line 1354 at r35 (raw file):

		// since the return columns are always a subset of the fetch columns,
		// we can use use the fetch columns generate the mapping for the
		// returning rows.

returning -> returned (?)


pkg/sql/opt_exec_factory.go, line 1490 at r35 (raw file):

		returnColumns = sqlbase.ResultColumnsFromColDescs(returnColDescs)

		// Update the tabIdxToReturnColDescs for the mutation. Upsert returns

rowIdxToReturnColDescs -> rowIdxToRetIdx


pkg/sql/opt_exec_factory.go, line 1555 at r35 (raw file):

	// Find all the columns that are part of the rows returned by the delete.
	deleteDescs := make([]sqlbase.ColumnDescriptor, 0, len(tableColDescs))
	colMap := make(map[sqlbase.ColumnID]struct{}, len(tableColDescs))

Probably better to use FastIntSet for this


pkg/sql/opt_exec_factory.go, line 1638 at r35 (raw file):

	// Determine the relational type of the generated delete node.
	// If rows are not needed, no columns are returned.
	var returnColumns sqlbase.ResultColumns

this name is very similar to returnCols. I'd try to distinguish them a bit more. Perhaps call the input returnColSet and this variable returnCols?


pkg/sql/opt_exec_factory.go, line 1652 at r35 (raw file):

		}

		// Update the tabIdxToReturnColDescs for the mutation. Delete returns

fix name for this variable


pkg/sql/tablewriter_upsert_opt.go, line 77 at r35 (raw file):

	// columns in the resultRowBuffer. A value of -1 is used to indicate
	// that the table column at that index is not part of the resultRowBuffer
	// of the mutation. Otherwise, the value an the i-th index refers to the

an -> at


pkg/sql/update.go, line 494 at r35 (raw file):

	// columns in the resultRowBuffer. A value of -1 is used to indicate
	// that the column at that index is not part of the resultRowBuffer
	// of the mutation. Otherwise, the value an the i-th index refers to the

an -> at


pkg/sql/opt/norm/custom_funcs.go, line 364 at r35 (raw file):

	// The columns of the primary key are always returned regardless of
	// whether they are referenced.

This comment seems out of place


pkg/sql/opt/norm/custom_funcs.go, line 366 at r35 (raw file):

	// whether they are referenced.
	tab := c.mem.Metadata().Table(table)
	primaryIndex := tab.Index(0)

use cat.PrimaryIndex


pkg/sql/opt/norm/custom_funcs.go, line 369 at r35 (raw file):

	for i, n := 0, primaryIndex.KeyColumnCount(); i < n; i++ {
		primaryKeyCols.Add(tabID.ColumnID(primaryIndex.Column(i).Ordinal))
	}

Can't you use the TableMeta.IndexKeyColumns function here?


pkg/sql/opt/norm/prune_cols.go, line 515 at r36 (raw file):

		// Any of the RETURNING columns are candidates for pruning if they are
		// not required for the mutation.
		relProps.Rule.PruneCols = relProps.OutputCols

I think you need to do relProps.OutputCols.Copy(). Alternatively, you could change the last line to relProps.Rule.PruneCols = relProps.OutputCols.Difference(neededCols) (seems more efficient)

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the planNodes :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @justinj, @RaduBerinde, and @ridwanmsharif)


pkg/sql/delete.go, line 348 at r35 (raw file):

		// public columns.
		resultValues := make(tree.Datums, d.run.rows.NumCols())
		for i := range d.run.rowIdxToRetIdx {

nit: it's idiomatic to do for i, retIdx := range to save a line of code


pkg/sql/opt_exec_factory.go, line 1226 at r35 (raw file):

		// In such cases, the newly added columns shouldn't be returned.
		// See regression logic tests for #29494.
		if len(tabDesc.Columns) < len(returnColDescs) {

Doesn't seem straightforward to me that returnColDescs would contain mutation columns, but I guess that's probably just the way things are?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @justinj, @RaduBerinde, and @ridwanmsharif)


pkg/sql/insert.go, line 424 at r36 (raw file):

		//
		// Also we need to re-order the values in the source, ordered by
		// insertCols, when writing them to resultRowBuffer, ordered by

"ordered by n.columns" doesn't quite make sense anymore no? Maybe just replace it with "according to rowIdxToTabColIdx"


pkg/sql/opt_exec_factory.go, line 1156 at r36 (raw file):

}

// mutationRowIdxToReturnIdx returns the mapping from the origColDescs to the

This description is hard to understand without more context. Should start by saying that returnColDescs is a subset of the columns in origColDescs.


pkg/sql/opt_exec_factory.go, line 1157 at r36 (raw file):

// mutationRowIdxToReturnIdx returns the mapping from the origColDescs to the
// returningColDescs. -1 is used for columns not part of the returnColDescs.

returnColDescs


pkg/sql/opt_exec_factory.go, line 1548 at r36 (raw file):

// of a row on the table. This will include the returnColDescs columns that
// are referenced in the RETURNING clause of the delete mutation. This
// is different from the fetch columns of the delete mutation as it doesn't

"it" naturally sounds like it refers to "this" but I think it refers to the fetch columns? How can fetch columns not include index keys and returning columns?


pkg/sql/opt_exec_factory.go, line 1657 at r36 (raw file):

		returnColumns = sqlbase.ResultColumnsFromColDescs(returnColDescs)

		// Find all the columns that the Deleter returns. The returning columns

The first sentence seems wrong, isn't this "all the columns that the deleteNode receives" ?

It's a little strange that we have to calculate this set in here, surprised we can't just use the fetchCols.

CC @andy-kimball


pkg/sql/opt_exec_factory.go, line 1658 at r36 (raw file):

		// Find all the columns that the Deleter returns. The returning columns
		// of the mutation are a subset of this column set and will use this

Unclear what "will use this for the mapping" means, I'd just remove it altogether.


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 487 at r36 (raw file):

 │                     render 0       b                   ·                ·
 └── run               ·              ·                   (a, b)           ·
      └── insert       ·              ·                   (a, b)           ·

How come insert doesn't produce only b?


pkg/sql/opt/norm/testdata/rules/prune_cols, line 2569 at r36 (raw file):

 ├── fd: (1)-->(2,4)
 └── delete returning_test
      ├── columns: a:1(int!null) b:2(int) d:4(int) rowid:8(int!null)

How come rowid is still part of the columns produced by delete/upsert?

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 8910ee1 to 40031b8 Compare July 16, 2019 18:50
Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @jordanlewis, @justinj, @RaduBerinde, and @rytaft)


pkg/sql/delete.go, line 219 at r35 (raw file):

Previously, rytaft wrote…

an -> at

Done.


pkg/sql/insert.go, line 389 at r35 (raw file):

Previously, rytaft wrote…

an -> at

Done.


pkg/sql/insert.go, line 392 at r35 (raw file):

Previously, rytaft wrote…

[nit] for consistency, either call this tabColIdxToRetIdx or change the above name to rowIdxToTabIdx. Or maybe just get rid of tab and use colIdx for both.

Done.


pkg/sql/insert.go, line 424 at r36 (raw file):

Previously, RaduBerinde wrote…

"ordered by n.columns" doesn't quite make sense anymore no? Maybe just replace it with "according to rowIdxToTabColIdx"

Done.


pkg/sql/opt_exec_factory.go, line 1166 at r35 (raw file):

Previously, rytaft wrote…

[nit] I'd do range rowIdxToRetIdx to make it obvious that there's no array out of bounds issue here

Done.


pkg/sql/opt_exec_factory.go, line 1226 at r35 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Doesn't seem straightforward to me that returnColDescs would contain mutation columns, but I guess that's probably just the way things are?

Yep.


pkg/sql/opt_exec_factory.go, line 1348 at r35 (raw file):

Previously, rytaft wrote…

should rowIdxToReturnColDescs be rowIdxToRetIdx?

the of -> the

Done.


pkg/sql/opt_exec_factory.go, line 1353 at r35 (raw file):

Previously, rytaft wrote…

columns generate -> columns to generate

Done.


pkg/sql/opt_exec_factory.go, line 1354 at r35 (raw file):

Previously, rytaft wrote…

returning -> returned (?)

Done.


pkg/sql/opt_exec_factory.go, line 1490 at r35 (raw file):

Previously, rytaft wrote…

rowIdxToReturnColDescs -> rowIdxToRetIdx

Done.


pkg/sql/opt_exec_factory.go, line 1555 at r35 (raw file):

Previously, rytaft wrote…

Probably better to use FastIntSet for this

Good Call. Done.


pkg/sql/opt_exec_factory.go, line 1638 at r35 (raw file):

Previously, rytaft wrote…

this name is very similar to returnCols. I'd try to distinguish them a bit more. Perhaps call the input returnColSet and this variable returnCols?

Done.


pkg/sql/opt_exec_factory.go, line 1652 at r35 (raw file):

Previously, rytaft wrote…

fix name for this variable

Done.


pkg/sql/opt_exec_factory.go, line 1156 at r36 (raw file):

Previously, RaduBerinde wrote…

This description is hard to understand without more context. Should start by saying that returnColDescs is a subset of the columns in origColDescs.

Done.


pkg/sql/opt_exec_factory.go, line 1548 at r36 (raw file):

Previously, RaduBerinde wrote…

"it" naturally sounds like it refers to "this" but I think it refers to the fetch columns? How can fetch columns not include index keys and returning columns?

Yeah I'll word it better and elaborate. The fetch columns do include the columns this function returns yes, but the fetch columns also have columns (for example the ones referenced in the WHERE clause).


pkg/sql/opt_exec_factory.go, line 1657 at r36 (raw file):

Previously, RaduBerinde wrote…

The first sentence seems wrong, isn't this "all the columns that the deleteNode receives" ?

It's a little strange that we have to calculate this set in here, surprised we can't just use the fetchCols.

CC @andy-kimball

Done. Also, as discussed earlier, the fetchColumns include more columns than we need and that's why we calculate it here.


pkg/sql/opt_exec_factory.go, line 1658 at r36 (raw file):

Previously, RaduBerinde wrote…

Unclear what "will use this for the mapping" means, I'd just remove it altogether.

Done.


pkg/sql/tablewriter_upsert_opt.go, line 77 at r35 (raw file):

Previously, rytaft wrote…

an -> at

Done.


pkg/sql/update.go, line 494 at r35 (raw file):

Previously, rytaft wrote…

an -> at

Done.


pkg/sql/opt/exec/execbuilder/testdata/orderby, line 487 at r36 (raw file):

Previously, RaduBerinde wrote…

How come insert doesn't produce only b?

As discussed earlier, it's because in addition to the RETURNING clause columns, the mutation (currently) also returns the primary key columns.


pkg/sql/opt/norm/custom_funcs.go, line 364 at r35 (raw file):

Previously, rytaft wrote…

This comment seems out of place

Done.


pkg/sql/opt/norm/custom_funcs.go, line 366 at r35 (raw file):

Previously, rytaft wrote…

use cat.PrimaryIndex

Done.


pkg/sql/opt/norm/custom_funcs.go, line 369 at r35 (raw file):

Previously, rytaft wrote…

Can't you use the TableMeta.IndexKeyColumns function here?

Done.


pkg/sql/opt/norm/prune_cols.go, line 515 at r36 (raw file):

Previously, rytaft wrote…

I think you need to do relProps.OutputCols.Copy(). Alternatively, you could change the last line to relProps.Rule.PruneCols = relProps.OutputCols.Difference(neededCols) (seems more efficient)

Done.


pkg/sql/opt/norm/testdata/rules/prune_cols, line 2569 at r36 (raw file):

Previously, RaduBerinde wrote…

How come rowid is still part of the columns produced by delete/upsert?

As part of the needed set in the rules, all the primary key columns are returned as well.

@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 40031b8 to 01059a7 Compare July 16, 2019 19:18
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @jordanlewis, @justinj, @RaduBerinde, @ridwanmsharif, and @rytaft)


pkg/sql/opt_exec_factory.go, line 1558 at r37 (raw file):

	// Find all the columns that are part of the rows returned by the delete.
	deleteDescs := make([]sqlbase.ColumnDescriptor, 0, len(tableColDescs))
	deleteCols := util.FastIntSet{}

[nit] var deleteCols util.FastIntSet


pkg/sql/opt/norm/rules/prune_cols.opt, line 471 at r37 (raw file):

# conservative with the fetch columns.
# TODO(ridwanmsharif): Mutations shouldn't need to return the primary key
#  columns. Make appropriate changes to SQL to accommodate this.

[nit] extra space before columns. Also, say "SQL execution" or it can read as something else

Previously, we used to fetch all columns when a mutation contained
a `RETURNING` clause. This is an issue because it forces us to
retrieve unnecessary data and creates extra contention.
This change adds logic to compute the minimal set of required columns
and fetches only those.

This change passes down the minimal required return cols
to SQL so it knows to only return columns that's requested.
This fixes the output column calculation done by opt and
makes sure the execPlan's output columns are the same as
output cols of the opt plan.

Fixes cockroachdb#30618.
Unblocks cockroachdb#30624.

Release note: None
@ridwanmsharif ridwanmsharif force-pushed the fetch-minimal-columns-on-return branch from 01059a7 to 5a44334 Compare July 16, 2019 20:10
Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball, @jordanlewis, @justinj, @RaduBerinde, and @rytaft)

craig bot pushed a commit that referenced this pull request Jul 16, 2019
38594: opt: fetch minimal set of columns on returning mutations r=ridwanmsharif a=ridwanmsharif

Previously, we used to fetch all columns when a mutation contained
a `RETURNING` clause. This is an issue because it forces us to
retrieve unnecessary data and creates extra contention.
This change adds logic to compute the minimal set of required columns
and fetches only those.

Fixes #30618.
Unblocks #30624.

Release note: None

Co-authored-by: Ridwan Sharif <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jul 16, 2019

Build succeeded

@craig craig bot merged commit 5a44334 into cockroachdb:master Jul 16, 2019
@ridwanmsharif ridwanmsharif deleted the fetch-minimal-columns-on-return branch July 16, 2019 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: when RETURNING expression is present, only fetch the specific columns used
7 participants