-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: computed columns not derived when pushed through lookup join #63735
Comments
Here's a possibly related INSERT that is slowed down by its foreign key check: stmt-bundle-650214558116610049.zip. |
I took a look at what was going wrong here and what it would take to push the constant join predicate through to the other side of the join so that its computed column could be derived and constrained. In doing so, I started playing around with other similar queries. It turned out that when using an INT column instead of the ENUM column, things actually worked, which I wasn't expecting: demo@127.0.0.1:26257/movr> CREATE TABLE stock (
s_i_id INT8 NOT NULL,
s_w_id INT8 NOT NULL,
s_quantity INT8 NULL,
crdb_region INT NOT VISIBLE NOT NULL AS (CASE WHEN s_w_id BETWEEN 0 AND 1665 THEN 100 WHEN s_w_id BETWEEN 1666 AND 3332 THEN 200 WHEN s_w_id BETWEEN 3333 AND 4999 THEN 300 END) STORED,
CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, s_w_id ASC, s_i_id ASC)
);
CREATE TABLE order_line (
ol_o_id INT8 NOT NULL,
ol_d_id INT8 NOT NULL,
ol_w_id INT8 NOT NULL,
ol_number INT8 NOT NULL,
ol_i_id INT8 NOT NULL,
crdb_region INT NOT VISIBLE NOT NULL AS (CASE WHEN ol_w_id BETWEEN 0 AND 1665 THEN 400 WHEN ol_w_id BETWEEN 1666 AND 3332 THEN 500 WHEN ol_w_id BETWEEN 3333 AND 4999 THEN 600 END) STORED,
CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, ol_w_id ASC, ol_d_id ASC, ol_o_id DESC, ol_number ASC)
);
ALTER TABLE stock INJECT STATISTICS '[
{
"columns": [
"s_i_id"
],
"created_at": "2021-04-13 19:20:01.481197",
"distinct_count": 100000,
"histo_col_type": "",
"name": "__auto__",
"null_count": 0,
"row_count": 500000000
},
{
"columns": [
"s_w_id"
],
"created_at": "2021-04-13 19:20:01.481198",
"distinct_count": 5000,
"histo_col_type": "",
"name": "__auto__",
"null_count": 0,
"row_count": 500000000
},
{
"columns": [
"s_quantity"
],
"created_at": "2021-04-13 19:20:01.481199",
"distinct_count": 91,
"histo_col_type": "",
"name": "__auto__",
"null_count": 0,
"row_count": 500000000
}
]';
EXPLAIN (opt) SELECT
count(*)
FROM
(
SELECT
DISTINCT s_i_id
FROM
order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
WHERE
ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
);
info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
scalar-group-by
├── columns: count:15
├── cardinality: [1 - 1]
├── stats: [rows=1]
├── cost: 8.4219795
├── key: ()
├── fd: ()-->(15)
├── prune: (15)
├── distinct-on
│ ├── columns: s_i_id:9
│ ├── grouping columns: s_i_id:9
│ ├── stats: [rows=0.817973447, distinct(9)=0.817973447, null(9)=0]
│ ├── cost: 8.38379977
│ ├── key: (9)
│ └── inner-join (lookup stock)
│ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 s_i_id:9 s_w_id:10 s_quantity:11
│ ├── key columns: [17 3 5] = [12 10 9]
│ ├── lookup columns are key
│ ├── stats: [rows=1, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(9)=0.817973447, null(9)=0, distinct(10)=0.821, null(10)=0]
│ ├── cost: 8.33562004
│ ├── fd: ()-->(2,3,10), (9)-->(11), (5)==(9), (9)==(5), (3)==(10), (10)==(3)
│ ├── project
│ │ ├── columns: "lookup_join_const_col_@12":17 ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
│ │ ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(17)=0.821, null(17)=0]
│ │ ├── cost: 5.02162
│ │ ├── fd: ()-->(2,3,17)
│ │ ├── scan order_line
│ │ │ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
│ │ │ ├── constraint: /6/3/2/-1/4: [/400/1/6/3022 - /400/1/6/3003]
│ │ │ ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(2,3)=0.821, null(2,3)=0, distinct(1-3)=0.821, null(1-3)=0]
│ │ │ ├── cost: 4.9952
│ │ │ ├── fd: ()-->(2,3)
│ │ │ └── prune: (5)
│ │ └── projections
│ │ └── 100 [as="lookup_join_const_col_@12":17]
│ └── filters
│ ├── s_w_id:10 = 1 [outer=(10), constraints=(/10: [/1 - /1]; tight), fd=()-->(10)]
│ └── s_quantity:11 < 14 [outer=(11), constraints=(/11: (/NULL - /13]; tight)]
└── aggregations
└── count-rows [as=count_rows:15] Notice the project of However, when I switched back to the demo@127.0.0.1:26257/movr> CREATE TYPE region AS ENUM ('europe-west2','us-east1','us-west1');
CREATE TABLE stock (
s_i_id INT8 NOT NULL,
s_w_id INT8 NOT NULL,
s_quantity INT8 NULL,
crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN s_w_id BETWEEN 0 AND 1665 THEN 'us-east1' WHEN s_w_id BETWEEN 1666 AND 3332 THEN 'us-west1' WHEN s_w_id BETWEEN 3333 AND 4999 THEN 'europe-west2' END) STORED,
CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, s_w_id ASC, s_i_id ASC)
);
CREATE TABLE order_line (
ol_o_id INT8 NOT NULL,
ol_d_id INT8 NOT NULL,
ol_w_id INT8 NOT NULL,
ol_number INT8 NOT NULL,
ol_i_id INT8 NOT NULL,
crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN ol_w_id BETWEEN 0 AND 1665 THEN 'us-west1' WHEN ol_w_id BETWEEN 1666 AND 3332 THEN 'europe-west2' WHEN ol_w_id BETWEEN 3333 AND 4999 THEN 'us-east1' END) STORED,
CONSTRAINT "primary" PRIMARY KEY (crdb_region ASC, ol_w_id ASC, ol_d_id ASC, ol_o_id DESC, ol_number ASC)
);
ALTER TABLE stock INJECT STATISTICS '[
{
"columns": [
"s_i_id"
],
"created_at": "2021-04-13 19:20:01.481197",
"distinct_count": 100000,
"histo_col_type": "",
"name": "__auto__",
"null_count": 0,
"row_count": 500000000
},
{
"columns": [
"s_w_id"
],
"created_at": "2021-04-13 19:20:01.481198",
"distinct_count": 5000,
"histo_col_type": "",
"name": "__auto__",
"null_count": 0,
"row_count": 500000000
},
{
"columns": [
"s_quantity"
],
"created_at": "2021-04-13 19:20:01.481199",
"distinct_count": 91,
"histo_col_type": "",
"name": "__auto__",
"null_count": 0,
"row_count": 500000000
}
]';
EXPLAIN (opt,verbose) SELECT
count(*)
FROM
(
SELECT
DISTINCT s_i_id
FROM
order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
WHERE
ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
);
info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
scalar-group-by
├── columns: count:15
├── cardinality: [1 - 1]
├── stats: [rows=1]
├── cost: 15.0900571
├── key: ()
├── fd: ()-->(15)
├── prune: (15)
├── distinct-on
│ ├── columns: s_i_id:9
│ ├── grouping columns: s_i_id:9
│ ├── stats: [rows=0.817973447, distinct(9)=0.817973447, null(9)=0]
│ ├── cost: 15.0518773
│ ├── key: (9)
│ └── inner-join (lookup stock)
│ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 s_i_id:9 s_w_id:10 s_quantity:11
│ ├── key columns: [17 3 5] = [12 10 9]
│ ├── lookup columns are key
│ ├── stats: [rows=1, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(9)=0.817973447, null(9)=0, distinct(10)=0.821, null(10)=0]
│ ├── cost: 15.0036976
│ ├── fd: ()-->(2,3,10), (9)-->(11), (5)==(9), (9)==(5), (3)==(10), (10)==(3)
│ ├── inner-join (cross)
│ │ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 "lookup_join_const_col_@12":17
│ │ ├── multiplicity: left-rows(zero-or-more), right-rows(one-or-more)
│ │ ├── stats: [rows=2.463, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(17)=2.463, null(17)=0]
│ │ ├── cost: 5.1216975
│ │ ├── fd: ()-->(2,3)
│ │ ├── values
│ │ │ ├── columns: "lookup_join_const_col_@12":17
│ │ │ ├── cardinality: [3 - 3]
│ │ │ ├── stats: [rows=3, distinct(17)=3, null(17)=0]
│ │ │ ├── cost: 0.04
│ │ │ ├── ('europe-west2',)
│ │ │ ├── ('us-east1',)
│ │ │ └── ('us-west1',)
│ │ ├── scan order_line
│ │ │ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
│ │ │ ├── constraint: /6/3/2/-1/4: [/'us-west1'/1/6/3022 - /'us-west1'/1/6/3003]
│ │ │ ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(2,3)=0.821, null(2,3)=0, distinct(1-3)=0.821, null(1-3)=0]
│ │ │ ├── cost: 4.9952
│ │ │ ├── fd: ()-->(2,3)
│ │ │ └── prune: (5)
│ │ └── filters (true)
│ └── filters
│ ├── s_w_id:10 = 1 [outer=(10), constraints=(/10: [/1 - /1]; tight), fd=()-->(10)]
│ └── s_quantity:11 < 14 [outer=(11), constraints=(/11: (/NULL - /13]; tight)]
└── aggregations
└── count-rows [as=count_rows:15] Notice the With a bit of code instrumentation, I found that inside of The problem was that To demonstrate this, a made the following patch and the problem disappeared. With it, we correctly constrained the --- a/pkg/sql/opt/xform/join_funcs.go
+++ b/pkg/sql/opt/xform/join_funcs.go
@@ -223,9 +223,9 @@ func (c *CustomFuncs) GenerateLookupJoins(
// Generate implicit filters from CHECK constraints and computed columns as
// optional filters to help generate lookup join keys.
- optionalFilters := c.checkConstraintFilters(scanPrivate.Table)
- computedColFilters := c.computedColFilters(scanPrivate, on, optionalFilters)
- optionalFilters = append(optionalFilters, computedColFilters...)
+ optionalFiltersBefore := c.checkConstraintFilters(scanPrivate.Table)
+ computedColFilters := c.computedColFilters(scanPrivate, on, optionalFiltersBefore)
+ optionalFilters := append(computedColFilters, optionalFiltersBefore...)
var pkCols opt.ColList
var iter scanIndexIter demo@127.0.0.1:26257/movr> EXPLAIN (opt,verbose) SELECT
count(*)
FROM
(
SELECT
DISTINCT s_i_id
FROM
order_line JOIN stock ON s_i_id = ol_i_id AND s_w_id = ol_w_id
WHERE
ol_w_id = 1 AND ol_d_id = 6 AND ol_o_id BETWEEN (3023 - 20) AND (3023 - 1) AND s_quantity < 14
);
info
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
scalar-group-by
├── columns: count:15
├── cardinality: [1 - 1]
├── stats: [rows=1]
├── cost: 8.4219795
├── key: ()
├── fd: ()-->(15)
├── prune: (15)
├── distinct-on
│ ├── columns: s_i_id:9
│ ├── grouping columns: s_i_id:9
│ ├── stats: [rows=0.817973447, distinct(9)=0.817973447, null(9)=0]
│ ├── cost: 8.38379977
│ ├── key: (9)
│ └── inner-join (lookup stock)
│ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5 s_i_id:9 s_w_id:10 s_quantity:11
│ ├── key columns: [17 3 5] = [12 10 9]
│ ├── lookup columns are key
│ ├── stats: [rows=1, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(9)=0.817973447, null(9)=0, distinct(10)=0.821, null(10)=0]
│ ├── cost: 8.33562004
│ ├── fd: ()-->(2,3,10), (9)-->(11), (5)==(9), (9)==(5), (3)==(10), (10)==(3)
│ ├── project
│ │ ├── columns: "lookup_join_const_col_@12":17 ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
│ │ ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(17)=0.821, null(17)=0]
│ │ ├── cost: 5.02162
│ │ ├── fd: ()-->(2,3,17)
│ │ ├── scan order_line
│ │ │ ├── columns: ol_o_id:1 ol_d_id:2 ol_w_id:3 ol_i_id:5
│ │ │ ├── constraint: /6/3/2/-1/4: [/'us-west1'/1/6/3022 - /'us-west1'/1/6/3003]
│ │ │ ├── stats: [rows=0.821, distinct(1)=0.821, null(1)=0, distinct(2)=0.821, null(2)=0, distinct(3)=0.821, null(3)=0, distinct(5)=0.817973447, null(5)=0, distinct(2,3)=0.821, null(2,3)=0, distinct(1-3)=0.821, null(1-3)=0]
│ │ │ ├── cost: 4.9952
│ │ │ ├── fd: ()-->(2,3)
│ │ │ └── prune: (5)
│ │ └── projections
│ │ └── 'us-east1' [as="lookup_join_const_col_@12":17]
│ └── filters
│ ├── s_w_id:10 = 1 [outer=(10), constraints=(/10: [/1 - /1]; tight), fd=()-->(10)]
│ └── s_quantity:11 < 14 [outer=(11), constraints=(/11: (/NULL - /13]; tight)]
└── aggregations
└── count-rows [as=count_rows:15] So the issue seems to be that Should we be choosing the matching filter that results in the smallest number of diff --git a/pkg/sql/opt/xform/join_funcs.go b/pkg/sql/opt/xform/join_funcs.go
index cba1be97fa..8b43041514 100644
--- a/pkg/sql/opt/xform/join_funcs.go
+++ b/pkg/sql/opt/xform/join_funcs.go
@@ -906,13 +906,15 @@ func (c *CustomFuncs) mapInvertedJoin(
}
// findJoinFilterConstants tries to find a filter that is exactly equivalent to
-// constraining the given column to a constant value or a set of constant
-// values. If successful, the constant values and the index of the constraining
-// FiltersItem are returned. Note that the returned constant values do not
-// contain NULL.
+// constraining the given column to a constant value or a set of constant values
+// and minimizes the number of constant values. If successful, the constant
+// values and the index of the constraining FiltersItem are returned. Note that
+// the returned constant values do not contain NULL.
func (c *CustomFuncs) findJoinFilterConstants(
filters memo.FiltersExpr, col opt.ColumnID,
) (values tree.Datums, filterIdx int, ok bool) {
+ var bestValues tree.Datums
+ var bestFilterIdx int
for filterIdx := range filters {
props := filters[filterIdx].ScalarProps()
if props.TightConstraints {
@@ -927,12 +929,16 @@ func (c *CustomFuncs) findJoinFilterConstants(
break
}
}
- if !hasNull {
- return constVals, filterIdx, true
+ if !hasNull && (bestValues == nil || len(bestValues) > len(constVals)) {
+ bestValues = constVals
+ bestFilterIdx = filterIdx
}
}
}
- return nil, -1, false
+ if bestValues == nil {
+ return nil, -1, false
+ }
+ return bestValues, bestFilterIdx, true
}
// constructJoinWithConstants constructs a cross join that joins every row in cc. @mgartner as you appear to be the expert here. |
Unfortunately, this did not seem to help the foreign key checks. From my reading of the EXPLAIN output, we aren't even pushing the constant non-computed columns through to the other side of that lookup join (see "lookup expression"): root@localhost:26257/tpcc> EXPLAIN (opt, verbose) INSERT INTO history (h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES (2057, 4, 3, 4, 3, 2100.9, '2021-04-15 15:22:14', 'test data');
info
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
insert history
├── columns: <none>
├── insert-mapping:
│ ├── column21:21 => rowid:1
│ ├── column1:13 => history.h_c_id:2
│ ├── column2:14 => history.h_c_d_id:3
│ ├── column3:15 => history.h_c_w_id:4
│ ├── column4:16 => history.h_d_id:5
│ ├── column5:17 => history.h_w_id:6
│ ├── column7:19 => h_date:7
│ ├── h_amount:22 => history.h_amount:8
│ ├── column8:20 => h_data:9
│ └── column23:23 => history.crdb_region:10
├── check columns: check1:24
├── input binding: &1
├── cardinality: [0 - 0]
├── volatile, mutations
├── stats: [rows=0]
├── cost: 8.0713654
├── values
│ ├── columns: column1:13 column2:14 column3:15 column4:16 column5:17 column7:19 column8:20 column21:21 h_amount:22 column23:23 check1:24
│ ├── cardinality: [1 - 1]
│ ├── volatile
│ ├── stats: [rows=1, distinct(13)=1, null(13)=0, distinct(14)=1, null(14)=0, distinct(15)=1, null(15)=0, distinct(16)=1, null(16)=0, distinct(17)=1, null(17)=0]
│ ├── cost: 0.02
│ ├── key: ()
│ ├── fd: ()-->(13-17,19-24)
│ ├── prune: (13-17,19-24)
│ └── (2057, 4, 3, 4, 3, '2021-04-15 15:22:14', 'test data', gen_random_uuid(), 2100.9, 'us-east1', true)
└── f-k-checks
├── f-k-checks-item: history(h_c_w_id,h_c_d_id,h_c_id) -> customer(c_w_id,c_d_id,c_id)
│ └── anti-join (lookup customer)
│ ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
│ ├── lookup expression
│ │ └── filters
│ │ ├── h_c_w_id:25 = c_w_id:30 [outer=(25,30), constraints=(/25: (/NULL - ]; /30: (/NULL - ]), fd=(25)==(30), (30)==(25)]
│ │ ├── h_c_d_id:26 = c_d_id:29 [outer=(26,29), constraints=(/26: (/NULL - ]; /29: (/NULL - ]), fd=(26)==(29), (29)==(26)]
│ │ ├── h_c_id:27 = c_id:28 [outer=(27,28), constraints=(/27: (/NULL - ]; /28: (/NULL - ]), fd=(27)==(28), (28)==(27)]
│ │ └── customer.crdb_region:49 IN ('europe-west2', 'us-east1', 'us-west1') [outer=(49), constraints=(/49: [/'europe-west2' - /'europe-west2'] [/'us-east1' - /'us-east1'] [/'us-west1' - /'us-west1']; tight)]
│ ├── lookup columns are key
│ ├── cardinality: [0 - 1]
│ ├── stats: [rows=1e-10]
│ ├── cost: 4.02000049
│ ├── key: ()
│ ├── fd: ()-->(25-27)
│ ├── cte-uses
│ │ └── &1: count=1 used-columns=(13-15)
│ ├── with-scan &1
│ │ ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
│ │ ├── mapping:
│ │ │ ├── column3:15 => h_c_w_id:25
│ │ │ ├── column2:14 => h_c_d_id:26
│ │ │ └── column1:13 => h_c_id:27
│ │ ├── cardinality: [1 - 1]
│ │ ├── stats: [rows=1, distinct(25)=1, null(25)=0, distinct(26)=1, null(26)=0, distinct(27)=1, null(27)=0]
│ │ ├── cost: 0.01
│ │ ├── key: ()
│ │ ├── fd: ()-->(25-27)
│ │ └── cte-uses
│ │ └── &1: count=1 used-columns=(13-15)
│ └── filters (true)
└── f-k-checks-item: history(h_w_id,h_d_id) -> district(d_w_id,d_id)
└── anti-join (lookup district)
├── columns: h_w_id:52 h_d_id:53
├── lookup expression
│ └── filters
│ ├── h_w_id:52 = d_w_id:55 [outer=(52,55), constraints=(/52: (/NULL - ]; /55: (/NULL - ]), fd=(52)==(55), (55)==(52)]
│ ├── h_d_id:53 = d_id:54 [outer=(53,54), constraints=(/53: (/NULL - ]; /54: (/NULL - ]), fd=(53)==(54), (54)==(53)]
│ └── district.crdb_region:65 IN ('europe-west2', 'us-east1', 'us-west1') [outer=(65), constraints=(/65: [/'europe-west2' - /'europe-west2'] [/'us-east1' - /'us-east1'] [/'us-west1' - /'us-west1']; tight)]
├── lookup columns are key
├── cardinality: [0 - 1]
├── stats: [rows=1e-10]
├── cost: 4.02136491
├── key: ()
├── fd: ()-->(52,53)
├── cte-uses
│ └── &1: count=1 used-columns=(16,17)
├── with-scan &1
│ ├── columns: h_w_id:52 h_d_id:53
│ ├── mapping:
│ │ ├── column5:17 => h_w_id:52
│ │ └── column4:16 => h_d_id:53
│ ├── cardinality: [1 - 1]
│ ├── stats: [rows=1, distinct(52)=1, null(52)=0, distinct(53)=1, null(53)=0]
│ ├── cost: 0.01
│ ├── key: ()
│ ├── fd: ()-->(52,53)
│ └── cte-uses
│ └── &1: count=1 used-columns=(16,17)
└── filters (true) |
Here's an easy way to reproduce this foreign-key behavior locally: SET experimental_enable_unique_without_index_constraints = true;
CREATE TYPE region AS ENUM ('europe-west2','us-east1','us-west1');
CREATE TABLE district (
d_id INT8 NOT NULL,
d_w_id INT8 NOT NULL,
crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN d_w_id BETWEEN 0 AND 1665 THEN 'us-east1' WHEN d_w_id BETWEEN 1666 AND 3332 THEN 'us-west1' WHEN d_w_id BETWEEN 3333 AND 4999 THEN 'europe-west2' END) STORED,
CONSTRAINT "primary" PRIMARY KEY (crdb_region, d_w_id, d_id),
UNIQUE WITHOUT INDEX (d_w_id, d_id)
);
CREATE TABLE history (
rowid UUID NOT NULL DEFAULT gen_random_uuid(),
h_d_id INT8 NOT NULL,
h_w_id INT8 NOT NULL,
crdb_region region NOT VISIBLE NOT NULL AS (CASE WHEN h_w_id BETWEEN 0 AND 1665 THEN 'us-west1' WHEN h_w_id BETWEEN 1666 AND 3332 THEN 'europe-west2' WHEN h_w_id BETWEEN 3333 AND 4999 THEN 'us-east1' END) STORED,
CONSTRAINT "primary" PRIMARY KEY (crdb_region, h_w_id, rowid),
CONSTRAINT fk_h_w_id_ref_district FOREIGN KEY (h_w_id, h_d_id) REFERENCES district(d_w_id, d_id) NOT VALID
);
EXPLAIN (opt,verbose) INSERT INTO history (h_d_id, h_w_id) VALUES (4, 3);
info
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
insert history
├── columns: <none>
├── insert-mapping:
│ ├── column9:9 => rowid:1
│ ├── column1:7 => history.h_d_id:2
│ ├── column2:8 => history.h_w_id:3
│ └── column10:10 => history.crdb_region:4
├── check columns: check1:11
├── input binding: &1
├── cardinality: [0 - 0]
├── volatile, mutations
├── stats: [rows=0]
├── cost: 4.05639
├── values
│ ├── columns: column1:7 column2:8 column9:9 column10:10 check1:11
│ ├── cardinality: [1 - 1]
│ ├── volatile
│ ├── stats: [rows=1, distinct(7)=1, null(7)=0, distinct(8)=1, null(8)=0]
│ ├── cost: 0.02
│ ├── key: ()
│ ├── fd: ()-->(7-11)
│ ├── prune: (7-11)
│ └── (4, 3, gen_random_uuid(), 'us-west1', true)
└── f-k-checks
└── f-k-checks-item: history(h_w_id,h_d_id) -> district(d_w_id,d_id)
└── anti-join (lookup district)
├── columns: h_w_id:12 h_d_id:13
├── lookup expression
│ └── filters
│ ├── h_w_id:12 = d_w_id:15 [outer=(12,15), constraints=(/12: (/NULL - ]; /15: (/NULL - ]), fd=(12)==(15), (15)==(12)]
│ ├── h_d_id:13 = d_id:14 [outer=(13,14), constraints=(/13: (/NULL - ]; /14: (/NULL - ]), fd=(13)==(14), (14)==(13)]
│ └── district.crdb_region:16 IN ('europe-west2', 'us-east1', 'us-west1') [outer=(16), constraints=(/16: [/'europe-west2' - /'europe-west2'] [/'us-east1' - /'us-east1'] [/'us-west1' - /'us-west1']; tight)]
├── lookup columns are key
├── cardinality: [0 - 1]
├── stats: [rows=1e-10]
├── cost: 4.02639
├── key: ()
├── fd: ()-->(12,13)
├── cte-uses
│ └── &1: count=1 used-columns=(7,8)
├── with-scan &1
│ ├── columns: h_w_id:12 h_d_id:13
│ ├── mapping:
│ │ ├── column2:8 => h_w_id:12
│ │ └── column1:7 => h_d_id:13
│ ├── cardinality: [1 - 1]
│ ├── stats: [rows=1, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0]
│ ├── cost: 0.01
│ ├── key: ()
│ ├── fd: ()-->(12,13)
│ └── cte-uses
│ └── &1: count=1 used-columns=(7,8)
└── filters (true) |
…join This commit stores two new columns in the order@order_idx so that the UPDATE performed by the Delivery transaction does not need to perform an index join. This was discovered while exploring cockroachdb#63735.
I take this back. After I recaptured table statistics for all tables, the foreign key lookups all seemed to properly derive tighter spans through their computed columns. Here's what the previous plan now looks like: root@localhost:26257/tpcc> EXPLAIN (opt, verbose) INSERT INTO history (h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES (2057, 4, 3, 4, 3, 2100.9, '2021-04-15 15:22:14', 'test data');
info
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
insert history
├── columns: <none>
├── insert-mapping:
│ ├── column21:21 => rowid:1
│ ├── column1:13 => history.h_c_id:2
│ ├── column2:14 => history.h_c_d_id:3
│ ├── column3:15 => history.h_c_w_id:4
│ ├── column4:16 => history.h_d_id:5
│ ├── column5:17 => history.h_w_id:6
│ ├── column7:19 => h_date:7
│ ├── h_amount:22 => history.h_amount:8
│ ├── column8:20 => h_data:9
│ └── column23:23 => history.crdb_region:10
├── check columns: check1:24
├── input binding: &1
├── cardinality: [0 - 0]
├── volatile, mutations
├── stats: [rows=0]
├── cost: 8.60451651
├── values
│ ├── columns: column1:13 column2:14 column3:15 column4:16 column5:17 column7:19 column8:20 column21:21 h_amount:22 column23:23 check1:24
│ ├── cardinality: [1 - 1]
│ ├── volatile
│ ├── stats: [rows=1, distinct(13)=1, null(13)=0, distinct(14)=1, null(14)=0, distinct(15)=1, null(15)=0, distinct(16)=1, null(16)=0, distinct(17)=1, null(17)=0]
│ ├── cost: 0.02
│ ├── key: ()
│ ├── fd: ()-->(13-17,19-24)
│ ├── prune: (13-17,19-24)
│ └── (2057, 4, 3, 4, 3, '2021-04-15 15:22:14', 'test data', gen_random_uuid(), 2100.9, 'us-east1', true)
└── f-k-checks
├── f-k-checks-item: history(h_c_w_id,h_c_d_id,h_c_id) -> customer(c_w_id,c_d_id,c_id)
│ └── anti-join (lookup customer)
│ ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
│ ├── lookup expression
│ │ └── filters
│ │ ├── h_c_w_id:25 = c_w_id:30 [outer=(25,30), constraints=(/25: (/NULL - ]; /30: (/NULL - ]), fd=(25)==(30), (30)==(25)]
│ │ ├── h_c_d_id:26 = c_d_id:29 [outer=(26,29), constraints=(/26: (/NULL - ]; /29: (/NULL - ]), fd=(26)==(29), (29)==(26)]
│ │ ├── h_c_id:27 = c_id:28 [outer=(27,28), constraints=(/27: (/NULL - ]; /28: (/NULL - ]), fd=(27)==(28), (28)==(27)]
│ │ └── customer.crdb_region:49 IN ('europe-west2', 'us-west1') [outer=(49), constraints=(/49: [/'europe-west2' - /'europe-west2'] [/'us-west1' - /'us-west1']; tight)]
│ ├── lookup columns are key
│ ├── cardinality: [0 - 1]
│ ├── stats: [rows=1e-10]
│ ├── cost: 4.32545852
│ ├── key: ()
│ ├── fd: ()-->(25-27)
│ ├── cte-uses
│ │ └── &1: count=1 used-columns=(13-15)
│ ├── anti-join (lookup customer)
│ │ ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
│ │ ├── lookup expression
│ │ │ └── filters
│ │ │ ├── h_c_w_id:25 = c_w_id:30 [outer=(25,30), constraints=(/25: (/NULL - ]; /30: (/NULL - ]), fd=(25)==(30), (30)==(25)]
│ │ │ ├── h_c_d_id:26 = c_d_id:29 [outer=(26,29), constraints=(/26: (/NULL - ]; /29: (/NULL - ]), fd=(26)==(29), (29)==(26)]
│ │ │ ├── h_c_id:27 = c_id:28 [outer=(27,28), constraints=(/27: (/NULL - ]; /28: (/NULL - ]), fd=(27)==(28), (28)==(27)]
│ │ │ └── customer.crdb_region:49 = 'us-east1' [outer=(49), immutable, constraints=(/49: [/'us-east1' - /'us-east1']; tight), fd=()-->(49)]
│ │ ├── lookup columns are key
│ │ ├── cardinality: [0 - 1]
│ │ ├── stats: [rows=0.666666667, distinct(25)=0.666666667, null(25)=0, distinct(26)=0.666666667, null(26)=0, distinct(27)=0.666666667, null(27)=0]
│ │ ├── cost: 2.43233936
│ │ ├── key: ()
│ │ ├── fd: ()-->(25-27)
│ │ ├── with-scan &1
│ │ │ ├── columns: h_c_w_id:25 h_c_d_id:26 h_c_id:27
│ │ │ ├── mapping:
│ │ │ │ ├── column3:15 => h_c_w_id:25
│ │ │ │ ├── column2:14 => h_c_d_id:26
│ │ │ │ └── column1:13 => h_c_id:27
│ │ │ ├── cardinality: [1 - 1]
│ │ │ ├── stats: [rows=1, distinct(25)=1, null(25)=0, distinct(26)=1, null(26)=0, distinct(27)=1, null(27)=0]
│ │ │ ├── cost: 0.01
│ │ │ ├── key: ()
│ │ │ ├── fd: ()-->(25-27)
│ │ │ └── cte-uses
│ │ │ └── &1: count=1 used-columns=(13-15)
│ │ └── filters (true)
│ └── filters (true)
└── f-k-checks-item: history(h_w_id,h_d_id) -> district(d_w_id,d_id)
└── anti-join (lookup district)
├── columns: h_w_id:52 h_d_id:53
├── lookup expression
│ └── filters
│ ├── h_w_id:52 = d_w_id:55 [outer=(52,55), constraints=(/52: (/NULL - ]; /55: (/NULL - ]), fd=(52)==(55), (55)==(52)]
│ ├── h_d_id:53 = d_id:54 [outer=(53,54), constraints=(/53: (/NULL - ]; /54: (/NULL - ]), fd=(53)==(54), (54)==(53)]
│ └── district.crdb_region:65 IN ('europe-west2', 'us-west1') [outer=(65), constraints=(/65: [/'europe-west2' - /'europe-west2'] [/'us-west1' - /'us-west1']; tight)]
├── lookup columns are key
├── cardinality: [0 - 1]
├── stats: [rows=1e-10]
├── cost: 4.24905799
├── key: ()
├── fd: ()-->(52,53)
├── cte-uses
│ └── &1: count=1 used-columns=(16,17)
├── anti-join (lookup district)
│ ├── columns: h_w_id:52 h_d_id:53
│ ├── lookup expression
│ │ └── filters
│ │ ├── h_w_id:52 = d_w_id:55 [outer=(52,55), constraints=(/52: (/NULL - ]; /55: (/NULL - ]), fd=(52)==(55), (55)==(52)]
│ │ ├── h_d_id:53 = d_id:54 [outer=(53,54), constraints=(/53: (/NULL - ]; /54: (/NULL - ]), fd=(53)==(54), (54)==(53)]
│ │ └── district.crdb_region:65 = 'us-east1' [outer=(65), immutable, constraints=(/65: [/'us-east1' - /'us-east1']; tight), fd=()-->(65)]
│ ├── lookup columns are key
│ ├── cardinality: [0 - 1]
│ ├── stats: [rows=0.666666667, distinct(52)=0.666666667, null(52)=0, distinct(53)=0.666666667, null(53)=0]
│ ├── cost: 2.39959628
│ ├── key: ()
│ ├── fd: ()-->(52,53)
│ ├── with-scan &1
│ │ ├── columns: h_w_id:52 h_d_id:53
│ │ ├── mapping:
│ │ │ ├── column5:17 => h_w_id:52
│ │ │ └── column4:16 => h_d_id:53
│ │ ├── cardinality: [1 - 1]
│ │ ├── stats: [rows=1, distinct(52)=1, null(52)=0, distinct(53)=1, null(53)=0]
│ │ ├── cost: 0.01
│ │ ├── key: ()
│ │ ├── fd: ()-->(52,53)
│ │ └── cte-uses
│ │ └── &1: count=1 used-columns=(16,17)
│ └── filters (true)
└── filters (true) And with that plus the patch from earlier, we get entirely region-local TPC-C transactions! (concurrently) ➜ rp ssh nathan-1618341380-01-n12cpu4-geo:4 -- './cockroach workload run tpcc --warehouses=5000 --active-warehouses=300 --partitions=3 --partition-affinity=0 --duration=10m {pgurl:1-3}'
...
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1274 2.1 106.0 79.7 226.5 704.6 973.1 delivery
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 12489 20.8 102.6 39.8 151.0 1946.2 2550.1 newOrder
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1251 2.1 17.7 8.4 35.7 318.8 385.9 orderStatus
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 12786 21.3 49.6 21.0 92.3 973.1 1946.2 payment
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1277 2.1 21.1 13.6 35.7 302.0 352.3 stockLevel
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
600.0s 0 29077 48.5 72.2 30.4 125.8 1610.6 2550.1
_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
600.0s 1248.9 32.4% 102.6 39.8 67.1 151.0 1946.2 2550.1 ➜ rp ssh nathan-1618341380-01-n12cpu4-geo:8 -- './cockroach workload run tpcc --warehouses=5000 --active-warehouses=300 --partitions=3 --partition-affinity=1 --duration=10m {pgurl:5-7}'
...
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1279 2.1 131.7 83.9 201.3 1610.6 2818.6 delivery
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 12556 20.9 117.2 37.7 109.1 2684.4 3489.7 newOrder
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1267 2.1 23.3 9.4 31.5 436.2 604.0 orderStatus
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 12642 21.1 74.1 19.9 65.0 1811.9 2080.4 payment
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1282 2.1 48.2 16.3 71.3 973.1 1140.9 stockLevel
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
600.0s 0 29026 48.4 91.9 31.5 104.9 2281.7 3489.7
_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
600.0s 1255.6 32.5% 117.2 37.7 56.6 109.1 2684.4 3489.7 ➜ rp ssh nathan-1618341380-01-n12cpu4-geo:12 -- './cockroach workload run tpcc --warehouses=5000 --active-warehouses=300 --partitions=3 --partition-affinity=2 --duration=10m {pgurl:9-11}'
...
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1276 2.1 132.2 71.3 201.3 1744.8 3087.0 delivery
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 12484 20.8 127.4 33.6 113.2 2818.6 4026.5 newOrder
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1290 2.1 26.9 7.3 33.6 570.4 738.2 orderStatus
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 12713 21.2 61.1 16.8 65.0 1610.6 2818.6 payment
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
600.0s 0 1275 2.1 48.4 21.0 71.3 872.4 1140.9 stockLevel
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
600.0s 0 29038 48.4 90.6 29.4 104.9 2684.4 4026.5
_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
600.0s 1248.4 32.4% 127.4 33.6 60.8 113.2 2818.6 4026.5 For a combined efficiency of And for reference, here is what one of those |
Thanks for all your work on this, @nvanbenschoten! I'll be working on this tomorrow so I'll be able to respond to all of these ideas then. |
Awesome sleuthing, @nvanbenschoten!! I think your diff above to fix The second issue with the foreign key checks seems to be because we haven't costed locality optimized anti joins aggressively enough. I'm glad that recollecting stats seemed to fix the issue, but I think we want to ensure that we always choose locality optimized anti joins when possible (at least until the optimizer is more distribution-aware). I will submit a PR to fix that issue. One question (unrelated to this issue): I noticed that in the bundle for the first query, there were no stats for the |
This commit reduces the cost of locality optimized anti join to to ensure that we always choose it over a non-locality-optimized anti join when possible. Informs cockroachdb#63735 Release note (performance improvement): Adjusted the estimated cost of locality optimized anti joins in the optimizer so that they are always chosen over non-locality-optimized anti joins when possible. This makes it more likely that queries involving anti joins (such as inserts with foreign key checks) can avoid visting remote regions. This results in lower latency.
+1
It may not have been immediately obvious, but this was a brilliant debugging step which highlighted the issue. Perhaps confusing because the issue was less about the type of column and more about the fact that ENUMs come with an implicit CHECK constraint. |
Thanks for taking a look @rytaft and @mgartner!
Sure, I'll send out the patch, along with some testing that demonstrates the impact.
Sounds great. #63830 looks like a good change. What's interesting about this situation that I hadn't noticed before is that locality-optimized search is saving us here. I had been assuming that with fresh stats we were suddenly able to infer the single I'm not quite sure how to think about this. On one hand, it would be nice to derive the Maybe we just open another issue to track the longer-term item of deriving the value of the computed column in these foreign key lookup queries? What do you think?
I'm not totally sure. This dataset was generated using IMPORT, so it should have used the pre-calculated stats in Also, as a follow-up to all of this, I switched the database over to
We do see p50 latency increase by more than the replication latency (~65ms). I believe this is explained by a few replication pipeline stalls during foreign key checks on recently written rows in some of these transactions. The use of column families, which were added in #30624, is known to avoid most of these. Unfortunately, it's not easy to add column families after the fact to confirm that. |
Yea, let's open an issue. It's a bit tricky since the foreign key check scans the buffer from the mutation output rather than using the constant values directly, so we'd need to somehow detect cases where we can "inline" the constant values rather than scanning from the buffer. I opened #63882 -- feel free to add any additional color there.
They shouldn't ever be discarded. I did a bit of poking around, and it seems like we only inject stats with |
63830: opt: reduce cost of locality optimized anti join r=rytaft a=rytaft This commit reduces the cost of locality optimized anti join to to ensure that we always choose it over a non-locality-optimized anti join when possible. Informs #63735 Release note (performance improvement): Adjusted the estimated cost of locality optimized anti joins in the optimizer so that they are always chosen over non-locality-optimized anti joins when possible. This makes it more likely that queries involving anti joins (such as inserts with foreign key checks) can avoid visting remote regions. This results in lower latency. Co-authored-by: Rebecca Taft <[email protected]>
Hey @nvanbenschoten, I'm assigning this issue back to you to open the PR with your diff from above. Also let me know your thoughts about my TPC-C stats question above whenever you get a chance. |
Fixes cockroachdb#63735. This commit addresses cockroachdb#63735 by reworking how constant filters are selected when being applied to lookup joins. findJoinFilterConstants is adjusted to return the filter that minimizes the number of returned values, instead of returning the first matching filter that it finds. This affects lookup joins which have constant filters due to both CHECK constraints and computed column constraints, as we saw in cockroachdb#63735. This has an effect on both standard and inverted lookup joins, as the new test cases demonstrate. For standard lookup joins, a plan which used to look like: ```sql SELECT * FROM small INNER LOOKUP JOIN t63735 ON n = y WHERE m = 5 AND n = 15 ---- inner-join (lookup t63735) ├── flags: force lookup join (into right side) ├── lookup columns are key ├── inner-join (cross) │ ├── values │ │ ├── (10,) │ │ ├── (20,) │ │ └── (30,) │ ├── select │ │ ├── scan small │ │ └── filters │ │ ├── n = 15 │ │ └── m = 5 │ └── filters (true) └── filters └── y = 15 ``` now looks like: ```sql inner-join (lookup t63735) ├── flags: force lookup join (into right side) ├── lookup columns are key ├── project │ ├── select │ │ ├── scan small │ │ └── filters │ │ ├── n = 15 │ │ └── m = 5 │ └── projections │ └── 30 └── filters └── y = 15 ``` For inverted lookup joins, a plan which used to look like: ```sql SELECT * FROM t63735_a AS a INNER INVERTED JOIN t63735_b AS b ON a.k = 15 AND b.j @> a.j AND b.y = a.k ---- inner-join (lookup t63735_b [as=b]) ├── lookup columns are key ├── inner-join (inverted t63735_b@secondary [as=b]) │ ├── flags: force inverted join (into right side) │ ├── inverted-expr │ │ └── b.j @> a.j │ ├── inner-join (cross) │ │ ├── values │ │ │ ├── (10,) │ │ │ ├── (20,) │ │ │ └── (30,) │ │ ├── scan t63735_a [as=a] │ │ │ └── constraint: /1: [/15 - /15] │ │ └── filters (true) │ └── filters │ └── y = 15 └── filters └── b.j @> a.j ``` now looks like: ```sql inner-join (lookup t63735_b [as=b]) ├── lookup columns are key ├── inner-join (inverted t63735_b@secondary [as=b]) │ ├── flags: force inverted join (into right side) │ ├── inverted-expr │ │ └── b.j @> a.j │ ├── project │ │ ├── scan t63735_a [as=a] │ │ │ └── constraint: /1: [/15 - /15] │ │ └── projections │ │ └── 30 │ └── filters │ └── y = 15 └── filters └── b.j @> a.j ``` Release note (sql change): Lookup joins on indexes with computed columns which are also either constrained by CHECK constraints or use an ENUM data type may now choose a more optimal plan.
Good question. I think we still run |
Done: #64580. Thanks for the info! |
64509: sql/opt: use maximally selective constant filter for lookup joins r=rytaft a=nvanbenschoten Fixes #63735. This commit addresses #63735 by reworking how constant filters are selected when being applied to lookup joins. `findJoinFilterConstants` is adjusted to return the filter that minimizes the number of returned values, instead of returning the first matching filter that it finds. This affects lookup joins which have constant filters due to both CHECK constraints and computed column constraints, as we saw in #63735. This has an effect on both standard and inverted lookup joins, as the new test cases demonstrate. For standard lookup joins, a plan which used to look like: ```sql SELECT * FROM small INNER LOOKUP JOIN t63735 ON n = y WHERE m = 5 AND n = 15 ---- inner-join (lookup t63735) ├── flags: force lookup join (into right side) ├── lookup columns are key ├── inner-join (cross) │ ├── values │ │ ├── (10,) │ │ ├── (20,) │ │ └── (30,) │ ├── select │ │ ├── scan small │ │ └── filters │ │ ├── n = 15 │ │ └── m = 5 │ └── filters (true) └── filters └── y = 15 ``` now looks like: ```sql inner-join (lookup t63735) ├── flags: force lookup join (into right side) ├── lookup columns are key ├── project │ ├── select │ │ ├── scan small │ │ └── filters │ │ ├── n = 15 │ │ └── m = 5 │ └── projections │ └── 30 └── filters └── y = 15 ``` For inverted lookup joins, a plan which used to look like: ```sql SELECT * FROM t63735_a AS a INNER INVERTED JOIN t63735_b AS b ON a.k = 15 AND b.j @> a.j AND b.y = a.k ---- inner-join (lookup t63735_b [as=b]) ├── lookup columns are key ├── inner-join (inverted t63735_b@secondary [as=b]) │ ├── flags: force inverted join (into right side) │ ├── inverted-expr │ │ └── b.j @> a.j │ ├── inner-join (cross) │ │ ├── values │ │ │ ├── (10,) │ │ │ ├── (20,) │ │ │ └── (30,) │ │ ├── scan t63735_a [as=a] │ │ │ └── constraint: /1: [/15 - /15] │ │ └── filters (true) │ └── filters │ └── y = 15 └── filters └── b.j @> a.j ``` now looks like: ```sql inner-join (lookup t63735_b [as=b]) ├── lookup columns are key ├── inner-join (inverted t63735_b@secondary [as=b]) │ ├── flags: force inverted join (into right side) │ ├── inverted-expr │ │ └── b.j @> a.j │ ├── project │ │ ├── scan t63735_a [as=a] │ │ │ └── constraint: /1: [/15 - /15] │ │ └── projections │ │ └── 30 │ └── filters │ └── y = 15 └── filters └── b.j @> a.j ``` Release note (sql change): Lookup joins on indexes with computed columns which are also either constrained by CHECK constraints or use an ENUM data type may now choose a more optimal plan. Co-authored-by: Nathan VanBenschoten <[email protected]>
This commit reduces the cost of locality optimized anti join to to ensure that we always choose it over a non-locality-optimized anti join when possible. Informs cockroachdb#63735 Release note (performance improvement): Adjusted the estimated cost of locality optimized anti joins in the optimizer so that they are always chosen over non-locality-optimized anti joins when possible. This makes it more likely that queries involving anti joins (such as inserts with foreign key checks) can avoid visting remote regions. This results in lower latency.
Fixes cockroachdb#63735. This commit addresses cockroachdb#63735 by reworking how constant filters are selected when being applied to lookup joins. findJoinFilterConstants is adjusted to return the filter that minimizes the number of returned values, instead of returning the first matching filter that it finds. This affects lookup joins which have constant filters due to both CHECK constraints and computed column constraints, as we saw in cockroachdb#63735. This has an effect on both standard and inverted lookup joins, as the new test cases demonstrate. For standard lookup joins, a plan which used to look like: ```sql SELECT * FROM small INNER LOOKUP JOIN t63735 ON n = y WHERE m = 5 AND n = 15 ---- inner-join (lookup t63735) ├── flags: force lookup join (into right side) ├── lookup columns are key ├── inner-join (cross) │ ├── values │ │ ├── (10,) │ │ ├── (20,) │ │ └── (30,) │ ├── select │ │ ├── scan small │ │ └── filters │ │ ├── n = 15 │ │ └── m = 5 │ └── filters (true) └── filters └── y = 15 ``` now looks like: ```sql inner-join (lookup t63735) ├── flags: force lookup join (into right side) ├── lookup columns are key ├── project │ ├── select │ │ ├── scan small │ │ └── filters │ │ ├── n = 15 │ │ └── m = 5 │ └── projections │ └── 30 └── filters └── y = 15 ``` For inverted lookup joins, a plan which used to look like: ```sql SELECT * FROM t63735_a AS a INNER INVERTED JOIN t63735_b AS b ON a.k = 15 AND b.j @> a.j AND b.y = a.k ---- inner-join (lookup t63735_b [as=b]) ├── lookup columns are key ├── inner-join (inverted t63735_b@secondary [as=b]) │ ├── flags: force inverted join (into right side) │ ├── inverted-expr │ │ └── b.j @> a.j │ ├── inner-join (cross) │ │ ├── values │ │ │ ├── (10,) │ │ │ ├── (20,) │ │ │ └── (30,) │ │ ├── scan t63735_a [as=a] │ │ │ └── constraint: /1: [/15 - /15] │ │ └── filters (true) │ └── filters │ └── y = 15 └── filters └── b.j @> a.j ``` now looks like: ```sql inner-join (lookup t63735_b [as=b]) ├── lookup columns are key ├── inner-join (inverted t63735_b@secondary [as=b]) │ ├── flags: force inverted join (into right side) │ ├── inverted-expr │ │ └── b.j @> a.j │ ├── project │ │ ├── scan t63735_a [as=a] │ │ │ └── constraint: /1: [/15 - /15] │ │ └── projections │ │ └── 30 │ └── filters │ └── y = 15 └── filters └── b.j @> a.j ``` Release note (sql change): Lookup joins on indexes with computed columns which are also either constrained by CHECK constraints or use an ENUM data type may now choose a more optimal plan.
note: the name on this might be very off, I was just guessing
Statement diagnostics of relevant query: stmt-bundle-650217270527393794.zip
In multi-region TPC-C, we have a schema that looks like the following:
Notice that each table has a
crdb_region
column which is computed from its warehouse column (ol_w_id
fororder_line
,s_w_id
forstock
).One of the "Stock Level" transaction statements runs a join against these two tables. The statement constraints the warehouse sufficiently so that we should be able to understand that we can search locally, but we don't, so the query is executed with a remote scan that bumps it from about 10ms to about 100ms:
If we manually push the
s_w_id
through the join, things still don't improve:However, if we also manually derive the
stock
table'scrdb_region
column, things remain local to the gateway region. We can see this in both a change in query plan and a large speedup:cc. @rytaft for triage.
The text was updated successfully, but these errors were encountered: