-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant cast to varchar in join clause #21050
Conversation
4d20653
to
a9ba850
Compare
@@ -1253,7 +1257,7 @@ public void testPayloadJoins() | |||
"SELECT l.* FROM (select orderkey+1 as ok1, * from lineitem_map) l left join orders o on (l.ok1 = o.orderkey+1) left join part p on (l.partkey=p.partkey)", | |||
"SELECT l.* FROM (select *, cast(orderkey as int) as ok from lineitem_map) l left join (select *, cast(orderkey as int) as ok from orders) o on (l.ok = o.ok) left join (select *, cast(partkey as int) as pk from part) p on (l.ok=p.pk)", | |||
"with lm as (select quantity + 1 as q1, * from lineitem_map) SELECT l.* FROM (select * from lm) l left join orders o on (l.orderkey = o.orderkey) left join part p on (l.partkey=p.partkey)", | |||
|
|||
// This is the query which fail when REMOVE_REDUNDANT_CAST_TO_VARCHAR_IN_JOIN set to true | |||
"SELECT l.* FROM lineitem_map l left join orders o on (cast(l.orderkey as varchar) = cast(o.orderkey as varchar)) left join part p on (l.partkey=p.partkey)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlyublena The optimization added surfaced a bug in payload join optimizer, i.e. when payload join optimization is not enabled, this query runs successfully with the optimization in this PR, but when payload join optimization is enabled, this query will fail with the optimization in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an outstanding PR that fixes a lot of bugs in payload join optimizer: #19904
My guess is that the case you found is already fixed there
.build(); | ||
assertUpdate("create table lineitem_map as select *, map(ARRAY[1,3], ARRAY[2,4]) as m1, map(ARRAY[1,3], ARRAY[2,4]) as m2 from lineitem", 60175); | ||
assertUpdate("create table part_map as select *, map(ARRAY[1,3], ARRAY[2,4]) as m3, map(ARRAY[1,3], ARRAY[2,4]) as m4 from part", 2000); | ||
assertQueryWithSameQueryRunner(session, "SELECT l.* FROM lineitem_map l left join orders o on (cast(l.orderkey as varchar) = cast(o.orderkey as varchar)) left join part p on (l.partkey=p.partkey)", disabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the query found in payload join test here, make sure that it succeeds with this optimization, with payload join disabled.
...com/facebook/presto/sql/planner/iterative/rule/RemoveRedundantCastToVarcharInJoinClause.java
Outdated
Show resolved
Hide resolved
* </pre> | ||
* We will rely on optimizations later to remove unnecessary cast (if not used) and identity projection here. | ||
* | ||
* Notice that we do not apply similar optimizations to queries with similar join condition like `cast(bigint as varchar) = varchar`. In general it can be converted to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we allow this transformation for all non-lossy/deterministic casts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied for cast(bigint/integer as varchar) as varchar has high cost for hash operation. Other cases may not show as big improvement here.
@@ -282,6 +282,8 @@ public class FeaturesConfig | |||
|
|||
private boolean preProcessMetadataCalls; | |||
|
|||
private boolean removeRedundantCastToVarcharInJoin = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a session property? isn't this always beneficial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case we need to revert it if any bugs found later.
ebffe7d
to
a558373
Compare
...com/facebook/presto/sql/planner/iterative/rule/RemoveRedundantCastToVarcharInJoinClause.java
Outdated
Show resolved
Hide resolved
{ | ||
PlanNode leftInput = context.getLookup().resolve(node.getLeft()); | ||
PlanNode rightInput = context.getLookup().resolve(node.getRight()); | ||
if (!(leftInput instanceof ProjectNode) || !(rightInput instanceof ProjectNode)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that transformation won't work if the projection happens some levels down the tree?
for example
(select * from (select *, cast(key as varchar) as newkey) where ...) join T on newkey=cast(... as varchar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it will not work. I tried to do it, but it will complicated this rule as we not only need to extract this pattern traversing the tree, but also need to propagate the variables used in the cast expression (it not propagated already) to the join node. Since this rule already covered the query which motivated this optimization, I decided to do it in a simplified way for now.
a558373
to
113d023
Compare
presto-main/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
RowExpression rightAssignment = ((CallExpression) rightProjectAssignment).getArguments().get(0); | ||
|
||
if (!leftAssignment.getType().equals(rightAssignment.getType())) { | ||
leftAssignment = castToBigInt(functionAndTypeManager, leftAssignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to do this if the expressions are already bigints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checked in castToBigInt
function, with
if (rowExpression.getType().equals(BIGINT)) {
return rowExpression;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
113d023
to
fe5b0b6
Compare
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueries.java
Show resolved
Hide resolved
0f161cf
to
2643e89
Compare
Codenotify: Notifying subscribers in CODENOTIFY files for diff c4df72a...47c9b35. No notifications. |
2643e89
to
e180aeb
Compare
a84f670
to
ed0da1b
Compare
ed0da1b
to
47c9b35
Compare
Description
Rewrite query such as
select select * from orders o join customer c on cast(o.custkey as varchar) = cast(c.custkey as varchar)
toselect select * from orders o join customer c on o.custkey = c.custkey
Motivation and Context
Varchar is expensive to hash, and join with varchar keys can be expensive. For the cases above, removing the cast can reduce the cost of join operation.
Impact
Test in production queries show it decrease both CPU and latency by half.
Test Plan
Unit tests and verifier run
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.