Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-tuple expression for in-subquery to join #4725

Closed
wants to merge 3 commits into from

Conversation

ygf11
Copy link
Contributor

@ygf11 ygf11 commented Dec 24, 2022

Which issue does this PR close?

Closes #4724.

Rationale for this change

Currently datafusion support rewrite in-subquery to join when the expression is a column, like:

> select * from t1 where t1_id in (select t2_id from t2);

but following sql does not works:

> select * from t1 where t1_id + 11 in (select t2_id from t2);
NotImplemented("Physical plan does not support logical expression CAST(t1.t1_id AS Int64) + Int64(11) IN (<subquery>)")

We can rewrite these subquery to join also.

What changes are included in this PR?

In SubqueryFilterToJoin:

  1. Rewrite where x+1 in (...) to equi-join.
  2. Rewrite where 1 in (..) to non equi-join.

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels Dec 24, 2022
@ygf11 ygf11 changed the title Support non-tuple expression for subquery-filter to join optimizer Support non-tuple expression for subquery-filter to join rule Dec 24, 2022
@ygf11 ygf11 changed the title Support non-tuple expression for subquery-filter to join rule Support non-tuple expression for in-subquery to join Dec 24, 2022
@ygf11 ygf11 marked this pull request as ready for review December 24, 2022 11:58
Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great improvement to me.👍 Thanks @ygf11


#[tokio::test]
async fn reduce_where_in_to_expr_equijoin() -> Result<()> {
let ctx = create_join_context("t1_id", "t2_id", false)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +111 to +118
let (on, filter) =
// When left is a constant expression, like 1,
// the join predicate will be `1 = right_key`, it is better to add it to filter.
if left_key.to_columns()?.is_empty() {
let equi_expr =
Expr::eq(*expr.clone(), Expr::Column(right_key));
(vec![], Some(equi_expr))
} else {
Copy link
Member

@jackwener jackwener Dec 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I think it's a good improvement but maybe isn't suitable to put in this rule.🤔
It could be better to handle separate/handle on/joinfilter in a new rule(we also can add more improvement in it).

Just a thought, I think we also can add a TODO, and handle them as a future ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be better to handle separate/handle on/joinfilter in a new rule(we also can add more improvement in it).

Agree.

There is another pr #4711 which will separate equi and non-equi predicate from filter.
We can add the left-key = right-key to join-filter in this rule, and let the other rule do the separating works.

@Jefffrey
Copy link
Contributor

A great improvement to me.+1 Thanks @Jefffrey

Got the wrong person 😆

@liukun4515
Copy link
Contributor

why not implement this in the rule decorrelate_where_in.rs?

The two rule of decorrelate_where_in.rs and subquery_filter_to_join.rs implement the same function, and convert the in query to Join. Maybe we can reduce the duplicate code and reduce the rule. cc @jackwener

@jackwener
Copy link
Member

why not implement this in the rule decorrelate_where_in.rs?

The two rule of decorrelate_where_in.rs and subquery_filter_to_join.rs implement the same function, and convert the in query to Join. Maybe we can reduce the duplicate code and reduce the rule. cc @jackwener

Yes, I add it in new PR #4731

@ygf11
Copy link
Contributor Author

ygf11 commented Dec 26, 2022

Close this pr, and I will submit a new pr based on DecorrelateWhereIn rule.

@ygf11 ygf11 closed this Dec 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support non-tuple expression for in-subquery to join
4 participants