-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect results for join condition against current master branch #4844
Comments
Wow, let's make sure to add some tests so regressions like this do not stealthily go through in the future 🤔 |
Looks to be regression introduced by fddb3d3 (#4562) On the commit prior to it (2792113), I get this explain plan: +------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| initial_logical_plan | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | Filter: g.grade > Int64(2) |
| | Filter: s.mark BETWEEN g.min AND g.max |
| | CrossJoin: |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | TableScan: grades |
| logical_plan after inline_table_scan | SAME TEXT AS ABOVE |
| logical_plan after type_coercion | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | Filter: g.grade > Int64(2) |
| | Filter: s.mark >= g.min AND s.mark <= g.max |
| | CrossJoin: |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | TableScan: grades |
...
| logical_plan after eliminate_cross_join | SAME TEXT AS ABOVE |
...
| logical_plan after push_down_filter | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | Filter: s.mark >= g.min AND s.mark <= g.max |
| | CrossJoin: |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | Filter: grades.grade > Int64(2) |
| | TableScan: grades, partial_filters=[grades.grade > Int64(2)]
... And on commit fddb3d3 I get this plan instead: +------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| initial_logical_plan | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | Filter: g.grade > Int64(2) |
| | Inner Join: Filter: s.mark BETWEEN g.min AND g.max |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | TableScan: grades |
| logical_plan after inline_table_scan | SAME TEXT AS ABOVE |
| logical_plan after type_coercion | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | Filter: g.grade > Int64(2) |
| | Inner Join: Filter: s.mark >= g.min AND s.mark <= g.max AS s.mark BETWEEN g.min AND g.max |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | TableScan: grades |
...
| logical_plan after eliminate_cross_join | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | Filter: g.grade > Int64(2) |
| | CrossJoin: |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | TableScan: grades |
...
| logical_plan after push_down_filter | Sort: s.mark DESC NULLS FIRST |
| | Projection: s.name, s.mark, g.grade |
| | CrossJoin: |
| | SubqueryAlias: s |
| | TableScan: students |
| | SubqueryAlias: g |
| | Filter: grades.grade > Int64(2) |
| | TableScan: grades, partial_filters=[grades.grade > Int64(2)] The actual regression seems to be caused by the SQL planner generating the initial logical plan with an Inner Join instead of a Cross Join, and this propagates down to cause the bug. However it seems to highlight the actual flaw which is in Sort: s.mark DESC NULLS FIRST
Projection: s.name, s.mark, g.grade
Filter: g.grade > Int64(2)
Inner Join: Filter: s.mark >= g.min AND s.mark <= g.max AS s.mark BETWEEN g.min AND g.max
SubqueryAlias: s
TableScan: students
SubqueryAlias: g
TableScan: grades to Sort: s.mark DESC NULLS FIRST
Projection: s.name, s.mark, g.grade
Filter: g.grade > Int64(2)
CrossJoin:
SubqueryAlias: s
TableScan: students
SubqueryAlias: g
TableScan: grades Where it completely discards the Filter on the Inner Join when converting it to a Cross Join, causing the buggy behaviour |
It seems the I will submit a pr later if others do not fix it. |
cc @liukun4515 |
Definitely it would be great to get some test coverage for this case. |
IMO it would be a good idea to fix this before releasing, this seems like a regression on fundamental functionality. Happy to help with reviewing the fix PR. |
I agree -- I have mentioned it on the mailing list vote thread. @ygf11 can you work on a PR for this issue soon? |
I think the plan should be the |
Yes, we can just fix this bug in the |
I debug it, it is because find |
I meet a strange problem when I do the pr #4866. After I fix the test, the tcph-q11 will panic when run sqllogictests(the plan of q11 is modified), the direct panic message:
I have no idea about it now, need to investigate more. To not block the release, I think maybe we can skip With the patch, the sql of this issue will get the correct result. |
I create #4869 to show the patch. |
This problem remind me the importance of #4615. |
Should this issue be closed now that #4869 is merged? |
Filed #4877 to track the work to properly support this case. Thank you all for your work to fix this issue |
Describe the bug
Join condition(
on
withbetween
) works incorrectly.Looks like ignored and returned cartesian product.
It used to work for latest stable release (
15.0.0
fromcrates.io
)But I tested it against current master branch, hash
3cc607de4ce6e9e1fd537091e471858c62f58653
.To Reproduce
Steps to reproduce the behavior:
students.csv
:grades.csv
:MRE:
It will return:
Expected behavior
It should be the same as for
datafusion = "15.0.0"
:The text was updated successfully, but these errors were encountered: