-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-6836] Add Rule to convert INTERSECT to EXISTS #4209
base: main
Are you sure you want to change the base?
Conversation
Hi, @NobiGo please review thanks ! |
core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java
Outdated
Show resolved
Hide resolved
047f96e
to
ed98960
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the description provided by Jira, we need to add a Distinct at the top level of the Project, and perhaps we can also add test cases to JdbcTest to ensure that our transformation implementation is correct.
The SQL example:
select * from (values (1, 2), (1, 2), (3, 4), (5, 7)) as t(a, b)
intersect
select * from (values (1, 2), (3, 4), (5, 6)) as t(a, b) ;
You will know why we need the Distinct.
@NobiGo I know the cause of the problem. I considered distinct in the rule, but the field I chose was too special. It was the primary key of the table, so agg was automatically removed. I modified the test case again, making all the current test cases use non-primary keys, and then adding a test case using the primary key field. |
ed98960
to
119438d
Compare
@@ -9585,4 +9585,46 @@ private void checkJoinAssociateRuleWithTopAlwaysTrueCondition(boolean allowAlway | |||
.withRule(CoreRules.MULTI_JOIN_OPTIMIZE) | |||
.check(); | |||
} | |||
|
|||
@Test void testIntersectToExistsRuleOneField() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since plans are hard to read, I would encourage you to also add some quidem tests, these are much simpler to read and write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since plans are hard to read, I would encourage you to also add some quidem tests, these are much simpler to read and write
@mihaibudiu Could you give me a description of quidem tests? I don't quite understand how to write them now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the files with iq suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the files with iq suffix
@mihaibudiu Thanks, I have seen the file with this suffix. Do I need to write a SQL test to output the final result through the optimization rules? Is there a similar example for reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as @mihaibudiu said, currently this quidem file cannot specify specific optimizations. Before Quidem's test, we wrote such test cases in JdbcTest. Please feel free to check it.
Actually I think I am wrong, there is no way right now to use a specific optimization in a quidem file. That ticket is still open. So you can for now forget about my request. |
It doesn't matter, I can just learn this testing method, which may be covered later. |
</Resource> | ||
<Resource name="planAfter"> | ||
<![CDATA[ | ||
LogicalAggregate(group=[{0, 1}]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not read this paper for the time being. Is it necessary to consider the case of null values here. For example, if the emp table has only one null row. in intersect it should return null, in exsits it will return 0 rows of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suibianwanwank This is indeed a good question. Many database implementations are as you said, so do we need to modify the logic?
SELECT a.CustID FROM tbl1 AS a INTERSECT SELECT b.CustID FROM tbl2 AS b
The Exists SQL change to
SELECT DISTINCT a.CustID FROM tbl1 AS a WHERE EXISTS (SELECT b.CustID FROM tbl2 b WHERE (a.CustID=b.CustID) or (a.CustID is NULL and b.CustID is NULL))
@NobiGo Could you help me determine whether this logic is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't read the paper either, but the specific rewrite you propose LGTM, even in the case @suibianwanwank proposed.
You can even opt for a more concise (equivalent) rewrite:
SELECT DISTINCT a.CustID FROM tbl1 AS a WHERE EXISTS (SELECT b.CustID FROM tbl2 b WHERE a.CustID IS NOT DISTINCT FROM b.CustID)
I have played a bit with it here to double check my understanding: https://www.db-fiddle.com/#&togetherjs=NSeyJDgqqy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IS NOT DISTINCT FROM
@asolimando Great! I will chang my rule to fit your suggestion.
119438d
to
694301d
Compare
@asolimando @suibianwanwank @NobiGo I have fixed, please review it. |
core/src/main/java/org/apache/calcite/rel/rules/IntersectToExistsRule.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/rel/rules/IntersectToExistsRule.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue raised by @suibianwanwank is a very good test case. We should test the accuracy of this logic in the JdbcTest.
694301d
to
6d9b9de
Compare
@NobiGo Sorry, I tried many ways and couldn't find a good jdbc test method. I found that when executing the test, the plan should not include subquery, which will cause this error
|
6d9b9de
to
c471930
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Test void testIntersect() {
final String sql = ""
+ "select \"empid\", \"name\" from \"hr\".\"emps\" where \"deptno\"=10\n"
+ "intersect\n"
+ "select \"empid\", \"name\" from \"hr\".\"emps\" where \"empid\">=150";
CalciteAssert.hr()
.query(sql)
.withHook(Hook.PLANNER, (Consumer<RelOptPlanner>)
planner -> {
planner.removeRule(CoreRules.INTERSECT_TO_DISTINCT);
planner.removeRule(EnumerableRules.ENUMERABLE_INTERSECT_RULE);
planner.addRule(CoreRules.INTERSECT_TO_EXISTS);
})
.explainContains("")
.returnsUnordered("empid=150; name=Sebastian");
}
You can try this unit test in JdbcTest.
Now it throws exception.
@NobiGo I have debugged the code and modified some of the logic, but there are still errors. as
the stack is
it seems not support exectue subquery, as I mentioned before. Are there any other solutions? |
c471930
to
e526431
Compare
e526431
to
436e890
Compare
@NobiGo I have added a test case for SQL to Rel and Rel to SQL. Can this verify the conclusion? As for the execution result of this SQL, it depends on various execution engines. |
I need more time to review this PR. |
@NobiGo @suibianwanwank @mihaibudiu Hi all, Do we still need this PR? |
@xiedeyantu This PR cannot be added with test cases in JdbcTest due to hidden issues. I have been dealing with them and we can merge this PR in the next version. |
@NobiGo Got it, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revisions. There were a few minor issues I left comments on.
I'll try to test this pr locally this weekend, it would be great to find a way to test the results.
Also, you don't have to overwrite previous commits every time, and keeping a record of changes will allow other reviewers to better understand pr changes.
sql(sql).withRule(CoreRules.INTERSECT_TO_EXISTS) | ||
.check(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all remaining tests, also add the jira case
RelDataType originalType = | ||
rowType.getFieldList().get(projects.size()).getType(); | ||
RexNode expr; | ||
if (!originalType.equals(rexNode.getType())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation is the type adjustment for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation is the type adjustment for here?
Here we can get java type and sql type, when we using setop.
* Will not eliminate HepRelVertex in subqueries | ||
* during the optimization process. | ||
* We need to eliminate it here. */ | ||
private RelNode removeHepRelVertex(RelNode input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need to stipped().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may need to stipped().
Yes, thanks.
* <a href="https://issues.apache.org/jira/browse/CALCITE-6836">[CALCITE-6836] | ||
* Add Rule to convert INTERSECT to EXISTS</a>. */ | ||
@Test void testIntersectToExistsRule() { | ||
String query = "SELECT \"product_name\"\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I don't think it's necessary to test in RelToSqlConverter because the semantics of relational algebra is already expressed in RelNode. What is tested here is the correctness of RelNode -> sqlnode -> sqlText.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I don't think it's necessary to test in RelToSqlConverter because the semantics of relational algebra is already expressed in RelNode. What is tested here is the correctness of RelNode -> sqlnode -> sqlText.
@suibianwanwank Thanks, I agree with you, but currently Calcite may not have the ability to execute this plan, and I have no other good options.
|
issue: https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-6836