Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6836] Add Rule to convert INTERSECT to EXISTS #4209

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xiedeyantu
Copy link
Contributor

@xiedeyantu xiedeyantu commented Feb 23, 2025

@xiedeyantu
Copy link
Contributor Author

Hi, @NobiGo please review thanks !

@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from 047f96e to ed98960 Compare February 24, 2025 14:50
@xiedeyantu xiedeyantu requested a review from NobiGo February 24, 2025 14:54
Copy link
Contributor

@NobiGo NobiGo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the description provided by Jira, we need to add a Distinct at the top level of the Project, and perhaps we can also add test cases to JdbcTest to ensure that our transformation implementation is correct.
The SQL example:

select * from (values (1, 2), (1, 2), (3, 4), (5, 7)) as t(a, b)
intersect 
select * from (values (1, 2), (3, 4), (5, 6)) as t(a, b) ;

You will know why we need the Distinct.

@xiedeyantu
Copy link
Contributor Author

According to the description provided by Jira, we need to add a Distinct at the top level of the Project, and perhaps we can also add test cases to JdbcTest to ensure that our transformation implementation is correct. The SQL example:

select * from (values (1, 2), (1, 2), (3, 4), (5, 7)) as t(a, b)
intersect 
select * from (values (1, 2), (3, 4), (5, 6)) as t(a, b) ;

You will know why we need the Distinct.

@NobiGo I know the cause of the problem. I considered distinct in the rule, but the field I chose was too special. It was the primary key of the table, so agg was automatically removed. I modified the test case again, making all the current test cases use non-primary keys, and then adding a test case using the primary key field.

@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from ed98960 to 119438d Compare February 25, 2025 13:59
@xiedeyantu xiedeyantu requested a review from NobiGo February 25, 2025 14:00
@@ -9585,4 +9585,46 @@ private void checkJoinAssociateRuleWithTopAlwaysTrueCondition(boolean allowAlway
.withRule(CoreRules.MULTI_JOIN_OPTIMIZE)
.check();
}

@Test void testIntersectToExistsRuleOneField() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since plans are hard to read, I would encourage you to also add some quidem tests, these are much simpler to read and write

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since plans are hard to read, I would encourage you to also add some quidem tests, these are much simpler to read and write

@mihaibudiu Could you give me a description of quidem tests? I don't quite understand how to write them now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the files with iq suffix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the files with iq suffix

@mihaibudiu Thanks, I have seen the file with this suffix. Do I need to write a SQL test to output the final result through the optimization rules? Is there a similar example for reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as @mihaibudiu said, currently this quidem file cannot specify specific optimizations. Before Quidem's test, we wrote such test cases in JdbcTest. Please feel free to check it.

@mihaibudiu
Copy link
Contributor

Actually I think I am wrong, there is no way right now to use a specific optimization in a quidem file. That ticket is still open. So you can for now forget about my request.

@xiedeyantu
Copy link
Contributor Author

Actually I think I am wrong, there is no way right now to use a specific optimization in a quidem file. That ticket is still open. So you can for now forget about my request.

It doesn't matter, I can just learn this testing method, which may be covered later.

</Resource>
<Resource name="planAfter">
<![CDATA[
LogicalAggregate(group=[{0, 1}])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not read this paper for the time being. Is it necessary to consider the case of null values here. For example, if the emp table has only one null row. in intersect it should return null, in exsits it will return 0 rows of data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suibianwanwank This is indeed a good question. Many database implementations are as you said, so do we need to modify the logic?
SELECT a.CustID FROM tbl1 AS a INTERSECT SELECT b.CustID FROM tbl2 AS b
The Exists SQL change to
SELECT DISTINCT a.CustID FROM tbl1 AS a WHERE EXISTS (SELECT b.CustID FROM tbl2 b WHERE (a.CustID=b.CustID) or (a.CustID is NULL and b.CustID is NULL))

@NobiGo Could you help me determine whether this logic is correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't read the paper either, but the specific rewrite you propose LGTM, even in the case @suibianwanwank proposed.

You can even opt for a more concise (equivalent) rewrite:

SELECT DISTINCT a.CustID FROM tbl1 AS a WHERE EXISTS (SELECT b.CustID FROM tbl2 b WHERE a.CustID IS NOT DISTINCT FROM b.CustID)

I have played a bit with it here to double check my understanding: https://www.db-fiddle.com/#&togetherjs=NSeyJDgqqy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS NOT DISTINCT FROM

@asolimando Great! I will chang my rule to fit your suggestion.

@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from 119438d to 694301d Compare February 26, 2025 14:39
@xiedeyantu
Copy link
Contributor Author

@asolimando @suibianwanwank @NobiGo I have fixed, please review it.

Copy link
Contributor

@NobiGo NobiGo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue raised by @suibianwanwank is a very good test case. We should test the accuracy of this logic in the JdbcTest.

@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from 694301d to 6d9b9de Compare February 27, 2025 09:24
@xiedeyantu
Copy link
Contributor Author

The issue raised by @suibianwanwank is a very good test case. We should test the accuracy of this logic in the JdbcTest.

@NobiGo Sorry, I tried many ways and couldn't find a good jdbc test method. I found that when executing the test, the plan should not include subquery, which will cause this error

  public RexLocalRef registerInput(RexNode expr) {
    final RexShuttle shuttle = new RegisterInputShuttle(true);
    final RexNode ref = expr.accept(shuttle);
    return (RexLocalRef) ref;  <------ ref is Subquery, can not cast to RexLocalRef
  }

@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from 6d9b9de to c471930 Compare February 27, 2025 10:46
Copy link
Contributor

@NobiGo NobiGo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 @Test void testIntersect() {
    final String sql = ""
        + "select \"empid\", \"name\" from \"hr\".\"emps\" where \"deptno\"=10\n"
        + "intersect\n"
        + "select \"empid\", \"name\" from \"hr\".\"emps\" where \"empid\">=150";
    CalciteAssert.hr()
        .query(sql)
        .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>)
            planner -> {
          planner.removeRule(CoreRules.INTERSECT_TO_DISTINCT);
          planner.removeRule(EnumerableRules.ENUMERABLE_INTERSECT_RULE);
          planner.addRule(CoreRules.INTERSECT_TO_EXISTS);
            })
        .explainContains("")
        .returnsUnordered("empid=150; name=Sebastian");
  }

You can try this unit test in JdbcTest.
Now it throws exception.

@xiedeyantu
Copy link
Contributor Author

 @Test void testIntersect() {
    final String sql = ""
        + "select \"empid\", \"name\" from \"hr\".\"emps\" where \"deptno\"=10\n"
        + "intersect\n"
        + "select \"empid\", \"name\" from \"hr\".\"emps\" where \"empid\">=150";
    CalciteAssert.hr()
        .query(sql)
        .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>)
            planner -> {
          planner.removeRule(CoreRules.INTERSECT_TO_DISTINCT);
          planner.removeRule(EnumerableRules.ENUMERABLE_INTERSECT_RULE);
          planner.addRule(CoreRules.INTERSECT_TO_EXISTS);
            })
        .explainContains("")
        .returnsUnordered("empid=150; name=Sebastian");
  }

You can try this unit test in JdbcTest. Now it throws exception.

@NobiGo I have debugged the code and modified some of the logic, but there are still errors. as

cannot translate expression EXISTS({
LogicalFilter(condition=[AND(IS NOT DISTINCT FROM($0, $cor0.empid), IS NOT DISTINCT FROM($1, $cor0.name))])
  LogicalFilter(subset=[rel#282:RelSubset#5.NONE.[]], condition=[>=($0, 150)])
    LogicalProject(subset=[rel#280:RelSubset#4.NONE.[]], empid=[$0], name=[$2])
      LogicalTableScan(subset=[rel#272:RelSubset#0.NONE.[]], table=[[hr, emps]])
})

the stack is

0 = {StackTraceElement@7965} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitSubQuery(RexToLixTranslator.java:1701)"
1 = {StackTraceElement@7966} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitSubQuery(RexToLixTranslator.java:108)"
2 = {StackTraceElement@7967} "org.apache.calcite.rex.RexSubQuery.accept(RexSubQuery.java:162)"
3 = {StackTraceElement@7968} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.implementCallOperand(RexToLixTranslator.java:1439)"
4 = {StackTraceElement@7969} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:1426)"
5 = {StackTraceElement@7970} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitCall(RexToLixTranslator.java:108)"
6 = {StackTraceElement@7971} "org.apache.calcite.rex.RexCall.accept(RexCall.java:208)"
7 = {StackTraceElement@7972} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitLocalRef(RexToLixTranslator.java:1306)"
8 = {StackTraceElement@7973} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.visitLocalRef(RexToLixTranslator.java:108)"
9 = {StackTraceElement@7974} "org.apache.calcite.rex.RexLocalRef.accept(RexLocalRef.java:78)"
10 = {StackTraceElement@7975} "org.apache.calcite.adapter.enumerable.RexToLixTranslator.translate(RexToLixTranslator.java:260)"

it seems not support exectue subquery, as I mentioned before. Are there any other solutions?

@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from c471930 to e526431 Compare February 27, 2025 15:50
@xiedeyantu xiedeyantu force-pushed the support-intersecttoexists branch from e526431 to 436e890 Compare March 1, 2025 15:05
@xiedeyantu
Copy link
Contributor Author

@NobiGo I have added a test case for SQL to Rel and Rel to SQL. Can this verify the conclusion? As for the execution result of this SQL, it depends on various execution engines.

@NobiGo
Copy link
Contributor

NobiGo commented Mar 3, 2025

@NobiGo I have added a test case for SQL to Rel and Rel to SQL. Can this verify the conclusion? As for the execution result of this SQL, it depends on various execution engines.

I need more time to review this PR.

@xiedeyantu
Copy link
Contributor Author

@NobiGo @suibianwanwank @mihaibudiu Hi all, Do we still need this PR?

@NobiGo
Copy link
Contributor

NobiGo commented Mar 5, 2025

@xiedeyantu This PR cannot be added with test cases in JdbcTest due to hidden issues. I have been dealing with them and we can merge this PR in the next version.

@xiedeyantu
Copy link
Contributor Author

@xiedeyantu This PR cannot be added with test cases in JdbcTest due to hidden issues. I have been dealing with them and we can merge this PR in the next version.

@NobiGo Got it, thanks!

Copy link
Contributor

@suibianwanwank suibianwanwank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisions. There were a few minor issues I left comments on.
I'll try to test this pr locally this weekend, it would be great to find a way to test the results.
Also, you don't have to overwrite previous commits every time, and keeping a record of changes will allow other reviewers to better understand pr changes.

sql(sql).withRule(CoreRules.INTERSECT_TO_EXISTS)
.check();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all remaining tests, also add the jira case

RelDataType originalType =
rowType.getFieldList().get(projects.size()).getType();
RexNode expr;
if (!originalType.equals(rexNode.getType())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What situation is the type adjustment for here?

Copy link
Contributor Author

@xiedeyantu xiedeyantu Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What situation is the type adjustment for here?

Here we can get java type and sql type, when we using setop.

* Will not eliminate HepRelVertex in subqueries
* during the optimization process.
* We need to eliminate it here. */
private RelNode removeHepRelVertex(RelNode input) {
Copy link
Contributor

@suibianwanwank suibianwanwank Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to stipped().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to stipped().

Yes, thanks.

* <a href="https://issues.apache.org/jira/browse/CALCITE-6836">[CALCITE-6836]
* Add Rule to convert INTERSECT to EXISTS</a>. */
@Test void testIntersectToExistsRule() {
String query = "SELECT \"product_name\"\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I don't think it's necessary to test in RelToSqlConverter because the semantics of relational algebra is already expressed in RelNode. What is tested here is the correctness of RelNode -> sqlnode -> sqlText.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I don't think it's necessary to test in RelToSqlConverter because the semantics of relational algebra is already expressed in RelNode. What is tested here is the correctness of RelNode -> sqlnode -> sqlText.

@suibianwanwank Thanks, I agree with you, but currently Calcite may not have the ability to execute this plan, and I have no other good options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants