Convert Presto SQL to Calcite SqlNode #171

wenruimeng · 2021-10-10T04:05:26Z

This is the first task of issue #170
This PR introduces presto-parser dependency to parse the Presto SQL string to Presto Statement and implements the Presto AST visitor to convert the Presto Statement to SqlNode. DDL and Lambda are not supported in this PR.

There are about 300+ test cases that are most modified from the Presto repo test cases in presto-parser and presto-product-tests.

ljfgem

Awesome work. Thanks a lot, Wenrui!

ljfgem · 2021-10-11T23:41:48Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/PrestoParserDriver.java

+
+
+public class PrestoParserDriver {
+  private final static ParsingOptions parsingOptions = new ParsingOptions(AS_DOUBLE /* anything */);


Looks like new ParsingOptions is deprecated, maybe change it to:

ParsingOptions.builder().setDecimalLiteralTreatment(AS_DOUBLE).build();

?

And why do we choose AS_DOUBLE?

There are 3 options here. REJECT will fail the parser. It either treats the literal as double or decimal. I think either way should be good to give the same result.
It failed to parse 0.00 or some other cases.

ljfgem · 2021-10-11T23:45:20Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/CalciteSqlFormatter.java

+ * Licensed under the BSD-2 Clause license.
+ * See LICENSE in the project root for license information.
+ */
+package com.linkedin.coral.presto.parser;


I think it is a little bit confusing to have both coral-trino module and coral-presto module (Actually, coral-trino as called coral-presto before.)
Would it be better if we rename this module to something like coral-from-presto or coral-from-trino?

I see. I would suggest to coral-from-presto in this case since preso is a public name. What do you think?

Are you able to use Trino dependencies and classes? I also suggest we reuse the coral-trino module and add a package called com.linkedin.coral.trino.trino2rel along the lines of the existing com.linkedin.coral.trino.rel2trino.

What are the differences between the trino dependency and public presto dependency? Is it just about the name? Not sure the original context to choose trino over presto. If it's intended to open source, I think public dependency might be more suitable.

Although most of their grammars are still the same, the differences between them may become larger and larger.
FYI:

Trino-346:Add support for window frames based on GROUPS. (#5713)

Prestodb-0.232:Add support for ALTER FUNCTION.(#13799)

Discussed offline. Will move this change under the coral-trino/trino2rel

ljfgem · 2021-10-12T00:25:05Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/calcite/CalciteUtil.java

+  }
+
+  public static SqlIdentifier createSqlIdentifier(SqlParserPos pos, String... path) {
+    return new SqlIdentifier(Arrays.asList(path), ZERO);


Should be

return new SqlIdentifier(Arrays.asList(path), pos);

?

ljfgem · 2021-10-12T01:13:38Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+    }
+    SqlParserPos pos = getParserPos(node);
+    List<SqlNode> operands =
+        node.getArguments().stream().map(arg -> visitNode(arg, context)).collect(Collectors.toList());


It seems that if inner operand is *, then node.getArguments() is empty. Therefore, count(*) would be converted to count()
We might need some special handlings for such kind of corner cases.

I noticed that difference in the test. I think to count() and count(*) are semantically equivalent in Calcite. If it's not same, I can add some special handling here.

To be aligned with what coral-hive does for star:

coral/coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/parsetree/ParseTreeBuilder.java

Line 578 in f023597

protected SqlNode visitFunctionStar(ASTNode node, ParseContext ctx) {

I think it is better to handle the corner cases here.

Add speical handling for count(*)

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

ljfgem · 2021-10-12T15:15:20Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+
+  @Override
+  protected SqlNode visitExtract(Extract node, ParserVisitorContext context) {
+    SqlParserPos pos = getParserPos(node);


pos is not used?

ljfgem · 2021-10-12T15:31:12Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+    SqlLiteral functionQualifier = node.isDistinct() ? SqlLiteral.createSymbol(SqlSelectKeyword.DISTINCT, ZERO) : null;
+    SqlCall call = createCall(unresolvedFunction, operands, functionQualifier);
+    if (node.getWindow().isPresent()) {
+      return OVER.createCall(pos, call, visitWindow(node.getWindow().get(), context));


Looks like it doesn't consider ignore/respect nulls.

lag(salary, 1) ignore nulls over (partition by depname) and lag(salary, 1) respect nulls over (partition by depname) are returning same results.

Add the following code to take the ignoreNull into account.
if (NULL_CARED_OPERATOR.contains(call.getKind())) { if (node.isIgnoreNulls()) { call = IGNORE_NULLS.createCall(pos, call); } else { call = RESPECT_NULLS.createCall(pos, call); } }

ljfgem · 2021-10-12T19:45:01Z

coral-presto/src/test/java/parser/ParseTreeBuilderTest.java

+import static org.testng.AssertJUnit.assertEquals;
+
+
+public class ParseTreeBuilderTest {


For tests, I think it is better to be aligned with other coral modules. i.e. create different unit tests for different functions and use assertion to check if the result is correct for each unit test, which would be cleaner and easier for debug. Maybe add a TODO item if we cannot handle it in this PR?

Unit test on each function is a good suggestion. I can add them later. The test cases here are more like brute force testing to make sure most of the presto queries can be handled here. I do check their equivalence with the Calcite parsed SqlNode. Most of them are equal except of some corner cases such as count() and count(*), identifier upper or lower case, etc.

ljfgem · 2021-10-12T19:59:44Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+    SqlNode percentageNode = visitNode(node.getSamplePercentage(), context);
+    float percentage = 0;
+    if (percentageNode instanceof SqlNumericLiteral) {
+      percentage = (float) (((SqlNumericLiteral) percentageNode).longValue(true) / 100.0);


Looks like there would be precision issue for float type, i.e.:
select * from foo tablesample system (10) join bar tablesample bernoulli (30) on a.id = b.id would be converted to SELECT * FROM "FOO" TABLESAMPLE SYSTEM(10.000000149011612) INNER JOIN "BAR" TABLESAMPLE BERNOULLI(30.000001192092896) ON "A"."ID" = "B"."ID"
But there seems no way to avoid it since it requires float type, any ideas?

@ljfgem, I think you meant 0.10000000149011612. Regardless, we can try to maintain just two decimal places (by formatting to string and and back to float).

@wmoustafa It's stored 0.1 but converted 10.000000149011612 by Calcite toString function. That's probably unavoidable.

public String toString() { StringBuilder b = new StringBuilder(); b.append(this.isBernoulli ? "BERNOULLI" : "SYSTEM"); b.append('('); b.append((double)this.samplePercentage * 100.0D); b.append(')'); if (this.isRepeatable) { b.append(" REPEATABLE("); b.append(this.repeatableSeed); b.append(')'); } return b.toString(); }

ljfgem · 2021-10-12T20:10:11Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/calcite/CalciteUtil.java

+import static org.apache.calcite.util.Litmus.IGNORE;
+
+
+public class CalciteUtil {


Looks like many methods in this class are not used, are they for future use?

Yes. I added these util functions which are very common when we deal with Calcite SqlNode processing. I leave them for potential use cases. I could delete them if that's not a good decision.

I would delete them if possible. We can introduce them when we need them.

Sure. Will do it

wmoustafa

Great work @wenruimeng! I am done with ParseTreeBuilder. Will get back the to rest soon.

wmoustafa · 2021-10-14T19:09:52Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/CalciteSqlFormatter.java

+
+
+/**
+ * This is a reimplementation of vertical_blank's StandardSqlFormatter.


Could you add a link here?

wmoustafa · 2021-10-14T23:57:11Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+
+
+public class ParseTreeBuilder extends AstVisitor<SqlNode, ParserVisitorContext> {
+  private static final String UNSUPPORTED_EXCEPTION_MSG = "%s is not supported in the visit.";


Should we do something along the lines of

coral/coral-hive/src/main/java/com/linkedin/coral/hive/hive2rel/parsetree/UnhandledASTTokenException.java

Line 11 in 579ce52

public class UnhandledASTTokenException extends RuntimeException {

?

Add line and column number in the error message

I meant using the same exception class name and structure.

wmoustafa · 2021-10-15T00:44:00Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+  private final SqlTypeFactoryImpl sqlTypeFactory = new SqlTypeFactoryImpl(new HiveTypeSystem());
+  private final HiveFunctionResolver functionResolver =
+      new HiveFunctionResolver(new StaticHiveFunctionRegistry(), new ConcurrentHashMap<>());


Are we going to replace those with non-Hive versions? Something along those lines is in #151.

We could create a trino specific type system, but I didn't find the incompatibility so far. Maybe I missed some scenarios. If we found any issue, we can fix it later. What do you think?

Could you clarify what you meant by incompatibilities? For example, the semantics of strpos in Trino is carried by instr of Coral-IR. There should be a number of other instances.

wmoustafa · 2021-10-15T00:45:18Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+      new HiveFunctionResolver(new StaticHiveFunctionRegistry(), new ConcurrentHashMap<>());
+
+  // convert the Presto node parse location to the Calcite SqlParserPos
+  private SqlParserPos getParserPos(Node node) {


Rename this to pos to make the code less verbose? (it has 65 usages).

wmoustafa · 2021-10-15T00:46:59Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+  private UnsupportedOperationException getUnsupportedException(Node node) {
+    return new UnsupportedOperationException(format(UNSUPPORTED_EXCEPTION_MSG, node.toString()));
+  }
+
+  private UnsupportedOperationException getDDLException(Node node) {
+    return new UnsupportedOperationException(format(DDL_NOT_SUPPORT_MSG, node.toString()));
+  }
+
+  private UnsupportedOperationException getLambdaException(Node node) {
+    return new UnsupportedOperationException(format(LAMBDA_NOT_SUPPORT_MSG, node.toString()));
+  }


Combine to one method with string argument? Also please see comment on UnhandledASTTokenException.

wmoustafa · 2021-10-17T20:47:13Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+    SqlNode percentageNode = visitNode(node.getSamplePercentage(), context);
+    float percentage = 0;
+    if (percentageNode instanceof SqlNumericLiteral) {
+      percentage = (float) (((SqlNumericLiteral) percentageNode).longValue(true) / 100.0);


@ljfgem, I think you meant 0.10000000149011612. Regardless, we can try to maintain just two decimal places (by formatting to string and and back to float).

wmoustafa · 2021-10-17T21:00:54Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+  }
+
+  private RelDataType getDecimalType(String type) {
+    String value = type.substring(8, type.length() - 1);


Is there a more reliable way to get those values from the Trino/Presto Node?

In the trino parser, it has better way to do the conversion. Will update it here.

wmoustafa · 2021-10-17T21:07:22Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+  private RelDataType convertType(String type) {
+    if (type.toUpperCase().startsWith("DECIMAL(")) {
+      return getDecimalType(type);
+    } else if (type.toUpperCase().startsWith("VARCHAR(")) {
+      return getVarcharType(type);
+    } else if (type.toUpperCase().startsWith("CHAR(")) {
+      return getCharType(type);
+    }
+    return sqlTypeFactory.createSqlType(SqlTypeName.valueOf(type.toUpperCase()));
+  }


Can we leverage Decimal#getPrecesion and Decimal#getScale, and comparable VarcharType methods, etc?

In the trino parser, all these types are GenericDataType object which has name as Identifier and a list of arguments. We can do the name match and arguments handling separately. In previous Presto, the type is just a string. That's why it handled in such way.

wmoustafa · 2021-10-17T21:16:21Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+    return new SqlNodeList(getChildren(node, context), getParserPos(node));
+  }
+
+  private List<SqlNode> getListSqlNode(List<? extends Node> nodes, ParserVisitorContext context) {


toListOfSqlNode?

wmoustafa · 2021-10-17T21:19:53Z

coral-presto/src/main/java/com/linkedin/coral/presto/parser/ParseTreeBuilder.java

+  @Override
+  protected SqlNode visitWindow(Window node, ParserVisitorContext context) {
+    SqlParserPos pos = getParserPos(node);
+    SqlNodeList partitionList = createSqlNodeList(getListSqlNode(node.getPartitionBy(), context), pos);


Maybe worth introducing new method for createSqlNodeList(getListSqlNode(x)) together.

We can rename getListSqlNode to toListOfSqlNode, and implement createSqlNodeList(getListSqlNode(x)) in toSqlNodeList.

…lOrder so it's consistent to the Calcite parser. Fix some minor edge cases by comparing the Calcite parsed SqlNode and Translated SqlNode

wmoustafa · 2021-11-13T16:02:10Z

Thank you so much @wenruimeng for this great work!

antumbde · 2021-11-14T05:21:52Z

Very nice work! Thank you for working on this.

antumbde · 2021-11-14T05:33:08Z

I am out of touch with slack conversation so trying to understand the plan:

Current impl seems to be first phase to convert Trino parse tree to calcite parse tree. Is there WIP to convert that to rel ?
It's using HiveFunctionResolver and static function registry. We should change that to TrinoResolver and TrinoFunctionRegistry. (Worth discussing how to organize code to support to/from conversions). Maybe this is already planned.

wmoustafa · 2021-11-15T05:37:32Z

Current impl seems to be first phase to convert Trino parse tree to calcite parse tree. Is there WIP to convert that to rel ?

It has not started, but yes, conversion to RelNode should be addressed in the second phase.

It's using HiveFunctionResolver and static function registry. We should change that to TrinoResolver and TrinoFunctionRegistry. (Worth discussing how to organize code to support to/from conversions). Maybe this is already planned.

We had a couple of discussions around this, and HiveFunctionResolver is just a placeholder in this patch. It should go way in the step to convert to RelNode.

ljfgem reviewed Oct 12, 2021

View reviewed changes

wmoustafa reviewed Oct 17, 2021

View reviewed changes

Wenrui Meng added 9 commits October 29, 2021 18:09

Convert Presto SQL to Calcite SqlNode

cc7ed59

Update the queryVisit to make it generate SqlOrder whenever it has sq…

4c2205d

…lOrder so it's consistent to the Calcite parser. Fix some minor edge cases by comparing the Calcite parsed SqlNode and Translated SqlNode

Adjust some comments

f4c79fb

Remove unused functions in the util

6e4755a

Move to coral-trino module and adjust comments

cac7024

Remove coral-presto module

74b69b3

Add assertion in the test

0b8b81e

Add the test fixture

9be680c

spotlessApply

2d06db8

wenruimeng force-pushed the wenrui/presto_to_sqlnode branch from c99f66d to 2d06db8 Compare October 30, 2021 01:11

Remove unused code and update the notice with trino license

92b0d4e

wmoustafa approved these changes Nov 13, 2021

View reviewed changes

wmoustafa merged commit 5dc967b into linkedin:master Nov 13, 2021



		public class PrestoParserDriver {
		private final static ParsingOptions parsingOptions = new ParsingOptions(AS_DOUBLE /* anything */);

		import static org.testng.AssertJUnit.assertEquals;


		public class ParseTreeBuilderTest {

		import static org.apache.calcite.util.Litmus.IGNORE;


		public class CalciteUtil {



		/**
		* This is a reimplementation of vertical_blank's StandardSqlFormatter.



		public class ParseTreeBuilder extends AstVisitor<SqlNode, ParserVisitorContext> {
		private static final String UNSUPPORTED_EXCEPTION_MSG = "%s is not supported in the visit.";

Convert Presto SQL to Calcite SqlNode #171

Convert Presto SQL to Calcite SqlNode #171

Conversation

wenruimeng commented Oct 10, 2021 • edited Loading

ljfgem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenruimeng Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmoustafa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenruimeng Oct 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmoustafa commented Nov 13, 2021

antumbde commented Nov 14, 2021

antumbde commented Nov 14, 2021

wmoustafa commented Nov 15, 2021 • edited Loading

wenruimeng commented Oct 10, 2021 •

edited

Loading

wenruimeng Oct 25, 2021 •

edited

Loading

wenruimeng Oct 23, 2021 •

edited

Loading

wmoustafa commented Nov 15, 2021 •

edited

Loading