Handling planning with alias for time for group by and order by #12418

somu-imply · 2022-04-10T05:22:50Z

The query
select __time as t, val from table group by 1,2 order by 1 fails to plan.

The reason is that this query during the explain phase translates into a scan query which uses the alias t but the check in the scan query allows group by only on the __time column. This causes a mismatch as the planner cannot understand that t and __time means the same thing.

This is solved by the rowtype being set from the output type of the subquery in such a case. There is a fallback for the other cases.

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

abhishekagarwal87 · 2022-04-10T09:25:51Z

the check in the scan query allows group by only on the __time column

Did you mean to say that scan query allows order by only on the __time column? Also, what if we ignore erroneous plans during planning? So we could changes DruidRules class to something like below

         if (outerQueryRel.isValidDruidQuery()) {
-          call.transformTo(outerQueryRel);
+          try {
+            call.transformTo(outerQueryRel);
+          } catch (Exception ex) {
+            if (Throwables.getCauseOfType(ex, CannotBuildQueryException.class) == null) {
+              throw ex;
+            }
+            // Do nothing
+          }
         }
       }

I tested the query with above change and it plans successfully.

somu-imply · 2022-04-11T02:09:22Z

@abhishekagarwal87 thanks for your comments. The scan query allows order by only on the __time column and the column name also needs to be __time as in

druid/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java

Line 1205 in 8edea5a

    
           .anyMatch(orderBy -> !orderBy.getColumnName().equals(ColumnHolder.TIME_COLUMN_NAME)))) {

.

Your suggestion works great. It is unfortunate that the planner reaches a scan query (may be due to the exploratory nature) where it should have been contained to an outer sort and an inner group by query. The difference between the two approaches are ignoring a faulty plan vs correcting a faulty plan (as the faulty plan uses the alias name instead of __time). I am not that familiar with calcite, can we guarantee that the faulty plan if corrected will never generate a plan with a better cost ?

abhishekagarwal87 · 2022-04-11T04:19:58Z

Hi Somu, In this case, the transformation can be deemed invalid and the assumption is that calcite will find another plan as it continues exploring. I am assuming that ignoring an exception during transformation is fine but it can be confirmed by asking on the calcite dev mailing list (they are very responsive).

The reason for suggesting this approach is that it can cover more failures scenarios such as the one you put out in the description. Otherwise, we might end up with lot of scenario-specific handling in different parts of the code.

Also, as this change is being tested, it might be worth adding test cases such as below

join between two scan queries with a join on alias of time column
(select __time as dim1, dim2 from table1 group by 1, 2 order by 1) as T1 INNER JOIN (select __time as a1, a2 from table2 group by 1, 2 order by 1) as T2 ON T1.dim1 = T2.a1
Verify that the output field names returned are correct in the header.

clintropolis · 2022-04-11T05:14:56Z

I think changing DruidRules maybe isn't the best fix here, because what is going on is a bit of a mismatch of expectations on a method to check if a query is valid. This code in DruidRel assumes that if a query can be explained without exploding, that it is a 'valid' druid query. However, DruidOuterQueryRel doesn't actually explain the subquery, so cannot actually be treated as if the query is valid because full validation is deferred until the subquery is actually transformed (which is where the failure occurs). But because the query is "valid", the transforms happen and then explodes ,which is not caught by any of the rules which call this method. We would have to handle every call to isValidDruidQuery similarly with a try catch (which already has it's own) if we don't fix the contract of the method itself I think.

The change in this PR just makes DruidOuterQueryRel eagerly 'validate' the subquery by explaining it, so that the exception happens early and the 'is valid' check catches and returns false, which allows the planner to proceed to other plans. An alternative fix would be to change the valid check to transform to an actual druid query instead of using the explain method to validate, which is perhaps better if we think there are other rels that cannot be considered valid by just explaining, at least with their current explain implementations.

abhishekagarwal87 · 2022-04-11T05:33:52Z

hmm. I believe the part of the PR that fixes the bug, is using the row signature of the subquery. If you remove that and just explain the subQuery, the overall query will still fail with the same exception.

clintropolis · 2022-04-12T19:39:49Z

hmm. I believe the part of the PR that fixes the bug, is using the row signature of the subquery. If you remove that and just explain the subQuery, the overall query will still fail with the same exception.

Sorry yes, I confused myself, it is the outer scan query that is invalid, not the subquery here because the outer cannot order by the aliased time of the subquery, i mixed them up. But, it still is reporting that the query is valid because of the way this explain works, while calling toDruidQuery on the same fails with the CannotBuildQueryException we see when we try to transform the query after the valid check, so to me that seems it is not in fact a valid query.

The worry I think is that changing explain to also explain the subquery might result in extra work being done which could result in increased planning time for nested queries, which is reasonable to be concerned about. Though, i also wonder if by not actually validating here if we are allowing other work to proceed which should not have gotten so far, like in this case if explain failed then it would never proceed to the rule transformation to fail again. I'm not really sure what is best though... or how to know

clintropolis

I think this seems fine, I leave it up to you @abhishekagarwal87 if you'd prefer it done another way since I don't feel that strongly about it

abhishekagarwal87 · 2022-04-13T06:24:55Z

I am definitely concerned about increased planning time. The druidRel would have been validated already but we would still explain it repeatedly. This could be a problem for queries that have many levels of nesting. Again, the problem is not in the explain itself. It's just that isValidDruidQuery and transformTo are calling explain on query with different inputs. Those inputs are different because of transformTo function.
I will still recommend the other approach so that we can just ignore bad in-flight transformations and move on to other rules.

Also, the scan query need not require ordering on the time column when the underlying data source is a query data source. That is actually bound to give wrong ordering because the underlying data itself might not be sorted by time.

somu-imply · 2022-04-13T18:33:54Z

@clintropolis and @abhishekagarwal87 I have updated the code and have added a test case with alias for inner joins as asked in an earlier review comment

…g to be cool, so scan should be invalid for query datasource if it has order by

abhishekagarwal87

Thank you @somu-imply. The changes LGTM. There are still some questions I have on one of the test cases.

abhishekagarwal87 · 2022-04-14T07:26:19Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java

@@ -7002,7 +7008,7 @@ public void testExplainExactCountDistinctOfSemiJoinResult() throws Exception
                         + "  )\n"
                         + ")";
    final String legacyExplanation =
-        "DruidOuterQueryRel(query=[{\"queryType\":\"timeseries\",\"dataSource\":{\"type\":\"table\",\"name\":\"__subquery__\"},\"intervals\":{\"type\":\"intervals\",\"intervals\":[\"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z\"]},\"descending\":false,\"virtualColumns\":[],\"filter\":null,\"granularity\":{\"type\":\"all\"},\"aggregations\":[{\"type\":\"count\",\"name\":\"a0\"}],\"postAggregations\":[],\"limit\":2147483647,\"context\":{\"defaultTimeout\":300000,\"maxScatterGatherBytes\":9223372036854775807,\"sqlCurrentTimestamp\":\"2000-01-01T00:00:00Z\",\"sqlQueryId\":\"dummy\",\"vectorize\":\"false\",\"vectorizeVirtualColumns\":\"false\"}}], signature=[{a0:LONG}])\n"


not sure why this changed but the query still runs as before since the final native query is still the same.

I am not sure why the timeseries changed to a groupBy. But it may be due to the change in DUMMY_DATA_SOURCE moving from a table source to a query data source which led the planner to do something else.

Ah right. That makes sense.

abhishekagarwal87 · 2022-04-14T07:28:21Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java

@@ -6719,6 +6719,12 @@ public void testMinMaxAvgDailyCountWithLimit() throws Exception
                            )
                        )
                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                        .setLimitSpec(


I am surprised that explanation didn't need to change here. The object should also have the same limit spec as this native plan.

Yes the previous would have been limitSpec=NoopLimitSpec but now it changed to limitSpec=DefaultLimitSpec{columns='[]', offset=0, limit=1} . But considering the query has a LIMIT 1 at the end, I thought this was acceptable. I updated the limit to a different value and it shows up in the DefaultLimitSpec

abhishekagarwal87 · 2022-04-15T05:07:05Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java

+        plannerContext.setPlanningError(
+            "SQL query requires order by non-time column on a datasource[%s], which is not supported.",
+            dataSource


I realized that this message is incorrect. You already checked that the column being ordered on is a time column.

Suggested change

plannerContext.setPlanningError(

"SQL query requires order by non-time column on a datasource[%s], which is not supported.",

dataSource

plannerContext.setPlanningError(

"SQL query is a scan and requires order by on a datasource of type[%s], which is not supported.",

dataSource.getType()

somu-imply added 2 commits April 9, 2022 22:14

Handling planning with alias for time for group by and order by

e2ab8e6

Minor change

c921ac1

clintropolis added Bug Area - SQL labels Apr 11, 2022

clintropolis approved these changes Apr 13, 2022

View reviewed changes

Handling Druid rules, adding tests

2b20025

ordering scan by __time from a subquery is probably almost never goin…

1c56ec6

…g to be cool, so scan should be invalid for query datasource if it has order by

somu-imply force-pushed the PlanningError branch from fcea1e0 to 1c56ec6 Compare April 14, 2022 01:17

oops checkstyle

76cf13b

abhishekagarwal87 reviewed Apr 14, 2022

View reviewed changes

Reverting to old misc.xml

30c2d4b

somu-imply force-pushed the PlanningError branch from e9fd3db to 30c2d4b Compare April 14, 2022 16:01

abhishekagarwal87 approved these changes Apr 15, 2022

View reviewed changes

abhishekagarwal87 merged commit cd6fba2 into apache:master Apr 15, 2022

abhishekagarwal87 reviewed Apr 15, 2022

View reviewed changes

suneet-s mentioned this pull request Apr 20, 2022

Updating an error msg #12450

Merged

9 tasks

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

somu-imply mentioned this pull request Oct 6, 2022

Fix row schema of nested queries with ORDER BY clause to get proper row type during planning validation #13173

Closed

9 tasks

abhishekagarwal87 mentioned this pull request Oct 6, 2022

Join queries with order by clause doesn't work #13163

Open

somu-imply mentioned this pull request Mar 22, 2023

Planning correctly for order by queries on time which previously thre… #13965

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling planning with alias for time for group by and order by #12418

Handling planning with alias for time for group by and order by #12418

somu-imply commented Apr 10, 2022 •

edited

Loading

abhishekagarwal87 commented Apr 10, 2022 •

edited

Loading

somu-imply commented Apr 11, 2022

abhishekagarwal87 commented Apr 11, 2022

clintropolis commented Apr 11, 2022

abhishekagarwal87 commented Apr 11, 2022

clintropolis commented Apr 12, 2022

clintropolis left a comment

abhishekagarwal87 commented Apr 13, 2022 •

edited

Loading

somu-imply commented Apr 13, 2022

abhishekagarwal87 left a comment

abhishekagarwal87 Apr 14, 2022

somu-imply Apr 14, 2022

abhishekagarwal87 Apr 15, 2022

abhishekagarwal87 Apr 14, 2022

somu-imply Apr 14, 2022

abhishekagarwal87 Apr 15, 2022

Handling planning with alias for time for group by and order by #12418

Handling planning with alias for time for group by and order by #12418

Conversation

somu-imply commented Apr 10, 2022 • edited Loading

abhishekagarwal87 commented Apr 10, 2022 • edited Loading

somu-imply commented Apr 11, 2022

abhishekagarwal87 commented Apr 11, 2022

clintropolis commented Apr 11, 2022

abhishekagarwal87 commented Apr 11, 2022

clintropolis commented Apr 12, 2022

clintropolis left a comment

Choose a reason for hiding this comment

abhishekagarwal87 commented Apr 13, 2022 • edited Loading

somu-imply commented Apr 13, 2022

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

abhishekagarwal87 Apr 14, 2022

Choose a reason for hiding this comment

somu-imply Apr 14, 2022

Choose a reason for hiding this comment

abhishekagarwal87 Apr 15, 2022

Choose a reason for hiding this comment

abhishekagarwal87 Apr 14, 2022

Choose a reason for hiding this comment

somu-imply Apr 14, 2022

Choose a reason for hiding this comment

abhishekagarwal87 Apr 15, 2022

Choose a reason for hiding this comment

somu-imply commented Apr 10, 2022 •

edited

Loading

abhishekagarwal87 commented Apr 10, 2022 •

edited

Loading

abhishekagarwal87 commented Apr 13, 2022 •

edited

Loading