-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling planning with alias for time for group by and order by #12418
Conversation
Did you mean to say that scan query allows order by only on the __time column? Also, what if we ignore erroneous plans during planning? So we could changes
I tested the query with above change and it plans successfully. |
@abhishekagarwal87 thanks for your comments. The scan query allows order by only on the __time column and the column name also needs to be
Your suggestion works great. It is unfortunate that the planner reaches a scan query (may be due to the exploratory nature) where it should have been contained to an outer sort and an inner group by query. The difference between the two approaches are ignoring a faulty plan vs correcting a faulty plan (as the faulty plan uses the alias name instead of __time). I am not that familiar with calcite, can we guarantee that the faulty plan if corrected will never generate a plan with a better cost ? |
Hi Somu, In this case, the transformation can be deemed invalid and the assumption is that calcite will find another plan as it continues exploring. I am assuming that ignoring an exception during transformation is fine but it can be confirmed by asking on the calcite dev mailing list (they are very responsive). The reason for suggesting this approach is that it can cover more failures scenarios such as the one you put out in the description. Otherwise, we might end up with lot of scenario-specific handling in different parts of the code. Also, as this change is being tested, it might be worth adding test cases such as below
|
I think changing The change in this PR just makes |
hmm. I believe the part of the PR that fixes the bug, is using the row signature of the subquery. If you remove that and just explain the subQuery, the overall query will still fail with the same exception. |
Sorry yes, I confused myself, it is the outer scan query that is invalid, not the subquery here because the outer cannot order by the aliased time of the subquery, i mixed them up. But, it still is reporting that the query is valid because of the way this explain works, while calling The worry I think is that changing explain to also explain the subquery might result in extra work being done which could result in increased planning time for nested queries, which is reasonable to be concerned about. Though, i also wonder if by not actually validating here if we are allowing other work to proceed which should not have gotten so far, like in this case if explain failed then it would never proceed to the rule transformation to fail again. I'm not really sure what is best though... or how to know |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this seems fine, I leave it up to you @abhishekagarwal87 if you'd prefer it done another way since I don't feel that strongly about it
I am definitely concerned about increased planning time. The druidRel would have been validated already but we would still explain it repeatedly. This could be a problem for queries that have many levels of nesting. Again, the problem is not in the Also, the scan query need not require ordering on the time column when the underlying data source is a query data source. That is actually bound to give wrong ordering because the underlying data itself might not be sorted by time. |
@clintropolis and @abhishekagarwal87 I have updated the code and have added a test case with alias for inner joins as asked in an earlier review comment |
…g to be cool, so scan should be invalid for query datasource if it has order by
fcea1e0
to
1c56ec6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @somu-imply. The changes LGTM. There are still some questions I have on one of the test cases.
@@ -7002,7 +7008,7 @@ public void testExplainExactCountDistinctOfSemiJoinResult() throws Exception | |||
+ " )\n" | |||
+ ")"; | |||
final String legacyExplanation = | |||
"DruidOuterQueryRel(query=[{\"queryType\":\"timeseries\",\"dataSource\":{\"type\":\"table\",\"name\":\"__subquery__\"},\"intervals\":{\"type\":\"intervals\",\"intervals\":[\"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z\"]},\"descending\":false,\"virtualColumns\":[],\"filter\":null,\"granularity\":{\"type\":\"all\"},\"aggregations\":[{\"type\":\"count\",\"name\":\"a0\"}],\"postAggregations\":[],\"limit\":2147483647,\"context\":{\"defaultTimeout\":300000,\"maxScatterGatherBytes\":9223372036854775807,\"sqlCurrentTimestamp\":\"2000-01-01T00:00:00Z\",\"sqlQueryId\":\"dummy\",\"vectorize\":\"false\",\"vectorizeVirtualColumns\":\"false\"}}], signature=[{a0:LONG}])\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this changed but the query still runs as before since the final native query is still the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why the timeseries changed to a groupBy. But it may be due to the change in DUMMY_DATA_SOURCE moving from a table source to a query data source which led the planner to do something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. That makes sense.
@@ -6719,6 +6719,12 @@ public void testMinMaxAvgDailyCountWithLimit() throws Exception | |||
) | |||
) | |||
.setInterval(querySegmentSpec(Filtration.eternity())) | |||
.setLimitSpec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that explanation
didn't need to change here. The object should also have the same limit spec as this native plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the previous would have been limitSpec=NoopLimitSpec
but now it changed to limitSpec=DefaultLimitSpec{columns='[]', offset=0, limit=1}
. But considering the query has a LIMIT 1 at the end, I thought this was acceptable. I updated the limit to a different value and it shows up in the DefaultLimitSpec
e9fd3db
to
30c2d4b
Compare
plannerContext.setPlanningError( | ||
"SQL query requires order by non-time column on a datasource[%s], which is not supported.", | ||
dataSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that this message is incorrect. You already checked that the column being ordered on is a time column.
plannerContext.setPlanningError( | |
"SQL query requires order by non-time column on a datasource[%s], which is not supported.", | |
dataSource | |
plannerContext.setPlanningError( | |
"SQL query is a scan and requires order by on a datasource of type[%s], which is not supported.", | |
dataSource.getType() |
The query
select __time as t, val from table group by 1,2 order by 1
fails to plan.The reason is that this query during the explain phase translates into a scan query which uses the alias
t
but the check in the scan query allows group by only on the__time
column. This causes a mismatch as the planner cannot understand thatt
and__time
means the same thing.This is solved by the rowtype being set from the output type of the subquery in such a case. There is a fallback for the other cases.
This PR has: