-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12789] [SQL] Support Order By Ordinal in SQL #11815
Conversation
Thanks - the description is great! If the order by list is empty (all or constant foldable), then I'd say we should get rid of the order by. That's what postgres does too. Update: no it has nothing to do with postgres. But I still think when that flag is off, we should accept ints and they are just noops. |
object IntegerIndex { | ||
def unapply(a: Any): Option[Int] = a match { | ||
case Literal(a: Int, IntegerType) => Some(a) | ||
case UnaryMinus(IntegerLiteral(v)) => Some(-v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still don't get it - why are we matching unary minus ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As shown in the following link, our parser converts -3
to UnaryMinus(Literal(3))
. Thus, we have to use this to get negative values.
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/CatalystQl.scala
Line 641 in 637a78f
case Token("-", child :: Nil) => UnaryMinus(nodeToExpr(child)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why are we getting negative values? Ordinals can only be positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When users input negative values, we just detect and then issue an exception message, like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ic so this is used to detect errors. i'd add some comment here explaining why we are having this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do. Thanks!
Test build #53516 has finished for PR 11815 at commit
|
Will submit a separate PR for cleaning/eliminating no-op |
Test build #53542 has finished for PR 11815 at commit
|
cc @cloud-fan for review |
// Replace the index with the related attribute for ORDER BY | ||
// which is a 1-base position of the projection list. | ||
case s @ Sort(orders, global, child) | ||
if conf.orderByOrdinal && child.resolved && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can add a case above: case s: Sort if !s.child.resolved => s
, then we don't need to check child.resolved
in following cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do. Thanks!
LGTM except 2 minor comments |
Test build #53604 has finished for PR 11815 at commit
|
@@ -435,6 +435,11 @@ object SQLConf { | |||
defaultValue = Some(true), | |||
doc = "When false, we will treat bucketed table as normal table") | |||
|
|||
val ORDER_BY_ORDINAL = booleanConf("spark.sql.orderByOrdinal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a similar configuration value in hive since 0.11 and later, which is hive.groupby.orderby.position.alias
(default as false
).
See: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy
I think we also need to read from ConfVar for this config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind just diverging from Hive over time and not support any of the config options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrian-wang Thank you for your input! Glad to learn it.
SortOrder(child.output(index - 1), direction) | ||
} else { | ||
throw new UnresolvedException(s, | ||
"Order/sort By position: $index does not exist " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems you miss a s
at head? Should be s"... $index..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Let me correct it. Thanks!
Test build #53656 has finished for PR 11815 at commit
|
LGTM, pending tests. |
Test build #53662 has finished for PR 11815 at commit
|
#### What changes were proposed in this pull request? This PR is to support order by position in SQL, e.g. ```SQL select c1, c2, c3 from tbl order by 1 desc, 3 ``` should be equivalent to ```SQL select c1, c2, c3 from tbl order by c1 desc, c3 asc ``` This is controlled by config option `spark.sql.orderByOrdinal`. - When true, the ordinal numbers are treated as the position in the select list. - When false, the ordinal number in order/sort By clause are ignored. - Only convert integer literals (not foldable expressions). If found foldable expressions, ignore them - This also works with select *. **Question**: Do we still need sort by columns that contain zero reference? In this case, it will have no impact on the sorting results. IMO, we should not allow users do it. rxin cloud-fan marmbrus yhuai hvanhovell -- Update: In these cases, they are ignored in this case. **Note**: This PR is taken from apache#10731. When merging this PR, please give the credit to zhichao-li Also cc all the people who are involved in the previous discussion: adrian-wang chenghao-intel tejasapatil #### How was this patch tested? Added a few test cases for both positive and negative test cases. Author: gatorsmile <[email protected]> Closes apache#11815 from gatorsmile/orderByPosition.
What changes were proposed in this pull request?
This PR is to support order by position in SQL, e.g.
should be equivalent to
This is controlled by config option
spark.sql.orderByOrdinal
.Question: Do we still need sort by columns that contain zero reference? In this case, it will have no impact on the sorting results. IMO, we should not allow users do it. @rxin @cloud-fan @marmbrus @yhuai @hvanhovell
-- Update: In these cases, they are ignored in this case.
Note: This PR is taken from #10731. When merging this PR, please give the credit to @zhichao-li
Also cc all the people who are involved in the previous discussion: @adrian-wang @chenghao-intel @tejasapatil
How was this patch tested?
Added a few test cases for both positive and negative test cases.