Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12789] [SQL] Support Order By Ordinal in SQL #11815

Closed
wants to merge 9 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to support order by position in SQL, e.g.

select c1, c2, c3 from tbl order by 1 desc, 3

should be equivalent to

select c1, c2, c3 from tbl order by c1 desc, c3 asc

This is controlled by config option spark.sql.orderByOrdinal.

  • When true, the ordinal numbers are treated as the position in the select list.
  • When false, the ordinal number in order/sort By clause are ignored.
  • Only convert integer literals (not foldable expressions). If found foldable expressions, ignore them
  • This also works with select *.

Question: Do we still need sort by columns that contain zero reference? In this case, it will have no impact on the sorting results. IMO, we should not allow users do it. @rxin @cloud-fan @marmbrus @yhuai @hvanhovell
-- Update: In these cases, they are ignored in this case.

Note: This PR is taken from #10731. When merging this PR, please give the credit to @zhichao-li

Also cc all the people who are involved in the previous discussion: @adrian-wang @chenghao-intel @tejasapatil

How was this patch tested?

Added a few test cases for both positive and negative test cases.

@rxin
Copy link
Contributor

rxin commented Mar 18, 2016

Thanks - the description is great!

If the order by list is empty (all or constant foldable), then I'd say we should get rid of the order by. That's what postgres does too.

Update: no it has nothing to do with postgres. But I still think when that flag is off, we should accept ints and they are just noops.

object IntegerIndex {
def unapply(a: Any): Option[Int] = a match {
case Literal(a: Int, IntegerType) => Some(a)
case UnaryMinus(IntegerLiteral(v)) => Some(-v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't get it - why are we matching unary minus ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown in the following link, our parser converts -3 to UnaryMinus(Literal(3)). Thus, we have to use this to get negative values.

case Token("-", child :: Nil) => UnaryMinus(nodeToExpr(child))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why are we getting negative values? Ordinals can only be positive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ic so this is used to detect errors. i'd add some comment here explaining why we are having this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do. Thanks!

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53516 has finished for PR 11815 at commit e0dce39.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Will submit a separate PR for cleaning/eliminating no-op SortOrder in Optimizer. Thanks!

@SparkQA
Copy link

SparkQA commented Mar 18, 2016

Test build #53542 has finished for PR 11815 at commit 256814b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Mar 18, 2016

cc @cloud-fan for review

// Replace the index with the related attribute for ORDER BY
// which is a 1-base position of the projection list.
case s @ Sort(orders, global, child)
if conf.orderByOrdinal && child.resolved &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can add a case above: case s: Sort if !s.child.resolved => s, then we don't need to check child.resolved in following cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do. Thanks!

@cloud-fan
Copy link
Contributor

LGTM except 2 minor comments

@SparkQA
Copy link

SparkQA commented Mar 19, 2016

Test build #53604 has finished for PR 11815 at commit b04529b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -435,6 +435,11 @@ object SQLConf {
defaultValue = Some(true),
doc = "When false, we will treat bucketed table as normal table")

val ORDER_BY_ORDINAL = booleanConf("spark.sql.orderByOrdinal",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a similar configuration value in hive since 0.11 and later, which is hive.groupby.orderby.position.alias(default as false).
See: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy

I think we also need to read from ConfVar for this config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind just diverging from Hive over time and not support any of the config options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrian-wang Thank you for your input! Glad to learn it.

SortOrder(child.output(index - 1), direction)
} else {
throw new UnresolvedException(s,
"Order/sort By position: $index does not exist " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems you miss a s at head? Should be s"... $index..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Let me correct it. Thanks!

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53656 has finished for PR 11815 at commit d52e1c7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, pending tests.

@SparkQA
Copy link

SparkQA commented Mar 21, 2016

Test build #53662 has finished for PR 11815 at commit 46e3f69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 2c5b18f Mar 21, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
#### What changes were proposed in this pull request?
This PR is to support order by position in SQL, e.g.
```SQL
select c1, c2, c3 from tbl order by 1 desc, 3
```
should be equivalent to
```SQL
select c1, c2, c3 from tbl order by c1 desc, c3 asc
```

This is controlled by config option `spark.sql.orderByOrdinal`.
- When true, the ordinal numbers are treated as the position in the select list.
- When false, the ordinal number in order/sort By clause are ignored.

- Only convert integer literals (not foldable expressions). If found foldable expressions, ignore them
- This also works with select *.

**Question**: Do we still need sort by columns that contain zero reference? In this case, it will have no impact on the sorting results. IMO, we should not allow users do it. rxin cloud-fan marmbrus yhuai hvanhovell
-- Update: In these cases, they are ignored in this case.

**Note**: This PR is taken from apache#10731. When merging this PR, please give the credit to zhichao-li

Also cc all the people who are involved in the previous discussion: adrian-wang chenghao-intel tejasapatil

#### How was this patch tested?
Added a few test cases for both positive and negative test cases.

Author: gatorsmile <[email protected]>

Closes apache#11815 from gatorsmile/orderByPosition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants