Skip to content

Commit

Permalink
[SPARK-7303] [SQL] push down project if possible when the child is sort
Browse files Browse the repository at this point in the history
Optimize the case of `project(_, sort)` , a example is:

`select key from (select * from testData order by key) t`

before this PR:
```
== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
    'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
    Subquery testData
     LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Physical Plan ==
Project [key#0]
 Sort [key#0 ASC], true
  Exchange (RangePartitioning [key#0 ASC], 5), []
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
```

after this PR
```
== Parsed Logical Plan ==
'Project ['key]
 'Subquery t
  'Sort ['key ASC], true
   'Project [*]
    'UnresolvedRelation [testData], None

== Analyzed Logical Plan ==
Project [key#0]
 Subquery t
  Sort [key#0 ASC], true
   Project [key#0,value#1]
    Subquery testData
     LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Optimized Logical Plan ==
Sort [key#0 ASC], true
 Project [key#0]
  LogicalRDD [key#0,value#1], MapPartitionsRDD[1]

== Physical Plan ==
Sort [key#0 ASC], true
 Exchange (RangePartitioning [key#0 ASC], 5), []
  Project [key#0]
   PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
```

with this rule we will first do column pruning on the table and then do sorting.

Author: scwf <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Michael Armbrust <[email protected]>

Closes apache#5838 from scwf/pruning and squashes the following commits:

b00d833 [scwf] address michael's comment
e230155 [scwf] fix tests failure
b09b895 [scwf] improve column pruning
  • Loading branch information
scwf authored and marmbrus committed May 13, 2015
1 parent df2fb13 commit 59250fe
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,11 @@ object ColumnPruning extends Rule[LogicalPlan] {
case Project(projectList, Limit(exp, child)) =>
Limit(exp, Project(projectList, child))

// push down project if possible when the child is sort
case p @ Project(projectList, s @ Sort(_, _, grandChild))
if s.references.subsetOf(p.outputSet) =>
s.copy(child = Project(projectList, grandChild))

// Eliminate no-op Projects
case Project(projectList, child) if child.output == projectList => child
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ package org.apache.spark.sql.catalyst.optimizer

import org.apache.spark.sql.catalyst.analysis
import org.apache.spark.sql.catalyst.analysis.EliminateSubQueries
import org.apache.spark.sql.catalyst.expressions.{Count, Explode}
import org.apache.spark.sql.catalyst.expressions.{SortOrder, Ascending, Count, Explode}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.plans.{LeftSemi, PlanTest, LeftOuter, RightOuter}
import org.apache.spark.sql.catalyst.rules._
Expand Down Expand Up @@ -542,4 +542,38 @@ class FilterPushdownSuite extends PlanTest {

comparePlans(optimized, originalQuery)
}

test("push down project past sort") {
val x = testRelation.subquery('x)

// push down valid
val originalQuery = {
x.select('a, 'b)
.sortBy(SortOrder('a, Ascending))
.select('a)
}

val optimized = Optimize.execute(originalQuery.analyze)
val correctAnswer =
x.select('a)
.sortBy(SortOrder('a, Ascending)).analyze

comparePlans(optimized, analysis.EliminateSubQueries(correctAnswer))

// push down invalid
val originalQuery1 = {
x.select('a, 'b)
.sortBy(SortOrder('a, Ascending))
.select('b)
}

val optimized1 = Optimize.execute(originalQuery1.analyze)
val correctAnswer1 =
x.select('a, 'b)
.sortBy(SortOrder('a, Ascending))
.select('b).analyze

comparePlans(optimized1, analysis.EliminateSubQueries(correctAnswer1))

}
}

0 comments on commit 59250fe

Please sign in to comment.