Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1913][SQL] Bug fix: column pruning error in Parquet support #863

Closed
wants to merge 3 commits into from

Conversation

liancheng
Copy link
Contributor

JIRA issue: SPARK-1913

When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the ParquetTableScan operator and causes exception.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15166/

@liancheng
Copy link
Contributor Author

@marmbrus @rxin This bug should be blocking for Spark 1.0 release, please help review, thanks.

(Sorry, forgot that Parquet filter pushdown is not a Spark 1.0 feature.)

scanBuilder: Seq[Attribute] => SparkPlan): SparkPlan = {

val projectSet = projectList.flatMap(_.references).toSet
val filterSet = filterPredicates.flatMap(_.references).toSet
val filterCondition = filterPredicates.reduceLeftOption(And)
val filterCondition = prunePushedDownFilter
.map(filterPredicates.filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this slightly hard to understand because prunePushedDownFilter is an option. Can we write this out as a full closure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the Option, now it should be clear :)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15179/

@rxin
Copy link
Contributor

rxin commented May 25, 2014

Thanks. I've merged this.

@asfgit asfgit closed this in 5afe6af May 25, 2014
@liancheng liancheng deleted the spark-1913 branch May 25, 2014 06:15
@liancheng liancheng restored the spark-1913 branch June 13, 2014 17:29
asfgit pushed a commit that referenced this pull request Jun 13, 2014
#511 and #863 got left out of branch-1.0 since we were really close to the release.  Now that they have been tested a little I see no reason to leave them out.

Author: Michael Armbrust <[email protected]>
Author: witgo <[email protected]>

Closes #1078 from marmbrus/branch-1.0 and squashes the following commits:

22be674 [witgo]  [SPARK-1841]: update scalatest to version 2.1.5
fc8fc79 [Michael Armbrust] Include #1071 as well.
c5d0adf [Michael Armbrust] Update SparkSQL in branch-1.0 to match master.
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
JIRA issue: [SPARK-1913](https://issues.apache.org/jira/browse/SPARK-1913)

When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception.

Author: Cheng Lian <[email protected]>

Closes apache#863 from liancheng/spark-1913 and squashes the following commits:

f976b73 [Cheng Lian] Addessed the readability issue commented by @rxin
f5b257d [Cheng Lian] Added back comments deleted by mistake
ae60ab3 [Cheng Lian] [SPARK-1913] Attributes referenced only in predicates pushed down should remain in ParquetTableScan operator
@liancheng liancheng deleted the spark-1913 branch July 3, 2014 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants