Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true #4775

Closed
wants to merge 2 commits into from

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented Feb 25, 2015

Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.

@yhuai
Copy link
Contributor Author

yhuai commented Feb 25, 2015

@liancheng Can we remove FilteringParquetRowInputFormat since task side split is in parquet? If you think it is good, we can do it in another PR.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #27966 has finished for PR 4775 at commit 1541554.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@yhuai As long as we decide to completely deprecate client side metadata reading, it's OK to remove FilteringParquetInputFormat.

@liancheng
Copy link
Contributor

@yhuai Actually, as we discussed offline, FilteringParquetRowInputFormat is still necessary, as we have to do schema merging in getSplits to prevent the exception thrown in SPARK-6010

@liancheng
Copy link
Contributor

This LGTM, please help rebasing it, then I can merge it.

@SparkQA
Copy link

SparkQA commented Feb 26, 2015

Test build #28006 has finished for PR 4775 at commit 78787b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Merging into master and branch-1.3, thanks!

@asfgit asfgit closed this in 192e42a Feb 26, 2015
asfgit pushed a commit that referenced this pull request Feb 26, 2015
… existing table when spark.sql.parquet.cacheMetadata=true

Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.

Author: Yin Huai <[email protected]>

Closes #4775 from yhuai/parquetFooterCache and squashes the following commits:

78787b1 [Yin Huai] Remove footerCache in FilteringParquetRowInputFormat.
dff6fba [Yin Huai] Failed unit test.

(cherry picked from commit 192e42a)
Signed-off-by: Cheng Lian <[email protected]>
@karthikgolagani
Copy link

@liancheng
Hi lian, if you are using sparkContext(sc), you can set ("parquet.enable.summary-metadata", "false") like below:

sc.("parquet.enable.summary-metadata", "false"). This fixed my issue instantly .
I did it in my spark streaming application.

WARN ParquetOutputCommitter: could not write summary file for hdfs://localhost/user/hive/warehouse java.lang.NullPointerException

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants