[SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true #4775

yhuai · 2015-02-25T23:57:15Z

Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.

yhuai · 2015-02-25T23:58:20Z

@liancheng Can we remove FilteringParquetRowInputFormat since task side split is in parquet? If you think it is good, we can do it in another PR.

SparkQA · 2015-02-26T01:21:50Z

Test build #27966 has finished for PR 4775 at commit 1541554.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-02-26T04:16:10Z

@yhuai As long as we decide to completely deprecate client side metadata reading, it's OK to remove FilteringParquetInputFormat.

liancheng · 2015-02-26T05:39:36Z

@yhuai Actually, as we discussed offline, FilteringParquetRowInputFormat is still necessary, as we have to do schema merging in getSplits to prevent the exception thrown in SPARK-6010

liancheng · 2015-02-26T15:00:18Z

This LGTM, please help rebasing it, then I can merge it.

SparkQA · 2015-02-26T16:49:48Z

Test build #28006 has finished for PR 4775 at commit 78787b1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liancheng · 2015-02-26T17:00:10Z

Merging into master and branch-1.3, thanks!

… existing table when spark.sql.parquet.cacheMetadata=true Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug. Author: Yin Huai <[email protected]> Closes #4775 from yhuai/parquetFooterCache and squashes the following commits: 78787b1 [Yin Huai] Remove footerCache in FilteringParquetRowInputFormat. dff6fba [Yin Huai] Failed unit test. (cherry picked from commit 192e42a) Signed-off-by: Cheng Lian <[email protected]>

karthikgolagani · 2017-01-11T05:50:45Z

@liancheng
Hi lian, if you are using sparkContext(sc), you can set ("parquet.enable.summary-metadata", "false") like below:

sc.("parquet.enable.summary-metadata", "false"). This fixed my issue instantly .
I did it in my spark streaming application.

WARN ParquetOutputCommitter: could not write summary file for hdfs://localhost/user/hive/warehouse java.lang.NullPointerException

yhuai added 2 commits February 26, 2015 07:31

Failed unit test.

dff6fba

Remove footerCache in FilteringParquetRowInputFormat.

78787b1

asfgit closed this in 192e42a Feb 26, 2015

liancheng mentioned this pull request Feb 26, 2015

[SPARK-6037][SQL] Avoiding duplicate Parquet schema merging #4786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true #4775

[SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true #4775

yhuai commented Feb 25, 2015

yhuai commented Feb 25, 2015

SparkQA commented Feb 26, 2015

liancheng commented Feb 26, 2015

liancheng commented Feb 26, 2015

liancheng commented Feb 26, 2015

SparkQA commented Feb 26, 2015

liancheng commented Feb 26, 2015

karthikgolagani commented Jan 11, 2017

[SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true #4775

[SPARK-6016][SQL] Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true #4775

Conversation

yhuai commented Feb 25, 2015

yhuai commented Feb 25, 2015

SparkQA commented Feb 26, 2015

liancheng commented Feb 26, 2015

liancheng commented Feb 26, 2015

liancheng commented Feb 26, 2015

SparkQA commented Feb 26, 2015

liancheng commented Feb 26, 2015

karthikgolagani commented Jan 11, 2017