[SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error #20406

cxzl25 · 2018-01-26T12:45:55Z

When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error.
We should take the default type of textfile and sequencefile both as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.

set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

….hadoop.hive.serde2.lazy.LazySimpleSerDe

gatorsmile · 2018-01-27T19:33:01Z

ok to test

gatorsmile · 2018-01-27T19:33:11Z

Also cc @dongjoon-hyun

SparkQA · 2018-01-27T23:10:46Z

Test build #86736 has finished for PR 20406 at commit f370dd6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cxzl25 · 2018-02-11T13:19:57Z

ping @gatorsmile @dongjoon-hyun

set hive.default.fileformat=orc;
create table tbl stored as textfile
as
select  1

It failed because it used the wrong SERDE

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow cannot be cast to org.apache.hadoop.io.BytesWritable
	at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
	at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
	... 16 more

dongjoon-hyun · 2018-02-11T16:35:41Z

Thank you for contribution, @cxzl25 . Could you update the title based on your statement?

When hive.default.fileformat is other kinds of file types, create textfile table cause a serda error.

dongjoon-hyun · 2018-02-11T16:38:21Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala

@@ -100,6 +100,25 @@ class HiveSerDeSuite extends HiveComparisonTest with PlanTest with BeforeAndAfte
      assert(output == Some("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"))
      assert(serde == Some("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"))
    }
+
+    withSQLConf("hive.default.fileformat" -> "orc") {


Please test with all possible values which are supported by Spark.

Actually, this PR does not need to improve the test coverage. What we really need to do is to confirm whether Hive's default serde are the ones added by this PR. Anybody can run it and post the results here?

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java#L102

hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Description: The default SerDe Hive will use for storage formats that do not specify a SerDe.

https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-RegistrationofNativeSerDes

hive cli

set hive.default.fileformat=orc; create table tbl( i string ) stored as textfile; desc formatted tbl;

SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

dongjoon-hyun · 2018-02-12T17:09:01Z

Thank you for updating the title, @cxzl25 .
Actually, this was not related to ORC logically from the beginning.

gatorsmile · 2018-02-13T00:00:14Z

LGTM

Thanks! Merged to master/2.3

…e types, create textfile table cause a serde error When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error. We should take the default type of textfile and sequencefile both as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe. ``` set hive.default.fileformat=orc; create table tbl( i string ) stored as textfile; desc formatted tbl; Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat ``` Author: sychen <[email protected]> Closes #20406 from cxzl25/default_serde. (cherry picked from commit 4104b68) Signed-off-by: gatorsmile <[email protected]>

gatorsmile · 2018-02-13T00:03:31Z

Could you please submit a separate PR to 2.2? Thanks!

cxzl25 · 2018-02-13T02:14:20Z

Thanks for your help , @dongjoon-hyun @gasparms .
I submit a separate PR to 2.2
#20593

gatorsmile · 2018-02-14T23:39:12Z

This is a trivial bug fix. I am fine if anybody wants to revert it from Spark 2.3.0, merge it back to Spark 2.3.1 later.

take the default type of textfile and sequencefile both as org.apache…

f370dd6

….hadoop.hive.serde2.lazy.LazySimpleSerDe

dongjoon-hyun reviewed Feb 11, 2018

View reviewed changes

cxzl25 changed the title ~~[SPARK-23230][SQL]Error by creating a data table when using hive.default.fileformat=orc~~ [SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error Feb 12, 2018

asfgit closed this in 4104b68 Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error #20406

[SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error #20406

cxzl25 commented Jan 26, 2018 •

edited

Loading

gatorsmile commented Jan 27, 2018

gatorsmile commented Jan 27, 2018

SparkQA commented Jan 27, 2018

cxzl25 commented Feb 11, 2018

dongjoon-hyun commented Feb 11, 2018

dongjoon-hyun Feb 11, 2018

gatorsmile Feb 12, 2018

cxzl25 Feb 12, 2018

dongjoon-hyun commented Feb 12, 2018

gatorsmile commented Feb 13, 2018 •

edited

Loading

gatorsmile commented Feb 13, 2018

cxzl25 commented Feb 13, 2018

gatorsmile commented Feb 14, 2018 •

edited

Loading

[SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error #20406

[SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error #20406

Conversation

cxzl25 commented Jan 26, 2018 • edited Loading

gatorsmile commented Jan 27, 2018

gatorsmile commented Jan 27, 2018

SparkQA commented Jan 27, 2018

cxzl25 commented Feb 11, 2018

dongjoon-hyun commented Feb 11, 2018

dongjoon-hyun Feb 11, 2018

Choose a reason for hiding this comment

gatorsmile Feb 12, 2018

Choose a reason for hiding this comment

cxzl25 Feb 12, 2018

Choose a reason for hiding this comment

dongjoon-hyun commented Feb 12, 2018

gatorsmile commented Feb 13, 2018 • edited Loading

gatorsmile commented Feb 13, 2018

cxzl25 commented Feb 13, 2018

gatorsmile commented Feb 14, 2018 • edited Loading

cxzl25 commented Jan 26, 2018 •

edited

Loading

gatorsmile commented Feb 13, 2018 •

edited

Loading

gatorsmile commented Feb 14, 2018 •

edited

Loading