Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error #20406

Closed
wants to merge 1 commit into from

Conversation

cxzl25
Copy link
Contributor

@cxzl25 cxzl25 commented Jan 26, 2018

When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error.
We should take the default type of textfile and sequencefile both as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.

set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

Also cc @dongjoon-hyun

@SparkQA
Copy link

SparkQA commented Jan 27, 2018

Test build #86736 has finished for PR 20406 at commit f370dd6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cxzl25
Copy link
Contributor Author

cxzl25 commented Feb 11, 2018

ping @gatorsmile @dongjoon-hyun

set hive.default.fileformat=orc;
create table tbl stored as textfile
as
select  1

It failed because it used the wrong SERDE

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow cannot be cast to org.apache.hadoop.io.BytesWritable
	at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:91)
	at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:327)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
	... 16 more

@dongjoon-hyun
Copy link
Member

Thank you for contribution, @cxzl25 . Could you update the title based on your statement?

When hive.default.fileformat is other kinds of file types, create textfile table cause a serda error.

@@ -100,6 +100,25 @@ class HiveSerDeSuite extends HiveComparisonTest with PlanTest with BeforeAndAfte
assert(output == Some("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"))
assert(serde == Some("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"))
}

withSQLConf("hive.default.fileformat" -> "orc") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test with all possible values which are supported by Spark.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this PR does not need to improve the test coverage. What we really need to do is to confirm whether Hive's default serde are the ones added by this PR. Anybody can run it and post the results here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/StorageFormat.java#L102

hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Description: The default SerDe Hive will use for storage formats that do not specify a SerDe.

https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-RegistrationofNativeSerDes

hive cli

set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

@cxzl25 cxzl25 changed the title [SPARK-23230][SQL]Error by creating a data table when using hive.default.fileformat=orc [SPARK-23230][SQL]When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error Feb 12, 2018
@dongjoon-hyun
Copy link
Member

Thank you for updating the title, @cxzl25 .
Actually, this was not related to ORC logically from the beginning.

@gatorsmile
Copy link
Member

gatorsmile commented Feb 13, 2018

LGTM

Thanks! Merged to master/2.3

asfgit pushed a commit that referenced this pull request Feb 13, 2018
…e types, create textfile table cause a serde error

When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error.
We should take the default type of textfile and sequencefile both as org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.

```
set hive.default.fileformat=orc;
create table tbl( i string ) stored as textfile;
desc formatted tbl;

Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat  org.apache.hadoop.mapred.TextInputFormat
OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
```

Author: sychen <[email protected]>

Closes #20406 from cxzl25/default_serde.

(cherry picked from commit 4104b68)
Signed-off-by: gatorsmile <[email protected]>
@gatorsmile
Copy link
Member

Could you please submit a separate PR to 2.2? Thanks!

@asfgit asfgit closed this in 4104b68 Feb 13, 2018
@cxzl25
Copy link
Contributor Author

cxzl25 commented Feb 13, 2018

Thanks for your help , @dongjoon-hyun @gasparms .
I submit a separate PR to 2.2
#20593

@gatorsmile
Copy link
Member

gatorsmile commented Feb 14, 2018

This is a trivial bug fix. I am fine if anybody wants to revert it from Spark 2.3.0, merge it back to Spark 2.3.1 later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants