[SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver #20385

atallahhezbor · 2018-01-24T16:08:38Z

Signed-off-by: Atallah Hezbor [email protected]

What changes were proposed in this pull request?

This PR proposes modifying the match statement that gets the columns of a row in HiveThriftServer. There was previously no case for UserDefinedType, so querying a table that contained them would throw a match error. The changes catch that case and return the string representation.

How was this patch tested?

While I would have liked to add a unit test, I couldn't easily incorporate UDTs into the HiveThriftServer2Suites pipeline. With some guidance I would be happy to push a commit with tests.

Instead I did a manual test by loading a DataFrame with Point UDT in a spark shell with a HiveThriftServer. Then in beeline, connecting to the server and querying that table.

Here is the result before the change

0: jdbc:hive2://localhost:10000> select * from chicago;
Error: scala.MatchError: org.apache.spark.sql.PointUDT@2d980dc3 (of class org.apache.spark.sql.PointUDT) (state=,code=0)

And after the change:

0: jdbc:hive2://localhost:10000> select * from chicago;
+---------------------------------------+--------------+------------------------+---------------------+--+
|                __fid__                | case_number  |          dtg           |        geom         |
+---------------------------------------+--------------+------------------------+---------------------+--+
| 109602f9-54f8-414b-8c6f-42b1a337643e  | 2            | 2016-01-01 19:00:00.0  | POINT (-77 38)      |
| 709602f9-fcff-4429-8027-55649b6fd7ed  | 1            | 2015-12-31 19:00:00.0  | POINT (-76.5 38.5)  |
| 009602f9-fcb5-45b1-a867-eb8ba10cab40  | 3            | 2016-01-02 19:00:00.0  | POINT (-78 39)      |
+---------------------------------------+--------------+------------------------+---------------------+--+

Signed-off-by: Atallah Hezbor <[email protected]>

gatorsmile · 2018-01-24T16:44:05Z

cc @liufengdb

felixcheung · 2018-01-24T18:28:02Z

Jenkins test this please

SparkQA · 2018-01-24T18:54:50Z

Test build #86597 has finished for PR 20385 at commit c8fb436.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

liufengdb · 2018-01-24T19:15:41Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

@@ -102,6 +102,8 @@ private[hive] class SparkExecuteStatementOperation(
        to += from.getAs[Timestamp](ordinal)
      case BinaryType =>
        to += from.getAs[Array[Byte]](ordinal)
+      case udt: UserDefinedType[_] =>
+        to += from.get(ordinal).toString


It is possible from.get(ordinal) returns null, then a null pointer exception. I think a better way to add this case is by the method HiveUtils.toHiveString, which can potentially be reused and tested.

liufengdb

Looks like a test is necessary.

gatorsmile · 2018-01-24T21:07:47Z

The unit test cases are needed for HiveUtils.toHiveString.

atallahhezbor · 2018-01-26T20:57:26Z

@liufengdb @gatorsmile I'm happy to write a unit test if you require it. Though as I mentioned before, I did not see a clear way of testing the functionality in SparkExecuteStatementOperation. It looks to me that a lot of those functions are not tested directly.

Also @gatorsmile, I'm not sure I understand you. If I move the functionality to HiveUtils.toHiveString, are you saying I should write a unit test of that whole function?

gatorsmile · 2018-01-27T19:59:14Z

@atallahhezbor Yeah! Please help us improve the test coverage. We do not have a clear way to test the functionality in SparkExecuteStatementOperation

Adding unit test cases for HiveUtils.toHiveString is enough if we move the code changes to HiveUtils.toHiveString

Signed-off-by: Atallah Hezbor <[email protected]>

gatorsmile · 2018-01-31T18:48:48Z

ok to test

liufengdb · 2018-01-31T18:51:58Z

Actually, one more thing, do you need to consider the UDT as one attribute of a structured type? https://github.com/apache/spark/pull/20385/files#diff-842e3447fc453de26c706db1cac8f2c4L467

SparkQA · 2018-01-31T21:00:19Z

Test build #86887 has finished for PR 20385 at commit e05041f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-02-01T04:44:23Z

LGTM

@atallahhezbor Could you submit another PR to address the comment from @liufengdb ?

This fix is nice to have in Spark 2.3. Let merge this now.

Thanks! Merged to master/2.3

… Thriftserver Signed-off-by: Atallah Hezbor <atallahhezborgmail.com> ## What changes were proposed in this pull request? This PR proposes modifying the match statement that gets the columns of a row in HiveThriftServer. There was previously no case for `UserDefinedType`, so querying a table that contained them would throw a match error. The changes catch that case and return the string representation. ## How was this patch tested? While I would have liked to add a unit test, I couldn't easily incorporate UDTs into the ``HiveThriftServer2Suites`` pipeline. With some guidance I would be happy to push a commit with tests. Instead I did a manual test by loading a `DataFrame` with Point UDT in a spark shell with a HiveThriftServer. Then in beeline, connecting to the server and querying that table. Here is the result before the change ``` 0: jdbc:hive2://localhost:10000> select * from chicago; Error: scala.MatchError: org.apache.spark.sql.PointUDT2d980dc3 (of class org.apache.spark.sql.PointUDT) (state=,code=0) ``` And after the change: ``` 0: jdbc:hive2://localhost:10000> select * from chicago; +---------------------------------------+--------------+------------------------+---------------------+--+ | __fid__ | case_number | dtg | geom | +---------------------------------------+--------------+------------------------+---------------------+--+ | 109602f9-54f8-414b-8c6f-42b1a337643e | 2 | 2016-01-01 19:00:00.0 | POINT (-77 38) | | 709602f9-fcff-4429-8027-55649b6fd7ed | 1 | 2015-12-31 19:00:00.0 | POINT (-76.5 38.5) | | 009602f9-fcb5-45b1-a867-eb8ba10cab40 | 3 | 2016-01-02 19:00:00.0 | POINT (-78 39) | +---------------------------------------+--------------+------------------------+---------------------+--+ ``` Author: Atallah Hezbor <[email protected]> Closes #20385 from atallahhezbor/udts_over_hive. (cherry picked from commit b2e7677) Signed-off-by: gatorsmile <[email protected]>

Fixes MatchError when UDTs are passed through Hive Thriftserver

c8fb436

Signed-off-by: Atallah Hezbor <[email protected]>

liufengdb reviewed Jan 24, 2018

View reviewed changes

liufengdb suggested changes Jan 24, 2018

View reviewed changes

Moves UDT handling to HiveUtils. Adds unit test

e05041f

Signed-off-by: Atallah Hezbor <[email protected]>

asfgit closed this in b2e7677 Feb 1, 2018

sasoria mentioned this pull request Mar 16, 2018

scala.MatchError using GeoSparkSql on ThriftServer2 apache/sedona#198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver #20385

[SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver #20385

atallahhezbor commented Jan 24, 2018

gatorsmile commented Jan 24, 2018

felixcheung commented Jan 24, 2018

SparkQA commented Jan 24, 2018

liufengdb Jan 24, 2018 •

edited

Loading

liufengdb left a comment

gatorsmile commented Jan 24, 2018

atallahhezbor commented Jan 26, 2018

gatorsmile commented Jan 27, 2018

gatorsmile commented Jan 31, 2018

liufengdb commented Jan 31, 2018

SparkQA commented Jan 31, 2018

gatorsmile commented Feb 1, 2018

[SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver #20385

[SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver #20385

Conversation

atallahhezbor commented Jan 24, 2018

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Jan 24, 2018

felixcheung commented Jan 24, 2018

SparkQA commented Jan 24, 2018

liufengdb Jan 24, 2018 • edited Loading

Choose a reason for hiding this comment

liufengdb left a comment

Choose a reason for hiding this comment

gatorsmile commented Jan 24, 2018

atallahhezbor commented Jan 26, 2018

gatorsmile commented Jan 27, 2018

gatorsmile commented Jan 31, 2018

liufengdb commented Jan 31, 2018

SparkQA commented Jan 31, 2018

gatorsmile commented Feb 1, 2018

liufengdb Jan 24, 2018 •

edited

Loading