-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21396][SQL] Fixes MatchError when UDTs are passed through Hive Thriftserver #20385
Conversation
Signed-off-by: Atallah Hezbor <[email protected]>
cc @liufengdb |
Jenkins test this please |
Test build #86597 has finished for PR 20385 at commit
|
@@ -102,6 +102,8 @@ private[hive] class SparkExecuteStatementOperation( | |||
to += from.getAs[Timestamp](ordinal) | |||
case BinaryType => | |||
to += from.getAs[Array[Byte]](ordinal) | |||
case udt: UserDefinedType[_] => | |||
to += from.get(ordinal).toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible from.get(ordinal)
returns null
, then a null pointer exception. I think a better way to add this case is by the method HiveUtils.toHiveString
, which can potentially be reused and tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a test is necessary.
The unit test cases are needed for |
@liufengdb @gatorsmile I'm happy to write a unit test if you require it. Though as I mentioned before, I did not see a clear way of testing the functionality in Also @gatorsmile, I'm not sure I understand you. If I move the functionality to |
@atallahhezbor Yeah! Please help us improve the test coverage. We do not have a clear way to test the functionality in Adding unit test cases for |
Signed-off-by: Atallah Hezbor <[email protected]>
ok to test |
Actually, one more thing, do you need to consider the UDT as one attribute of a structured type? https://github.com/apache/spark/pull/20385/files#diff-842e3447fc453de26c706db1cac8f2c4L467 |
Test build #86887 has finished for PR 20385 at commit
|
LGTM @atallahhezbor Could you submit another PR to address the comment from @liufengdb ? This fix is nice to have in Spark 2.3. Let merge this now. Thanks! Merged to master/2.3 |
… Thriftserver Signed-off-by: Atallah Hezbor <atallahhezborgmail.com> ## What changes were proposed in this pull request? This PR proposes modifying the match statement that gets the columns of a row in HiveThriftServer. There was previously no case for `UserDefinedType`, so querying a table that contained them would throw a match error. The changes catch that case and return the string representation. ## How was this patch tested? While I would have liked to add a unit test, I couldn't easily incorporate UDTs into the ``HiveThriftServer2Suites`` pipeline. With some guidance I would be happy to push a commit with tests. Instead I did a manual test by loading a `DataFrame` with Point UDT in a spark shell with a HiveThriftServer. Then in beeline, connecting to the server and querying that table. Here is the result before the change ``` 0: jdbc:hive2://localhost:10000> select * from chicago; Error: scala.MatchError: org.apache.spark.sql.PointUDT2d980dc3 (of class org.apache.spark.sql.PointUDT) (state=,code=0) ``` And after the change: ``` 0: jdbc:hive2://localhost:10000> select * from chicago; +---------------------------------------+--------------+------------------------+---------------------+--+ | __fid__ | case_number | dtg | geom | +---------------------------------------+--------------+------------------------+---------------------+--+ | 109602f9-54f8-414b-8c6f-42b1a337643e | 2 | 2016-01-01 19:00:00.0 | POINT (-77 38) | | 709602f9-fcff-4429-8027-55649b6fd7ed | 1 | 2015-12-31 19:00:00.0 | POINT (-76.5 38.5) | | 009602f9-fcb5-45b1-a867-eb8ba10cab40 | 3 | 2016-01-02 19:00:00.0 | POINT (-78 39) | +---------------------------------------+--------------+------------------------+---------------------+--+ ``` Author: Atallah Hezbor <[email protected]> Closes #20385 from atallahhezbor/udts_over_hive. (cherry picked from commit b2e7677) Signed-off-by: gatorsmile <[email protected]>
Signed-off-by: Atallah Hezbor [email protected]
What changes were proposed in this pull request?
This PR proposes modifying the match statement that gets the columns of a row in HiveThriftServer. There was previously no case for
UserDefinedType
, so querying a table that contained them would throw a match error. The changes catch that case and return the string representation.How was this patch tested?
While I would have liked to add a unit test, I couldn't easily incorporate UDTs into the
HiveThriftServer2Suites
pipeline. With some guidance I would be happy to push a commit with tests.Instead I did a manual test by loading a
DataFrame
with Point UDT in a spark shell with a HiveThriftServer. Then in beeline, connecting to the server and querying that table.Here is the result before the change
And after the change: