-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764
Conversation
cc @cloud-fan Tried to find the hole, but it sounds like all the cases can pass. Please let me know if anything is missing. Thanks! |
Test build #60786 has finished for PR 13764 at commit
|
Test build #60789 has finished for PR 13764 at commit
|
oh sorry looks like l made a false alarm, and thanks for adding these tests! LGTM, can you resolve the conflict? thanks! |
Thank you! : ) @cloud-fan |
Test build #60936 has finished for PR 13764 at commit
|
test("column nullability and comment - write and then read") { | ||
val schema = StructType( | ||
StructField("cl1", IntegerType, nullable = false, | ||
new MetadataBuilder().putString("comment", "test").build()) :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, is this the official way to add column comment when create table using DataFrameWriter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: ) It is a little bit hacky. Maybe we should add a new API for users to add comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am afraid it is. I just grepped through the code base and there are a few places where we do this, for example:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L1434-L1438
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L397-L404
+1 for adding a convenience method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to submit a new PR for this? Or add it into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can open a new PR and move this test there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me remove it and submit a new PR soon. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell @cloud-fan The new methods are added in the PR: #13860 Could you please review it? Thanks!
@@ -223,6 +223,31 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be | |||
} | |||
} | |||
|
|||
test("column nullability and comment - write and then read") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also remove this test, let's focus on SQL CREATE TABLE in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me change it. Thanks!
Test build #61008 has finished for PR 13764 at commit
|
Test build #61010 has finished for PR 13764 at commit
|
#### What changes were proposed in this pull request? This PR is to improve test coverage. It verifies whether `Comment` of `Column` can be appropriate handled. The test cases verify the related parts in Parser, both SQL and DataFrameWriter interface, and both Hive Metastore catalog and In-memory catalog. #### How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #13764 from gatorsmile/dataSourceComment. (cherry picked from commit 9f990fa) Signed-off-by: Wenchen Fan <[email protected]>
thanks, merging to master and 2.0! |
…ructType #### What changes were proposed in this pull request? Based on the previous discussion with cloud-fan hvanhovell in another related PR #13764 (comment), it looks reasonable to add convenience methods for users to add `comment` when defining `StructField`. Currently, the column-related `comment` attribute is stored in `Metadata` of `StructField`. For example, users can add the `comment` attribute using the following way: ```Scala StructType( StructField( "cl1", IntegerType, nullable = false, new MetadataBuilder().putString("comment", "test").build()) :: Nil) ``` This PR is to add more user friendly methods for the `comment` attribute when defining a `StructField`. After the changes, users are provided three different ways to do it: ```Scala val struct = (new StructType) .add("a", "int", true, "test1") val struct = (new StructType) .add("c", StringType, true, "test3") val struct = (new StructType) .add(StructField("d", StringType).withComment("test4")) ``` #### How was this patch tested? Added test cases: - `DataTypeSuite` is for testing three types of API changes, - `DataFrameReaderWriterSuite` is for parquet, json and csv formats - using in-memory catalog - `OrcQuerySuite.scala` is for orc format using Hive-metastore Author: gatorsmile <[email protected]> Closes #13860 from gatorsmile/newMethodForComment.
What changes were proposed in this pull request?
This PR is to improve test coverage. It verifies whether
Comment
ofColumn
can be appropriate handled.The test cases verify the related parts in Parser, both SQL and DataFrameWriter interface, and both Hive Metastore catalog and In-memory catalog.
How was this patch tested?
N/A