[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764

gatorsmile · 2016-06-19T00:06:24Z

What changes were proposed in this pull request?

This PR is to improve test coverage. It verifies whether Comment of Column can be appropriate handled.

The test cases verify the related parts in Parser, both SQL and DataFrameWriter interface, and both Hive Metastore catalog and In-memory catalog.

How was this patch tested?

N/A

gatorsmile · 2016-06-19T00:08:16Z

cc @cloud-fan Tried to find the hole, but it sounds like all the cases can pass. Please let me know if anything is missing. Thanks!

SparkQA · 2016-06-19T01:47:15Z

Test build #60786 has finished for PR 13764 at commit 6d8fd50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-19T02:30:52Z

Test build #60789 has finished for PR 13764 at commit 23c98f0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-06-21T11:57:13Z

oh sorry looks like l made a false alarm, and thanks for adding these tests!

LGTM, can you resolve the conflict? thanks!

gatorsmile · 2016-06-21T13:58:44Z

Thank you! : ) @cloud-fan

SparkQA · 2016-06-21T15:44:36Z

Test build #60936 has finished for PR 13764 at commit 9940185.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- final class Binarizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- final class Bucketizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- final class ChiSqSelector @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String)
- class CountVectorizer @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- class CountVectorizerModel(
- class DCT @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- class ElementwiseProduct @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
- class HashingTF @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- final class IDF @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class Interaction @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String) extends Transformer
- class MaxAbsScaler @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
- class MinMaxScaler @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- class NGram @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- class Normalizer @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
- class OneHotEncoder @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String) extends Transformer
- class PCA @Since(\"1.5.0\") (
- class PolynomialExpansion @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
- final class QuantileDiscretizer @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String)
- class RFormula @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- class SQLTransformer @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String) extends Transformer
- class StandardScaler @Since(\"1.4.0\") (
- class StopWordsRemover @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- class StringIndexer @Since(\"1.4.0\") (
- class Tokenizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class RegexTokenizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class VectorAssembler @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
- class VectorIndexer @Since(\"1.4.0\") (
- final class VectorSlicer @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
- final class Word2Vec @Since(\"1.4.0\") (
- case class CachedBatch(numRows: Int, buffers: Array[Array[Byte]], stats: InternalRow)
- class TextSocketSource(host: String, port: Int, sqlContext: SQLContext)
- class TextSocketSourceProvider extends StreamSourceProvider with DataSourceRegister with Logging

cloud-fan · 2016-06-22T03:56:08Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala

+  test("column nullability and comment - write and then read") {
+    val schema = StructType(
+      StructField("cl1", IntegerType, nullable = false,
+        new MetadataBuilder().putString("comment", "test").build()) ::


hmmm, is this the official way to add column comment when create table using DataFrameWriter?

cc @yhuai @hvanhovell

: ) It is a little bit hacky. Maybe we should add a new API for users to add comments?

Yeah I am afraid it is. I just grepped through the code base and there are a few places where we do this, for example:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L1434-L1438
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L397-L404

+1 for adding a convenience method.

Do you want me to submit a new PR for this? Or add it into this?

you can open a new PR and move this test there.

Sure, let me remove it and submit a new PR soon. Thanks!

@hvanhovell @cloud-fan The new methods are added in the PR: #13860 Could you please review it? Thanks!

cloud-fan · 2016-06-22T05:20:58Z

sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala

@@ -223,6 +223,31 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
    }
  }

+  test("column nullability and comment - write and then read") {


also remove this test, let's focus on SQL CREATE TABLE in this PR.

Sure, let me change it. Thanks!

SparkQA · 2016-06-22T07:13:06Z

Test build #61008 has finished for PR 13764 at commit 94b7264.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-06-22T07:32:24Z

Test build #61010 has finished for PR 13764 at commit 87d32d7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

#### What changes were proposed in this pull request? This PR is to improve test coverage. It verifies whether `Comment` of `Column` can be appropriate handled. The test cases verify the related parts in Parser, both SQL and DataFrameWriter interface, and both Hive Metastore catalog and In-memory catalog. #### How was this patch tested? N/A Author: gatorsmile <[email protected]> Closes #13764 from gatorsmile/dataSourceComment. (cherry picked from commit 9f990fa) Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2016-06-23T01:14:44Z

thanks, merging to master and 2.0!

…ructType #### What changes were proposed in this pull request? Based on the previous discussion with cloud-fan hvanhovell in another related PR #13764 (comment), it looks reasonable to add convenience methods for users to add `comment` when defining `StructField`. Currently, the column-related `comment` attribute is stored in `Metadata` of `StructField`. For example, users can add the `comment` attribute using the following way: ```Scala StructType( StructField( "cl1", IntegerType, nullable = false, new MetadataBuilder().putString("comment", "test").build()) :: Nil) ``` This PR is to add more user friendly methods for the `comment` attribute when defining a `StructField`. After the changes, users are provided three different ways to do it: ```Scala val struct = (new StructType) .add("a", "int", true, "test1") val struct = (new StructType) .add("c", StringType, true, "test3") val struct = (new StructType) .add(StructField("d", StringType).withComment("test4")) ``` #### How was this patch tested? Added test cases: - `DataTypeSuite` is for testing three types of API changes, - `DataFrameReaderWriterSuite` is for parquet, json and csv formats - using in-memory catalog - `OrcQuerySuite.scala` is for orc format using Hive-metastore Author: gatorsmile <[email protected]> Closes #13860 from gatorsmile/newMethodForComment.

test cases

6d8fd50

simplify the testcase

23c98f0

Merge remote-tracking branch 'upstream/master' into dataSourceComment

9940185

cloud-fan reviewed Jun 22, 2016
View reviewed changes

revert it back

94b7264

cloud-fan reviewed Jun 22, 2016
View reviewed changes

revert it back

87d32d7

gatorsmile mentioned this pull request Jun 22, 2016

[SPARK-16157] [SQL] Add New Methods for comments in StructField and StructType #13860

Closed

asfgit closed this in 9f990fa Jun 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764

[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764

gatorsmile commented Jun 19, 2016

gatorsmile commented Jun 19, 2016

SparkQA commented Jun 19, 2016

SparkQA commented Jun 19, 2016

cloud-fan commented Jun 21, 2016

gatorsmile commented Jun 21, 2016

SparkQA commented Jun 21, 2016

cloud-fan Jun 22, 2016

gatorsmile Jun 22, 2016

hvanhovell Jun 22, 2016

gatorsmile Jun 22, 2016

cloud-fan Jun 22, 2016

gatorsmile Jun 22, 2016

gatorsmile Jun 23, 2016

cloud-fan Jun 22, 2016

gatorsmile Jun 22, 2016

SparkQA commented Jun 22, 2016

SparkQA commented Jun 22, 2016

cloud-fan commented Jun 23, 2016

[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764

[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764

Conversation

gatorsmile commented Jun 19, 2016

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Jun 19, 2016

SparkQA commented Jun 19, 2016

SparkQA commented Jun 19, 2016

cloud-fan commented Jun 21, 2016

gatorsmile commented Jun 21, 2016

SparkQA commented Jun 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 22, 2016

SparkQA commented Jun 22, 2016

cloud-fan commented Jun 23, 2016