Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16024] [SQL] [TEST] Verify Column Comment for Data Source Tables #13764

Closed
wants to merge 5 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to improve test coverage. It verifies whether Comment of Column can be appropriate handled.

The test cases verify the related parts in Parser, both SQL and DataFrameWriter interface, and both Hive Metastore catalog and In-memory catalog.

How was this patch tested?

N/A

@gatorsmile
Copy link
Member Author

cc @cloud-fan Tried to find the hole, but it sounds like all the cases can pass. Please let me know if anything is missing. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 19, 2016

Test build #60786 has finished for PR 13764 at commit 6d8fd50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 19, 2016

Test build #60789 has finished for PR 13764 at commit 23c98f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

oh sorry looks like l made a false alarm, and thanks for adding these tests!

LGTM, can you resolve the conflict? thanks!

@gatorsmile
Copy link
Member Author

Thank you! : ) @cloud-fan

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60936 has finished for PR 13764 at commit 9940185.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • final class Binarizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • final class Bucketizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • final class ChiSqSelector @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String)
    • class CountVectorizer @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • class CountVectorizerModel(
    • class DCT @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • class ElementwiseProduct @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
    • class HashingTF @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • final class IDF @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class Interaction @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String) extends Transformer
    • class MaxAbsScaler @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
    • class MinMaxScaler @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • class NGram @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • class Normalizer @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
    • class OneHotEncoder @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String) extends Transformer
    • class PCA @Since(\"1.5.0\") (
    • class PolynomialExpansion @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)
    • final class QuantileDiscretizer @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String)
    • class RFormula @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • class SQLTransformer @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String) extends Transformer
    • class StandardScaler @Since(\"1.4.0\") (
    • class StopWordsRemover @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • class StringIndexer @Since(\"1.4.0\") (
    • class Tokenizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class RegexTokenizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class VectorAssembler @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)
    • class VectorIndexer @Since(\"1.4.0\") (
    • final class VectorSlicer @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)
    • final class Word2Vec @Since(\"1.4.0\") (
    • case class CachedBatch(numRows: Int, buffers: Array[Array[Byte]], stats: InternalRow)
    • class TextSocketSource(host: String, port: Int, sqlContext: SQLContext)
    • class TextSocketSourceProvider extends StreamSourceProvider with DataSourceRegister with Logging

test("column nullability and comment - write and then read") {
val schema = StructType(
StructField("cl1", IntegerType, nullable = false,
new MetadataBuilder().putString("comment", "test").build()) ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, is this the official way to add column comment when create table using DataFrameWriter?

cc @yhuai @hvanhovell

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: ) It is a little bit hacky. Maybe we should add a new API for users to add comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to submit a new PR for this? Or add it into this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can open a new PR and move this test there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me remove it and submit a new PR soon. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell @cloud-fan The new methods are added in the PR: #13860 Could you please review it? Thanks!

@@ -223,6 +223,31 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be
}
}

test("column nullability and comment - write and then read") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remove this test, let's focus on SQL CREATE TABLE in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me change it. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61008 has finished for PR 13764 at commit 94b7264.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61010 has finished for PR 13764 at commit 87d32d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 23, 2016
#### What changes were proposed in this pull request?
This PR is to improve test coverage. It verifies whether `Comment` of `Column` can be appropriate handled.

The test cases verify the related parts in Parser, both SQL and DataFrameWriter interface, and both Hive Metastore catalog and In-memory catalog.

#### How was this patch tested?
N/A

Author: gatorsmile <[email protected]>

Closes #13764 from gatorsmile/dataSourceComment.

(cherry picked from commit 9f990fa)
Signed-off-by: Wenchen Fan <[email protected]>
@asfgit asfgit closed this in 9f990fa Jun 23, 2016
@cloud-fan
Copy link
Contributor

thanks, merging to master and 2.0!

asfgit pushed a commit that referenced this pull request Jun 29, 2016
…ructType

#### What changes were proposed in this pull request?
Based on the previous discussion with cloud-fan hvanhovell in another related PR #13764 (comment), it looks reasonable to add convenience methods for users to add `comment` when defining `StructField`.

Currently, the column-related `comment` attribute is stored in `Metadata` of `StructField`. For example, users can add the `comment` attribute using the following way:
```Scala
StructType(
  StructField(
    "cl1",
    IntegerType,
    nullable = false,
    new MetadataBuilder().putString("comment", "test").build()) :: Nil)
```
This PR is to add more user friendly methods for the `comment` attribute when defining a `StructField`. After the changes, users are provided three different ways to do it:
```Scala
val struct = (new StructType)
  .add("a", "int", true, "test1")

val struct = (new StructType)
  .add("c", StringType, true, "test3")

val struct = (new StructType)
  .add(StructField("d", StringType).withComment("test4"))
```

#### How was this patch tested?
Added test cases:
- `DataTypeSuite` is for testing three types of API changes,
- `DataFrameReaderWriterSuite` is for parquet, json and csv formats - using in-memory catalog
- `OrcQuerySuite.scala` is for orc format using Hive-metastore

Author: gatorsmile <[email protected]>

Closes #13860 from gatorsmile/newMethodForComment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants