Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4856] [SQL] NullType instead of StringType when sampling against empty string or nul... #3708

Closed
wants to merge 2 commits into from

Conversation

chenghao-intel
Copy link
Contributor

TestSQLContext.sparkContext.parallelize(
  """{"ip":"27.31.100.29","headers":{"Host":"1.abc.com","Charset":"UTF-8"}}""" ::
  """{"ip":"27.31.100.29","headers":{}}""" ::
  """{"ip":"27.31.100.29","headers":""}""" :: Nil)

As empty string (the "headers") will be considered as String in the beginning (in line 2 and 3), it ignores the real nested data type (struct type "headers" in line 1), and also take the line 1 (the "headers") as String Type, which is not our expected.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24479/
Test FAILed.

@chenghao-intel
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24481 has started for PR 3708 at commit 853de51.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 16, 2014

Test build #24481 has finished for PR 3708 at commit 853de51.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24481/
Test PASSed.

@@ -263,6 +263,8 @@ private[sql] object JsonRDD extends Logging {
val elementType = typeOfArray(array)
buildKeyPathForInnerStructs(array, elementType) :+ (key, elementType)
}
// we couldn't tell what the type is if the value is null or empty string
case (key: String, value) if value == "" || value == null => (key, NullType) :: Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null case makes sense to me, but why "" as well? That seems to be unequivocally a String

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the unit test, which probably make more sense.

In some cases (as shown in the unit test), "" is equivalent to null for struct type, so we'd better not to say "it's MUST be StringType if we meet an empty string".

In the meantime, the NullType is the minimum data type, and it can be promoted to any other data type in JsonRDD (e.g. promote to StructType), however, it's impossible to promote a StringType to StructType.

It's safe to make it as NullType here, as we can promote it as StringType in the last promote rules, see https://github.com/chenghao-intel/spark/blob/json/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L231

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #24519 has started for PR 3708 at commit e7a72e9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #24519 has finished for PR 3708 at commit e7a72e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24519/
Test PASSed.

@marmbrus
Copy link
Contributor

Thanks! Merged to master.

@asfgit asfgit closed this in 8d0d2a6 Dec 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants