Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18260] Make from_json null safe #15771

Closed
wants to merge 5 commits into from
Closed

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Nov 4, 2016

What changes were proposed in this pull request?

from_json is currently not safe against null rows. This PR adds a fix and a regression test for it.

How was this patch tested?

Regression test

Seq("""{"a": 1}""").toDF("value").write.parquet(new File(dir, "p=x").toString)
Seq("""{"a": 2}""").toDF("record").write.parquet(new File(dir, "p=y").toString)
val schema = new StructType().add("a", IntegerType)
val df = spark.read.parquet(dir.toString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why parquet? Seems this is testing more than it needs to (and thus runs slower).

@@ -499,6 +499,7 @@ case class JsonToStruct(schema: StructType, options: Map[String, String], child:

override def eval(input: InternalRow): Any = {
try parser.parse(child.eval(input).toString).head catch {
case _: NullPointerException => null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider doing this with an if. I think catching and throwing an exception can be expensive (though maybe JVM optimizations save us?). Since its easy I'd rather not rely on that.

@brkyvz
Copy link
Contributor Author

brkyvz commented Nov 4, 2016

@marmbrus Addressed

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68151 has finished for PR 15771 at commit 2867366.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68156 has finished for PR 15771 at commit 6558bfe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2016

Test build #68155 has finished for PR 15771 at commit 882915c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -498,7 +498,7 @@ case class JsonToStruct(schema: StructType, options: Map[String, String], child:
override def children: Seq[Expression] = child :: Nil

override def eval(input: InternalRow): Any = {
try parser.parse(child.eval(input).toString).head catch {
try Option(child.eval(input)).map(json => parser.parse(json.toString).head).orNull catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to keep iterating... I think you want to avoid an extra allocation in the execution path. I would use an if statement.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 4, 2016

LGTM, pending tests

@SparkQA
Copy link

SparkQA commented Nov 5, 2016

Test build #68165 has finished for PR 15771 at commit 86ee98b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Nov 5, 2016

Merging in master/branch-2.1.

@asfgit asfgit closed this in 6e27018 Nov 5, 2016
asfgit pushed a commit that referenced this pull request Nov 5, 2016
## What changes were proposed in this pull request?

`from_json` is currently not safe against `null` rows. This PR adds a fix and a regression test for it.

## How was this patch tested?

Regression test

Author: Burak Yavuz <[email protected]>

Closes #15771 from brkyvz/json_fix.

(cherry picked from commit 6e27018)
Signed-off-by: Reynold Xin <[email protected]>
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

`from_json` is currently not safe against `null` rows. This PR adds a fix and a regression test for it.

## How was this patch tested?

Regression test

Author: Burak Yavuz <[email protected]>

Closes apache#15771 from brkyvz/json_fix.
@brkyvz brkyvz deleted the json_fix branch February 3, 2019 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants