Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4250] [SQL] Fix bug of constant null value mapping to ConstantObjectInspector #3114

Closed
wants to merge 3 commits into from

Conversation

chenghao-intel
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22941 has started for PR 3114 at commit f54f369.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22941 has finished for PR 3114 at commit f54f369.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22941/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22978 has started for PR 3114 at commit 50a13ba.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22978 has finished for PR 3114 at commit 50a13ba.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22978/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

@marmbrus @rxin @liancheng Couple of PRs are (#3109, #2802) blocked by this, can you review this?

case Literal(_, NullType) =>
HiveShim.getPrimitiveNullWritableConstantObjectInspector
case Literal(value: Seq[_], ArrayType(dt, _)) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why remove this to only have to add a more verbose .asInstanceOf below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't bind a type here, just to keep consistent with pattern match above, and it is also helpful in debugging, you know, sometimes we just confused what type for catalyst array (Array or Seq), the .asInstanceOf will be a strong check.

@marmbrus
Copy link
Contributor

marmbrus commented Nov 7, 2014

Hey, thanks for working on this. I'm a little confused about how we are handling nulls though. Am I missing something?

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23101 has started for PR 3114 at commit e603bda.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

Thank you @marmbrus for reviewing this.
All of primitive constant null values (Literal(null, _) actually will not find a match currently, and it will resort to a normal Primitive ObjectInspector, this won't bring the incorrect result in most of the cases. But some Hive UDFs require the its arguments are ConstantObjectInspector (e.g. #3109, named_struct), which will cause error.
On the other hand the ConstantObjectInspector is definitely a way for optimization, we can avoid casting a constant catalyst object to Hive one(wrap) when calling the Hive UDFs.
I am planning to do this with couple of follow up PRs for the improvement.

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23101 has finished for PR 3114 at commit e603bda.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23101/
Test PASSed.

@liancheng
Copy link
Contributor

@marmbrus The problem @chenghao-intel pointed out can be illustrated by this Scala shell snippet:

scala> def foo(v: Any) = v match {
     |   case _: String => "string"
     |   case _ => "other"
     | }
foo: (v: Any)String

scala> val s: String = null
s: String = null

scala> foo(s)
res0: String = other

That's why type annotations in toInspector are removed.

@marmbrus
Copy link
Contributor

Ah, thanks for the explanation!

@marmbrus
Copy link
Contributor

Thanks! Merged to master and 1.2.

@chenghao-intel
Copy link
Contributor Author

@marmbrus , but this PR seems still open, can you double-check if it's merged?

@asfgit asfgit closed this in fa77783 Nov 11, 2014
asfgit pushed a commit that referenced this pull request Nov 11, 2014
…ObjectInspector

Author: Cheng Hao <[email protected]>

Closes #3114 from chenghao-intel/constant_null_oi and squashes the following commits:

e603bda [Cheng Hao] fix the bug of null value for primitive types
50a13ba [Cheng Hao] fix the timezone issue
f54f369 [Cheng Hao] fix bug of constant null value for ObjectInspector

(cherry picked from commit fa77783)
Signed-off-by: Michael Armbrust <[email protected]>
@marmbrus
Copy link
Contributor

Github was lagging you can always check here: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=summary

@chenghao-intel chenghao-intel deleted the constant_null_oi branch July 2, 2015 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants