Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8280][SPARK-8281][SQL]Handle NaN, null and Infinity in math #6835

Closed
wants to merge 9 commits into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Jun 16, 2015

Just as their counterparts in Hive:

UDFAsin & UDFAcos would just return NaN if its argument is NaN or its absolute value is greater than 1, therefore I think it's enough to just remove NaN checking before return.

Log* would first check argument <= 0.0 and return null for 0.0 instead of -Infinity

@rxin
Copy link
Contributor

rxin commented Jun 16, 2015

Jenkins, ok to test.

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34973 has finished for PR 6835 at commit 2d1dfc1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryMathLogExpression(f: Double => Double, name: String)
    • case class Log(child: Expression) extends UnaryMathLogExpression(math.log, "LOG")
    • case class Log10(child: Expression) extends UnaryMathLogExpression(math.log10, "LOG10")
    • case class Log1p(child: Expression) extends UnaryMathLogExpression(math.log1p, "LOG1P")

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34977 has finished for PR 6835 at commit ef8c28d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryMathLogExpression(f: Double => Double, name: String)
    • case class Log(child: Expression) extends UnaryMathLogExpression(math.log, "LOG")
    • case class Log10(child: Expression) extends UnaryMathLogExpression(math.log10, "LOG10")
    • case class Log1p(child: Expression) extends UnaryMathLogExpression(math.log1p, "LOG1P")

override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = {
val eval = child.gen(ctx)
eval.code + s"""
boolean ${ev.isNull} = ${eval.isNull} || Double.valueOf(${eval.primitive}) <= 0.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just

${eval.primitive} <= 0.0

instead of calling Double.valueOf?

@rxin
Copy link
Contributor

rxin commented Jun 18, 2015

@YijieSHEN it'd make sense to merge this after #6725

It might change your code here.

@rxin
Copy link
Contributor

rxin commented Jun 18, 2015

btw I was thinking maybe Log2 and the other log functions can just be child class of the Logarithm class added in #6725?

@yjshen
Copy link
Member Author

yjshen commented Jun 18, 2015

I've merged #6725 in the latest commit.

Also tried to refactor all the log functions to subclass the Logarithm class added there, but find wried and hard to understand then, so just discard that version.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35139 has finished for PR 6835 at commit 4be400a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryLogarithmExpression(f: Double => Double, name: String, yAsymptote: Double)
    • case class Log(child: Expression) extends UnaryLogarithmExpression(math.log, "LOG", 0.0)
    • case class Log10(child: Expression) extends UnaryLogarithmExpression(math.log10, "LOG10", 0.0)
    • case class Log1p(child: Expression) extends UnaryLogarithmExpression(math.log1p, "LOG1P", -1.0)

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35141 has finished for PR 6835 at commit a150de5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryLogarithmExpression(f: Double => Double, name: String, yAsymptote: Double)
    • case class Log(child: Expression) extends UnaryLogarithmExpression(math.log, "LOG", 0.0)
    • case class Log10(child: Expression) extends UnaryLogarithmExpression(math.log10, "LOG10", 0.0)
    • case class Log1p(child: Expression) extends UnaryLogarithmExpression(math.log1p, "LOG1P", -1.0)

// math.log(evalE2.asInstanceOf[Double]) / math.log(evalE1.asInstanceOf[Double])
// }
// }
// }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive's UDFLog doesn't support base in range (0.0, 1.0], it would just eval to null in this case, just act as the commented out code above.

I'm not sure it we also want to behave like this, any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should support these. just remove the commented code, and add to inline comment for log that we support (0.0, 1.0], unlike hive.

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35150 has finished for PR 6835 at commit f19f651.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryLogarithmExpression(f: Double => Double, name: String, yAsymptote: Double)
    • case class Log(child: Expression) extends UnaryLogarithmExpression(math.log, "LOG", 0.0)
    • case class Log10(child: Expression) extends UnaryLogarithmExpression(math.log10, "LOG10", 0.0)
    • case class Log1p(child: Expression) extends UnaryLogarithmExpression(math.log1p, "LOG1P", -1.0)

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35629 has finished for PR 6835 at commit 0c96d86.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class UnaryLogarithmExpression(f: Double => Double, name: String, yAsymptote: Double)
    • case class Log(child: Expression) extends UnaryLogarithmExpression(math.log, "LOG", 0.0)
    • case class Log10(child: Expression) extends UnaryLogarithmExpression(math.log10, "LOG10", 0.0)
    • case class Log1p(child: Expression) extends UnaryLogarithmExpression(math.log1p, "LOG1P", -1.0)

@yjshen
Copy link
Member Author

yjshen commented Jun 24, 2015

Seems build failure is not related to this PR

@yjshen
Copy link
Member Author

yjshen commented Jun 24, 2015

retest this please.

@JoshRosen
Copy link
Contributor

The MiMa test failure is due to an artifact missing from a Maven staging repository; I have opened #6974 to hotfix.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35645 has finished for PR 6835 at commit 0c96d86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -259,19 +285,14 @@ case class Atan2(left: Expression, right: Expression)
null
} else {
// With codegen, the values returned by -0.0 and 0.0 are different. Handled with +0.0
val result = math.atan2(evalE1.asInstanceOf[Double] + 0.0,
math.atan2(evalE1.asInstanceOf[Double] + 0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think u need to wrap here.

@rxin
Copy link
Contributor

rxin commented Jul 16, 2015

@YijieSHEN can you bring this up to date?

@yjshen
Copy link
Member Author

yjshen commented Jul 16, 2015

@rxin, do you mind my closing this one and creating a new pr for this? The base code is quite outdated.

@rxin
Copy link
Contributor

rxin commented Jul 16, 2015

Not at all. Feel free to close this one and submit a new one.
.

@yjshen
Copy link
Member Author

yjshen commented Jul 16, 2015

ok, will do soon

@yjshen
Copy link
Member Author

yjshen commented Jul 16, 2015

close this and continue in #7451

@yjshen yjshen closed this Jul 16, 2015
@yjshen yjshen deleted the nan_infinity_null branch July 20, 2015 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants