[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018

davies · 2015-08-07T03:46:07Z

This PR is based on #7635 , thanks to @JoshRosen .

Because java.math.BigDecimal already has the same optimization for precision < 18 (use long for unscaledValue), so we don't need to duplicated this in Decimal. Also removed _scale, java.math.BigDecimal already have it.

In order to have unified hashCode, we still use scala.math.BigDecimal.hashCode()

After this patch, we could have end-to-end about 100% performance improvement on test which has sum of multiplication of short and decimal (SUM(short * decimal(5,2)), from 19 seconds to 9.6 seconds).

… via Scala wrapper

SparkQA · 2015-08-07T03:54:57Z

Test build #40137 has finished for PR 8018 at commit 8eee859.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-07T05:47:18Z

Test build #40139 has finished for PR 8018 at commit 7b70c28.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-07T07:59:35Z

Test build #40147 has finished for PR 8018 at commit c861001.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-08-07T16:20:38Z

cc @JoshRosen for review

…gdecimal

SparkQA · 2015-08-08T09:56:34Z

Test build #40228 has finished for PR 8018 at commit 56190ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-08T17:49:44Z

Test build #1414 has finished for PR 8018 at commit 56190ef.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-08-08T20:30:19Z

After this patch, we could have end-to-end about 100% performance improvement on test which has sum of multiplication of short and decimal (SUM(short * decimal(5,2)), from 19 seconds to 9.6 seconds).

Cool! I wasn't actually expecting this to lead to such a big performance improvement.

JoshRosen · 2015-08-08T20:32:23Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

   */
  def set(unscaled: Long, precision: Int, scale: Int): Decimal = {
-    if (setOrNull(unscaled, precision, scale) == null) {


Do you think that we should replace this old check with an assert, just to be safe / guard against mistakes made by the caller?

davies · 2015-08-08T21:00:02Z

Most of the performance improvement came from two parts:

call scala.math.Decimal.apply(MathContext) is heavy, right now we change to java.math.BigDecimal.multiply(b, MathContext)
currently, when cast a numeric into decimal, we turn int/short/byte into double, then cast double into decimal, it's expensive, now we change to call Decimal(i.toLong)

other cleanups are not that necessary to the performance improvement, if they are risky for the release, we can delay that in 1.6.

JoshRosen · 2015-08-08T21:01:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

-  private[sql] val ONE = Decimal(1)
+  private val ROUNDING_MODE = RoundingMode.HALF_UP
+  private val MATH_CONTEXT = new MathContext(DecimalType.MAX_PRECISION, ROUNDING_MODE)
+  private val POW_10 = Array.tabulate[Long](MAX_LONG_DIGITS + 1)(i => math.pow(10, i).toLong)


It looks like this is now unused?

JoshRosen · 2015-08-08T21:07:16Z

One naive question, since I'm not as familiar with the old long optimization: it looks like we used to use a long for storing the small values, but it looks like JavaBigDecimal stores them as an int instead. Does this imply that fewer values can now be stored using the compact representation?

I'm probably not the best judge of the riskiness of these changes, since I don't know as much about our decimal internals compared to other reviewers / contributors. Is there someone else that we should loop in for another pair of eyes?

davies · 2015-08-08T21:11:02Z

@JoshRosen Good question, the intCompat is actually long. It confusing me in the beginning, so double check that. So it have same range as we, JavaBigDecimal should have done better job than us (handle conversion between each other).

@mateiz Do you have some cycle to review this? If it's risky for the release, I'd like to pull out some optimization as a separate PR.

davies · 2015-08-08T21:15:41Z

After these work, I just realized that the _precision is also not necessary in Decimal, because JavaBigDecimal already have precision for the value. Decimal._precision came from DecimalType, it's the maximum precision the JavaBigDecimal could have, only used when we create a Decimal, or change_precision().

Also, HiveTypeCoercion needs some cleanup, lots of unnecessary casting.

JoshRosen · 2015-08-08T21:30:59Z

Good question, the intCompat is actually long. It confusing me in the beginning, so double check that.

Yep, it's a long.

davies · 2015-08-08T21:52:42Z

Created #8052

…gdecimal Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

mateiz · 2015-08-10T21:42:17Z

I didn't realize that Java's BigDecimal already has a shortcut for things that fit in a Long. That definitely simplifies it. In terms of this change, the biggest thing I'd look for is whether some of the math operations change from the Numeric type associated with this class. Hopefully the Hive tests catch those.

SparkQA · 2015-08-10T23:56:40Z

Test build #40322 has finished for PR 8018 at commit 4c2752e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-08-11T04:08:08Z

This PR may be too late for 1.5 release, I'd like close it now. For 1.6, we may want to use Decimal128 from Hive.

JoshRosen and others added 3 commits July 23, 2015 23:02

[SPARK-9303] Decimal should use java.math.Decimal directly instead of…

d7a3535

… via Scala wrapper

fix tests

de55951

clean up

8eee859

fix build

7b70c28

fix test

c861001

Davies Liu added 2 commits August 7, 2015 21:51

Merge branch 'master' of github.com:apache/spark into remove_scala_bi…

571a8a3

…gdecimal

simplify Decimal

56190ef

JoshRosen reviewed Aug 8, 2015
View reviewed changes

Merge branch 'master' of github.com:apache/spark into remove_scala_bi…

4c2752e

…gdecimal Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala

davies closed this Aug 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018

[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018

davies commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

davies commented Aug 7, 2015

SparkQA commented Aug 8, 2015

SparkQA commented Aug 8, 2015

JoshRosen commented Aug 8, 2015

JoshRosen Aug 8, 2015

davies Aug 8, 2015

davies commented Aug 8, 2015

JoshRosen Aug 8, 2015

JoshRosen commented Aug 8, 2015

davies commented Aug 8, 2015

davies commented Aug 8, 2015

JoshRosen commented Aug 8, 2015

davies commented Aug 8, 2015

mateiz commented Aug 10, 2015

SparkQA commented Aug 10, 2015

davies commented Aug 11, 2015

[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018

[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018

Conversation

davies commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

SparkQA commented Aug 7, 2015

davies commented Aug 7, 2015

SparkQA commented Aug 8, 2015

SparkQA commented Aug 8, 2015

JoshRosen commented Aug 8, 2015

JoshRosen Aug 8, 2015

Choose a reason for hiding this comment

davies Aug 8, 2015

Choose a reason for hiding this comment

davies commented Aug 8, 2015

JoshRosen Aug 8, 2015

Choose a reason for hiding this comment

JoshRosen commented Aug 8, 2015

davies commented Aug 8, 2015

davies commented Aug 8, 2015

JoshRosen commented Aug 8, 2015

davies commented Aug 8, 2015

mateiz commented Aug 10, 2015

SparkQA commented Aug 10, 2015

davies commented Aug 11, 2015