Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018

Closed
wants to merge 8 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Aug 7, 2015

This PR is based on #7635 , thanks to @JoshRosen .

Because java.math.BigDecimal already has the same optimization for precision < 18 (use long for unscaledValue), so we don't need to duplicated this in Decimal. Also removed _scale, java.math.BigDecimal already have it.

In order to have unified hashCode, we still use scala.math.BigDecimal.hashCode()

After this patch, we could have end-to-end about 100% performance improvement on test which has sum of multiplication of short and decimal (SUM(short * decimal(5,2)), from 19 seconds to 9.6 seconds).

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40137 has finished for PR 8018 at commit 8eee859.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40139 has finished for PR 8018 at commit 7b70c28.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2015

Test build #40147 has finished for PR 8018 at commit c861001.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Aug 7, 2015

cc @JoshRosen for review

@SparkQA
Copy link

SparkQA commented Aug 8, 2015

Test build #40228 has finished for PR 8018 at commit 56190ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 8, 2015

Test build #1414 has finished for PR 8018 at commit 56190ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

After this patch, we could have end-to-end about 100% performance improvement on test which has sum of multiplication of short and decimal (SUM(short * decimal(5,2)), from 19 seconds to 9.6 seconds).

Cool! I wasn't actually expecting this to lead to such a big performance improvement.

*/
def set(unscaled: Long, precision: Int, scale: Int): Decimal = {
if (setOrNull(unscaled, precision, scale) == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we should replace this old check with an assert, just to be safe / guard against mistakes made by the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@davies
Copy link
Contributor Author

davies commented Aug 8, 2015

Most of the performance improvement came from two parts:

  1. call scala.math.Decimal.apply(MathContext) is heavy, right now we change to java.math.BigDecimal.multiply(b, MathContext)
  2. currently, when cast a numeric into decimal, we turn int/short/byte into double, then cast double into decimal, it's expensive, now we change to call Decimal(i.toLong)

other cleanups are not that necessary to the performance improvement, if they are risky for the release, we can delay that in 1.6.

private[sql] val ONE = Decimal(1)
private val ROUNDING_MODE = RoundingMode.HALF_UP
private val MATH_CONTEXT = new MathContext(DecimalType.MAX_PRECISION, ROUNDING_MODE)
private val POW_10 = Array.tabulate[Long](MAX_LONG_DIGITS + 1)(i => math.pow(10, i).toLong)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is now unused?

@JoshRosen
Copy link
Contributor

One naive question, since I'm not as familiar with the old long optimization: it looks like we used to use a long for storing the small values, but it looks like JavaBigDecimal stores them as an int instead. Does this imply that fewer values can now be stored using the compact representation?

I'm probably not the best judge of the riskiness of these changes, since I don't know as much about our decimal internals compared to other reviewers / contributors. Is there someone else that we should loop in for another pair of eyes?

@davies
Copy link
Contributor Author

davies commented Aug 8, 2015

@JoshRosen Good question, the intCompat is actually long. It confusing me in the beginning, so double check that. So it have same range as we, JavaBigDecimal should have done better job than us (handle conversion between each other).

@mateiz Do you have some cycle to review this? If it's risky for the release, I'd like to pull out some optimization as a separate PR.

@davies
Copy link
Contributor Author

davies commented Aug 8, 2015

After these work, I just realized that the _precision is also not necessary in Decimal, because JavaBigDecimal already have precision for the value. Decimal._precision came from DecimalType, it's the maximum precision the JavaBigDecimal could have, only used when we create a Decimal, or change_precision().

Also, HiveTypeCoercion needs some cleanup, lots of unnecessary casting.

@JoshRosen
Copy link
Contributor

Good question, the intCompat is actually long. It confusing me in the beginning, so double check that.

Yep, it's a long.

@davies
Copy link
Contributor Author

davies commented Aug 8, 2015

Created #8052

…gdecimal

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@mateiz
Copy link
Contributor

mateiz commented Aug 10, 2015

I didn't realize that Java's BigDecimal already has a shortcut for things that fit in a Long. That definitely simplifies it. In terms of this change, the biggest thing I'd look for is whether some of the math operations change from the Numeric type associated with this class. Hopefully the Hive tests catch those.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40322 has finished for PR 8018 at commit 4c2752e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Aug 11, 2015

This PR may be too late for 1.5 release, I'd like close it now. For 1.6, we may want to use Decimal128 from Hive.

@davies davies closed this Aug 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants