-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9303] Decimal should use java.math.Decimal directly instead of using Scala wrapper #8018
Conversation
Test build #40137 has finished for PR 8018 at commit
|
Test build #40139 has finished for PR 8018 at commit
|
Test build #40147 has finished for PR 8018 at commit
|
cc @JoshRosen for review |
Test build #40228 has finished for PR 8018 at commit
|
Test build #1414 has finished for PR 8018 at commit
|
Cool! I wasn't actually expecting this to lead to such a big performance improvement. |
*/ | ||
def set(unscaled: Long, precision: Int, scale: Int): Decimal = { | ||
if (setOrNull(unscaled, precision, scale) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that we should replace this old check with an assert, just to be safe / guard against mistakes made by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Most of the performance improvement came from two parts:
other cleanups are not that necessary to the performance improvement, if they are risky for the release, we can delay that in 1.6. |
private[sql] val ONE = Decimal(1) | ||
private val ROUNDING_MODE = RoundingMode.HALF_UP | ||
private val MATH_CONTEXT = new MathContext(DecimalType.MAX_PRECISION, ROUNDING_MODE) | ||
private val POW_10 = Array.tabulate[Long](MAX_LONG_DIGITS + 1)(i => math.pow(10, i).toLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is now unused?
One naive question, since I'm not as familiar with the old I'm probably not the best judge of the riskiness of these changes, since I don't know as much about our decimal internals compared to other reviewers / contributors. Is there someone else that we should loop in for another pair of eyes? |
@JoshRosen Good question, the @mateiz Do you have some cycle to review this? If it's risky for the release, I'd like to pull out some optimization as a separate PR. |
After these work, I just realized that the Also, HiveTypeCoercion needs some cleanup, lots of unnecessary casting. |
Yep, it's a |
Created #8052 |
…gdecimal Conflicts: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
I didn't realize that Java's BigDecimal already has a shortcut for things that fit in a Long. That definitely simplifies it. In terms of this change, the biggest thing I'd look for is whether some of the math operations change from the Numeric type associated with this class. Hopefully the Hive tests catch those. |
Test build #40322 has finished for PR 8018 at commit
|
This PR may be too late for 1.5 release, I'd like close it now. For 1.6, we may want to use Decimal128 from Hive. |
This PR is based on #7635 , thanks to @JoshRosen .
Because java.math.BigDecimal already has the same optimization for precision < 18 (use
long
for unscaledValue), so we don't need to duplicated this in Decimal. Also removed _scale, java.math.BigDecimal already have it.In order to have unified hashCode, we still use scala.math.BigDecimal.hashCode()
After this patch, we could have end-to-end about 100% performance improvement on test which has sum of multiplication of short and decimal (SUM(short * decimal(5,2)), from 19 seconds to 9.6 seconds).