-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in support for Decimal Multiply and DIV #1564
Conversation
Signed-off-by: Ryan Lee <[email protected]>
Signed-off-by: Ryan Lee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor doc edits and a question about documenting the problem in an issue, but otherwise lgtm.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
lBase.binaryOp(binaryOp, rBase, outType) | ||
val tmp = lBase.binaryOp(binaryOp, rBase, outType) | ||
// In some cases the output type is ignored | ||
if (!outType.equals(tmp.getType) && castOutputAtEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we created a bug against cudf to track this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is expected behavior right now. If we want to push on it we can, but it is likely to make the cudf binary bigger without much of a real benefit.
lhs.binaryOp(binaryOp, rBase, outType) | ||
val tmp = lhs.binaryOp(binaryOp, rBase, outType) | ||
// In some cases the output type is ignored | ||
if (!outType.equals(tmp.getType) && castOutputAtEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -199,6 +220,9 @@ case class GpuIntegralDivide(left: Expression, right: Expression) extends GpuDiv | |||
|
|||
override def dataType: DataType = LongType | |||
override def outputTypeOverride: DType = DType.INT64 | |||
// CUDF does not support casting output implicitly for decimal binary ops, so we work around | |||
// it here were we want to force the output to be a Long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: :s/were/where
…into decimal_mult
Signed-off-by: Robert (Bobby) Evans <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]>
build |
Signed-off-by: Robert (Bobby) Evans <[email protected]> Co-authored-by: Ryan Lee <[email protected]>
Signed-off-by: Robert (Bobby) Evans <[email protected]> Co-authored-by: Ryan Lee <[email protected]>
Multiply probably needs a follow on issue filed to be able to fully support what cudf currently can support for multiply. The issue is around
PromotePrecision
andCheckOverflow
. In Spark during the logical planning phasehttps://github.com/apache/spark/blob/116f4cab6b05f4286a08b9f03b8ddfb48ec464cf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L125-L135
it will look at the two inputs to the multiply. Cast them so that the scale for both is what ever the greater of the two would be. and the part on the other side of the decimal point (what they call the range) is which ever would be larger. This means that if I multiply
1.01
precision 3 scale 2, by10.0
precision 3 scale 1. That both of them are widened to precision 4 scale 2.01.01
*10.00
. Then they will do the multiply. At this step spark lies about what the output type actually is. Multiply says that the output type is the same as the type of the left hand operator, but in reality spark is using java's big decimal for the multiplication so the precision is dynamic based off of what is needed to store the result and the scale is the sum of the two input scales, or10.1000
precision 6 scale 4. In the final step CheckOverflow will force the scale and precision to be what spark wants them to be. which is the sum of the precisions of the inputs +1, and the sum of the scales of the inputs. This means it will adjust the scale of10.1000
down to 3 (scale 2 + scale 1)10.100
and then verify that the precision is less than 7 (precision 3 + precision 3 + 1).In an ideal world we would just get the original inputs do the multiply, which would produce the correct result. I am not 100% sure why Spark will go through all of these extra steps and lie about the result. I do know that if the resulting precision or scale would go above what 128-bits could support then they use this to return a null on the overflow or throw an exception, and to truncate the scale if necessary.
My original plan for translating this to cudf was to look at the plan and throw out all of the PromotePrecision, CheckOverflow, and Casts because the cudf multiply will just do the right thing so long as we verify that the precision will fit in the output we expect. Sadly because of other optimizations in the logical plan before it becomes a part of the physical plan I have no good way to do this. If we are multiplying by a literal value the literal will have been turned into a single decimal literal value, and in some cases casts can disapear if they are a NOOP. So if I multiply by a literal byte (1.0) by a column of decimal(5, 2) that I happen to cast to decimal(7, 2) first
the byte gets automatically cast to a decimal(3,0) and then cast again as a part of widening to a decimal(7, 2) and the final plan looks like
So I have no good way of knowing that the Literal was actually a Decimal(3, 0). I could do pattern matching and see the resulting precision and scale. I could also see the cast to a Decimal(7,2) for "foo" but I cannot tell if that cast was automatically inserted by Spark to make the Decimal(5, 2) match whatever the literal was or if it was something inserted by the user on purpose. This is just one example, but there are lots more.
Instead I opted to just follow what spark does. I let GpuCast happen as spark wants to widen the inputs and then I let cudf do the multiplication, but I have extra checks in place to be sure that we do not overflow. Finally
GpuCheckOverflow
adjusts the scale if needed to get back to what spark wants.The impact of this means that some multiplication problems that cudf can support we will have to fall back to the CPU for because we don't know for sure that we could support them. CUDF can support any multiplication where the precision of both inputs + 1 is <= 18. Now it is a lot more complicated. If we want more documentation for this in the official release docs I can do that too. I suspect that for the remaining binary ops we will run into similar problems so I might hold off on the documentation until I have finished looking at them.
I might also file an issue for us to look at spark/cudf and see if there is something that we can do to avoid some of these issues.
I just added in support for
DIV
. It was a lot simpler because the output of spark for a DIV is always a long, so some of the complexity of multiply is not there. The main problem is that because of widening the children it still could result in some operations that we cannot do on the GPU even if we could without that step. If it becomes a customer issue we can look into it later.