-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down #34593
[SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down #34593
Conversation
Seq(Row(BigDecimal("5.9"), BigDecimal("6"))) | ||
) | ||
checkAnswer( | ||
df.withColumn("value_brounded", round('value, 0, "down")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: is there any reference of DBMSes that supports down
, up
, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql-server does something similar :
https://docs.microsoft.com/en-us/sql/t-sql/functions/round-transact-sql?view=sql-server-ver15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we match the argument and usage with other references? I think we don't have to bother to come up with spark's own argument style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please give me an example ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently don't - I haven't taken a look around other DBMSes on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAP Hana SQL ROUND function takes rounding mode as argument as we do : https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20e6a27575191014bd54a07fd86c585d.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROUND(<number> [, <position> [, <rounding_mode>]])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there seems to be no standard / de-fact standard implementation of round
which has the third argument.
As you mention, SAP Hana and MS SQL Server allow the third argument for round
but they are still ununiform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -413,6 +413,18 @@ final class Decimal extends Ordered[Decimal] with Serializable { | |||
if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 0) { | |||
longVal += (if (droppedDigits < 0) -1L else 1L) | |||
} | |||
case ROUND_UP => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the changePrecision method for ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN. If anyone could help me on this that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Also do you have any input on this please ? I need to integrate that in DecimalSuite.scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation, changePrecision
should not take ROUND_UP
, ROUND_DOWN
and ROUND_HALF_DOWN
. If we have such case, it's a bug so how about having an assertion before this match
expression or simply falling to the default _
branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sarutak thanks for your review. Out of curiosity, may i know why it will be a bug if changePrecision
takes ROUND_UP
, ROUND_DOWN
or ROUND_HALF_DOWN
?
So we could do something like this :
case ROUND_UP | ROUND_DOWN | ROUND_HALF_DOWN | _ =>
throw QueryExecutionErrors.unsupportedRoundingMode(roundMode)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, may i know why it will be a bug if changePrecision takes ROUND_UP, ROUND_DOWN or ROUND_HALF_DOWN ?
It's not expected that ROUND_UP
, ROUND_DOWN
and ROUND_HALF_DOWN
are not expected to be passed to changePrecision
and it's private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i see
Test Fails on compatibility check against spark-sql_2.12:3.2.0! but it is normal, no ? any idea how can we fix it ? error] spark-sql: Failed binary compatibility check against org.apache.spark:spark-sql_2.12:3.2.0! Found 2 potential problems (filtered 203) |
@gengliangwang Could you please kindly take a look at this ? thank you very much. |
ok to test |
* if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0. | ||
* | ||
* @group math_funcs | ||
* @since 1.5.0 | ||
*/ | ||
def round(e: Column, scale: Int): Column = withExpr { Round(e.expr, Literal(scale)) } | ||
def round(e: Column, scale: Int, mode: String = "half_up"): Column = withExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks binary compatibility in Java because the default argument doesn't work. we should overload it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that fixed it. I'm currently fixing the schema mismatch fails. But i'm not sure why the kubernetes integration test is failing, do you have any input on that, please ?
Test build #145261 has finished for PR 34593 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145282 has finished for PR 34593 at commit
|
def round(e: Column, scale: Int): Column = round(e, scale, "half_up") | ||
|
||
/** | ||
* Round the value of `e` to `scale` decimal places with given round mode, default: HALF_UP | ||
* if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0. | ||
* | ||
* @group math_funcs | ||
* @since 1.5.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would have to be 3.3.0
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145350 has finished for PR 34593 at commit
|
Kubernetes integration test starting |
Test build #145339 has finished for PR 34593 at commit
|
Test build #145341 has finished for PR 34593 at commit
|
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status failure |
Test build #145552 has finished for PR 34593 at commit
|
Test build #145551 has finished for PR 34593 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145597 has finished for PR 34593 at commit
|
@@ -242,7 +242,7 @@ | |||
| org.apache.spark.sql.catalyst.expressions.Reverse | reverse | SELECT reverse('Spark SQL') | struct<reverse(Spark SQL):string> | | |||
| org.apache.spark.sql.catalyst.expressions.Right | right | SELECT right('Spark SQL', 3) | struct<right(Spark SQL, 3):string> | | |||
| org.apache.spark.sql.catalyst.expressions.Rint | rint | SELECT rint(12.3456) | struct<rint(12.3456):double> | | |||
| org.apache.spark.sql.catalyst.expressions.Round | round | SELECT round(2.5, 0) | struct<round(2.5, 0):decimal(2,0)> | | |||
| org.apache.spark.sql.catalyst.expressions.Round | round | SELECT round(2.5, 0) | struct<round(2.5, 0, half_up):decimal(2,0)> | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every DBMS systems supports the third parameter. Some of the external connectors are relying on the SQL representation string of expressions.
So, shall we hide the 3rd parameter if it is the default "half_up" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do that. I'll wait for the opinions of other committers too.
This doesn't seem not related to the proposal in this PR. |
I investigated whether the major DBMS and query processing systems (PostgreSQL, BigQuery, Snowflake, MySQL, SAP HANA, MS SQL Server, Firebird, DB2, Teradata, Hive) support the third parameter of @sathiyapk @HyukjinKwon @gengliangwang What do you think? |
@sarutak yeah I did some investigations too. This feature seems not commonly used. |
I think people can just use |
floor and ceil can't control the position of rounding directly |
@sarutak Currently, we are relying on UDF for rounding, when i googled to do it via spark native methods, i found a lot of people have the same issue and it's not complex to support it via spark native methods, so i opened this PR. |
@cloud-fan @gengliangwang I confirm |
Then can we add a |
@cloud-fan I like the idea of adding |
@sathiyapk you are right, but I can also argue that the We can collect more data points, I found one example that snowflake supports scale parameter in floor/ceil: https://docs.snowflake.com/en/sql-reference/functions/floor.html |
@cloud-fan In our case, the first function we looked into was If everyone agree I will open a new PR for the scale parameter to |
+1 for adding the scale parameter to |
@gengliangwang @HyukjinKwon what do you think ? |
+1 for adding the scale parameter to floor and ceil. |
### What changes were proposed in this pull request? Adds `scale` parameter to `floor`/`ceil` functions in order to allow users to control the rounding position. This feature is proposed in the PR: #34593 ### Why are the changes needed? Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN (bround). But we have use cases that needs RoundingMode.UP and RoundingMode.DOWN. Floor and Ceil functions helps to do this but it doesn't support the position of the rounding. Adding scale parameter to the functions would help us control the rounding positions. Snowflake supports `scale` parameter to `floor`/`ceil` : ` FLOOR( <input_expr> [, <scale_expr> ] )` REF: https://docs.snowflake.com/en/sql-reference/functions/floor.html ### Does this PR introduce _any_ user-facing change? Now users can pass `scale` parameter to the `floor` and `ceil` functions. ``` > SELECT floor(-0.1); -1.0 > SELECT floor(5); 5 > SELECT floor(3.1411, 3); 3.141 > SELECT floor(3.1411, -3); 1000.0 > SELECT ceil(-0.1); 0.0 > SELECT ceil(5); 5 > SELECT ceil(3.1411, 3); 3.142 > SELECT ceil(3.1411, -3); 1000.0 ``` ### How was this patch tested? This patch was tested locally using unit test and git workflow. Closes #34729 from sathiyapk/SPARK-37475-floor-ceil-scale. Authored-by: Sathiya KUMAR <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Why are the changes needed?
Currently there's no easier and straight forward way to round decimals with the mode UP, DOWN and HALF_DOWN. People need to use UDF or complex operations to do the same.
Opening support for the other rounding modes might interest a lot of use cases.
SAP Hana Sql ROUND function does it :
ROUND(<number> [, <position> [, <rounding_mode>]])
REF : https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20e6a27575191014bd54a07fd86c585d.html
Sql Server does something similar to this :
ROUND ( numeric_expression , length [ ,function ] )
REF : https://docs.microsoft.com/en-us/sql/t-sql/functions/round-transact-sql?view=sql-server-ver15
Does this PR introduce any user-facing change?
Now, users can specify the rounding mode while calling round function for round modes up, down, half_down. Calling round function without rounding mode will default to half_up.
How was this patch tested?
This patch was tested locally using unit test and git workflow.