Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down #34593

Closed

Conversation

sathiyapk
Copy link
Contributor

@sathiyapk sathiyapk commented Nov 14, 2021

What changes were proposed in this pull request?

  1. Adds support for Decimal RoundingMode.UP, DOWN and HALF_DOWN by letting users to pass the rounding mode as argument to round function.
  2. bround function calls round function with the argument "half_even"

Why are the changes needed?

Currently there's no easier and straight forward way to round decimals with the mode UP, DOWN and HALF_DOWN. People need to use UDF or complex operations to do the same.

Opening support for the other rounding modes might interest a lot of use cases.

SAP Hana Sql ROUND function does it : 

ROUND(<number> [, <position> [, <rounding_mode>]])

REF : https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20e6a27575191014bd54a07fd86c585d.html

Sql Server does something similar to this :

ROUND ( numeric_expression , length [ ,function ] )

REF : https://docs.microsoft.com/en-us/sql/t-sql/functions/round-transact-sql?view=sql-server-ver15 

Does this PR introduce any user-facing change?

Now, users can specify the rounding mode while calling round function for round modes up, down, half_down. Calling round function without rounding mode will default to half_up.

> SELECT round(3.145, 2)
3.15

>SELECT round(3.145, 2, 'down')
3.14
df.withColumn("value_rounded", round('value, 0)

df.withColumn("value_rounded", round('value, 0, "down")

How was this patch tested?

This patch was tested locally using unit test and git workflow.

@github-actions github-actions bot added the SQL label Nov 14, 2021
@HyukjinKwon HyukjinKwon changed the title SPARK-37324 Adds support for decimal rounding mode up, down, half_down [SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down Nov 14, 2021
Seq(Row(BigDecimal("5.9"), BigDecimal("6")))
)
checkAnswer(
df.withColumn("value_brounded", round('value, 0, "down")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: is there any reference of DBMSes that supports down, up, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we match the argument and usage with other references? I think we don't have to bother to come up with spark's own argument style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give me an example ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently don't - I haven't taken a look around other DBMSes on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@sathiyapk sathiyapk Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROUND(<number> [, <position> [, <rounding_mode>]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there seems to be no standard / de-fact standard implementation of round which has the third argument.
As you mention, SAP Hana and MS SQL Server allow the third argument for round but they are still ununiform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -413,6 +413,18 @@ final class Decimal extends Ordered[Decimal] with Serializable {
if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 0) {
longVal += (if (droppedDigits < 0) -1L else 1L)
}
case ROUND_UP =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the changePrecision method for ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN. If anyone could help me on this that would be great.

Copy link
Contributor Author

@sathiyapk sathiyapk Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Also do you have any input on this please ? I need to integrate that in DecimalSuite.scala.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, changePrecision should not take ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN. If we have such case, it's a bug so how about having an assertion before this match expression or simply falling to the default _ branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sarutak thanks for your review. Out of curiosity, may i know why it will be a bug if changePrecision takes ROUND_UP, ROUND_DOWN or ROUND_HALF_DOWN ?

So we could do something like this :

case ROUND_UP | ROUND_DOWN | ROUND_HALF_DOWN | _ =>
            throw QueryExecutionErrors.unsupportedRoundingMode(roundMode)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, may i know why it will be a bug if changePrecision takes ROUND_UP, ROUND_DOWN or ROUND_HALF_DOWN ?

It's not expected that ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN are not expected to be passed to changePrecision and it's private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i see

@sathiyapk
Copy link
Contributor Author

Test Fails on compatibility check against spark-sql_2.12:3.2.0! but it is normal, no ? any idea how can we fix it ?

error] spark-sql: Failed binary compatibility check against org.apache.spark:spark-sql_2.12:3.2.0! Found 2 potential problems (filtered 203)
[error] * static method round(org.apache.spark.sql.Column,Int)org.apache.spark.sql.Column in class org.apache.spark.sql.functions does not have a correspondent in current version
[error] filter with: ProblemFilters.excludeDirectMissingMethodProblem
[error] * method round(org.apache.spark.sql.Column,Int)org.apache.spark.sql.Column in object org.apache.spark.sql.functions does not have a correspondent in current version

@sathiyapk
Copy link
Contributor Author

@gengliangwang Could you please kindly take a look at this ? thank you very much.

@HyukjinKwon
Copy link
Member

ok to test

* if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
*
* @group math_funcs
* @since 1.5.0
*/
def round(e: Column, scale: Int): Column = withExpr { Round(e.expr, Literal(scale)) }
def round(e: Column, scale: Int, mode: String = "half_up"): Column = withExpr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks binary compatibility in Java because the default argument doesn't work. we should overload it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that fixed it. I'm currently fixing the schema mismatch fails. But i'm not sure why the kubernetes integration test is failing, do you have any input on that, please ?

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Test build #145261 has finished for PR 34593 at commit 3b1357b.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49730/

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49730/

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49752/

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49752/

@SparkQA
Copy link

SparkQA commented Nov 16, 2021

Test build #145282 has finished for PR 34593 at commit 0366617.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

def round(e: Column, scale: Int): Column = round(e, scale, "half_up")

/**
* Round the value of `e` to `scale` decimal places with given round mode, default: HALF_UP
* if `scale` is greater than or equal to 0 or at integral part when `scale` is less than 0.
*
* @group math_funcs
* @since 1.5.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would have to be 3.3.0

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49810/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49810/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49812/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49812/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145350 has finished for PR 34593 at commit 5dc12af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49821/

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145339 has finished for PR 34593 at commit 36538d1.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Test build #145341 has finished for PR 34593 at commit 97220a2.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49821/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50023/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50024/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50023/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50024/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145552 has finished for PR 34593 at commit ed83e8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145551 has finished for PR 34593 at commit b7c2cc6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50069/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50069/

@SparkQA
Copy link

SparkQA commented Nov 25, 2021

Test build #145597 has finished for PR 34593 at commit e7f6261.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -242,7 +242,7 @@
| org.apache.spark.sql.catalyst.expressions.Reverse | reverse | SELECT reverse('Spark SQL') | struct<reverse(Spark SQL):string> |
| org.apache.spark.sql.catalyst.expressions.Right | right | SELECT right('Spark SQL', 3) | struct<right(Spark SQL, 3):string> |
| org.apache.spark.sql.catalyst.expressions.Rint | rint | SELECT rint(12.3456) | struct<rint(12.3456):double> |
| org.apache.spark.sql.catalyst.expressions.Round | round | SELECT round(2.5, 0) | struct<round(2.5, 0):decimal(2,0)> |
| org.apache.spark.sql.catalyst.expressions.Round | round | SELECT round(2.5, 0) | struct<round(2.5, 0, half_up):decimal(2,0)> |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not every DBMS systems supports the third parameter. Some of the external connectors are relying on the SQL representation string of expressions.
So, shall we hide the 3rd parameter if it is the default "half_up" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that. I'll wait for the opinions of other committers too.

@gengliangwang
Copy link
Member

Sql Server does something similar to this :

ROUND ( numeric_expression , length [ ,function ] )

REF : https://docs.microsoft.com/en-us/sql/t-sql/functions/round-transact-sql?view=sql-server-ver15 

This doesn't seem not related to the proposal in this PR.
Is there any other DBMS supporting the rounding mode as 3rd parameter?

@sarutak
Copy link
Member

sarutak commented Nov 25, 2021

I investigated whether the major DBMS and query processing systems (PostgreSQL, BigQuery, Snowflake, MySQL, SAP HANA, MS SQL Server, Firebird, DB2, Teradata, Hive) support the third parameter of round but I found only SAP HANA and MS SQL Server do.

@sathiyapk
How many users need to change the rounding modes and how often the additional rounding modes are used?
If users really need up, down and half_down, I think it's not difficult to implement a UDF by using BigDecimal.setScale.

@HyukjinKwon @gengliangwang What do you think?

@gengliangwang
Copy link
Member

@sarutak yeah I did some investigations too. This feature seems not commonly used.

@cloud-fan
Copy link
Contributor

I think people can just use floor and ceil functions if they need a different rounding mode?

@gengliangwang
Copy link
Member

I think people can just use floor and ceil functions if they need a different rounding mode?

floor and ceil can't control the position of rounding directly

@sathiyapk
Copy link
Contributor Author

sathiyapk commented Nov 25, 2021

@sarutak Currently, we are relying on UDF for rounding, when i googled to do it via spark native methods, i found a lot of people have the same issue and it's not complex to support it via spark native methods, so i opened this PR.

@sathiyapk
Copy link
Contributor Author

@cloud-fan @gengliangwang I confirm floor and ceil don't match our use case.

@cloud-fan
Copy link
Contributor

Then can we add a scale parameter to floor and ceil? This leads to better API design IMO: we have 3 functions for different rounding modes, and each function takes a scale parameter.

@sathiyapk
Copy link
Contributor Author

@cloud-fan I like the idea of adding scale parameter to floor and ceil. This would simulate the rounding mode UP and down for positive integers, but for negative integers it will work differently i believe ?

@cloud-fan
Copy link
Contributor

@sathiyapk you are right, but I can also argue that the round function with a rounding mode can not do what I want by using the floor/ceil function with a scale parameter. The problem is which behavior is more common/reasonable and can be useful to more users?

We can collect more data points, I found one example that snowflake supports scale parameter in floor/ceil: https://docs.snowflake.com/en/sql-reference/functions/floor.html

@sathiyapk
Copy link
Contributor Author

@cloud-fan In our case, the first function we looked into was floor/ceil but since it didn't take the scale parameter we had to look for other ways. So i believe adding a scale parameter to floor/ceil is a good idea and also I believe the rounding of negative numbers are not so common.

If everyone agree I will open a new PR for the scale parameter to floor/ceil function.

@sarutak
Copy link
Member

sarutak commented Nov 25, 2021

+1 for adding the scale parameter to floor and ceil.

@sathiyapk
Copy link
Contributor Author

@gengliangwang @HyukjinKwon what do you think ?

@gengliangwang
Copy link
Member

+1 for adding the scale parameter to floor and ceil.

cloud-fan pushed a commit that referenced this pull request Feb 21, 2022
### What changes were proposed in this pull request?

Adds `scale` parameter to `floor`/`ceil` functions in order to allow users to control the rounding position. This feature is proposed in the PR: #34593

### Why are the changes needed?

Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN (bround). But we have use cases that needs RoundingMode.UP and RoundingMode.DOWN.

Floor and Ceil functions helps to do this but it doesn't support the position of the rounding. Adding scale parameter to the functions would help us control the rounding positions.

Snowflake supports `scale` parameter to `floor`/`ceil` :
` FLOOR( <input_expr> [, <scale_expr> ] )`

REF:
https://docs.snowflake.com/en/sql-reference/functions/floor.html

### Does this PR introduce _any_ user-facing change?

Now users can pass `scale` parameter to the `floor` and `ceil` functions.
 ```
     > SELECT floor(-0.1);
       -1.0
      > SELECT floor(5);
       5
      > SELECT floor(3.1411, 3);
       3.141
      > SELECT floor(3.1411, -3);
       1000.0

      > SELECT ceil(-0.1);
       0.0
      > SELECT ceil(5);
       5
      > SELECT ceil(3.1411, 3);
       3.142
      > SELECT ceil(3.1411, -3);
       1000.0

```
### How was this patch tested?

This patch was tested locally using unit test and git workflow.

Closes #34729 from sathiyapk/SPARK-37475-floor-ceil-scale.

Authored-by: Sathiya KUMAR <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@github-actions
Copy link

github-actions bot commented Mar 7, 2022

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 7, 2022
@github-actions github-actions bot closed this Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants