Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rank and dense_rank Spark window function #6289

Closed
wants to merge 1 commit into from

Conversation

JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented Aug 28, 2023

Spark Rank function computes the rank of a value in a group of values.
The result is one plus the number of rows preceding or equal to the current row in
the ordering of the partition. The values will produce gaps in the sequence.
Spark DenseRank function computes the rank of a value in a group of values.
The result is one plus the previously assigned rank value. Unlike Rank function,
DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and
prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint.
This PR refer the nth_value() function and move the Rank.cpp file from
velox/functions/prestosql/window into the velox/functions/window.
And also provide registerRankBigint and registerRankInteger for prestosql and sparksql.

@netlify
Copy link

netlify bot commented Aug 28, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 029f25d
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64f1529a17e58f00070c2d2c

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 28, 2023
@JkSelf JkSelf force-pushed the rank-window-function branch 2 times, most recently from 88aea96 to cdd649d Compare August 28, 2023 02:42
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 28, 2023

@mbasmanova @rui-mo Can you help to review?

@JkSelf JkSelf force-pushed the rank-window-function branch from cdd649d to 4a16f23 Compare August 28, 2023 05:19
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit @rui-mo Aditi, Rui, would you help review this PR?

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Minor comment

@JkSelf JkSelf force-pushed the rank-window-function branch from 4a16f23 to 3ccdf4f Compare August 30, 2023 03:12
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 30, 2023

@aditi-pandit @rui-mo Do you have any further comment?

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Thanks !

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 30, 2023

Thanks @aditi-pandit. @mbasmanova @rui-mo Do you have any comment?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf Looks good. Please, update PR description to wrap each line at 80 characters.

@mbasmanova
Copy link
Contributor

There are CI failures. Please, rebase to the latest to see if these go away.

@JkSelf JkSelf force-pushed the rank-window-function branch 2 times, most recently from 733d8d2 to ac258a4 Compare August 30, 2023 10:24
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 30, 2023

@mbasmanova updated PR description. And the CI is passed. Please help to merge. Thanks.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kevinwilfong merged this pull request in e85f942.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit e85f942b.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values.
 The result is one plus the number of rows preceding or equal to the current row in
 the ordering of the partition. The values will produce gaps in the sequence.
 Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values.
The result is one plus the previously assigned rank value. Unlike Rank function,
 DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and
 prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint.
 This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from
 velox/functions/prestosql/window into the velox/functions/window.
And also provide registerRankBigint and registerRankInteger for prestosql and sparksql.

Pull Request resolved: facebookincubator#6289

Reviewed By: laithsakka

Differential Revision: D48827183

Pulled By: kevinwilfong

fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values.
 The result is one plus the number of rows preceding or equal to the current row in
 the ordering of the partition. The values will produce gaps in the sequence.
 Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values.
The result is one plus the previously assigned rank value. Unlike Rank function,
 DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and
 prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint.
 This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from
 velox/functions/prestosql/window into the velox/functions/window.
And also provide registerRankBigint and registerRankInteger for prestosql and sparksql.

Pull Request resolved: facebookincubator#6289

Reviewed By: laithsakka

Differential Revision: D48827183

Pulled By: kevinwilfong

fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
codyschierbeck pushed a commit to codyschierbeck/velox that referenced this pull request Sep 27, 2023
Summary:
Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values.
 The result is one plus the number of rows preceding or equal to the current row in
 the ordering of the partition. The values will produce gaps in the sequence.
 Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values.
The result is one plus the previously assigned rank value. Unlike Rank function,
 DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and
 prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint.
 This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from
 velox/functions/prestosql/window into the velox/functions/window.
And also provide registerRankBigint and registerRankInteger for prestosql and sparksql.

Pull Request resolved: facebookincubator#6289

Reviewed By: laithsakka

Differential Revision: D48827183

Pulled By: kevinwilfong

fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values.
 The result is one plus the number of rows preceding or equal to the current row in
 the ordering of the partition. The values will produce gaps in the sequence.
 Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values.
The result is one plus the previously assigned rank value. Unlike Rank function,
 DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and
 prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint.
 This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from
 velox/functions/prestosql/window into the velox/functions/window.
And also provide registerRankBigint and registerRankInteger for prestosql and sparksql.

Pull Request resolved: facebookincubator#6289

Reviewed By: laithsakka

Differential Revision: D48827183

Pulled By: kevinwilfong

fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants