-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rank and dense_rank Spark window function #6289
Conversation
✅ Deploy Preview for meta-velox canceled.
|
88aea96
to
cdd649d
Compare
@mbasmanova @rui-mo Can you help to review? |
cdd649d
to
4a16f23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit @rui-mo Aditi, Rui, would you help review this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Minor comment
4a16f23
to
3ccdf4f
Compare
@aditi-pandit @rui-mo Do you have any further comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Thanks !
Thanks @aditi-pandit. @mbasmanova @rui-mo Do you have any comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf Looks good. Please, update PR description to wrap each line at 80 characters.
There are CI failures. Please, rebase to the latest to see if these go away. |
733d8d2
to
ac258a4
Compare
@mbasmanova updated PR description. And the CI is passed. Please help to merge. Thanks. |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ac258a4
to
029f25d
Compare
@kevinwilfong merged this pull request in e85f942. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike Rank function, DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint. This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from velox/functions/prestosql/window into the velox/functions/window. And also provide registerRankBigint and registerRankInteger for prestosql and sparksql. Pull Request resolved: facebookincubator#6289 Reviewed By: laithsakka Differential Revision: D48827183 Pulled By: kevinwilfong fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
Summary: Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike Rank function, DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint. This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from velox/functions/prestosql/window into the velox/functions/window. And also provide registerRankBigint and registerRankInteger for prestosql and sparksql. Pull Request resolved: facebookincubator#6289 Reviewed By: laithsakka Differential Revision: D48827183 Pulled By: kevinwilfong fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
Summary: Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike Rank function, DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint. This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from velox/functions/prestosql/window into the velox/functions/window. And also provide registerRankBigint and registerRankInteger for prestosql and sparksql. Pull Request resolved: facebookincubator#6289 Reviewed By: laithsakka Differential Revision: D48827183 Pulled By: kevinwilfong fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
Summary: Spark [Rank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1006) computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Spark [DenseRank function](https://github.com/apache/spark/blob/f824d058b14e3c58b1c90f64fefc45fac105c7dd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L1044) computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike Rank function, DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint. This PR refer the [nth_value()](https://github.com/facebookincubator/velox/blob/b9be1718a70f3f81d184cd1dc57134552a2ed96a/velox/functions/lib/window/NthValue.h#L20) function and move the Rank.cpp file from velox/functions/prestosql/window into the velox/functions/window. And also provide registerRankBigint and registerRankInteger for prestosql and sparksql. Pull Request resolved: facebookincubator#6289 Reviewed By: laithsakka Differential Revision: D48827183 Pulled By: kevinwilfong fbshipit-source-id: cbf2e5a8da2bef0593304bf630fa32b1fc071559
Spark Rank function computes the rank of a value in a group of values.
The result is one plus the number of rows preceding or equal to the current row in
the ordering of the partition. The values will produce gaps in the sequence.
Spark DenseRank function computes the rank of a value in a group of values.
The result is one plus the previously assigned rank value. Unlike Rank function,
DenseRank will not produce gaps in the ranking sequence. The difference between sparksql and
prestosql is the return type, where the sparksql's return type is integer and the prestosql's is bigint.
This PR refer the nth_value() function and move the Rank.cpp file from
velox/functions/prestosql/window into the velox/functions/window.
And also provide registerRankBigint and registerRankInteger for prestosql and sparksql.