-
Notifications
You must be signed in to change notification settings - Fork 932
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor
collect_set
to use cudf::distinct
and `cudf::lists::dist…
…inct` (#11228) The current groupby/reducttion `collect_set` aggregations use `lists::drop_list_duplicates` to generate set(s) of distinct elements. This PR changes that to use `cudf::distinct` and `cudf::lists::distinct` instead, which have some advantages including: * Fully supporting nested types, and: * Achieving better performance (`O(n)` instead of `O(nlogn)`) by internally using hash table instead of segmented sort. This also enables nested types support for `collect_set` in spark-rapids (issue NVIDIA/spark-rapids#5508). The changes in Java code here are only to fix unit tests. Previously, they were implemented with the assumption that the `collect_set` results are sorted, now they fail when the results are no longer sorted. Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Jason Lowe (https://github.com/jlowe) - David Wendt (https://github.com/davidwendt) - MithunR (https://github.com/mythrocks) URL: #11228
- Loading branch information
Showing
9 changed files
with
701 additions
and
431 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.