Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] add nested struct support for collect_set and merge_set aggregations #8972

Closed
revans2 opened this issue Aug 5, 2021 · 0 comments · Fixed by #9202
Closed

[FEA] add nested struct support for collect_set and merge_set aggregations #8972

revans2 opened this issue Aug 5, 2021 · 0 comments · Fixed by #9202
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Aug 5, 2021

Is your feature request related to a problem? Please describe.
In the Spark plugin we are pushing to try and support structs more fully. collect_list and merge_list already support structs, but collect_set and merge_set do not because the deduplication code cannot tell if nested values are the same. Cudf already support sorting structs of structs so it should be able to come up with a similar way to tell if these are equal and deduplicate them. To be 100% clear this is only for structs of basic types and structs of structs. We done need list support for this, because that appears to be a lot more difficult.

@revans2 revans2 added feature request New feature or request Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Aug 5, 2021
@ttnghia ttnghia self-assigned this Aug 10, 2021
@beckernick beckernick added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Aug 23, 2021
rapids-bot bot pushed a commit that referenced this issue Sep 21, 2021
This PR add support for struct type into the existing `drop_list_duplicates` API. This is the first time a nested type is supported in this function. Some more code cleanup has also been done.

To be clear: Only structs of basic types and structs of structs are supported. Structs of lists are not, due to their complex nature.

Closes #8972.
Blocked by #9218 (it is merged).

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - https://github.com/nvdbaranec
  - Mark Harris (https://github.com/harrism)

URL: #9202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants