[FEA] add nested struct support for collect_set and merge_set aggregations #8972
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
Milestone
Is your feature request related to a problem? Please describe.
In the Spark plugin we are pushing to try and support structs more fully. collect_list and merge_list already support structs, but collect_set and merge_set do not because the deduplication code cannot tell if nested values are the same. Cudf already support sorting structs of structs so it should be able to come up with a similar way to tell if these are equal and deduplicate them. To be 100% clear this is only for structs of basic types and structs of structs. We done need list support for this, because that appears to be a lot more difficult.
The text was updated successfully, but these errors were encountered: