[FEA] Simplify code for NaN handling in `lists/drop_list_duplicates` #9257

ttnghia · 2021-09-20T20:25:30Z

Currently, drop_list_duplicates requires an input parameter specifying whether NaN values should be considered as equal or not. This parameter fulfills different desired behaviors in both Pandas and Spark. Inside drop_list_duplicates, the implementation code needs to pass that parameter down to multiple levels, increasing the complexity of the implementation and leading to burdensome in maintanance.

We should simplify the code somehow, reducing the number of code paths, or at least removing the passing-down parameter. Another potential way for this may be as recommended in #9202 (comment), which worth to explore.

The text was updated successfully, but these errors were encountered:

github-actions · 2021-11-15T21:03:28Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-02-13T22:03:14Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

This PR completely removes `cudf::lists::drop_list_duplicates`. It is replaced by the new API `cudf::list::distinct` which has a simpler implementation but better performance. The replacements for internal cudf usage have all been merged before thus there is no side effect or breaking for the existing APIs in this work. Closes #11114, #11093, #11053, #11034, and closes #9257. Depends on: * #11228 * #11149 * #11234 * #11233 Authors: - Nghia Truong (https://github.com/ttnghia) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) - Robert Maynard (https://github.com/robertmaynard) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #11236

ttnghia added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 20, 2021

ttnghia mentioned this issue Sep 20, 2021

Add struct type support for drop_list_duplicates #9202

Merged

github-actions bot added the inactive-30d label Nov 15, 2021

github-actions bot added the inactive-90d label Feb 13, 2022

ttnghia self-assigned this Jun 30, 2022

ttnghia mentioned this issue Jun 30, 2022

[FEA] Deprecate lists::drop_list_duplicates #11114

Closed

This was referenced Jul 8, 2022

Fully support nested types in lists::drop_list_duplicates #11224

Closed

Remove lists::drop_list_duplicates #11236

Merged

rapids-bot bot closed this as completed in #11236 Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Simplify code for NaN handling in `lists/drop_list_duplicates` #9257

[FEA] Simplify code for NaN handling in `lists/drop_list_duplicates` #9257

ttnghia commented Sep 20, 2021

github-actions bot commented Nov 15, 2021

github-actions bot commented Feb 13, 2022

[FEA] Simplify code for NaN handling in lists/drop_list_duplicates #9257

[FEA] Simplify code for NaN handling in lists/drop_list_duplicates #9257

Comments

ttnghia commented Sep 20, 2021

github-actions bot commented Nov 15, 2021

github-actions bot commented Feb 13, 2022

[FEA] Simplify code for NaN handling in `lists/drop_list_duplicates` #9257

[FEA] Simplify code for NaN handling in `lists/drop_list_duplicates` #9257