[BUG] Stream compaction drop_duplicates
does not use stable sort when removing duplicates
#9356
Labels
bug
Something isn't working
libcudf
Affects libcudf (C++/CUDA) code.
non-breaking
Non-breaking change
Currently, the stream compaction API
drop_duplicates
has an option allowing to keep the first/last duplicate element. For example, if the input keys are[1, 1, 2, 2]
and values are[1, 2, 3, 4]
, then removing duplicates (by keys) withKEEP_FIRST
option should result in the values[1, 3]
.Internally,
drop_duplicates
uses sorting to sort the keys elements then usesunique_copy
. WithKEEP_FIRST
andKEEP_LAST
options, stable sort should be used to guarantee to have the expected result. However, the current implementation is using the default unstable sort.Since unstable sort may produce the same result as stable sort, the current unit tests for
drop_duplicates
still pass all. But we should switch to use stable sort ASAP.The text was updated successfully, but these errors were encountered: