-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Support various outcomes for NaN equality comparison in equality comparator #10781
Comments
See also #4760 I think this change to the row comparators is fine so long as it is proven that it does not impact performance for all existing cases. However, I still maintain that it is a bad idea to make any promises that we will support the full extent of NaN nonsense that Spark does. |
I'm thinking about a way that allows having different NaN outcomes while that has minimal impact on cudf build and its downstream usage. One way for this is to have template internal APIs (similar to #10870) but we only instantiate one NaN config in cudf (which is the default, current config). The remaining config (which Spark wants) will be instantiated somewhere else (rapids-spark-jni). |
That is the solution I described here: #4760 (comment) It's possible for some things, but it would be a good bit of work to make it robust. |
Further splitting up #9452 -- split off at the suggestion of @bdice Related to #10781 and #4760 -- issues and discussions related to NaN comparison behavior. Authors: - Ryan Lee (https://github.com/rwlee) - Bradley Dice (https://github.com/bdice) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #10870
This issue has been labeled |
Now NaN handling should be good with the current experimental row comparator. Close this. |
Currently,
cudf::equality_compare
specialized for floating point numbers considersNaNs
as equal. That result may not be desirable in some cases. Depending on situations, sometimes we want to haveNaNs
considered as equal, but some other time we want the opposite.We should reimplement
cudf::equality_compare
and the corresponding comparators chain (element comparator, row comparator etc), adding an optional parameter:Currently, there are many APIs that support this parameter (such as
lists::drop_list_duplicates
) but these APIs implement such support locally. As more and more APIs adopt it,NaN
handling should be supported in a more structural way.The text was updated successfully, but these errors were encountered: