[FEA] Support various outcomes for NaN equality comparison in equality comparator #10781

ttnghia · 2022-05-03T23:01:33Z

Currently, cudf::equality_compare specialized for floating point numbers considers NaNs as equal. That result may not be desirable in some cases. Depending on situations, sometimes we want to have NaNs considered as equal, but some other time we want the opposite.

We should reimplement cudf::equality_compare and the corresponding comparators chain (element comparator, row comparator etc), adding an optional parameter:

enum class nan_equality /*unspecified*/ {
  ALL_EQUAL,  ///< All NaNs compare equal, regardless of sign
  UNEQUAL     ///< All NaNs compare unequal (IEEE754 behavior)
};

Currently, there are many APIs that support this parameter (such as lists::drop_list_duplicates) but these APIs implement such support locally. As more and more APIs adopt it, NaN handling should be supported in a more structural way.

The text was updated successfully, but these errors were encountered:

jrhemstad · 2022-05-04T01:49:39Z

See also #4760

I think this change to the row comparators is fine so long as it is proven that it does not impact performance for all existing cases.

However, I still maintain that it is a bad idea to make any promises that we will support the full extent of NaN nonsense that Spark does.

ttnghia · 2022-05-20T19:17:45Z

I'm thinking about a way that allows having different NaN outcomes while that has minimal impact on cudf build and its downstream usage. One way for this is to have template internal APIs (similar to #10870) but we only instantiate one NaN config in cudf (which is the default, current config). The remaining config (which Spark wants) will be instantiated somewhere else (rapids-spark-jni).

jrhemstad · 2022-05-21T03:45:52Z

I'm thinking about a way that allows having different NaN outcomes while that has minimal impact on cudf build and its downstream usage. One way for this is to have template internal APIs (similar to #10870) but we only instantiate one NaN config in cudf (which is the default, current config). The remaining config (which Spark wants) will be instantiated somewhere else (rapids-spark-jni).

That is the solution I described here: #4760 (comment)

It's possible for some things, but it would be a good bit of work to make it robust.

@bdice

Further splitting up #9452 -- split off at the suggestion of @bdice Related to #10781 and #4760 -- issues and discussions related to NaN comparison behavior. Authors: - Ryan Lee (https://github.com/rwlee) - Bradley Dice (https://github.com/bdice) Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #10870

github-actions · 2022-06-20T04:07:33Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

ttnghia · 2022-06-20T04:43:20Z

Now NaN handling should be good with the current experimental row comparator. Close this.

ttnghia added feature request New feature or request Needs Triage Need team to review and classify labels May 3, 2022

ttnghia mentioned this issue May 3, 2022

[FEA] cudf::lists::contains to support NaN == NaN #10741

Closed

ttnghia added the Spark Functionality that helps Spark RAPIDS label May 3, 2022

rwlee mentioned this issue May 17, 2022

Configurable NaN handling in device_row_comparators #10870

Merged

github-actions bot added the inactive-30d label Jun 20, 2022

ttnghia closed this as completed Jun 20, 2022

bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support various outcomes for NaN equality comparison in equality comparator #10781

[FEA] Support various outcomes for NaN equality comparison in equality comparator #10781

ttnghia commented May 3, 2022 •

edited

Loading

jrhemstad commented May 4, 2022

ttnghia commented May 20, 2022

jrhemstad commented May 21, 2022

github-actions bot commented Jun 20, 2022

ttnghia commented Jun 20, 2022

[FEA] Support various outcomes for NaN equality comparison in equality comparator #10781

[FEA] Support various outcomes for NaN equality comparison in equality comparator #10781

Comments

ttnghia commented May 3, 2022 • edited Loading

jrhemstad commented May 4, 2022

ttnghia commented May 20, 2022

jrhemstad commented May 21, 2022

github-actions bot commented Jun 20, 2022

ttnghia commented Jun 20, 2022

ttnghia commented May 3, 2022 •

edited

Loading