Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-43475: Fix worst case collision behavior for NaN instances #25493

Merged
merged 6 commits into from
Apr 22, 2021

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Apr 21, 2021

@rhettinger rhettinger removed the request for review from tiran April 21, 2021 03:38
@rhettinger rhettinger added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 21, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @rhettinger for commit 056a4f7 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 21, 2021
@rhettinger rhettinger added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 21, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @rhettinger for commit 9f3f9e9 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 21, 2021
@rhettinger rhettinger merged commit a07da09 into python:master Apr 22, 2021
@realead
Copy link

realead commented Jun 10, 2021

Is my understanding right, that this PR would break the following code:

import math

class A:
    def __init__(self, a):
        self.a=a
    def __hash__(self):
        return hash(self.a)
    def __eq__(self, other):
        if(math.isnan(self.a) and math.isnan(other.a)):
            return True
        return self.a == other.a
    def __repr__(self):
        return str(self.a)
        
set([A(float("nan")), A(float("nan"))])  # result: {nan}

I.e. when somebody tries to wrap Float and change __eq__ in such a way, that all float-nans will be equivalent?

With this PR, the chances are high, that the result will be {nan, nan}, as hashes from both objects will be different.

Until now, it was clear - don't put nans into set/dict because the default "="-relation for floats isn't an equivalence relation. People worked around this by redefining the "="-relation and didn't so for hash function because until now "a,b - nans => hash(a)=hash(b)" was given.


I think the intuitive behavior for set([float("nan"), float("nan")] is {nan} and not {nan, nan}. Given how Py_EQ is defined for floats, this is not possible. Maybe there is need for a new Py_EQ_FOR_HASH comparator, which would be used in hashset/hashdict and be more or less the same as Py_EQ but would yield true for nan==nan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants