Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve hash efficiency by directly using str/unicode hash #746

Merged
merged 1 commit into from
May 26, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions rdflib/term.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,10 +200,11 @@ def __ge__(self, other):
return True
return self == other

def __hash__(self):
t = type(self)
fqn = t.__module__ + '.' + t.__name__
return hash(fqn) ^ hash(text_type(self))
# use parent's hash for efficiency reasons
# clashes of 'foo', URIRef('foo') and Literal('foo') are typically so rare
# that they don't justify additional overhead. Notice that even in case of
# clash __eq__ is still the fallback and very quick in those cases.
__hash__ = text_type.__hash__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this give you something to attach the comment to, otherwise this line does nothing? the class already inherits from text_type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in py2 this is irrelevant, in py3 (from https://docs.python.org/3/reference/datamodel.html#object.__hash__):

A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When the __hash__() method of a class is None, instances of the class will raise an appropriate TypeError when a program attempts to retrieve their hash value, and will also be correctly identified as unhashable when checking isinstance(obj, collections.Hashable).

If a class that overrides __eq__() needs to retain the implementation of __hash__() from a parent class, the interpreter must be told this explicitly by setting __hash__ = <ParentClass>.__hash__.

We override __eq__ in two places: Identifier and Literal, both also have an explicit __hash__, as they would in py3 otherwise fail to be hashable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see! you learn something new every day!



class URIRef(Identifier):
Expand Down Expand Up @@ -924,7 +925,8 @@ def __hash__(self):
-- 6.5.1 Literal Equality (RDF: Concepts and Abstract Syntax)

"""
res = super(Literal, self).__hash__()
# don't use super()... for efficiency reasons, see Identifier.__hash__
res = text_type.__hash__(self)
if self.language:
res ^= hash(self.language.lower())
if self.datatype:
Expand Down