Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexemes are unhashable (v0.101.0) #371

Closed
bwegge opened this issue May 12, 2016 · 6 comments
Closed

Lexemes are unhashable (v0.101.0) #371

bwegge opened this issue May 12, 2016 · 6 comments

Comments

@bwegge
Copy link

bwegge commented May 12, 2016

When I try to add Lexemes to a set or dict, it fails since Lexemes are unhashable:

cat = nlp.vocab['cat']
dog = nlp.vocab['dog']
my_animals = {cat, dog}

Traceback (most recent call last):

  File "<ipython-input-30-8ffec97fae23>", line 1, in <module>
    my_animals = {cat, dog}

TypeError: unhashable type: 'spacy.lexeme.Lexeme'

Maybe lexeme.orth can be used (together with lexeme.lang) as hash value?

Another funny observation is that looking up the same word multiple times through nlp.vocab[word] produces Lexemes at different addresses (although comparison works thanks to the newly implemented rich comparison):

nlp.vocab['cat']
Out[17]: <spacy.lexeme.Lexeme at 0xe865401e10>

nlp.vocab['cat']
Out[18]: <spacy.lexeme.Lexeme at 0xe865401d80>
@honnibal
Copy link
Member

To save memory, the Lexeme class is a wrapper around the LexemeC struct. So the Python objects are indeed created afresh each time. You can see the implementation here: https://github.com/spacy-io/spaCy/blob/master/spacy/lexeme.pyx#L31

Adding a __hash__ method is a good idea though. Will do.

@bwegge
Copy link
Author

bwegge commented May 12, 2016

Sounds reasonable, thanks for the explanation!

@lylebrown
Copy link

Is there a workaround for this in the meantime? I'm new to NLP and trying to follow this guide, specifically the part where it mentions word vector representations.

@jr-pe
Copy link

jr-pe commented Jul 12, 2016

@lylebrown
Replace the curly braces ({ }) with square brackets ([ ]) in the following line:

allWords = list({w for w in parser.vocab if w.has_vector and w.orth_.islower() and w.lower_ != "nasa"})

@syllog1sm
Copy link
Contributor

Btw the line should probably be:

allWords = [w for w in parser.vocab if w.has_vector and w.is_lower and w.lower_ != "nasa"]

The old .repvec property is now named .vector, too.

The __hash__ method will be there in the next release.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants