-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexemes are unhashable (v0.101.0) #371
Comments
To save memory, the Lexeme class is a wrapper around the LexemeC struct. So the Python objects are indeed created afresh each time. You can see the implementation here: https://github.com/spacy-io/spaCy/blob/master/spacy/lexeme.pyx#L31 Adding a |
Sounds reasonable, thanks for the explanation! |
Is there a workaround for this in the meantime? I'm new to NLP and trying to follow this guide, specifically the part where it mentions word vector representations. |
@lylebrown allWords = list({w for w in parser.vocab if w.has_vector and w.orth_.islower() and w.lower_ != "nasa"}) |
Btw the line should probably be: allWords = [w for w in parser.vocab if w.has_vector and w.is_lower and w.lower_ != "nasa"] The old The |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
When I try to add Lexemes to a set or dict, it fails since Lexemes are unhashable:
Maybe lexeme.orth can be used (together with lexeme.lang) as hash value?
Another funny observation is that looking up the same word multiple times through
nlp.vocab[word]
produces Lexemes at different addresses (although comparison works thanks to the newly implemented rich comparison):The text was updated successfully, but these errors were encountered: