-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contractions do not have the correct lemma #717
Comments
I looked it up in the code, it does seem like the lemma mentioned is there: https://github.com/explosion/spaCy/blob/master/spacy/en/tokenizer_exceptions.py#L896
Strange, seems like a bug? |
Thanks for the report. The data definitely looks correct, so this seems like a bug. I'm travelling today so can't easily check, so just to confirm: are you on the most recent version (1.5)? |
Yes (same session :-)):
It is not a problem with all contractions strangely.
|
I wonder whether the new exception data is being loaded for English...I think it might be preferring to load the exceptions in the model, using the (deprecated) text file. If so, least know the tokenizer is in sync with the existing trained weights. |
On a slightly unrelated note, I just realised that both |
@ines Feel free to have a look at https://github.com/kootenpv/contractions/blob/master/contractions/__init__.py and see if there is anything else you'd like to add. |
@kootenpv Ah, this is perfect, thanks! 👍 There are definitely a few that we haven't covered. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I made a pip package called contractions to solve contractions, but it is rather slow (even though I tried to optimise for speed). I did that before working with spacy :)
I'm mostly wondering why you handle it like this:
Why not replace '
re
withare
so that the lemma would be correct?The text was updated successfully, but these errors were encountered: