Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pronoun lemmas inconsistent with docs #686

Closed
nickdavidhaynes opened this issue Dec 14, 2016 · 6 comments
Closed

Pronoun lemmas inconsistent with docs #686

nickdavidhaynes opened this issue Dec 14, 2016 · 6 comments

Comments

@nickdavidhaynes
Copy link

Unclear whether this is a bug or an issue with the docs, but when I lemmatize a pronoun in 1.3.0, I'm not seeing the -PRON- token a la https://spacy.io/docs/api/annotation#lemmatization:

>>> sentence = nlp('He is the man.')
>>> sentence[0].orth_
'He'
>>> sentence[0].lemma_
'he'
>>> sentence[0].pos_
'PRON'

My Environment

  • El Capitan
  • Python 3.5.2
  • spaCy 1.3.0
@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Dec 15, 2016
@honnibal
Copy link
Member

Thanks! Looks like I broke something in the morphological analysis.

@honnibal
Copy link
Member

Turns out this has been the behaviour for a long time. Interesting.

The good news is that this means the model is correct — I was worried that I broke the calculation of these lemmas, and that meant the weights in the model we've all been using were wrong. That's not the case.

We'll fix this bug in version 2.0, because we don't want to switch the behaviour in the code, and invalidate the model.

@honnibal honnibal added 🌙 nightly Discussion and contributions related to nightly builds performance and removed bug Bugs and behaviour differing from documentation labels Dec 18, 2016
@nickdavidhaynes
Copy link
Author

Thanks! Definitely glad to hear the model wasn't wrong.

@ines ines added this to the Update lemmatizer and morphology milestone Feb 18, 2017
ines added a commit that referenced this issue Mar 13, 2017
@honnibal
Copy link
Member

Fixed in v1.7

@ines ines removed the 🌙 nightly Discussion and contributions related to nightly builds label May 7, 2017
@drorata
Copy link

drorata commented Jan 4, 2018

Should I be surprised that sentence[0].lemma_ yields 'Er' given:

nlp = spacy.load('de')
sentence = nlp('Er is wieder da.')

I was expecting '-PRON-' as well.

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants