Pronoun lemmas inconsistent with docs #686

nickdavidhaynes · 2016-12-14T15:24:51Z

Unclear whether this is a bug or an issue with the docs, but when I lemmatize a pronoun in 1.3.0, I'm not seeing the -PRON- token a la https://spacy.io/docs/api/annotation#lemmatization:

>>> sentence = nlp('He is the man.')
>>> sentence[0].orth_
'He'
>>> sentence[0].lemma_
'he'
>>> sentence[0].pos_
'PRON'

My Environment

El Capitan
Python 3.5.2
spaCy 1.3.0

The text was updated successfully, but these errors were encountered:

honnibal · 2016-12-15T12:48:14Z

Thanks! Looks like I broke something in the morphological analysis.

honnibal · 2016-12-18T22:49:45Z

Turns out this has been the behaviour for a long time. Interesting.

The good news is that this means the model is correct — I was worried that I broke the calculation of these lemmas, and that meant the weights in the model we've all been using were wrong. That's not the case.

We'll fix this bug in version 2.0, because we don't want to switch the behaviour in the code, and invalidate the model.

nickdavidhaynes · 2016-12-18T23:29:26Z

Thanks! Definitely glad to hear the model wasn't wrong.

honnibal · 2017-03-16T22:58:49Z

Fixed in v1.7

drorata · 2018-01-04T14:28:15Z

Should I be surprised that sentence[0].lemma_ yields 'Er' given:

nlp = spacy.load('de')
sentence = nlp('Er is wieder da.')

I was expecting '-PRON-' as well.

lock · 2018-05-08T03:55:49Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the bug Bugs and behaviour differing from documentation label Dec 15, 2016

honnibal added 🌙 nightly Discussion and contributions related to nightly builds performance and removed bug Bugs and behaviour differing from documentation labels Dec 18, 2016

ines added this to the Update lemmatizer and morphology milestone Feb 18, 2017

ines added a commit that referenced this issue Mar 13, 2017

Add regression test for #686

444d665

honnibal closed this as completed Mar 16, 2017

ines added a commit that referenced this issue Mar 18, 2017

Add title-case version of morph rules (resolves #686)

aefb898

ines removed the 🌙 nightly Discussion and contributions related to nightly builds label May 7, 2017

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pronoun lemmas inconsistent with docs #686

Pronoun lemmas inconsistent with docs #686

nickdavidhaynes commented Dec 14, 2016

honnibal commented Dec 15, 2016

honnibal commented Dec 18, 2016

nickdavidhaynes commented Dec 18, 2016

honnibal commented Mar 16, 2017

drorata commented Jan 4, 2018

lock bot commented May 8, 2018

Pronoun lemmas inconsistent with docs #686

Pronoun lemmas inconsistent with docs #686

Comments

nickdavidhaynes commented Dec 14, 2016

My Environment

honnibal commented Dec 15, 2016

honnibal commented Dec 18, 2016

nickdavidhaynes commented Dec 18, 2016

honnibal commented Mar 16, 2017

drorata commented Jan 4, 2018

lock bot commented May 8, 2018