SpaCy NER training example from version 1.5.0 doesn't work in 1.6.0 #773

ejschoen · 2017-01-24T21:27:58Z

I tried to use the training example here:

https://github.com/explosion/spaCy/blob/master/examples/training/train_ner.py

with SpaCy 1.6.0. I get results like this:

Who is Shaka Khan?
Who 1228 554 WP  2
is 474 474 VBZ PERSON 3
Shaka 57550 129921 NNP PERSON 1
Khan 12535 48600 NNP LOC 3
? 482 482 . LOC 3

I like London and Berlin
I 467 570 PRP LOC 3
like 502 502 VBP LOC 1
London 4003 24340 NNP LOC 3
and 470 470 CC PERSON 3
Berlin 11964 60816 NNP PERSON 1

The tagging is odd, and from Khan is recognized as a LOC and Berlin as a PERSON. If I back up to version 1.5.0, the result is as expected:

Who is Shaka Khan?
Who 1228 554 WP  2
is 474 474 VBZ  2
Shaka 57550 129921 NNP PERSON 3
Khan 12535 48600 NNP PERSON 1
? 482 482 .  2

I like London and Berlin
I 467 570 PRP  2
like 502 502 VBP  2
London 4003 24340 NNP LOC 3
and 470 470 CC  2
Berlin 11964 60816 NNP LOC 3

Could this be an issue with the off the shelf English model that spacy.en.download 1.6.0 fetched?

The text was updated successfully, but these errors were encountered:

honnibal · 2017-01-27T11:13:45Z

TL;DR

I made a bug fix to thinc for 1.6 that's messed up the example, as it's written.

The best fix is to not call .end_training() after updating the model. I'm working on making this less confusing.

What's going on

spaCy 1.x uses the Averaged Perceptron algorithm for all its machine learning. You can read about the algorithm in the POS tagger blog post, where you can also find a straight-forward Python implementation: https://explosion.ai/blog/part-of-speech-pos-tagger-in-python

AP uses the Averaged Parameter Trick for SGD. There are two copies of the weights:

The current weights,
The averaged weights

During training predictions are made with the current weights, and the averaged weights are updated in the background. At the end of training, we swap the current for the averages. This makes a huge difference for most training scenarios.

However, when I wrote the code, I didn't pay much attention to the current use-case of "resuming" training, in order to add another class. I recently fixed a long-standing error in the averaged perceptron code:

After loading a model, Thinc was not initialising the averages to the newly loaded weights. This saves memory, because the averages require another copy of the weights, and also additional book-keeping. The consequence of this bug was that when you updated a feature after resuming training, you wiped the weights that were previously associated with it. This is really bad --- it means that as you train new examples, you're deleting all the information previously associated with it.

I finally fixed this bug in this commit: explosion/thinc@09b030b

The consequence of this is that the correction makes the model behave differently on these small-data example cases.

What's still unclear is, how should we compute an average between the old weights and the new ones? The old weights were trained on about 20 passes over about 80,000 sentences of annotation. So the new 5 passes over 5 examples shouldn't change the weights at all if we take an unbiased average. This seems undesirable.

If you have so little data, it's probably not a good idea to average.

About NER and training more generally (making this the megathread)

#762 , #612 , #701, #665 . Attn: @savvopoulos, @viksit

People are having a lot of pain with training the NER system. Some of the problems are easy to fix --- the current workflow around saving and loading data is pretty bad, and it's made worse by some Python 2/3 unicode save/load bugs in the example scripts.

What's hard to solve is that people seem to want to train the NER system on like, 5 examples. The current algorithm expects more like 5,000. I realise I never wrote this anywhere, and the examples all show five examples. I guess I've been doing this stuff too long, and it's no longer obvious to me what is and isn't obvious. I think has been the root cause of a lot of confusion.

Things will improve with spaCy 2.0 a little bit. You might be able to get a useful model with as little as 500 or 1,000 sentences annotated with a new NER class. Maybe.

We're working on ways to make all of this more efficient. We're working on making annotation projects less expensive and more consistent, and we're working on algorithms that require fewer annotated examples. But there will always be limits.

The thing is...I think most teams should be annotating literally 10,000x as much data as they're currently trying to get away with. You should have at least 1,000 sentences just of evaluation data, that your machine learning model never sees. Otherwise how will you know that your system is working? By typing stuff into it, manually? You wouldn't test your other code like that, would you? :)

etchen99 · 2017-02-15T19:06:50Z

Are there alternative models that are more robust with respect to smaller datasets? Playing with luis.ai and wit.ai, their NERs seem to handle smaller datasets, but I'm not sure what they're using behind the scenes. Their models retrain pretty quickly, so they're likely not complex.

honnibal · 2017-02-15T19:39:45Z

@etchen99 : Neural network models will do better at this, because we'll be able to use transfer learning --- we can import knowledge from other tasks, about the language in general. That helps a lot when you don't have much data.

But, again: "not much data" is here "a few thousand sentences". I get that people want to train on a few dozen sentences. I think people shouldn't want that.

Annotated data will never not be a part of this type of machine learning, no matter what algorithm you're using --- because you're always going to need evaluation data. That won't ever change. If you're making a few thousand sentences of evaluation data, you may as well make a few thousand more for training.

badbye · 2017-03-08T04:43:17Z

@honnibal
Thanks for your explanation.

Currently, the example code of training and updating NER in the document only use 2 sentences, which is obviously not enough (I realize it after reading your comment).

I think if you put your explanation in the document, that will be better. Everyone tries to read the doc to learn something, they go to the issues only if they could not find what they want in the doc.

More problems about the example code

How to use the updated NER model?
Update: find an example here: https://spacy.io/docs/usage/training#train-entity
Seems the example is trying to retrain a NER model, not update the original one?

>>> # after running the example code, it does not work
>>> nlp(u'Who is Chaka Khan?').ents
()

badbye · 2017-03-08T04:51:32Z

According to this repo, I did find a way to update the original NER model. However, it does not support training new entities.

Example of training to extract the degress:

nlp = spacy.load('en')
ner = nlp.entity
text, tags = (u'B.S. in Mathmatics', [(0, 4, 'DEGREE')])  
doc = nlp.make_doc(text)
gold = GoldParse(doc, entities=tags)
ner.update(doc, gold)

Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
  File "spacy/syntax/parser.pyx", line 247, in spacy.syntax.parser.Parser.update (spacy/syntax/parser.cpp:7892)
  File "spacy/syntax/ner.pyx", line 93, in spacy.syntax.ner.BiluoPushDown.preprocess_gold (spacy/syntax/ner.cpp:4783)
  File "spacy/syntax/ner.pyx", line 123, in spacy.syntax.ner.BiluoPushDown.lookup_transition (spacy/syntax/ner.cpp:5379)
KeyError: u'U-DEGREE'

honnibal · 2017-03-31T10:10:25Z

The bugs around this should now be resolved, as of 1.7.3. See further discussion in #910.

Usability around the retrained NER still isn't great, but the situation is improving. This will be fully resolved once:

The docs are improved
The examples are updated
The training CLI interface is finished and documented
The save/load process is easier, once models can be pickled.

All of these things are underway in other threads, so I'll close this one.

lock · 2018-05-09T00:39:13Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the bug Bugs and behaviour differing from documentation label Jan 27, 2017

This was referenced Jan 27, 2017

What is the best way to train spaCy for NER #762

Closed

Named Entity Recognition: How to add 2 new classes to existing entities if I have a list of all possible words in txt file? #612

Closed

honnibal mentioned this issue Feb 4, 2017

Fixes for NER training with spaCy RasaHQ/rasa#136

Closed

thomasgirault mentioned this issue Feb 14, 2017

Low accuracy of POS tagger trained on Universal Dependency French corpus #827

Closed

ines added this to the Improve training API milestone Feb 18, 2017

jithurjacob mentioned this issue Feb 23, 2017

Spacy NER bad results RasaHQ/rasa#168

Closed

tmbo mentioned this issue Feb 23, 2017

Different result when data is trained with Mittie and spaCy + scikit-learn backend ? RasaHQ/rasa#164

Closed

jcbgamboa mentioned this issue Mar 13, 2017

Is the NER *actually* retrainable? #887

Closed

honnibal closed this as completed Mar 31, 2017

PHLF mentioned this issue Jun 23, 2017

Spacy_sklearn and Mitie_sklearn RasaHQ/rasa#440

Closed

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpaCy NER training example from version 1.5.0 doesn't work in 1.6.0 #773

SpaCy NER training example from version 1.5.0 doesn't work in 1.6.0 #773

ejschoen commented Jan 24, 2017

honnibal commented Jan 27, 2017

etchen99 commented Feb 15, 2017

honnibal commented Feb 15, 2017 •

edited

Loading

badbye commented Mar 8, 2017 •

edited

Loading

badbye commented Mar 8, 2017

honnibal commented Mar 31, 2017

lock bot commented May 9, 2018

SpaCy NER training example from version 1.5.0 doesn't work in 1.6.0 #773

SpaCy NER training example from version 1.5.0 doesn't work in 1.6.0 #773

Comments

ejschoen commented Jan 24, 2017

honnibal commented Jan 27, 2017

TL;DR

What's going on

About NER and training more generally (making this the megathread)

etchen99 commented Feb 15, 2017

honnibal commented Feb 15, 2017 • edited Loading

badbye commented Mar 8, 2017 • edited Loading

More problems about the example code

badbye commented Mar 8, 2017

honnibal commented Mar 31, 2017

lock bot commented May 9, 2018

honnibal commented Feb 15, 2017 •

edited

Loading

badbye commented Mar 8, 2017 •

edited

Loading