thinc.extra.search.Beam.advance has assertion error with custom entity labels #3047

sshegheva · 2018-12-12T19:09:08Z

I was using code snippet from here #881 to compute confidence from the extracted entities. It works as expected until one introduces a new pipeline with custom entity type to add new dictionary terms via spacy-lookup.

     from spacy_lookup import Entity
    # add new keywords under ML label
     entity = Entity(keywords_list=["gradient", "neural network"], label="ML")
    nlp.add_pipe(entity, last=True)

Now we do the extraction with beam_parse method (so we can get the confidence):

        docs = list(nlp.pipe([text], disable=["ner"]))
        beams, _ = nlp.entity.beam_parse(docs,
                                              beam_width=3,
                                              beam_density=0.001)
        entity_scores = defaultdict(float)
        for doc, beam in zip(docs, beams):
            for score, ents in nlp.entity.moves.get_beam_parses(beam):
                for start, end, label in ents:
                    ent = doc[start:end]
                    if ent.text:  # do not write an empty entity
                        entity_scores[(ent.text.lower(), label)] += score

This throws an error:

  File "nn_parser.pyx", line 537, in spacy.syntax.nn_parser.Parser.beam_parse
 File "search.pyx", line 145, in thinc.extra.search.Beam.advance

Info about spaCy

spaCy version: 2.0.11
Platform: Darwin-18.0.0-x86_64-i386-64bit
Python version: 3.6.5
Models: en_core_web_md, en_coref_md, en_core_web_lg, en_core_web_sm

Note: If I "re-use" the label, and instead of "ML" for example pass a "PERSON", no exceptions are thrown. However, I need to be able to distinguish between different types and add custom dictionaries.

The main reason I am using beam_search here is that I can get confidence scores. If there is a different way to do that, I would be ok with calling the standard ner instead of the beam_search.

The text was updated successfully, but these errors were encountered:

honnibal · 2019-02-25T21:15:49Z

Sorry for not getting to this sooner. This should be fixed in v2.1, which you can get from spacy-nightly. If it's still occurring, please reopen (or open a new issue).

lock · 2019-03-27T21:32:04Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added 🔮 thinc spaCy's machine learning library Thinc feat / ner Feature: Named Entity Recognizer labels Dec 14, 2018

honnibal closed this as completed Feb 25, 2019

lock bot locked as resolved and limited conversation to collaborators Mar 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thinc.extra.search.Beam.advance has assertion error with custom entity labels #3047

thinc.extra.search.Beam.advance has assertion error with custom entity labels #3047

sshegheva commented Dec 12, 2018

honnibal commented Feb 25, 2019

lock bot commented Mar 27, 2019

thinc.extra.search.Beam.advance has assertion error with custom entity labels #3047

thinc.extra.search.Beam.advance has assertion error with custom entity labels #3047

Comments

sshegheva commented Dec 12, 2018

Info about spaCy

honnibal commented Feb 25, 2019

lock bot commented Mar 27, 2019