Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thinc.extra.search.Beam.advance has assertion error with custom entity labels #3047

Closed
sshegheva opened this issue Dec 12, 2018 · 2 comments
Closed
Labels
feat / ner Feature: Named Entity Recognizer 🔮 thinc spaCy's machine learning library Thinc

Comments

@sshegheva
Copy link

I was using code snippet from here #881 to compute confidence from the extracted entities. It works as expected until one introduces a new pipeline with custom entity type to add new dictionary terms via spacy-lookup.

     from spacy_lookup import Entity
    # add new keywords under ML label
     entity = Entity(keywords_list=["gradient", "neural network"], label="ML")
    nlp.add_pipe(entity, last=True)

Now we do the extraction with beam_parse method (so we can get the confidence):

        docs = list(nlp.pipe([text], disable=["ner"]))
        beams, _ = nlp.entity.beam_parse(docs,
                                              beam_width=3,
                                              beam_density=0.001)
        entity_scores = defaultdict(float)
        for doc, beam in zip(docs, beams):
            for score, ents in nlp.entity.moves.get_beam_parses(beam):
                for start, end, label in ents:
                    ent = doc[start:end]
                    if ent.text:  # do not write an empty entity
                        entity_scores[(ent.text.lower(), label)] += score

This throws an error:

  File "nn_parser.pyx", line 537, in spacy.syntax.nn_parser.Parser.beam_parse
 File "search.pyx", line 145, in thinc.extra.search.Beam.advance

Info about spaCy

  • spaCy version: 2.0.11
  • Platform: Darwin-18.0.0-x86_64-i386-64bit
  • Python version: 3.6.5
  • Models: en_core_web_md, en_coref_md, en_core_web_lg, en_core_web_sm

Note: If I "re-use" the label, and instead of "ML" for example pass a "PERSON", no exceptions are thrown. However, I need to be able to distinguish between different types and add custom dictionaries.

The main reason I am using beam_search here is that I can get confidence scores. If there is a different way to do that, I would be ok with calling the standard ner instead of the beam_search.

@ines ines added 🔮 thinc spaCy's machine learning library Thinc feat / ner Feature: Named Entity Recognizer labels Dec 14, 2018
@honnibal
Copy link
Member

Sorry for not getting to this sooner. This should be fixed in v2.1, which you can get from spacy-nightly. If it's still occurring, please reopen (or open a new issue).

@lock
Copy link

lock bot commented Mar 27, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Mar 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / ner Feature: Named Entity Recognizer 🔮 thinc spaCy's machine learning library Thinc
Projects
None yet
Development

No branches or pull requests

3 participants