Add noun_chunks to Span #658

pokey · 2016-11-24T10:50:45Z

Add noun_chunks to Span

Description

Support iterating noun_chunks of a Span. Also add doc attribute to Docclass for uniformity. Wasn't sure the best way to remove code duplication between noun_chunks in Doc and noun_chunks in Span.

Motivation and Context

Useful to be able to find noun_chunks in a span, rather than the whole doc. Eg all noun_chunks in a sentence.

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality to spaCy)
Breaking change (fix or feature causing change to spaCy's existing functionality)
Documentation (Addition to documentation of spaCy)

Checklist:

My code follows spaCy's code style.
My change requires a change to spaCy's documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

honnibal

Great patch! Very elegant implementation.

I think this patch is further evidence that we should add an AbstractBaseClass over Doc and Span. Then we could deduplicate these things and declare the interface better. I've avoided this so far because I usually favour a "you ain't gonna need it" approach to introducing abstractions. I think it's time to make the switch, though.

I think TokenSequence might be a good name for this.

Update the German noun chunks iterator, so that it also works on Span objects.

honnibal · 2016-11-24T12:33:04Z

I noticed that the German iterator needed the same trick. It's unfortunately difficult to avoid some of this per-language duplication — I think the cure is likely to be worse than the disease, because there's no way to say for sure which parts of the logic different languages will always share.

honnibal · 2016-11-24T12:36:06Z

Btw @pokey — I'd like to add you to the contributors.md list, if that's okay? We usually list contributors as "Full name, username", but an alias instead of the full name would be fine, too. Let me know what you prefer :)

pokey · 2016-11-24T14:27:32Z

Thanks :-)

Re TokenSequence: Agreed. Would be nice to formalize the extent to which these two classes can be treated the same

Re attribution: Pokey Rule, pokey

pokey · 2016-11-24T14:39:03Z

Also, why do we use doc.vocab.strings at all? Why not just import the symbols from spacy.symbols? Then we wouldn't need to access the doc object.

pokey · 2016-11-24T14:42:41Z

Finally, why don't we allow PROPN and PRON after conjunctions?

tokestermw · 2017-05-16T22:19:47Z

Hi, I've tested .noun_chunks on a Span and it doesn't seem to be working (on SpaCy 1.8.2).

    doc = nlp(u"Employees are recruiting talented staffers from overseas.")
    list(doc.noun_chunks)  # [Employees, talented staffers]

    span = doc[0:4]
    list(span.noun_chunks)  # ERROR

    for sent in doc.sents:
        list(sent.noun_chunks)  # ERROR

And I get the following error.

/lib/python2.7/site-packages/spacy/tokens/span.pyx in __get__ (spacy/tokens/span.cpp:7229)()
    231             spans = []
    232             for start, end, label in self.doc.noun_chunks_iterator(self):
--> 233                 spans.append(Span(self, start, end, label=label))
    234             for span in spans:
    235                 yield span

TypeError: Argument 'doc' has incorrect type (expected spacy.tokens.doc.Doc, got spacy.tokens.span.Span)

tokestermw · 2017-05-16T22:31:14Z

Small hack but I was able to use noun_chunks_iterator from the doc.

[sent[left_i:right_i] for left_i, right_i, _ in sent.doc.noun_chunks_iterator(sent)]
# [Employees, talented staffers]

Add noun_chunks to Span

3e3bda1

honnibal approved these changes Nov 24, 2016

View reviewed changes

Allow German noun chunks to work on Span

b8c4f5e

Update the German noun chunks iterator, so that it also works on Span objects.

honnibal merged commit 1f24795 into explosion:master Nov 24, 2016

IamJeffG mentioned this pull request Jul 20, 2017

Span.noun_chunks raises TypeError #1207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add noun_chunks to Span #658

Add noun_chunks to Span #658

pokey commented Nov 24, 2016

honnibal left a comment

honnibal commented Nov 24, 2016 •

edited

Loading

honnibal commented Nov 24, 2016

pokey commented Nov 24, 2016

pokey commented Nov 24, 2016

pokey commented Nov 24, 2016

tokestermw commented May 16, 2017

tokestermw commented May 16, 2017

Add noun_chunks to Span #658

Add noun_chunks to Span #658

Conversation

pokey commented Nov 24, 2016

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

honnibal left a comment

Choose a reason for hiding this comment

honnibal commented Nov 24, 2016 • edited Loading

honnibal commented Nov 24, 2016

pokey commented Nov 24, 2016

pokey commented Nov 24, 2016

pokey commented Nov 24, 2016

tokestermw commented May 16, 2017

tokestermw commented May 16, 2017

honnibal commented Nov 24, 2016 •

edited

Loading