-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add noun_chunks to Span #658
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great patch! Very elegant implementation.
I think this patch is further evidence that we should add an AbstractBaseClass over Doc and Span. Then we could deduplicate these things and declare the interface better. I've avoided this so far because I usually favour a "you ain't gonna need it" approach to introducing abstractions. I think it's time to make the switch, though.
I think TokenSequence might be a good name for this.
Update the German noun chunks iterator, so that it also works on Span objects.
I noticed that the German iterator needed the same trick. It's unfortunately difficult to avoid some of this per-language duplication — I think the cure is likely to be worse than the disease, because there's no way to say for sure which parts of the logic different languages will always share. |
Btw @pokey — I'd like to add you to the contributors.md list, if that's okay? We usually list contributors as "Full name, username", but an alias instead of the full name would be fine, too. Let me know what you prefer :) |
Thanks :-) Re Re attribution: Pokey Rule, pokey |
Also, why do we use |
Finally, why don't we allow |
Hi, I've tested
And I get the following error.
|
Small hack but I was able to use
|
Add noun_chunks to Span
Description
Support iterating noun_chunks of a Span. Also add
doc
attribute toDoc
class for uniformity. Wasn't sure the best way to remove code duplication betweennoun_chunks
inDoc
andnoun_chunks
inSpan
.Motivation and Context
Useful to be able to find noun_chunks in a span, rather than the whole doc. Eg all noun_chunks in a sentence.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: