-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc.noun_chunks Sentence Length Bug #693
Milestone
Comments
It looks like the noun chunk detection rules could be improved here. The issue comes from the combination of coordination and proper nouns: def english_noun_chunks(obj):
'''Detect base noun phrases from a dependency parse.
Works on both Doc and Span.'''
labels = ['nsubj', 'dobj', 'nsubjpass', 'pcomp', 'pobj',
'attr', 'ROOT', 'root']
doc = obj.doc # Ensure works on both Doc and Span.
np_deps = [doc.vocab.strings[label] for label in labels]
conj = doc.vocab.strings['conj']
np_label = doc.vocab.strings['NP']
for i, word in enumerate(obj):
if word.pos in (NOUN, PROPN, PRON) and word.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label
elif word.pos == NOUN and word.dep == conj:
head = word.head
while head.dep == conj and head.head.i < head.i:
head = head.head
# If the head is an NP, and we're coordinated to it, we're an NP
if head.dep in np_deps:
yield word.left_edge.i, word.i+1, np_label I think the correction should be:
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
doc.noun_chunks doesn't parse the complete sentence.
Test1:
Produces the output:
But Test2:
Produces Output:
Although both are identified properly they are done so only when they come early in a sentence and are ignored when they appear near the end.
The text was updated successfully, but these errors were encountered: