-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subject Object Extraction within Spacy #523
Comments
Try
If the entity recogniser is picking up An alternative solution is to use You can get a feel for this using the displaCy visualizer: https://demos.explosion.ai/displacy/?text=Bloomberg%20announced%20today%20that%20Gordian%20Capital%2C%20a%20Singapore-based%20institutional%20fund%20management%20platform%2C%20will%20implement%20the%20Bloomberg%20Entity%20Exchange%20solution%20to%20help%20its%20clients%20pursue%20new%20fund%20opportunities%20faster.&model=en&cpu=1&cph=1 Toggle the option "collapse phrases" to see how the retokenization works. |
Where is the script for this code? Thank you though, the solution looks quite ideal. |
@honnibal One last thing What I would like to do is for the token to go further right and further left Morgan Stanley being the Subject Perhaps like a while loop : subj token list matches 1 entity or more Thoughts? Much Appreciated |
The problem here seems to me to be that import spacy
def merge_phrase(matcher, doc, i, matches):
'''
Merge a phrase. We have to be careful here because we'll change the token indices.
To avoid problems, merge all the phrases once we're called on the last match.
'''
if i != len(matches)-1:
return None
# Get Span objects
spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
for ent_id, label, span in spans:
span.merge(label=label, tag='NNP' if label else span.root.tag_)
nlp = spacy.load('en')
nlp.matcher.add_entity('MorganStanley', on_match=merge_phrase)
nlp.matcher.add_pattern('MorganStanley', [{'orth': 'Morgan'}, {'orth': 'Stanley'}], label='ORG')
nlp.pipeline = [nlp.tagger, nlp.entity, nlp.matcher, nlp.parser]
# Okay, now we've got our pipeline set up...
doc = nlp(u'Morgan Stanley fires Vice President')
for word in doc:
print(word.text, word.tag_, word.dep_, word.head.text, word.ent_type_) |
@honnibal Hope all is well Shouldn't Morgan Stanley should be defined as a ORG as in the bank Morgan Stanley? |
Gah. Short on sleep :p. Edited, thanks. |
Oh I hope you get some rest soon Alright i will replace doc = nlp(u'Morgan Stanley fires Vice President') with and i will run it now to test this |
@honnibal |
What version are you running? |
I believe its the latest one and on Python3 The big picture is to use these entities as replacements for Subjects and Objects when we are outputting the SVO. |
Well, if you just want to go left one, you might want to look at the |
@honnibal Thank you |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi
I'm using the code written by nicschrading for Subject Verb Object Extraction
I/m wondering how come the subject doesnt represent the entities analyzed by Spacy
for example the sentence "Bloomberg announced today that Gordian Capital, a Singapore-based institutional fund management platform, will implement the Bloomberg Entity Exchange solution to help its clients pursue new fund opportunities faster."
SVO = "('capital', 'implement', 'solution'), ('clients', 'pursue', 'opportunities')"
Is there a way to make the subject Gordian Capital instead of just capital?
Thank you
The text was updated successfully, but these errors were encountered: