Subject Object Extraction within Spacy #523

Mustyy · 2016-10-13T20:39:11Z

Hi

I'm using the code written by nicschrading for Subject Verb Object Extraction
I/m wondering how come the subject doesnt represent the entities analyzed by Spacy
for example the sentence "Bloomberg announced today that Gordian Capital, a Singapore-based institutional fund management platform, will implement the Bloomberg Entity Exchange solution to help its clients pursue new fund opportunities faster."

SVO = "('capital', 'implement', 'solution'), ('clients', 'pursue', 'opportunities')"

Is there a way to make the subject Gordian Capital instead of just capital?

Thank you

honnibal · 2016-10-13T23:34:03Z

Try

import spacy

nlp = spacy.load('en')
doc = nlp(u'Bloomberg announced today that Gordian Capital, a Singapore-based institutional fund management platform, will implement the Bloomberg Entity Exchange solution to help its clients pursue new fund opportunities faster.')

for ent in list(doc.ents):
    ent.merge(ent.tag_, ent.text, ent.ent_type_)

If the entity recogniser is picking up Gordian Capital as a named entity, then this should retokenize it, so that you get one token. This makes the subsequent logic much easier to write.

An alternative solution is to use word.subtree, or word.left_edge and word.right_edge. This allows you to get the section of the dependency tree, instead of just the subject word.

You can get a feel for this using the displaCy visualizer: https://demos.explosion.ai/displacy/?text=Bloomberg%20announced%20today%20that%20Gordian%20Capital%2C%20a%20Singapore-based%20institutional%20fund%20management%20platform%2C%20will%20implement%20the%20Bloomberg%20Entity%20Exchange%20solution%20to%20help%20its%20clients%20pursue%20new%20fund%20opportunities%20faster.&model=en&cpu=1&cph=1

Toggle the option "collapse phrases" to see how the retokenization works.

Mustyy · 2016-10-17T18:58:26Z

Where is the script for this code?
Or rather can I insert this code into the svo script written by nicschrading?

Thank you though, the solution looks quite ideal.

honnibal · 2016-10-17T19:26:41Z

https://github.com/explosion/spacy-services/blob/master/displacy/displacy_service/parse.py

Mustyy · 2016-10-20T17:07:44Z

@honnibal
Hey
Thanks so much for the insights

One last thing
Is there a way to find the Index of entity & tokens to extract better Subject's and Object's
For instance take the sentence " Today Morgan Stanley fires Vice President due to allegations of corruption"
The SVO = Stanley fires Vice

What I would like to do is for the token to go further right and further left
So that we end up with "Morgan Stanley fires Vice President"

Morgan Stanley being the Subject
fires being the Verb
Vice President or VP being the object

Perhaps like a while loop : subj token list matches 1 entity or more
add 1 more token to the list

Thoughts?

Much Appreciated
Thank you

honnibal · 2016-10-20T17:29:05Z

The problem here seems to me to be that Morgan Stanley isn't found as a named entity. How about this:

import spacy


def merge_phrase(matcher, doc, i, matches):
    '''
    Merge a phrase. We have to be careful here because we'll change the token indices.
    To avoid problems, merge all the phrases once we're called on the last match.
    '''
    if i != len(matches)-1:
        return None
    # Get Span objects
    spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
    for ent_id, label, span in spans:
        span.merge(label=label, tag='NNP' if label else span.root.tag_)

nlp = spacy.load('en')
nlp.matcher.add_entity('MorganStanley', on_match=merge_phrase)
nlp.matcher.add_pattern('MorganStanley', [{'orth': 'Morgan'}, {'orth': 'Stanley'}], label='ORG')
nlp.pipeline = [nlp.tagger, nlp.entity, nlp.matcher, nlp.parser]

# Okay, now we've got our pipeline set up...
doc = nlp(u'Morgan Stanley fires Vice President')
for word in doc:
    print(word.text, word.tag_, word.dep_, word.head.text, word.ent_type_)

Mustyy · 2016-10-20T18:09:05Z

@honnibal
Hi

Hope all is well

Shouldn't Morgan Stanley should be defined as a ORG as in the bank Morgan Stanley?
and can I replace doc = nlp(u'Morgan Stanley fires Vice President') with
doc = nlp(u'Today Morgan Stanley fires Vice President due to allegations of corruption')

honnibal · 2016-10-20T18:09:44Z

Gah. Short on sleep :p. Edited, thanks.

Mustyy · 2016-10-20T18:11:38Z

Oh I hope you get some rest soon

Alright i will replace doc = nlp(u'Morgan Stanley fires Vice President') with
doc = nlp(u'Today Morgan Stanley fires Vice President due to allegations of corruption')

and i will run it now to test this

Mustyy · 2016-10-20T18:28:54Z

@honnibal
Quick note
When I run it I get
Traceback (most recent call last):
File "spacypipe1.py", line 20, in
nlp.matcher.add_entity('MorganStanley', on_match=merge_phrase)
AttributeError: 'spacy.matcher.Matcher' object has no attribute 'add_entity'

honnibal · 2016-10-20T18:35:17Z

What version are you running?

Mustyy · 2016-10-20T18:36:25Z

@honnibal

I believe its the latest one and on Python3
I will do an upgrade install now

The big picture is to use these entities as replacements for Subjects and Objects when we are outputting the SVO.
So the token would refer to the index of the entity to find out what completes "Stanley"
so going left once would result in a Subject = "Morgan Stanley"
Does that make sense?

honnibal · 2016-10-20T18:40:33Z

Well, if you just want to go left one, you might want to look at the token.nbor() method and the token.i attribute.

Mustyy · 2016-10-20T18:53:37Z

@honnibal Thank you
Well both options together would be ideal as well
But I have to get the first part working, it's still throwing the error

lock · 2018-05-09T08:12:03Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Mustyy closed this as completed Oct 20, 2016

Mustyy reopened this Oct 20, 2016

honnibal added the enhancement Feature requests and improvements label Oct 20, 2016

chssch added a commit to chssch/spaCy that referenced this issue Oct 22, 2016

Add merge phrases from explosion#523 (comment)

281745c

chssch added a commit to chssch/spaCy that referenced this issue Oct 22, 2016

Add merge phrases from explosion#523 (comment)

cf7b6f7

ines mentioned this issue Oct 22, 2016

💫 Document workflow: Using the dependency parse / dependency parsing #555

Closed

ines closed this as completed Oct 22, 2016

acowlikeobject mentioned this issue Feb 27, 2017

label keyword argument in Span.merge has no effect #862

Closed

lock bot locked as resolved and limited conversation to collaborators May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subject Object Extraction within Spacy #523

Subject Object Extraction within Spacy #523

Mustyy commented Oct 13, 2016

honnibal commented Oct 13, 2016

Mustyy commented Oct 17, 2016

honnibal commented Oct 17, 2016

Mustyy commented Oct 20, 2016

honnibal commented Oct 20, 2016 •

edited

Loading

Mustyy commented Oct 20, 2016 •

edited

Loading

honnibal commented Oct 20, 2016

Mustyy commented Oct 20, 2016

Mustyy commented Oct 20, 2016

honnibal commented Oct 20, 2016

Mustyy commented Oct 20, 2016

honnibal commented Oct 20, 2016

Mustyy commented Oct 20, 2016

lock bot commented May 9, 2018

Subject Object Extraction within Spacy #523

Subject Object Extraction within Spacy #523

Comments

Mustyy commented Oct 13, 2016

honnibal commented Oct 13, 2016

Mustyy commented Oct 17, 2016

honnibal commented Oct 17, 2016

Mustyy commented Oct 20, 2016

honnibal commented Oct 20, 2016 • edited Loading

Mustyy commented Oct 20, 2016 • edited Loading

honnibal commented Oct 20, 2016

Mustyy commented Oct 20, 2016

Mustyy commented Oct 20, 2016

honnibal commented Oct 20, 2016

Mustyy commented Oct 20, 2016

honnibal commented Oct 20, 2016

Mustyy commented Oct 20, 2016

lock bot commented May 9, 2018

honnibal commented Oct 20, 2016 •

edited

Loading

Mustyy commented Oct 20, 2016 •

edited

Loading