Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between spacy and Stanford Parser in results #259

Closed
rbhood opened this issue Feb 11, 2016 · 4 comments
Closed

Difference between spacy and Stanford Parser in results #259

rbhood opened this issue Feb 11, 2016 · 4 comments

Comments

@rbhood
Copy link

rbhood commented Feb 11, 2016

I am working on Sentiment Analysis for which I need to find Dependency Parsing relations between words to extract the aspect and its corresponding sentiment word. For this, I have tried spacy as well as Stanford but the relations given by Stanford are more accurate and relevant for my use but spacy is very very fast and I want to use it only.

So below are some examples where there is a problem in spacy:

  1. Sentence: Alice is happy.
    Stanford: It provides a direct relationship between alice and happy, so I can use Alice as my aspect while happy is my sentiment word for it.
    Spacy: But spacy gives relationship between (alice,is) and (is,happy)

Note: If sentence is something like "Alice likes apples. Then both Stanford and spacy gives the same relationships between (alice,likes) and (likes apples). But with "is,are like these Stanford provides a direct relationship.

  1. Sentence: There is plenty of leg room.
    Stanford: Relates plenty with room which was obvious also as plenty is used for leg room.
    Spacy: Not able to provide any such relationship.

  2. Sentence: In September, upon return to Toronto, my suitcase was damaged with the zipper mechanism and lock literally torn off.
    Stanford: it gives relationships between (mechanism,torn) and (lock,torn).
    Spacy: It doesnt provide any relationship between these words but we can see they are directly related.

All Stanford outputs are from the stanford nlp parser site: http://nlp.stanford.edu:8080/parser/index.jsp
as well as from the packages but the packages from Stanford are pretty slow ( very very slow) as compared to spacy.

So, Is there any way to use spacy to give exactly the same parsing output as from Stanford?? It would be of so much help sir.

@rbhood rbhood changed the title Difference between spacy and Stanford Parser Difference between spacy and Stanford Parser in results Feb 11, 2016
@honnibal
Copy link
Member

Hey,

This is a question of the annotation scheme. It's true that the relations spaCy is returning are a bit more low-level. We could post-process the relations to get a similar result to the Stanford ones, and for some purposes this would be better.

Btw, there's a whole can of worms around this sort of topic. Like, if you have "I eat plenty of apples", you probably want a relationship between "eat" and "apples", right? Instead in both our scheme and Stanford's you'll get a relationship between "eat" and "plenty".

There's really a need for a more abstract semantic representation on top of the syntactic parse. I'm not sure Stanford's solution of making the parse more semantic is what I like best. I think there's a need for the syntactic representation. It's just that currently, we don't have semantic role labelling. So it's true that the lower-level nature of spaCy's parse makes it difficult to work with in places.

For cases like 'plenty of leg room', you can improve things by merging the phrase:

>>> from spacy.en import English
>>> nlp = English()
>>> doc = nlp(u'I like plenty of leg room.')
>>> spans = []
>>> for word in doc:
...   if word.text in ('plenty', 'lots', 'heaps', 'all') and word.nbor(1).text == 'of' and len(list(word.subtree)) >= 3:
...     span = doc[word.left_edge.i : word.right_edge.i + 1]...     spans.append(span)
...     spans.append(span)
... >>> 
>>> spans
[plenty of leg room]
>>> spans[0].merge(span.root.tag_, span[2:].root.lemma_, span.root.ent_type_)>>> for word in doc:
...   print(word.text, word.lemma_, word.dep_, word.head.text)
... 
(u'I', u'i', u'nsubj', u'like')
(u'like', u'like', u'ROOT', u'like')
(u'plenty of leg room', u'room', u'dobj', u'like')
(u'.', u'.', u'punct', u'like')

What we're doing here is retokenizing the sentence so that you can get the relationships you need. We set the "lemma" of our new token 'plenty of leg room' to be 'room', and spaCy knows how to forward all the dependencies, so that the new token is attached correctly to 'like'.

@rachit221195
Copy link

rachit221195 commented Aug 23, 2017

@honnibal Is there any similar function in Spacy that helps me to get results similar to that in Stanford NLP?
is there any function or any particular method that can help me achieve this:
((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')), ((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')), ((u'elephant', u'NN'), u'det', (u'an', u'DT')), ((u'shot', u'VBD'), u'prep', (u'in', u'IN')), ((u'in', u'IN'), u'pobj', (u'sleep', u'NN')), ((u'sleep', u'NN'), u'poss', (u'my', u'PRP$'))]

The code that does this in Stanford NLP is this:

>> from nltk.parse.stanford import StanfordDependencyParser
>> path_to_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser.jar'
>> path_to_models_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar'
>> dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar, path_to_models_jar=path_to_models_jar)
>> result = dependency_parser.raw_parse('I shot an elephant in my sleep')
>> dep = result.next()
>> list(dep.triples())

I have been searching all over to get something similar but I cannot seem to find it.
I would really appreciate any help on this matter.

@RushiLuhar
Copy link

@rachit221195 - yes there is. You can use the combination of the .subtree and .head in the Token object to build up a tree representation as you can in the nltk method you describe below.
Iterate through each Token in your span, if it has a subtree, then you can build up a relation between the token and the each token in the subtree. Hope this makes sense.

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants