-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference between spacy and Stanford Parser in results #259
Comments
Hey, This is a question of the annotation scheme. It's true that the relations spaCy is returning are a bit more low-level. We could post-process the relations to get a similar result to the Stanford ones, and for some purposes this would be better. Btw, there's a whole can of worms around this sort of topic. Like, if you have "I eat plenty of apples", you probably want a relationship between "eat" and "apples", right? Instead in both our scheme and Stanford's you'll get a relationship between "eat" and "plenty". There's really a need for a more abstract semantic representation on top of the syntactic parse. I'm not sure Stanford's solution of making the parse more semantic is what I like best. I think there's a need for the syntactic representation. It's just that currently, we don't have semantic role labelling. So it's true that the lower-level nature of spaCy's parse makes it difficult to work with in places. For cases like 'plenty of leg room', you can improve things by merging the phrase: >>> from spacy.en import English
>>> nlp = English()
>>> doc = nlp(u'I like plenty of leg room.')
>>> spans = []
>>> for word in doc:
... if word.text in ('plenty', 'lots', 'heaps', 'all') and word.nbor(1).text == 'of' and len(list(word.subtree)) >= 3:
... span = doc[word.left_edge.i : word.right_edge.i + 1]... spans.append(span)
... spans.append(span)
... >>>
>>> spans
[plenty of leg room]
>>> spans[0].merge(span.root.tag_, span[2:].root.lemma_, span.root.ent_type_)>>> for word in doc:
... print(word.text, word.lemma_, word.dep_, word.head.text)
...
(u'I', u'i', u'nsubj', u'like')
(u'like', u'like', u'ROOT', u'like')
(u'plenty of leg room', u'room', u'dobj', u'like')
(u'.', u'.', u'punct', u'like') What we're doing here is retokenizing the sentence so that you can get the relationships you need. We set the "lemma" of our new token 'plenty of leg room' to be 'room', and spaCy knows how to forward all the dependencies, so that the new token is attached correctly to 'like'. |
@honnibal Is there any similar function in Spacy that helps me to get results similar to that in Stanford NLP? The code that does this in Stanford NLP is this:
I have been searching all over to get something similar but I cannot seem to find it. |
@rachit221195 - yes there is. You can use the combination of the .subtree and .head in the Token object to build up a tree representation as you can in the nltk method you describe below. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am working on Sentiment Analysis for which I need to find Dependency Parsing relations between words to extract the aspect and its corresponding sentiment word. For this, I have tried spacy as well as Stanford but the relations given by Stanford are more accurate and relevant for my use but spacy is very very fast and I want to use it only.
So below are some examples where there is a problem in spacy:
Stanford: It provides a direct relationship between alice and happy, so I can use Alice as my aspect while happy is my sentiment word for it.
Spacy: But spacy gives relationship between (alice,is) and (is,happy)
Note: If sentence is something like "Alice likes apples. Then both Stanford and spacy gives the same relationships between (alice,likes) and (likes apples). But with "is,are like these Stanford provides a direct relationship.
Sentence: There is plenty of leg room.
Stanford: Relates plenty with room which was obvious also as plenty is used for leg room.
Spacy: Not able to provide any such relationship.
Sentence: In September, upon return to Toronto, my suitcase was damaged with the zipper mechanism and lock literally torn off.
Stanford: it gives relationships between (mechanism,torn) and (lock,torn).
Spacy: It doesnt provide any relationship between these words but we can see they are directly related.
All Stanford outputs are from the stanford nlp parser site: http://nlp.stanford.edu:8080/parser/index.jsp
as well as from the packages but the packages from Stanford are pretty slow ( very very slow) as compared to spacy.
So, Is there any way to use spacy to give exactly the same parsing output as from Stanford?? It would be of so much help sir.
The text was updated successfully, but these errors were encountered: