-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Span crossing two sentences? #1333
Comments
Ah, this was me being an idiot. I put an error check in the coref model to make sure the spans were all in the same sentence (the original code masks for that AFAIK), but the error check itself was buggy. |
If you use the I was thinking that perhaps waiting for a bigger feature to be finished would be good for a new release, but seeing as how we've fixed a couple bugs in the last couple months, it might be worth doing an interim release |
Thank you very much! Just curious, can I still use it in google colab if it's in dev branch? Another thing, why does the model need to make sure that the spans are all in the same sentence? |
I don't know how you've installed Stanza, but you should be able to pip install from a branch, if that's what you did: https://stackoverflow.com/questions/20101834/pip-install-from-git-repo-branch
Technically it doesn't, but the model was trained to only have spans which are contained in a single sentence, and I used that assumption downstream when turning the spans into human-readable output. I had put an assertion to test that, but the assertion itself was buggy in the event that a span was exactly at the end of a sentence. Since sentence endings are usually punctuation, that hadn't come up until you hit one of the sentences for which the tokenizer is incorrectly splitting |
I installed the Stanza from dev branch and now it works! Thank you very much @AngledLuffa, you have been very helpful! I'm closing this issue |
Hello
I got an error saying that the model predicted span that crosses two sentences and to send the example to github. Here is my code (pretty simple):
`import stanza
pipe = stanza.Pipeline("en", processors="tokenize, coref")
out = pipe("""If an electrical machine or equipment generates mechanical vibrations when in service, e.g. because it is out of balance, the vibration amplitude measured on the machine or the equipment on board shall not lie outside area A. For this evaluation, reference is made only to the self-generated vibration components. Area A may only be utilized if the loading of all components, with due allowance for local excess vibration, does not impair reliable long-term operation""")
print(out)`
My guess is on the term "Area A". Is the model currently unable to process coreference that cross two sentence? What can I do about the sentence?
Thank you
The text was updated successfully, but these errors were encountered: