You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We've encountered a sentence pattern where Stanza fails to split apart two sentences. It appears when certain names are used (e.g. Max, Anna) but not with others (e.g. Ann).
It is definitely on our radar to improve the tokenizer in general. I would say this particular instance it is treating "No." as "Number", even though it should be conditioned not to do that when a name (or rather, a capital letter) comes after the "No.". I wonder if there's room to add some examples to the training data to discourage this behavior
Describe the bug
We've encountered a sentence pattern where Stanza fails to split apart two sentences. It appears when certain names are used (e.g. Max, Anna) but not with others (e.g. Ann).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The parse returns
No.
as a separate sentence.Environment (please complete the following information):
Additional context
This issue also appears in Stanza 1.8.1. Have not tested it with Stanza 1.7.x. Screenshot is from Stanza 1.6.1.
The text was updated successfully, but these errors were encountered: