You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there are newline characters in a text, the idx of a token in a spacy doc sometimes doesn't return the correct index into the original text.
Here's an example:
Thanks for the report! It looks like this is related to #859. Just added a regression test and it works for me using the version on master (which already includes the fix for that issue).
We're just finishing off the last fixes for v1.7 and training the models – the new update will be available very soon.
When there are newline characters in a text, the idx of a token in a spacy doc sometimes doesn't return the correct index into the original text.
Here's an example:
It seems the newline character is counted twice, once as the last char of the the first token, and then as a token of its own.
Your Environment
The text was updated successfully, but these errors were encountered: