-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making transformers
work on 0.12
.
#16537
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Thank you very much for the proposed fix. Before reviewing this PR, I would like to see if it is possible to keep the new feature in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both @SaulLu and @Narsil agree that this is the correct fix, then I'm ok to merge it like this to make master
green.
Note: from transformers
perspective it would obviously be better to have a 0.12.1 revert the breaking change, as all versions of transformers that preceded this fix will continue to be broken.
Summarizing an oral discussion around our options: TL;DR Ultimately the balance between 1/ The change in 2/ The A potential less clunky way would be to add The promoted way 3/ Forward compatibility. The main caveat to this proposed change, is that earlier versions of 4/ Use of It's a legacy thing, and is not changed to not break BC, but causes issues on its own: #15785 (comment) Using 5/ Tokenizer tests |
Would it be possible to consider a deprecation cycle for the tokenizer change, for example with an opt-in flag for the new behavior? Doing this would both keep all previous versions working with 0.12.0, while providing support for the new behavior. This would allow us to prepare for the breaking change and have at least a few versions that support this before dropping support of the current behavior. |
Ok Will revert in |
What does this PR do?
tokenizers
0.12
changed the waydecoder.decode(
works.Instead of returning directly a string, it returns a list of strings (the "decoded" parts), which enables the decoders to be chained (and hence customized more easily).
The fix works by simply joining those parts for versions >= 0.12.
Fixes ##16520
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.