Releases: huggingface/tokenizers
Node 0.13.1
[0.13.1]
- [#1072] Fixing Roberta type ids.
Python v0.13.0
[0.13.0]
- [#956] PyO3 version upgrade
- [#1055] M1 automated builds
- [#1008]
Decoder
is now a composable trait, but without being backward incompatible - [#1047, #1051, #1052]
Processor
is now a composable trait, but without being backward incompatible
Both trait changes warrant a "major" number since, despite best efforts to not break backward
compatibility, the code is different enough that we cannot be exactly sure.
Rust v0.13.0
[0.13.0]
- [#1009]
unstable_wasm
feature to support building on Wasm (it's unstable !) - [#1008]
Decoder
is now a composable trait, but without being backward incompatible - [#1047, #1051, #1052]
Processor
is now a composable trait, but without being backward incompatible
Both trait changes warrant a "major" number since, despite best efforts to not break backward
compatibility, the code is different enough that we cannot be exactly sure.
Node v0.13.0
Python v0.12.1
[0.12.1]
- [#938] Reverted breaking change. huggingface/transformers#16520
[YANKED] Rust v0.12.0
[0.12.0]
Bump minor version because of a breaking change.
The breaking change was causing more issues upstream in transformers
than anticipated:
huggingface/transformers#16537 (comment)
The decision was to rollback on that breaking change, and figure out a different way later to do this modification
-
[#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
-
[#939] Making the regex in
ByteLevel
pre_tokenizer optional (necessary for BigScience) -
[#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
-
[#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
-
[#961] Added link for Ruby port of
tokenizers
-
[#960] Feature gate for
cli
and itsclap
dependency
[YANKED] Python v0.12.0
[0.12.0]
The breaking change was causing more issues upstream in transformers
than anticipated:
huggingface/transformers#16537 (comment)
The decision was to rollback on that breaking change, and figure out a different way later to do this modification
Bump minor version because of a breaking change.
-
[#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
-
[#939] Making the regex in
ByteLevel
pre_tokenizer optional (necessary for BigScience) -
[#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
-
[#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
-
[#962] Fix tests for python 3.10
-
[#961] Added link for Ruby port of
tokenizers
[YANKED] Node v0.12.0
[0.12.0]
The breaking change was causing more issues upstream in transformers
than anticipated:
huggingface/transformers#16537 (comment)
The decision was to rollback on that breaking change, and figure out a different way later to do this modification
Bump minor version because of a breaking change.
Using 0.12
to match other bindings.
-
[#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
-
[#939] Making the regex in
ByteLevel
pre_tokenizer optional (necessary for BigScience) -
[#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
-
[#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
-
[#961] Added link for Ruby port of
tokenizers