Skip to content

Releases: huggingface/tokenizers

Node 0.13.1

06 Oct 13:10
Compare
Choose a tag to compare

[0.13.1]

  • [#1072] Fixing Roberta type ids.

Python v0.13.0

21 Sep 10:20
63082c4
Compare
Choose a tag to compare

[0.13.0]

  • [#956] PyO3 version upgrade
  • [#1055] M1 automated builds
  • [#1008] Decoder is now a composable trait, but without being backward incompatible
  • [#1047, #1051, #1052] Processor is now a composable trait, but without being backward incompatible

Both trait changes warrant a "major" number since, despite best efforts to not break backward
compatibility, the code is different enough that we cannot be exactly sure.

Rust v0.13.0

19 Sep 08:13
7c146d9
Compare
Choose a tag to compare

[0.13.0]

  • [#1009] unstable_wasm feature to support building on Wasm (it's unstable !)
  • [#1008] Decoder is now a composable trait, but without being backward incompatible
  • [#1047, #1051, #1052] Processor is now a composable trait, but without being backward incompatible

Both trait changes warrant a "major" number since, despite best efforts to not break backward
compatibility, the code is different enough that we cannot be exactly sure.

Node v0.13.0

19 Sep 09:13
7c146d9
Compare
Choose a tag to compare

[0.13.0]

  • [#1008] Decoder is now a composable trait, but without being backward incompatible
  • [#1047, #1051, #1052] Processor is now a composable trait, but without being backward incompatible

Python v0.12.1

13 Apr 10:02
8a9bb28
Compare
Choose a tag to compare
Python v0.12.1 Pre-release
Pre-release

[0.12.1]

[YANKED] Rust v0.12.0

31 Mar 09:10
0eb7455
Compare
Choose a tag to compare

[0.12.0]

Bump minor version because of a breaking change.

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

  • [#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.

  • [#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)

  • [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)

  • [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.

  • [#961] Added link for Ruby port of tokenizers

  • [#960] Feature gate for cli and its clap dependency

[YANKED] Python v0.12.0

31 Mar 09:18
0eb7455
Compare
Choose a tag to compare

[0.12.0]

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

Bump minor version because of a breaking change.

  • [#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.

  • [#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)

  • [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)

  • [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.

  • [#962] Fix tests for python 3.10

  • [#961] Added link for Ruby port of tokenizers

[YANKED] Node v0.12.0

31 Mar 13:07
23a22da
Compare
Choose a tag to compare

[0.12.0]

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

Bump minor version because of a breaking change.
Using 0.12 to match other bindings.

  • [#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.

  • [#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)

  • [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)

  • [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.

  • [#961] Added link for Ruby port of tokenizers

Rust v0.11.2

28 Feb 10:53
Compare
Choose a tag to compare
  • [#919] Fixing single_word AddedToken. (regression from 0.11.2)
  • [#916] Deserializing faster added_tokens by loading them in batch.

Python v0.11.6

28 Feb 09:22
Compare
Choose a tag to compare
  • [#919] Fixing single_word AddedToken. (regression from 0.11.2)
  • [#916] Deserializing faster added_tokens by loading them in batch.