Releases · huggingface/tokenizers

06 Oct 13:10

Narsil

node-v0.13.1

fb6abac

Node 0.13.1

[0.13.1]

[#1072] Fixing Roberta type ids.

Assets 2

21 Sep 10:20

Narsil

python-v0.13.0

63082c4

Python v0.13.0

[0.13.0]

[#956] PyO3 version upgrade
[#1055] M1 automated builds
[#1008] Decoder is now a composable trait, but without being backward incompatible
[#1047, #1051, #1052] Processor is now a composable trait, but without being backward incompatible

Both trait changes warrant a "major" number since, despite best efforts to not break backward
compatibility, the code is different enough that we cannot be exactly sure.

Assets 2

19 Sep 08:13

Narsil

v0.13.0

7c146d9

Rust v0.13.0

[0.13.0]

[#1009] unstable_wasm feature to support building on Wasm (it's unstable !)
[#1008] Decoder is now a composable trait, but without being backward incompatible
[#1047, #1051, #1052] Processor is now a composable trait, but without being backward incompatible

Both trait changes warrant a "major" number since, despite best efforts to not break backward
compatibility, the code is different enough that we cannot be exactly sure.

Assets 2

19 Sep 09:13

Narsil

node-v0.13.0

7c146d9

Node v0.13.0

[0.13.0]

[#1008] Decoder is now a composable trait, but without being backward incompatible
[#1047, #1051, #1052] Processor is now a composable trait, but without being backward incompatible

Assets 2

13 Apr 10:02

Narsil

python-v0.12.1

8a9bb28

Python v0.12.1 Pre-release

Pre-release

[0.12.1]

[#938] Reverted breaking change. huggingface/transformers#16520

Assets 2

31 Mar 09:10

Narsil

v0.12.0

0eb7455

[YANKED] Rust v0.12.0

[0.12.0]

Bump minor version because of a breaking change.

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

[#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
[#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)
[#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
[#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
[#961] Added link for Ruby port of tokenizers
[#960] Feature gate for cli and its clap dependency

Assets 2

31 Mar 09:18

Narsil

python-v0.12.0

0eb7455

[YANKED] Python v0.12.0

[0.12.0]

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

Bump minor version because of a breaking change.

[#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
[#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)
[#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
[#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
[#962] Fix tests for python 3.10
[#961] Added link for Ruby port of tokenizers

Assets 2

31 Mar 13:07

Narsil

node-v0.12.0

23a22da

[YANKED] Node v0.12.0

[0.12.0]

The breaking change was causing more issues upstream in transformers than anticipated:
huggingface/transformers#16537 (comment)

The decision was to rollback on that breaking change, and figure out a different way later to do this modification

Bump minor version because of a breaking change.
Using 0.12 to match other bindings.

[#938] Breaking change. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
[#939] Making the regex in ByteLevel pre_tokenizer optional (necessary for BigScience)
[#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
[#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
[#961] Added link for Ruby port of tokenizers

Assets 2

28 Feb 10:53

Narsil

v0.11.2

03e10b6

Rust v0.11.2

[#919] Fixing single_word AddedToken. (regression from 0.11.2)
[#916] Deserializing faster added_tokens by loading them in batch.

Assets 2

28 Feb 09:22

Narsil

python-v0.11.6

ffaee13

Python v0.11.6

[#919] Fixing single_word AddedToken. (regression from 0.11.2)
[#916] Deserializing faster added_tokens by loading them in batch.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.13.1]

[0.13.0]

[0.13.0]

[0.13.0]

[0.12.1]

[0.12.0]

[0.12.0]

[0.12.0]

Releases: huggingface/tokenizers

Node 0.13.1

[0.13.1]

Python v0.13.0

[0.13.0]

Rust v0.13.0

[0.13.0]

Node v0.13.0

[0.13.0]

Python v0.12.1

[0.12.1]

[YANKED] Rust v0.12.0

[0.12.0]

[YANKED] Python v0.12.0

[0.12.0]

[YANKED] Node v0.12.0

[0.12.0]

Rust v0.11.2

Python v0.11.6