Skip to content

Commit

Permalink
Preparing 0.12 release. (#967)
Browse files Browse the repository at this point in the history
* Preparing `0.12` release.

* Fix click version: psf/black#2964
  • Loading branch information
Narsil authored Mar 31, 2022
1 parent 28cd3dc commit 0eb7455
Show file tree
Hide file tree
Showing 8 changed files with 64 additions and 5 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ jobs:
working-directory: ./bindings/python
run: |
source .env/bin/activate
pip install black==20.8b1
pip install black==20.8b1 click==8.0.4
make check-style
- name: Run tests
Expand Down
21 changes: 21 additions & 0 deletions bindings/node/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
## [0.12.0]

Bump minor version because of a breaking change.
Using `0.12` to match other bindings.

- [#938] **Breaking change**. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
- [#939] Making the regex in `ByteLevel` pre_tokenizer optional (necessary for BigScience)

- [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
- [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
- [#961] Added link for Ruby port of `tokenizers`

# [0.8.0](https://github.com/huggingface/tokenizers/compare/node-v0.7.0...node-v0.8.0) (2021-09-02)

### BREACKING CHANGES
Expand Down Expand Up @@ -142,3 +154,12 @@ The files must now be provided first when calling `tokenizer.train(files, traine
- Fix default special tokens in `BertWordPieceTokenizer` ([10e2d28](https://github.com/huggingface/tokenizers/commit/10e2d286caf517f0977c04cf8e1924aed90403c9))
- Fix return type of `getSpecialTokensMask` on `Encoding` ([9770be5](https://github.com/huggingface/tokenizers/commit/9770be566175dc9c44dd7dcaa00a57d0e4ca632b))
- Actually add special tokens in tokenizers implementations ([acef252](https://github.com/huggingface/tokenizers/commit/acef252dacc43adc414175cfc325668ad1488753))


[#938]: https://github.com/huggingface/tokenizers/pull/938
[#939]: https://github.com/huggingface/tokenizers/pull/939
[#952]: https://github.com/huggingface/tokenizers/pull/952
[#954]: https://github.com/huggingface/tokenizers/pull/954
[#962]: https://github.com/huggingface/tokenizers/pull/962
[#961]: https://github.com/huggingface/tokenizers/pull/961
[#960]: https://github.com/huggingface/tokenizers/pull/960
2 changes: 1 addition & 1 deletion bindings/node/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "tokenizers",
"version": "0.8.3",
"version": "0.12.0",
"description": "",
"main": "./dist/index.js",
"types": "./dist/index.d.ts",
Expand Down
19 changes: 19 additions & 0 deletions bindings/python/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.12.0]

Bump minor version because of a breaking change.

- [#938] **Breaking change**. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
- [#939] Making the regex in `ByteLevel` pre_tokenizer optional (necessary for BigScience)

- [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
- [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
- [#962] Fix tests for python 3.10
- [#961] Added link for Ruby port of `tokenizers`

## [0.11.6]

- [#919] Fixing single_word AddedToken. (regression from 0.11.2)
Expand Down Expand Up @@ -360,6 +372,13 @@ delimiter (Works like `.split(delimiter)`)
- Fix a bug that was causing crashes in Python 3.5


[#938]: https://github.com/huggingface/tokenizers/pull/938
[#939]: https://github.com/huggingface/tokenizers/pull/939
[#952]: https://github.com/huggingface/tokenizers/pull/952
[#954]: https://github.com/huggingface/tokenizers/pull/954
[#962]: https://github.com/huggingface/tokenizers/pull/962
[#961]: https://github.com/huggingface/tokenizers/pull/961
[#960]: https://github.com/huggingface/tokenizers/pull/960
[#919]: https://github.com/huggingface/tokenizers/pull/919
[#916]: https://github.com/huggingface/tokenizers/pull/916
[#895]: https://github.com/huggingface/tokenizers/pull/895
Expand Down
2 changes: 1 addition & 1 deletion bindings/python/py_src/tokenizers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.11.6"
__version__ = "0.12.0"

from typing import Tuple, Union, Tuple, List
from enum import Enum
Expand Down
2 changes: 1 addition & 1 deletion bindings/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

setup(
name="tokenizers",
version="0.11.6",
version="0.12.0",
description="Fast and Customizable Tokenizers",
long_description=open("README.md", "r", encoding="utf-8").read(),
long_description_content_type="text/markdown",
Expand Down
19 changes: 19 additions & 0 deletions tokenizers/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.12.0]

Bump minor version because of a breaking change.

- [#938] **Breaking change**. Decoder trait is modified to be composable. This is only breaking if you are using decoders on their own. tokenizers should be error free.
- [#939] Making the regex in `ByteLevel` pre_tokenizer optional (necessary for BigScience)

- [#952] Fixed the vocabulary size of UnigramTrainer output (to respect added tokens)
- [#954] Fixed not being able to save vocabularies with holes in vocab (ConvBert). Yell warnings instead, but stop panicking.
- [#961] Added link for Ruby port of `tokenizers`
- [#960] Feature gate for `cli` and its `clap` dependency

## [0.11.3]

- [#919] Fixing single_word AddedToken. (regression from 0.11.2)
Expand Down Expand Up @@ -140,6 +152,13 @@ advised, but that's not the question)
split up in multiple bytes
- [#174]: The `LongestFirst` truncation strategy had a bug


[#938]: https://github.com/huggingface/tokenizers/pull/938
[#939]: https://github.com/huggingface/tokenizers/pull/939
[#952]: https://github.com/huggingface/tokenizers/pull/952
[#954]: https://github.com/huggingface/tokenizers/pull/954
[#961]: https://github.com/huggingface/tokenizers/pull/961
[#960]: https://github.com/huggingface/tokenizers/pull/960
[#919]: https://github.com/huggingface/tokenizers/pull/919
[#916]: https://github.com/huggingface/tokenizers/pull/916
[#884]: https://github.com/huggingface/tokenizers/pull/884
Expand Down
2 changes: 1 addition & 1 deletion tokenizers/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
authors = ["Anthony MOI <[email protected]>"]
edition = "2018"
name = "tokenizers"
version = "0.11.3"
version = "0.12.0"
homepage = "https://github.com/huggingface/tokenizers"
repository = "https://github.com/huggingface/tokenizers"
documentation = "https://docs.rs/tokenizers/"
Expand Down

0 comments on commit 0eb7455

Please sign in to comment.