-
Notifications
You must be signed in to change notification settings - Fork 165
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Major features * Added WMT21 datasets (closes #166) * Restructured internal storage, expose metadata via `--echo` * Add a Korean tokenizer (`--tok ko-mecab` #194) * Allow empty references (fixes #161) Bugfixes and minor additions: * Set API tokenizer to None by default (closes #181) * Added SPM to list of CJK tokenizer recommendations * Pulled out NoneTokenizer, allow None references in args check (addresses #195) * Remove colon from filename of parsed files (not permitted on Windows) * Fix: update the url of mtnt2019 data to latest * Added a few missing md5 hashes * Changed filename of downloaded file * Use global filename for downloaded tarballs * In tests, subsample datasets to download instead of going through them all Co-authored-by: NoUnique <[email protected]> Co-authored-by: hanbing <[email protected]> Co-authored-by: Jannis Vamvas <[email protected]>
- Loading branch information
1 parent
8e7abf5
commit a73315b
Showing
25 changed files
with
2,834 additions
and
1,084 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.