Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 848 Bytes

CHANGELOG.md

File metadata and controls

25 lines (17 loc) · 848 Bytes

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

[v0.2.1] - 2022-12-21

Fixed

  • Some of the banned words were not banned correctly - these are now correctly removed.

[v0.2.0] - 2022-12-21

Added

  • Added postprocessing of corpora, including removal of duplicates, bot comments, and removing comments from inappropriate subreddits.
  • Added --hub-repo-id to the CLI, which can be used to upload the resulting dataset to the Hugging Face Hub.

[v0.1.0] - 2022-12-20

Added

  • Initial release, which includes the CLI command build, which builds the Scandinavian Reddit corpus. Run build --help to see more information.