From 1e6abe660e8252323c80009fa920fef421adfa81 Mon Sep 17 00:00:00 2001 From: David Nicholson Date: Sat, 4 May 2024 23:26:32 -0400 Subject: [PATCH] DOC: Update CHANGELOG after merging #750 [skip ci] --- doc/CHANGELOG.md | 547 ++++++++++++++++++++++++----------------------- 1 file changed, 276 insertions(+), 271 deletions(-) diff --git a/doc/CHANGELOG.md b/doc/CHANGELOG.md index d4d66e8fe..6af0b3306 100644 --- a/doc/CHANGELOG.md +++ b/doc/CHANGELOG.md @@ -17,7 +17,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Add ways to define models and families of models [#605](https://github.com/vocalpy/vak/pull/605). Fixes [#406](https://github.com/vocalpy/vak/issues/406), - [#536](https://github.com/vocalpy/vak/issues/536), and + [#536](https://github.com/vocalpy/vak/issues/536), and [#603](https://github.com/vocalpy/vak/issues/603). - Add built-in TweetyNet model [#605](https://github.com/vocalpy/vak/pull/605). @@ -25,21 +25,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Add logging of training time [#628](https://github.com/vocalpy/vak/pull/628). Fixes [#2](https://github.com/vocalpy/vak/issues/2). -- Prepare datasets as directories, so they are portable +- Prepare datasets as directories, so they are portable and have a more standardized format [#658](https://github.com/vocalpy/vak/pull/658). - Fixes [#649](https://github.com/vocalpy/vak/issues/649) and + Fixes [#649](https://github.com/vocalpy/vak/issues/649) and [#650](https://github.com/vocalpy/vak/issues/650). -- Add concept of "dataset type" and "input type", +- Add concept of "dataset type" and "input type", where a dataset type maps to the task that a family of models is used for, - and the input type denotes the domain of the data that becomes the input + and the input type denotes the domain of the data that becomes the input for the neural network model. E.g., a frame classification model requires a frame classification dataset, - and its input type can be either audio or spectrograms + and its input type can be either audio or spectrograms [#670](https://github.com/vocalpy/vak/pull/670). Fixes [#667](https://github.com/vocalpy/vak/issues/667) and [#652](https://github.com/vocalpy/vak/issues/652). -- Add decorators to register models and model families +- Add decorators to register models and model families [#676](https://github.com/vocalpy/vak/pull/676). Fixes [#623](https://github.com/vocalpy/vak/issues/623). - Add initial parametric UMAP implementation @@ -47,23 +47,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 Fixes [#631](https://github.com/vocalpy/vak/issues/631). ### Changed -- Rename config file option `csv_path` to `dataset_path`, - since it is more specific and allows for the possibility +- Rename config file option `csv_path` to `dataset_path`, + since it is more specific and allows for the possibility that a dataset is not always a csv file [#632](https://github.com/vocalpy/vak/pull/632). Fixes [#549](https://github.com/vocalpy/vak/issues/549). -- Use [crowsetta 5.0](https://github.com/vocalpy/crowsetta/releases/tag/5.0.0), version released after +- Use [crowsetta 5.0](https://github.com/vocalpy/crowsetta/releases/tag/5.0.0), version released after [pyOpenSci review](https://github.com/pyOpenSci/software-submission/issues/68). [#628](https://github.com/vocalpy/vak/pull/628). Fixes [#522](https://github.com/vocalpy/vak/issues/526). -- Splits for learning curves are now generated by prep +- Splits for learning curves are now generated by prep and stored in the directory that represents the dataset, - instead of being generated by learncurve and saved in + instead of being generated by learncurve and saved in the results directory [#658](https://github.com/vocalpy/vak/pull/658). Fixes [#651](https://github.com/vocalpy/vak/issues/651). -- Refactor API so it's more clear what the top-level "public" API should be, - and to clean up spaghetti code that slows down adding new functionality. +- Refactor API so it's more clear what the top-level "public" API should be, + and to clean up spaghetti code that slows down adding new functionality. Move functions from core up to top-level: eval, learncurve, predict, prep, train. Move `vak.io` into prep, rename `prep_spectrogram_dataset`. Also move `vak.spect` in there. Move `vak.split` into `vak.prep` since that's the only place it's used. @@ -73,11 +73,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Have distance metrics return tensors [#702](https://github.com/vocalpy/vak/pull/702). Fixes [#701](https://github.com/vocalpy/vak/issues/701). -- Rename metric `SegmentErrorRate` to `CharacterErrorRate` - to make it clearer that this is an edit distance computed on +- Rename metric `SegmentErrorRate` to `CharacterErrorRate` + to make it clearer that this is an edit distance computed on the segment *labels* [#723](https://github.com/vocalpy/vak/pull/723). Fixes [#721](https://github.com/vocalpy/vak/issues/721). +- Change to version 1.0 of config file format + [#750](https://github.com/vocalpy/vak/pull/750). + Fixes [#685](https://github.com/vocalpy/vak/issues/685), + [#345](https://github.com/vocalpy/vak/issues/345), and + [#748](https://github.com/vocalpy/vak/issues/748). ### Removed - Remove entry points since they are not being used @@ -89,10 +94,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 Fixes [#538](https://github.com/vocalpy/vak/issues/538). - Remove `engine` with `Model` class [#627](https://github.com/vocalpy/vak/pull/627). - No longer used after switching to Lightning as backend in + No longer used after switching to Lightning as backend in [#598](https://github.com/vocalpy/vak/pull/598). - Remove config option 'previous_run_path' for learning curves. - This is no longer needed now that `vak.prep` generates splits + This is no longer needed now that `vak.prep` generates splits for learning curves and saves them in the dataset directory; to re-run a learning curve experiment, use the same `dataset_path` as the previous experiment @@ -101,17 +106,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - Fix functionality to evaluate model with and without - post-processing transform that was added in + post-processing transform that was added in [#621](https://github.com/vocalpy/vak/pull/621). Fixed in [#626](https://github.com/vocalpy/vak/pull/626). - Change label mapping to use single characters for the validation step of WindowedFrameClassificationModel *only*, - to avoid affecting the edit distance metric, + to avoid affecting the edit distance metric, instead of modifying the mapping inside the top-level eval function, - which can cause a crash because the mapping is used in other places + which can cause a crash because the mapping is used in other places [#665](https://github.com/vocalpy/vak/pull/665). Fixed in [#664](https://github.com/vocalpy/vak/pull/664). -- Fix bug that caused crash on Apple M1 / MPS accelerator +- Fix bug that caused crash on Apple M1 / MPS accelerator [#700](https://github.com/vocalpy/vak/issues/700). Fixed in [#702](https://github.com/vocalpy/vak/pull/702). - Fix models so they log training loss on each step @@ -121,8 +126,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## 0.8.1 -- 2023-03-02 ### Fixed - Fix transform that converts labeled timebins to segments - so that it returns all `None`s when there are no segments - in the vector, either before or after applying any + so that it returns all `None`s when there are no segments + in the vector, either before or after applying any post-processing transforms [#636](https://github.com/vocalpy/vak/pull/636). Bug introduced in @@ -131,31 +136,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## 0.8.0 -- 2023-02-09 ### Added -- Add options for how `audio.to_spect` calls `dask.bag`, +- Add options for how `audio.to_spect` calls `dask.bag`, to help with memory issues when processing large files [#611](https://github.com/vocalpy/vak/pull/611). Fixes [#580](https://github.com/vocalpy/vak/issues/580). -- Add ability to run evaluation of models with and without post-processing - transforms. This is done by specifying an option `post_tfm_kwargs` in the +- Add ability to run evaluation of models with and without post-processing + transforms. This is done by specifying an option `post_tfm_kwargs` in the `[EVAL]` or `[LEARNCURVE]` sections of a .toml configuration file. If the option is not specified, then models are evaluated as they were - previously, by converting the predicted label for each time bin + previously, by converting the predicted label for each time bin to a label for each continuous segment, represented as a string. - If the option *is* specified, then the post-processing is applied + If the option *is* specified, then the post-processing is applied to the model predictions before converting to strings. Metrics are computed for outputs with *and* without post-processing, to be able to compare the two. [#621](https://github.com/vocalpy/vak/pull/621). Fixes [#472](https://github.com/vocalpy/vak/issues/472). -- `vak.core.eval` now logs computed evaluation metrics so they can be +- `vak.core.eval` now logs computed evaluation metrics so they can be quickly inspected in the terminal or log files before full analysis [#621](https://github.com/vocalpy/vak/pull/621). Fixes [#471](https://github.com/vocalpy/vak/issues/471). ### Changed -- Rewrite post-processing transforms applied to network outputs +- Rewrite post-processing transforms applied to network outputs as transforms, with functional and class implementations, - to make it possible to compose these transforms, and more easily + to make it possible to compose these transforms, and more easily evaluate model performance with and without them [#621](https://github.com/vocalpy/vak/pull/621). Fixes [#537](https://github.com/vocalpy/vak/issues/537). @@ -169,24 +174,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [#542](https://github.com/vocalpy/vak/pull/542). Fixes [#337](https://github.com/vocalpy/vak/issues/337). - Add validation of `labels` argument to `vak.split.algorithms.brute_force`, - to prevent conditions where algorithm can fail to converge - because of bad input + to prevent conditions where algorithm can fail to converge + because of bad input [#562](https://github.com/vocalpy/vak/pull/562). Fixes [#288](https://github.com/vocalpy/vak/issues/288). -- Add a "Frequently Asked Questions" page to the documentation, +- Add a "Frequently Asked Questions" page to the documentation, and a page to the "Reference" section on file naming conventions [#564](https://github.com/vocalpy/vak/pull/564). Fixes [#524](https://github.com/vocalpy/vak/issues/524) and [#424](https://github.com/vocalpy/vak/issues/424). -- Add a new way for vak to map annotation files to annotated files - when preparing datasets, e.g. for training models. - For annotation formats that have one annotation file per +- Add a new way for vak to map annotation files to annotated files + when preparing datasets, e.g. for training models. + For annotation formats that have one annotation file per annotated file, vak can now recognize when - the annotation files are named by removing the - annotated file extension (e.g., .wav or .npz) - and replacing it with the annotation format extension, - e.g. .txt or .csv. (Other ways of relating annotations - and annotated files are still valid, e.g. by including + the annotation files are named by removing the + annotated file extension (e.g., .wav or .npz) + and replacing it with the annotation format extension, + e.g. .txt or .csv. (Other ways of relating annotations + and annotated files are still valid, e.g. by including the original source audio file in both filenames.) [#572](https://github.com/vocalpy/vak/pull/572). Fixes [#563](https://github.com/vocalpy/vak/issues/563). @@ -198,117 +203,117 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Rewrite unit tests in `tests/test_cli/` to use mocks for `vak.core` functions [#544](https://github.com/vocalpy/vak/pull/544). Fixes [#543](https://github.com/vocalpy/vak/issues/543). -- It is now possible to load configuration files - and work with them programmatically even if the paths +- It is now possible to load configuration files + and work with them programmatically even if the paths they point to do not exist. The `core` functions handle validation instead. - E.g., the `PrepConfig` class does not check whether + E.g., the `PrepConfig` class does not check whether `output_dir` exist is a directory, but `vak.core.prep` does. [#550](https://github.com/vocalpy/vak/pull/550). Fixes [#459](https://github.com/vocalpy/vak/issues/459). -- Refactor and speed up logic for determining whether a - dataset with sequence annotations has unlabeled segments +- Refactor and speed up logic for determining whether a + dataset with sequence annotations has unlabeled segments that should be assigned a "background" label [#559](https://github.com/vocalpy/vak/pull/559). Fixes [#243](https://github.com/vocalpy/vak/issues/243). - Adds a new sub-sub-package, `datasets.seq` - with a `validators` module, which is where the - re-written `has_unlabeled` function now lives. + with a `validators` module, which is where the + re-written `has_unlabeled` function now lives. Replaces the `vak.csv` module which was not well named. - - Also adds a `has_unlabeled` function to `vak.annotation` - that is used by `vak.datasets.seq.validators.has_unlabeled`; + - Also adds a `has_unlabeled` function to `vak.annotation` + that is used by `vak.datasets.seq.validators.has_unlabeled`; this function handles edge cases outlined in [#243](https://github.com/vocalpy/vak/issues/243). -- Rename and refactor functions in `vak.annotation` - that map annotations to the files that they annotate, - so that the purpose of the functions is clearer, - and add clearer error messages with links to documentation - about file naming conventions +- Rename and refactor functions in `vak.annotation` + that map annotations to the files that they annotate, + so that the purpose of the functions is clearer, + and add clearer error messages with links to documentation + about file naming conventions [#566](https://github.com/vocalpy/vak/pull/566). Fixes [#525](https://github.com/vocalpy/vak/issues/525). -- Revise "autoannotate" tutorial to use .wav audio and .csv - annotation files from new release of Bengalese Finch Song - Repository, and to suggest that Windows users unpack +- Revise "autoannotate" tutorial to use .wav audio and .csv + annotation files from new release of Bengalese Finch Song + Repository, and to suggest that Windows users unpack archives with tar, not other programs such as WinZip [#578](https://github.com/vocalpy/vak/pull/578). Fixes [#560](https://github.com/vocalpy/vak/issues/560) and [#576](https://github.com/vocalpy/vak/issues/576). -- Change `vak.files.find_fname` and `vak.files.spect.find_audio_fname` - so they work when spaces are in filename and/or path +- Change `vak.files.find_fname` and `vak.files.spect.find_audio_fname` + so they work when spaces are in filename and/or path [#594](https://github.com/vocalpy/vak/pull/594). Fixes [#589](https://github.com/vocalpy/vak/issues/589). ### Fixed - Fix how `vak.core.prep` handles `labelset` parameter. Add pre-condition that raises a ValueError - when `labelset` is `None` but the .toml config is one of + when `labelset` is `None` but the .toml config is one of {'train', 'learncurve', 'eval'} [#545](https://github.com/vocalpy/vak/pull/545). - Avoids running computationally expensive step of generating - and validating spectrograms *before* crashing when trying to - split the dataset using `labelset`. Also avoids silent - failures for datasets that do not require splitting, - e.g., an 'eval' set that could contain labels not in the + Avoids running computationally expensive step of generating + and validating spectrograms *before* crashing when trying to + split the dataset using `labelset`. Also avoids silent + failures for datasets that do not require splitting, + e.g., an 'eval' set that could contain labels not in the training set. Fixes [#468](https://github.com/vocalpy/vak/issues/468). - Fix how `cli` and `core` functions that have the `csv_path` parameter handles it. The parameter points to a dataset .csv generated by `vak prep` that other `core`/`cli` function use: `train`, `learncurve`, `eval`, `predict`. - They now validate that it exists, and if it doesn't, the `cli` functions - politely suggest running `vak prep` first; the `core` functions + They now validate that it exists, and if it doesn't, the `cli` functions + politely suggest running `vak prep` first; the `core` functions raise a FileNotFoundError. [#546](https://github.com/vocalpy/vak/pull/546). Fixes [#469](https://github.com/vocalpy/vak/issues/469). - Fix bug where `labelmap_path` parameter was ignored by `core.train`. - Change function so that either `labelmap_path` or `labelset` must + Change function so that either `labelmap_path` or `labelset` must be passed in, both passing in both will raise an error. - Also change `cli.train` to only pass in one of those and set the other + Also change `cli.train` to only pass in one of those and set the other to `None`. [#552](https://github.com/vocalpy/vak/pull/552). Fixes [#547](https://github.com/vocalpy/vak/issues/547). -- Fix `vak.annotation.has_unlabeled` to handle the edge case where an +- Fix `vak.annotation.has_unlabeled` to handle the edge case where an annotation file has no annotated segments [#583](https://github.com/vocalpy/vak/pull/583). Fixes [#378](https://github.com/vocalpy/vak/issues/378). - Fix `StandardizeSpect` method `fit_df` so that it computes parameters for standardization from a specific - split of the dataset--the training split, by default--instead - of using the entire dataset, which could technically give rise + split of the dataset--the training split, by default--instead + of using the entire dataset, which could technically give rise to data leakage [#584](https://github.com/vocalpy/vak/pull/583). Fixes [#575](https://github.com/vocalpy/vak/issues/575). - Fix error message in `vak.core.eval` [#589](https://github.com/vocalpy/vak/pull/589). Fixes [#588](https://github.com/vocalpy/vak/issues/588). - + ## 0.6.0 -- 2022-07-07 ### Added -- better document `conda` install +- better document `conda` install [#528](https://github.com/vocalpy/vak/pull/528). Fixes [#527](https://github.com/vocalpy/vak/issues/527). -- Add tests for console script, i.e., the command-line interface +- Add tests for console script, i.e., the command-line interface [#533](https://github.com/vocalpy/vak/pull/533). Fixes [#369](https://github.com/vocalpy/vak/issues/369). ### Changed -- switch from using `make` to `nox` for running tasks +- switch from using `make` to `nox` for running tasks [#532](https://github.com/vocalpy/vak/pull/532). Fixes [#440](https://github.com/vocalpy/vak/issues/440). - Refactor logging so that it can be configured by `cli` functions - when running `vak` through command-line interface, and by users + when running `vak` through command-line interface, and by users that are working with the API directly [#535](https://github.com/vocalpy/vak/pull/535). ### Fixed - Fix bug that prevented creating spectrogram files with non-default keys - (e.g. 'spect' instead of the default 's'). Needed to pass keys from `spect_params` - into `spect.to_dataframe` inside `vak.io.dataframe.from_files`. + (e.g. 'spect' instead of the default 's'). Needed to pass keys from `spect_params` + into `spect.to_dataframe` inside `vak.io.dataframe.from_files`. [#531](https://github.com/vocalpy/vak/pull/531). Fixes [#412](https://github.com/vocalpy/vak/issues/412). -- Fix logging so a single message is not logged multiple times. +- Fix logging so a single message is not logged multiple times. [#535](https://github.com/vocalpy/vak/pull/535). Fixes [#258](https://github.com/vocalpy/vak/issues/258). -- Fix section of contributing docs on setting up a development environment. +- Fix section of contributing docs on setting up a development environment. [#592](https://github.com/vocalpy/vak/pull/592). Fixes [#591](https://github.com/vocalpy/vak/issues/591). @@ -319,13 +324,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## 0.5.0 -- 2022-06-25 ### Added -- add ability to continue training from an existing checkpoint +- add ability to continue training from an existing checkpoint [#505](https://github.com/vocalpy/vak/pull/505). Fixes [#5](https://github.com/vocalpy/vak/issues/5). ## Changed -- change minimum required Python to 3.8, - to adhere to [NEP-29](https://numpy.org/neps/nep-0029-deprecation_policy.html), in +- change minimum required Python to 3.8, + to adhere to [NEP-29](https://numpy.org/neps/nep-0029-deprecation_policy.html), in [#513](https://github.com/vocalpy/vak/pull/513). Fixes [#512](https://github.com/vocalpy/vak/issues/512). @@ -339,24 +344,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.4.2](https://github.com/vocalpy/vak/releases/tag/0.4.2) -- 2022-03-29 ### Added -- add a [Code of Conduct](https://github.com/vocalpy/vak/blob/main/CODE_OF_CONDUCT.md), - a [contributing guide on GitHub](https://github.com/vocalpy/vak/blob/main/.github/CONTRIBUTING.md), - and a - [Development section of the documentation](https://vak.readthedocs.io/en/latest/development/index.html) +- add a [Code of Conduct](https://github.com/vocalpy/vak/blob/main/CODE_OF_CONDUCT.md), + a [contributing guide on GitHub](https://github.com/vocalpy/vak/blob/main/.github/CONTRIBUTING.md), + and a + [Development section of the documentation](https://vak.readthedocs.io/en/latest/development/index.html) [#448](https://github.com/vocalpy/vak/pull/448). Fixes [#8](https://github.com/vocalpy/vak/issues/8) and [#56](https://github.com/vocalpy/vak/issues/56). -- add pull request templates on GitHub +- add pull request templates on GitHub [#445](https://github.com/vocalpy/vak/pull/448). Fixes [#85](https://github.com/vocalpy/vak/issues/85). -- add links to page describing format for array files - containing spectrograms, on the reference index, and on - the how-to page on using your own spectrograms. - Also add a link to a small example dataset of - spectrogram files +- add links to page describing format for array files + containing spectrograms, on the reference index, and on + the how-to page on using your own spectrograms. + Also add a link to a small example dataset of + spectrogram files [#494](https://github.com/vocalpy/vak/pull/494). Fixes [#492](https://github.com/vocalpy/vak/issues/492). -- add more detail to explanation of how to use `'csv'` format +- add more detail to explanation of how to use `'csv'` format for annotation [#495](https://github.com/vocalpy/vak/pull/495). Fixes [#491](https://github.com/vocalpy/vak/issues/491). @@ -365,28 +370,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - make minor revisions to docs [#443](https://github.com/vocalpy/vak/pull/443). Fixes [#439](https://github.com/vocalpy/vak/issues/439). -- rewrite docs in Markdown / `MyST` wherever possible; +- rewrite docs in Markdown / `MyST` wherever possible; install MyST parser for Sphinx [#463](https://github.com/vocalpy/vak/pull/463). Fixes [#384](https://github.com/vocalpy/vak/issues/384). -- require `crowsetta` version 3.4.0 or greater; - in this version, annotation format `'csv'` is now named `'generic-seq'` +- require `crowsetta` version 3.4.0 or greater; + in this version, annotation format `'csv'` is now named `'generic-seq'` (and the name `'csv'` will stop working in the next version); format `'simple-csv'` renamed to `'simple-seq'` [#496](https://github.com/vocalpy/vak/pull/496). Fixes [#497](https://github.com/vocalpy/vak/issues/497). -- revise how-to page on annotation formats, - to include vignettes for the `'simple-seq'` and +- revise how-to page on annotation formats, + to include vignettes for the `'simple-seq'` and `'generic-seq'` formats. [#498](https://github.com/vocalpy/vak/pull/498). Fixes [#429](https://github.com/vocalpy/vak/issues/429). ### Fixed -- fix bug that caused `vak prep` to crash +- fix bug that caused `vak prep` to crash when there was only one file in a data directory [#483](https://github.com/vocalpy/vak/pull/483). Fixes [#467](https://github.com/vocalpy/vak/issues/467). -- fix bug that caused `vak prep` to crash +- fix bug that caused `vak prep` to crash when a `.not.mat` annotation file only had a single annotated segment [#488](https://github.com/vocalpy/vak/pull/488). Fixes [#466](https://github.com/vocalpy/vak/issues/466). @@ -396,10 +401,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - switch to using `flit` to build/publish, drop `poetry` [#434](https://github.com/vocalpy/vak/pull/434). Fixes [#433](https://github.com/vocalpy/vak/issues/433). -- raise minimum required `pytorch` version to 1.7.1 and +- raise minimum required `pytorch` version to 1.7.1 and minimum `crowsetta` version to 3.2.0' [#437](https://github.com/vocalpy/vak/pull/437). -- do various clean-up steps to development / CI workflows, +- do various clean-up steps to development / CI workflows, in the process of getting ready to publish `vak` on `conda-forge` [#437](https://github.com/vocalpy/vak/pull/437). - resolve various minor docs issues @@ -409,9 +414,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - add a [CITATION.cff](https://citation-file-format.github.io/) file [#407](https://github.com/vocalpy/vak/pull/407). -- add an [all-contributors](https://allcontributors.org/) table to README, +- add an [all-contributors](https://allcontributors.org/) table to README, using their bot to adopt the spec. - E.g., [#395](https://github.com/vocalpy/vak/pull/395). + E.g., [#395](https://github.com/vocalpy/vak/pull/395). Fixes [#387](https://github.com/vocalpy/vak/issues/387). - add description of command-line interface to reference section of documentation. [#417](https://github.com/vocalpy/vak/pull/417). @@ -419,16 +424,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - add how-to on using an annotation format that's not built in [#421](https://github.com/vocalpy/vak/pull/421). Fixes [#397](https://github.com/vocalpy/vak/issues/397). -- add how-to on using custom spectrograms +- add how-to on using custom spectrograms [#421](https://github.com/vocalpy/vak/pull/421). Fixes [#413](https://github.com/vocalpy/vak/issues/413). ### Changed -- updated the .toml configuration files in the tutorial +- updated the .toml configuration files in the tutorial to match what was used for [TweetyNet paper](https://github.com/yardencsGitHub/tweetynet). [#416](https://github.com/vocalpy/vak/pull/416). Fixes [#414](https://github.com/vocalpy/vak/issues/414). -- move tutorial into "getting started" section of docs, +- move tutorial into "getting started" section of docs, and revise landing page of docs [#419](https://github.com/vocalpy/vak/pull/419). - revise the documentation for the configuration file format. @@ -438,10 +443,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 Fixes [#271](https://github.com/vocalpy/vak/issues/271). ### Fixed -- make further fixes + add unit tests for handling predictions where all timebins +- make further fixes + add unit tests for handling predictions where all timebins are the background "unlabeled" class [#409](https://github.com/vocalpy/vak/pull/409). Fixes bug in `remove_short_segments` [#403](https://github.com/vocalpy/vak/issues/403). - Related to [#393](https://github.com/vocalpy/vak/issues/393) + Related to [#393](https://github.com/vocalpy/vak/issues/393) and [#386](https://github.com/vocalpy/vak/issues/386). - fix docs so entries appear in navbar [#427](https://github.com/vocalpy/vak/pull/427). @@ -454,10 +459,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 Fixes [#380](https://github.com/vocalpy/vak/issues/380). ### Fixed -- fix how `predict` handles annotations that are predicted to have no labeled segments, +- fix how `predict` handles annotations that are predicted to have no labeled segments, i.e. where all time bins are predicted to have "background" / "unlabeled" class [#394](https://github.com/vocalpy/vak/pull/394). - For details, see [#393](https://github.com/vocalpy/vak/issues/393) and + For details, see [#393](https://github.com/vocalpy/vak/issues/393) and [#386](https://github.com/vocalpy/vak/issues/386). ## [0.4.0b5](https://github.com/vocalpy/vak/releases/tag/0.4.0b5) -- 2021-10-08 @@ -468,26 +473,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - fix typo in `doc/reference/reference.rst` that broke a link [#363](https://github.com/vocalpy/vak/issues/363) -- fix bug in function `lbl_tb2labels` that affect calculation of segment error rate +- fix bug in function `lbl_tb2labels` that affect calculation of segment error rate for annotations with digits that had multiple characters (e.g. '21', '22'). [#377](https://github.com/vocalpy/vak/pull/377). For details see [#373](https://github.com/vocalpy/vak/issues/373) ## [0.4.0b4](https://github.com/vocalpy/vak/releases/tag/0.4.0b4) -- 2021-04-25 ### Added -- add `events2df` function to `tensorboard` module that converts an "events" file - (log) created during training into a `pandas.DataFrame`, to make it easier to - work directly with logged scalar values, e.g. plot training history showing loss +- add `events2df` function to `tensorboard` module that converts an "events" file + (log) created during training into a `pandas.DataFrame`, to make it easier to + work directly with logged scalar values, e.g. plot training history showing loss [#346](https://github.com/vocalpy/vak/pull/346). -- add Dice loss, commonly used for segmentation problems, adapted from `kornia` library +- add Dice loss, commonly used for segmentation problems, adapted from `kornia` library for use with 1-D sequences [#357](https://github.com/vocalpy/vak/pull/357). ### Changed -- change name of `summary_writer` module to `tensorboard` to reflect that it contains +- change name of `summary_writer` module to `tensorboard` to reflect that it contains any function related to `tensorboard` [#346](https://github.com/vocalpy/vak/pull/346). ### Fixed -- fix bug in Levenshtein distance implementation +- fix bug in Levenshtein distance implementation [#356](https://github.com/vocalpy/vak/pull/342). For details see issue [#355](https://github.com/vocalpy/vak/issues/355). Also added unit tests for Levenshtein distance and segment error rate @@ -506,64 +511,64 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.4.0b2] -- 2021-03-21 ### Added -- add built-in model `TeenyTweetyNet` - [#329](https://github.com/vocalpy/vak/pull/329) +- add built-in model `TeenyTweetyNet` + [#329](https://github.com/vocalpy/vak/pull/329) that will be used to speed up `vak` test suite. For details see issue [#308](https://github.com/vocalpy/vak/issues/308). -- make it so config does not validate other sections when running `vak prep`, - to avoid annoying errors due to options that are going to change anyway +- make it so config does not validate other sections when running `vak prep`, + to avoid annoying errors due to options that are going to change anyway [#335](https://github.com/vocalpy/vak/pull/335). - For details see [#314](https://github.com/vocalpy/vak/issues/314) and + For details see [#314](https://github.com/vocalpy/vak/issues/314) and [#334](https://github.com/vocalpy/vak/issues/334). -- raise clear error message when running `vak prep` and the section that a - dataset is being `prep`ared for already has a `csv_path`. Ask user to - remove it if they really want to generate a new one +- raise clear error message when running `vak prep` and the section that a + dataset is being `prep`ared for already has a `csv_path`. Ask user to + remove it if they really want to generate a new one [#335](https://github.com/vocalpy/vak/pull/335). - For details see [#314](https://github.com/vocalpy/vak/issues/314) and + For details see [#314](https://github.com/vocalpy/vak/issues/314) and [#333](https://github.com/vocalpy/vak/issues/333). -- add `split` parameter to `WindowDataset.spect_vectors_from_df` +- add `split` parameter to `WindowDataset.spect_vectors_from_df` [#336](https://github.com/vocalpy/vak/pull/336). For details see issue [#328](https://github.com/vocalpy/vak/issues/328). ### Changed - refactor `config` sub-package [#335](https://github.com/vocalpy/vak/pull/335). - For details see [#331](https://github.com/vocalpy/vak/issues/331), + For details see [#331](https://github.com/vocalpy/vak/issues/331), [#332](https://github.com/vocalpy/vak/issues/332). ### Fixed - change `model.load` method, so that `torch.load` uses `map_location` parameter [#324](https://github.com/vocalpy/vak/pull/324). - This way, loading a model trained on a GPU won't + This way, loading a model trained on a GPU won't cause a RuntimeError if only a CPU is available. For details see issue [#323](https://github.com/vocalpy/vak/issues/323). -- fix `train_dur_csv_paths.from_dir` so it uses correct dataset splits to - generate `spect_vector`s for `WindowDataset` +- fix `train_dur_csv_paths.from_dir` so it uses correct dataset splits to + generate `spect_vector`s for `WindowDataset` [#336](https://github.com/vocalpy/vak/pull/336). For details see issue [#328](https://github.com/vocalpy/vak/issues/328). ## [0.4.0b1] -- 2021-03-06 ### Added -- add ability to save "raw outputs" of network, e.g. the "logits", +- add ability to save "raw outputs" of network, e.g. the "logits", when running `vak predict` command [#320](https://github.com/vocalpy/vak/pull/320). For details see issue [#90](https://github.com/vocalpy/vak/issues/90). ### Changed -- change `split.algorithms.validate.validate_split_durations_and_convert_nonnegative` - so that it no longer converts all durations to non-negative numbers, because the - functions that call it need to "see" when a target split duration is specified as - -1 (meaning "use any remaining vocalizations in this split") so they can determine +- change `split.algorithms.validate.validate_split_durations_and_convert_nonnegative` + so that it no longer converts all durations to non-negative numbers, because the + functions that call it need to "see" when a target split duration is specified as + -1 (meaning "use any remaining vocalizations in this split") so they can determine properly when they've finished dividing the dataset into splits. Accordingly, rename to `split.algorithms.validate.validate_split_durations`. [#300](https://github.com/vocalpy/vak/pull/300) -- refactor code that programmatically builds `results_path` used in `core` and `cli` +- refactor code that programmatically builds `results_path` used in `core` and `cli` functions that run `train` and `learncurve` [#304](https://github.com/vocalpy/vak/pull/304). - For details see - [comment on pull request](https://github.com/vocalpy/vak/pull/304#issue-576981330). -- refactor `vak.config.parse.from_toml` function into two others, - the original and a new `parse.from_toml_path` + For details see + [comment on pull request](https://github.com/vocalpy/vak/pull/304#issue-576981330). +- refactor `vak.config.parse.from_toml` function into two others, + the original and a new `parse.from_toml_path` [#306](https://github.com/vocalpy/vak/pull/306). For details see issue [#305](https://github.com/vocalpy/vak/issues/305) - switch to using `pytest` to run test suite [#309](https://github.com/vocalpy/vak/pull/309). @@ -571,26 +576,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [#312](https://github.com/vocalpy/vak/pull/312). - parametrize `device` fixture so tests run on CPU and, when present, GPU [#313](https://github.com/vocalpy/vak/pull/313) -- refactor `cli.learncurve` module into a sub-package with separate module for +- refactor `cli.learncurve` module into a sub-package with separate module for `train_csv_paths` helper functions used by `learning_curve` [#319](https://github.com/vocalpy/vak/pull/319) -- lower the lower bounds on dependencies, so that users can install with earlier +- lower the lower bounds on dependencies, so that users can install with earlier versions of `torch`, `torchvision`, etc. [9c6ed46](https://github.com/vocalpy/vak/commit/9c6ed46822c53aaa25f66b050b6490657ec5005b) ### Fixed -- fix `split.algorithms.bruteforce` so that it always returns either a list of +- fix `split.algorithms.bruteforce` so that it always returns either a list of indices or `None` for each split, instead of sometimes returning an empty list - instead of a `None`. Also rewrite this function for clarity and to obey DRY + instead of a `None`. Also rewrite this function for clarity and to obey DRY principle. [#300](https://github.com/vocalpy/vak/pull/300) - fix unit tests [#309](https://github.com/vocalpy/vak/pull/309). -- fix how runs of `learncurve` that use `previous_run_path` get the - "spect vectors" that determine valid windows that can grabbed from +- fix how runs of `learncurve` that use `previous_run_path` get the + "spect vectors" that determine valid windows that can grabbed from the `WindowDataset` - [#319](https://github.com/vocalpy/vak/pull/319). + [#319](https://github.com/vocalpy/vak/pull/319). For details see [#316](https://github.com/vocalpy/vak/issues/316). - There was a bug with the first attempt to fix this, that was resolved by + There was a bug with the first attempt to fix this, that was resolved by [#322](https://github.com/vocalpy/vak/pull/322). For details see issue [#321](https://github.com/vocalpy/vak/issues/321). @@ -599,26 +604,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - automate generation of test data. [#274](https://github.com/vocalpy/vak/pull/274) - This pull request also adds concept of 'source' and 'generated' test data, - and decouples them from the source code in other ways, e.g. adding - a Makefile command that downloads them as .tar.gz files from an + This pull request also adds concept of 'source' and 'generated' test data, + and decouples them from the source code in other ways, e.g. adding + a Makefile command that downloads them as .tar.gz files from an Open Science Framework project. - See details in comment on pull request: + See details in comment on pull request: https://github.com/vocalpy/vak/pull/274#issue-538992350 -- make it possible to specify `spect_output_dir` when `prep`ing datasets, +- make it possible to specify `spect_output_dir` when `prep`ing datasets, the directory where array files containing spectrograms are saved [#290](https://github.com/vocalpy/vak/pull/290). Addresses issue [#289](https://github.com/vocalpy/vak/issues/289). -- add ability to specify `previous_run_path` when running `learncurve`, - so that training data subsets generated by a previous run are used - instead of generating new subsets. Controls for any effect of +- add ability to specify `previous_run_path` when running `learncurve`, + so that training data subsets generated by a previous run are used + instead of generating new subsets. Controls for any effect of changing training data across experiments, and makes things faster [#291](https://github.com/vocalpy/vak/pull/291) ### Changed - make it possible for labels in `labelset` to be multiple characters [##278](https://github.com/vocalpy/vak/pull/278) -- switch to `crowsetta` version 3.0.0, making it possible to specify +- switch to `crowsetta` version 3.0.0, making it possible to specify `csv` as an annotation format [#279](https://github.com/vocalpy/vak/pull/279) - switch to using `soundfile` to load audio files @@ -627,7 +632,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [#283](https://github.com/vocalpy/vak/pull/283) - move `converters` module out of `config` sub-package up to top level [4ad9b93](https://github.com/vocalpy/vak/commit/4ad9b9390be6ac97b3dbe2b459e94d12d35ff051) -- rename `converters.labelset_from_toml_value` to `labelset_to_set` +- rename `converters.labelset_from_toml_value` to `labelset_to_set` since it will be used throughout package (not just with .toml config files) [4ad9b93](https://github.com/vocalpy/vak/commit/4ad9b9390be6ac97b3dbe2b459e94d12d35ff051) - make other functions use `converter.labelset_to_set` for `labelset` argument @@ -639,14 +644,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [9df32e2](https://github.com/vocalpy/vak/commit/9df32e24c650057fc34dd7e53c159bae24192f25) - raise minimum versions for `crowsetta`, at least 3.1.0, and `tweetynet`, at least 0.5.0 [e1a6fbb](https://github.com/vocalpy/vak/commit/e1a6fbb9d3ccdb63167446684a8aecb3e667fd8a) -- make `vak.io.audio.to_spect` use `vak.logging.log_or_print` function +- make `vak.io.audio.to_spect` use `vak.logging.log_or_print` function so that logger messages actually appear in terminal and in log files - [af719b3](https://github.com/vocalpy/vak/commit/af719b30faa4484f2f27a0e0a236310576e8ecb0) - + [af719b3](https://github.com/vocalpy/vak/commit/af719b30faa4484f2f27a0e0a236310576e8ecb0) + ### Fixed -- add missing import of `eval` module to `vak.cli.__init__` and organize import statements +- add missing import of `eval` module to `vak.cli.__init__` and organize import statements [6341c8d](https://github.com/vocalpy/vak/commit/6341c8d4991a4e51565953f8e15d40f13419e6d5) -- fix `vak.files.from_dir` function, that returns list of all files +- fix `vak.files.from_dir` function, that returns list of all files from a directory with specified extension, so that it is case-insensitive [#276](https://github.com/vocalpy/vak/pull/276) - fix `vak.annotation.recursive_stem` function so it is case-insensitive @@ -655,11 +660,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [cbd08f6](https://github.com/vocalpy/vak/commit/cbd08f6deab7a26fbbb1814fbe6349c578dae20f) - fix `find_audio_fname` to work with str and Path [1480b01](https://github.com/vocalpy/vak/commit/1480b01ebc623a64a5c077c26ffdcaa242f29f3e) -- fix how `labelset_to_set` handles set, and add type-checking as pre-condition, +- fix how `labelset_to_set` handles set, and add type-checking as pre-condition, sp that the function doesn't just return `None` [6c454cd](https://github.com/vocalpy/vak/commit/6c454cda3aded7c0cf7ac19a6eef6f6831220033) -- use `poetry` in Makefile to run scripts that generate test data, - so that development version of `vak` is used, +- use `poetry` in Makefile to run scripts that generate test data, + so that development version of `vak` is used, not some other version that might be installed into an environment (e.g. a `conda` environment the developer had activated) [090c205](https://github.com/vocalpy/vak/commit/090c205e227824eda7c1b156f5320129a4809b6b) @@ -667,9 +672,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [d1cbe82](https://github.com/vocalpy/vak/commit/d1cbe82132f46f5cc400524dfefdc94de55c430b) ### Removed -- remove `tweetynet` as a core dependency, since this creates a - circular dependency (`tweetynet` definitely depends on `vak`) - that prevents using `conda-forge`. Instead declare `tweetynet` as +- remove `tweetynet` as a core dependency, since this creates a + circular dependency (`tweetynet` definitely depends on `vak`) + that prevents using `conda-forge`. Instead declare `tweetynet` as a test dependency. [74350a7](https://github.com/vocalpy/vak/commit/c26ad08bfd4057324ba55a1902f7dc2845bc6e40) - remove `output_dir` parameter from `dataframe.from_files` -- not used @@ -679,42 +684,42 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.3.3] ### Fixed -- remove out-of-date install instructions that were confusing people +- remove out-of-date install instructions that were confusing people [#268](https://github.com/vocalpy/vak/pull/268) ## [0.3.2] ### Fixed - fix wrong argument value in call to imshow in `plot.spect_annot` function [648b675](https://github.com/vocalpy/vak/commit/648b675221472f6bcd2750262c57dd8a761099e0) -- fix bug that caused `vak.config.parse` to silently fail when parsing the +- fix bug that caused `vak.config.parse` to silently fail when parsing the `[SPECT_PARAMS]` section of config.toml files [#266](https://github.com/vocalpy/vak/pull/266) ## [0.3.1] ### Fixed -- fix `RuntimeError` under torch 1.6 caused by +- fix `RuntimeError` under torch 1.6 caused by dividing a tensor by an integer in `Model._eval()` method [#250](https://github.com/vocalpy/vak/pull/250). - Fixes [#249](https://github.com/vocalpy/vak/issues/249). + Fixes [#249](https://github.com/vocalpy/vak/issues/249). ## [0.3.0] ### Added -- add functionality to `WindowDataset` that enables training with datasets +- add functionality to `WindowDataset` that enables training with datasets of specified durations [#188](https://github.com/vocalpy/vak/pull/186) -- add transforms for post-hoc clean up of predicted labels for time bins, +- add transforms for post-hoc clean up of predicted labels for time bins, that are applied before converting into segments with labels, onsets, and offsets - + `majority_vote_transform` that find the most frequently occurring label in a segment + + `majority_vote_transform` that find the most frequently occurring label in a segment and assigns it to the entire segment [#227](https://github.com/vocalpy/vak/pull/227) + `remove_short_segments` that removes any segments shorter than a specified duration [#229](https://github.com/vocalpy/vak/pull/229) -- add logic to `WindowDataset.crop_spect_vectors_keep_classes` method so that it tries - to crop a third way, by removing unlabeled segments within vocalizations, if cropping +- add logic to `WindowDataset.crop_spect_vectors_keep_classes` method so that it tries + to crop a third way, by removing unlabeled segments within vocalizations, if cropping the specified duration from the end or beginning fails [#224](https://github.com/vocalpy/vak/pull/224) -- add ability to specify name of .csv file containing annotations produced by +- add ability to specify name of .csv file containing annotations produced by `vak.core.predict` [#232](https://github.com/vocalpy/vak/pull/232) -- make it so that ItemTransforms (optionally) return path to array files - containing spectrograms, so user can easily link train/test/predict data +- make it so that ItemTransforms (optionally) return path to array files + containing spectrograms, so user can easily link train/test/predict data returned by `DataLoader` to the source file [#236](https://github.com/vocalpy/vak/pull/236) - add functions for plotting spectrograms and annotation to `plot` sub-package @@ -724,18 +729,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - refactor to remove `util`s modules [#196](https://github.com/vocalpy/vak/pull/196) - add `core.predict` module and rewrite `cli.predict` to use it [#210](https://github.com/vocalpy/vak/pull/210) -- modify `vak.split.algorithms.brute_force` so that it - starts by seeding each split with one instance of each - label in the label set. Quick tests found that this - improves success rate of splits on one dataset +- modify `vak.split.algorithms.brute_force` so that it + starts by seeding each split with one instance of each + label in the label set. Quick tests found that this + improves success rate of splits on one dataset with many (30) classes. - [#218](https://github.com/vocalpy/vak/pull/218) -- change `core.predict` so that it always saves - predicted annotations as a .csv file + [#218](https://github.com/vocalpy/vak/pull/218) +- change `core.predict` so that it always saves + predicted annotations as a .csv file [#222](https://github.com/vocalpy/vak/pull/222). Removed functionality for converting to other formats. See discussion in [#212](https://github.com/vocalpy/vak/issues/211) -- change warning issued by `split.train_test_dur_split_inds` to a log +- change warning issued by `split.train_test_dur_split_inds` to a log statement [#231](https://github.com/vocalpy/vak/pull/231) - use `VocalDataset` in `core.predict`, see discussion in issue [#206](https://github.com/vocalpy/vak/issues/206) @@ -760,8 +765,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 [#215](https://github.com/vocalpy/vak/pull/215) - fix bug in `WindowDataset.crop_spect_vectors_keep_classes` [#217](https://github.com/vocalpy/vak/issues/217) - that caused `x_inds` to have invalid values when the - `WindowDataset.crop_spect_vectors_keep_classes` function + that caused `x_inds` to have invalid values when the + `WindowDataset.crop_spect_vectors_keep_classes` function cropped the vectors to a specified duration "from the front" [#219](https://github.com/vocalpy/vak/pull/219) - remove line that caused `vak predict` to crash @@ -769,14 +774,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 when model was trained without a `SpectStandardizer` transform [#221](https://github.com/vocalpy/vak/pull/221) - fix bugs that prevented `vak eval` cli command from working - [#238](https://github.com/vocalpy/vak/pull/238) -- fix bug in `labels.lbl_tb2labels` (https://github.com/vocalpy/vak/issues/239) + [#238](https://github.com/vocalpy/vak/pull/238) +- fix bug in `labels.lbl_tb2labels` (https://github.com/vocalpy/vak/issues/239) that resulted from lack of input validation and an indentation error - [#240](https://github.com/vocalpy/vak/pull/240) -- fix how segment onsets and offsets are converted from time bin "units" + [#240](https://github.com/vocalpy/vak/pull/240) +- fix how segment onsets and offsets are converted from time bin "units" back to seconds [#246](https://github.com/vocalpy/vak/pull/246). Fixes [#237](https://github.com/vocalpy/vak/issues/237). -- fix .toml config file used with "autoannotate" tutorial, +- fix .toml config file used with "autoannotate" tutorial, and revise related section of tutorial on prediction [#247](https://github.com/vocalpy/vak/pull/247). Fixes [#223](https://github.com/vocalpy/vak/issues/223). @@ -790,18 +795,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.3.0a5] ### Added - add functions `format_from_df` and `from_df` to `vak.util.annotation` - [#107](https://github.com/vocalpy/vak/pull/107) - + `vak.util.annotation.from_from_df` returns annotation format associated with a + [#107](https://github.com/vocalpy/vak/pull/107) + + `vak.util.annotation.from_from_df` returns annotation format associated with a dataset. Raises an error if more than one annotation format or if format is none. - + `vak.util.annotation.from_df` function returns list of annotations + + `vak.util.annotation.from_df` function returns list of annotations (i.e. `crowsetta.Annotation` instances), one corresponding to each row in the dataframe `df`. - - encapsulates control flow logic for getting all labels from a dataset of + - encapsulates control flow logic for getting all labels from a dataset of annotated vocalizations represented as a Pandas DataFrame - + handles case where each vocalization has a separate annotation file + + handles case where each vocalization has a separate annotation file + and the case where all vocalizations have annotations in a single file - `vak.util.labels.from_df` function [#103](https://github.com/vocalpy/vak/pull/103) + checks for single annotation type, load all annotations, and then get just labels from those - + modified to use `util.annotation.from_df` and `vak.util.annotation.format_from_df` + + modified to use `util.annotation.from_df` and `vak.util.annotation.format_from_df` in [#107](https://github.com/vocalpy/vak/pull/107) - logic in `vak.cli.prep` that raises an informative error message when config.toml file specifies a duration for training set, but durations for validation and test sets are zero or None @@ -810,51 +815,51 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - 3 transform classes, and `vak.transforms.util` module [#112](https://github.com/vocalpy/vak/pull/112) + with `get_defaults` function - encapsulates logic for building transforms, to make `train`, `predict` etc. less verbose - + obeys DRY, avoid declaring the same utility transforms like to_floattensor and add_channel in - multiple functions + + obeys DRY, avoid declaring the same utility transforms like to_floattensor and add_channel in + multiple functions - add `labelset_from_toml_value` to converters [#115](https://github.com/vocalpy/vak/pull/115) + casts any value for the `labelset` option in a .toml config file to a set of characters [#127](https://github.com/vocalpy/vak/pull/127) - + uses `vak.util.general.range_str` so that user can specify + + uses `vak.util.general.range_str` so that user can specify set of labels with a "range string", e.g. `range: 1-27, 29` [#115](https://github.com/vocalpy/vak/pull/115) - add logging module in `vak.util` [#132](https://github.com/vocalpy/vak/pull/132) - add converters and validators for dataset split durations [#143](https://github.com/vocalpy/vak/pull/143) - add `logger` parameters to `io` sub-package functions, so they can use logger created by `cli` functions [#145](https://github.com/vocalpy/vak/pull/145) -- add `log_or_print` function to `util.logging` that either writes message to logger, +- add `log_or_print` function to `util.logging` that either writes message to logger, or simply prints the message if there is no logger [#147](https://github.com/vocalpy/vak/pull/147) -- add `logger` attribute to `vak.Model` class, used to log if not None +- add `logger` attribute to `vak.Model` class, used to log if not None [#148](https://github.com/vocalpy/vak/pull/148) -- add Tensorboard `SummaryWriter` to `vak.Model` class so there is an `events` file recording each +- add Tensorboard `SummaryWriter` to `vak.Model` class so there is an `events` file recording each model's training history [#149](https://github.com/vocalpy/vak/pull/149) + and add Tensorboard as a dependency in [#162](https://github.com/vocalpy/vak/pull/162) - add additional logging to `Model` class [#153](https://github.com/vocalpy/vak/pull/153) -- add initial tutorial on using `vak` for automated annotation of vocalizations +- add initial tutorial on using `vak` for automated annotation of vocalizations [#156](https://github.com/vocalpy/vak/pull/156) -- add `VocalDataset`, more generalized form of a dataset where the input to a network is contained in a source - file, e.g. a .npz array file with a spectrogram, and the optional target is the annotation +- add `VocalDataset`, more generalized form of a dataset where the input to a network is contained in a source + file, e.g. a .npz array file with a spectrogram, and the optional target is the annotation [#165](https://github.com/vocalpy/vak/pull/165) -- add `transforms.defaults` with `ItemTransforms` that return dictionaries. Decouples logic for +- add `transforms.defaults` with `ItemTransforms` that return dictionaries. Decouples logic for what will be in returned "items" from the different dataset classes [#165](https://github.com/vocalpy/vak/pull/165) - add `eval` command to command-line interface [#179](https://github.com/vocalpy/vak/pull/179) -- add `vak.core` sub-package with "core" functions that are called by corresponding functions in - `vak.cli`, e.g. `vak.cli.train` calls `vak.core.train`; de-couples high-level functionality from - command-line interface, and makes it possible for one high-level function to call another, i.e., +- add `vak.core` sub-package with "core" functions that are called by corresponding functions in + `vak.cli`, e.g. `vak.cli.train` calls `vak.core.train`; de-couples high-level functionality from + command-line interface, and makes it possible for one high-level function to call another, i.e., `vak.core.learncurve` calls `vak.core.train` and `vak.core.eval` [#183](https://github.com/vocalpy/vak/pull/183) - add computation of distance metrics to `Model._eval` method [#185](https://github.com/vocalpy/vak/pull/185) ### Changed -- rewrite `vak.util.dataset.has_unlabeled` to use `annotation.from_df` +- rewrite `vak.util.dataset.has_unlabeled` to use `annotation.from_df` [#107](https://github.com/vocalpy/vak/pull/107) - bump minimum version of `TweetyNet` to 0.3.1 in [#120](https://github.com/vocalpy/vak/pull/120) + so that `yarden2annot` function from `TweetyNet` will return annotation labels as string, not int -- rewrite `vak.util.annotation.source_annot_map` so that it maps annotations *to* source files, not +- rewrite `vak.util.annotation.source_annot_map` so that it maps annotations *to* source files, not vice versa [#130](https://github.com/vocalpy/vak/pull/130) + more specifically, it no longer crashes if it can't map every annotation to a source file + instead it crashes if it can't map every source file to an annotation -- change `vak.annotation.from_df` to better handle single annotation files +- change `vak.annotation.from_df` to better handle single annotation files [#131](https://github.com/vocalpy/vak/pull/131) + no longer crashes if the number of annotations from the file does not exactly match the number of source files + instead only requires there at least as many annotations as there are source files @@ -864,27 +869,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - change default value for util.general.timebin_dur_from_vec parameter n_decimals_trunc from 3 to 5 [#136](https://github.com/vocalpy/vak/pull/136) - rewrite + rename `splitalgos.validate.durs` [#143](https://github.com/vocalpy/vak/pull/143) -- parallelize validation of spectrogram files, so it's faster on large datasets +- parallelize validation of spectrogram files, so it's faster on large datasets [#144](https://github.com/vocalpy/vak/pull/144) - bump minimum version of `TweetyNet` to 0.4.0 in [#155](https://github.com/vocalpy/vak/pull/155) + so `TweetyNetModel.from_class` method accepts `logger` argument - change checkpointing and validation so that they occur on specific steps, not epochs. [#161](https://github.com/vocalpy/vak/pull/161) - This way models with very large training sets that may run for only 1-2 epochs still intermittently save + This way models with very large training sets that may run for only 1-2 epochs still intermittently save checkpoints as backups and measure performance on the validation set. -- change names of `TrainConfig` attributes `val_error_step` and `checkpoint_step` to `val_step` and `ckpt_step` - for brevity + clarity. [#161](https://github.com/vocalpy/vak/pull/161) Also changed the names of the +- change names of `TrainConfig` attributes `val_error_step` and `checkpoint_step` to `val_step` and `ckpt_step` + for brevity + clarity. [#161](https://github.com/vocalpy/vak/pull/161) Also changed the names of the corresponding `vak.Model.fit` method parameters to match. -- change `vak.Model._eval` method to work like `vak.cli.predict` does, feeding models non-overlapping +- change `vak.Model._eval` method to work like `vak.cli.predict` does, feeding models non-overlapping windows from spectrograms [#165](https://github.com/vocalpy/vak/pull/165) -- change `reshape_to_window` transform to `view_as_window_batch` because it was not working as intended +- change `reshape_to_window` transform to `view_as_window_batch` because it was not working as intended [#165](https://github.com/vocalpy/vak/pull/165) - bump minimum version of `TweetyNet` to 0.4.1 in [#172](https://github.com/vocalpy/vak/pull/172) + version that changes optimizer back to `Adam` - raise lower bound on `crowsetta` version to 2.2.0, to get fixes for `koumura2annot` and avoid errors when `annot_file` is provided as a `pathlib.Path` instead of a `str` [#175](https://github.com/vocalpy/vak/pull/175) -- change `Model._eval` method so it returns metrics average across batches, in addition to +- change `Model._eval` method so it returns metrics average across batches, in addition to the value for each batch [#185](https://github.com/vocalpy/vak/pull/185) - raise minimum version of `TweetyNet` to 0.4.2, adds distance metrics to `TweetyNetModel` @@ -893,37 +898,37 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed - add missing `shuffle` option to [TRAIN] and [LEARNCURVE] sections in `valid.toml` [#109](https://github.com/vocalpy/vak/pull/109) -- bug that prevented filtering out vocalizations from a dataset when labels are present +- bug that prevented filtering out vocalizations from a dataset when labels are present in that vocalization that are not in the specified labelset [#118](https://github.com/vocalpy/vak/pull/118) - fix logging for `vak.prep` command [#132](https://github.com/vocalpy/vak/pull/132) -- fix how dataset duration splits are validated [#143](https://github.com/vocalpy/vak/pull/143), +- fix how dataset duration splits are validated [#143](https://github.com/vocalpy/vak/pull/143), see issue [#140](https://github.com/vocalpy/vak/issues/140) for details. - fix error due to calling a Path attribute on a string [#144](https://github.com/vocalpy/vak/pull/144) as identified in issue [#123](https://github.com/vocalpy/vak/issues/123) -- fix indent error in `Model.fit` method (see issue [#151](https://github.com/vocalpy/vak/issues/151)) - that stopped training early [#153](https://github.com/vocalpy/vak/pull/153) -- fix bug [#166](https://github.com/vocalpy/vak/issues/166) - that let training continue even after `patience` number of validation steps had elapsed - without an increase in accuracy [#168](https://github.com/vocalpy/vak/pull/168) -- fix `learncurve` functionality so it will work in version `0.3.0` +- fix indent error in `Model.fit` method (see issue [#151](https://github.com/vocalpy/vak/issues/151)) + that stopped training early [#153](https://github.com/vocalpy/vak/pull/153) +- fix bug [#166](https://github.com/vocalpy/vak/issues/166) + that let training continue even after `patience` number of validation steps had elapsed + without an increase in accuracy [#168](https://github.com/vocalpy/vak/pull/168) +- fix `learncurve` functionality so it will work in version `0.3.0` [#183](https://github.com/vocalpy/vak/pull/183) ### Removed -- remove `vak.util.general.safe_truncate` function, no longer used +- remove `vak.util.general.safe_truncate` function, no longer used [#137](https://github.com/vocalpy/vak/issues/137) -- remove redundant validation of split durations in `util.split` +- remove redundant validation of split durations in `util.split` [#143](https://github.com/vocalpy/vak/pull/143) - removed `save_only_single_checkpoint_file` option and functionality - [#161](https://github.com/vocalpy/vak/pull/161). + [#161](https://github.com/vocalpy/vak/pull/161). Now save only one checkpoint as backup, and another for best performance on validation set if provided. See discussion in pull request and the issues it fixes for more detail. ## [0.3.0a4] ### Added -- warning when user runs `vak prep` with config.toml file that has a `[PREDICT]` +- warning when user runs `vak prep` with config.toml file that has a `[PREDICT]` section *and* a `labelset` option in the `[PREP]` section. - better error handling when parsing a config.toml file fails - + traceback now ends with clear message about error parsing .toml file, but still + + traceback now ends with clear message about error parsing .toml file, but still includes information from `toml` exception ### Fixed @@ -931,9 +936,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.3.0a3] ### Fixed -- add missing sections and options to .toml file that is used to validate - user config.toml files, so that those options don't cause - invalid section / option errors +- add missing sections and options to .toml file that is used to validate + user config.toml files, so that those options don't cause + invalid section / option errors ## [0.3.0a2] ### Fixed @@ -942,9 +947,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - [PREDICT] section now has `annot_format` option -- user can specify whatever format they want, doesn't have to be same as training data -- [PREDICT] section of config now has `to_format_kwargs` option, - that lets user specify keyword arguments to `crowsetta.Transcriber.to_format` - method for the annotation format of files made from predictions +- [PREDICT] section of config now has `to_format_kwargs` option, + that lets user specify keyword arguments to `crowsetta.Transcriber.to_format` + method for the annotation format of files made from predictions ## [0.3.0a1] ### Fixed @@ -955,41 +960,41 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `Dataset` class and related classes that were in `vak.dataset` sub-package + see `dataframe` module added below that replaces this abstraction - dependency on `Tensorflow` - + switch to `torch` because of consistent API, need to work with "mid-level" + + switch to `torch` because of consistent API, need to work with "mid-level" abstractions, and preference for Python-first framework - `core` sub-package + the idea is that the `cli` package should just implement all the logic that lets people who don't want to program use the main functionality - + and if you do want to program, the rest of the library should facilitate that + + and if you do want to program, the rest of the library should facilitate that *instead of* trying to do all the work for you - e.g. give someone w/basic coding skills friendly Python classes to work with - when writing a torch-vernacular training script, instead of + when writing a torch-vernacular training script, instead of giving them a giant `train` function with 3k arguments that no one will ever use - - `AbstractVakModel` class -- gets replaced with `vak.Model` in `engine` sub-package, + - `AbstractVakModel` class -- gets replaced with `vak.Model` in `engine` sub-package, see below - + ### Changed - `dataset` sub-package becomes `io` sub-package ("input-output", like in `astropy`) -- use `torch` and `torchvision` in place of `tensorflow` +- use `torch` and `torchvision` in place of `tensorflow` - use `crowsetta` version 2.0 - switch to `toml` format for config files - + more flexible than `ini` files, less code to maintain for parsing things that + + more flexible than `ini` files, less code to maintain for parsing things that don't fit into the `ini` format very well / not at all -- clean up `vak` package structure wherever possible: move many modules into +- clean up `vak` package structure wherever possible: move many modules into `util` sub-package ### Added - `dataframe` module in `vak.io` - + essentially, data path is audio --> spect --> dataframe --> .csv file that represents + + essentially, data path is audio --> spect --> dataframe --> .csv file that represents a dataset - + choose to use external libraries that are already well-maintained and established to + + choose to use external libraries that are already well-maintained and established to handle as much of the data processing as possible, i.e. `pandas` + `dask`, instead of trying to maintain a `Dataset` class that does all this work and deals with its own filetype - `datasets` sub-package + uses `torch` and `torchvision` abstractions to represent datasets + dataloaders - `transforms` sub-package - + uses `torchvision` transform abstraction to deal with things like "normalizing" + + uses `torchvision` transform abstraction to deal with things like "normalizing" spectrograms - `engine` sub-package + with `Model` class that models should sub-class; helps encourage consistent API for models @@ -1015,7 +1020,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - `vak.core.learncurve.test_one_model` function that makes it easier to measure frame and syllable error, etc., on a single trained model -- add `move_spects` method to `Dataset` so an instance of a `Dataset` is not locked to a +- add `move_spects` method to `Dataset` so an instance of a `Dataset` is not locked to a particular location ### Changed @@ -1034,11 +1039,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.1.0] ### Added -- add helper function to TestLearncurve that multiple unit tests can use to assert all outputs +- add helper function to TestLearncurve that multiple unit tests can use to assert all outputs were generated. Now being used to make sure bug fixed in 0.1.0a8 stays fixed. - error checking in cli that raises ValueError when cli command is `learncurve` and the option 'results_dir_made_by_main_script' is already defined in [OUTPUT] section, since running - 'learncurve' would overwrite it. + 'learncurve' would overwrite it. - `dataset` subpackage that houses `Dataset` and related classes that facilitate creating data sets for training neural networks from heterogeneous data: audio files, files of arrays containing spectrograms, different annotation types, etc. - also includes modules for handling each data source + e.g. `audio.to_spect` creates spectrograms from audio files @@ -1067,7 +1072,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.1.0a8] ### Fixed -- Fix how main loop in `learncurve` re-loads indices for grabbing subsets of training data after +- Fix how main loop in `learncurve` re-loads indices for grabbing subsets of training data after generating them, and do so in a way that still allows for re-using subsets from previous runs ## [0.1.0a7] @@ -1091,16 +1096,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [0.1.0a5] ### Added -- Use `attrs`-based classes to represent sections of config.ini files - +- Use `attrs`-based classes to represent sections of config.ini files + ### Changed - rewrite `vak.cli` so it can deal with state of config.ini files - + e.g. doesn't throw an error if `train_data_path` not declared as an option in [TRAIN] when running `vak prep` + + e.g. doesn't throw an error if `train_data_path` not declared as an option in [TRAIN] when running `vak prep` (since training data won't exist yet, doesn't make sense to throw an error). ### Removed -- remove code about `freq_bins` in a couple of places, since the number of frequency bins - in spectrograms is now just determined programmatically +- remove code about `freq_bins` in a couple of places, since the number of frequency bins + in spectrograms is now just determined programmatically + `vak.config.data` no longer has `freq_bins` field in DataConfig namedtuple + `make_data` no longer adds `freq_bins` option to [DATA] section after making data sets