-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(all): remove useless models and code
- Loading branch information
1 parent
f056d87
commit b2b9845
Showing
135 changed files
with
227 additions
and
16,484 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
checkpoints/ | ||
datasets/ | ||
data/ | ||
__pycache__ | ||
.ipynb_checkpoints/ | ||
build | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,58 +1,39 @@ | ||
# ssl-for-slr | ||
|
||
Framework to train a speech encoder in a **self-supervised** way for **speaker and language recognition** tasks. | ||
Collection of **self-supervised** models for **speaker and language recognition** tasks. | ||
|
||
The aim is to train a speech encoder by using multiple self-supervised modules as shown on figure below. | ||
## Models | ||
|
||
## Features | ||
|
||
- Configurable speech encoders (1D conv layers, GRU, skip connections, [SincNet](https://arxiv.org/abs/1808.00158)) | ||
- Self-supervised models: | ||
- [Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf) *(unidirectional or bidirectional)* | ||
- [vq-wav2vec](https://arxiv.org/pdf/1910.05453.pdf) | ||
- [Wav2Vec 2.0](https://arxiv.org/pdf/2006.11477.pdf) | ||
- [Local Info Max (LIM)](https://arxiv.org/pdf/1812.00271.pdf) and Global Info Max (GIM) | ||
- [PASE](https://arxiv.org/pdf/1904.03416.pdf) and [PASE+](https://arxiv.org/pdf/2001.09239.pdf) with the following workers: *Waveform*, *LPS*, *MFCC*, *CPC*, *LIM* and *GIM* | ||
- Evaluation on speaker recognition, speaker verification, language recognition and data-efficiency | ||
- Handle *LibriSpeech* and *VoxLingua107* datasets | ||
- Speech augmentation module *(reverberation, noise, frequency and temporal masks, clipping, ...)* | ||
- Modular configuration files | ||
- **CPC**: [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf) | ||
- **LIM/GIM**: [Learning Speaker Representations with Mutual Information](https://arxiv.org/pdf/1812.00271.pdf) | ||
- **SimCLR**: [Contrastive Self-Supervised Learning for Text-Independent Speaker Verification](https://sci-hub.mksa.top/10.1109/icassp39728.2021.9413351) | ||
- **MoCo**: [Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning](https://arxiv.org/pdf/2012.07178.pdf) | ||
|
||
## Usage | ||
|
||
### Install dependencies (inside a virtual env) | ||
|
||
1. `virtualenv ~/ssl-for-slr-env && source ~/ssl-for-slr-env/bin/activate` | ||
2. `pip install -r requirements.txt` | ||
|
||
*Type `deactivate` to exit the virtual env after use.* | ||
|
||
### Train model on pretext task | ||
|
||
``` | ||
python train.py configs/cpc-v1.json | ||
``` | ||
Start self-supervised training with `python train.py configs/cpc-base.json`. | ||
|
||
*Multiple config files are located in the `config/` folder.* | ||
|
||
### Evaluate model on downstream task *(speaker or language recognition)* | ||
|
||
1. Train a classifier on top of the previsouly trained encoder: `python train_evaluate.py configs/cpc-v1.json`. | ||
2. Use notebook `evaluate.ipnyb` to evaluate metrics obtained on the downstream task. | ||
Then, you can evaluate model on speaker verification (EER, minDCF) with `python evaluate.py configs/cpc-base.json`. | ||
|
||
## To-Do | ||
|
||
- [ ] Create config for different models (5) -> train -> evaluate -> experiment | ||
- [ ] Data augmentation / MFCC pipeline (cache features with create_features.py?) | ||
- [ ] Refactor project | ||
- [ ] Data: check similar (padding) [30min] | ||
- [ ] Evaluate: check works [30min] | ||
- [ ] Model: clamp W, init -5 10, check similar encoder, mfcc [1h] | ||
- [ ] Start SimCLR training [30min] | ||
|
||
--- | ||
- [ ] Reproduce results of SimCLR | ||
- [ ] If not working => use voxceleb_trainer implem | ||
- [ ] Add data augmentation | ||
- [ ] Evaluate: add minDCF | ||
- [ ] Experiment with VICReg | ||
|
||
- [ ] Dataset: cache useful? do not store audio cache in checkpoints/model/ | ||
- [ ] Refactor evaluation (choose type of classifier: random, surpervised) | ||
- [ ] Use dataclass and YAML for all configs | ||
--- | ||
|
||
- [ ] Explain data preparation / reproduction + cite articles in README | ||
- [ ] Use dataclass and YAML for model configs | ||
- [ ] CPC/LIM: @tf.function warning when doing tensor[1, :] | ||
- [ ] Fix error end training saving history.npy | ||
- [ ] Fix warning loading weights not used | ||
- [ ] Create custom training loop (https://stackoverflow.com/questions/57971007/tensorflow-2-0-display-progress-bar-in-custom-training-loop) | ||
- [ ] Allow restore optimizer |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
{ | ||
"name": "cpc-base", | ||
"seed": 1717, | ||
"encoder": { | ||
"type": "CPC", | ||
"encoded_dim": 512, | ||
"weight_regularizer": 1e-4 | ||
}, | ||
"model": { | ||
"type": "CPC", | ||
"nb_timesteps_to_predict": 12, | ||
"context_network": { | ||
"type": "GRU", | ||
"dim": 256, | ||
"nb_layers": 1 | ||
}, | ||
"bidirectional": false, | ||
"weight_regularizer": 1e-4 | ||
}, | ||
"training": { | ||
"epochs": 50, | ||
"batch_size": 64, | ||
"learning_rate": 0.0001 | ||
}, | ||
"dataset": { | ||
"sample_frequency": 16000, | ||
"frame_length": 20480, | ||
"max_samples": 1000, | ||
"train": "./data/debug.scp", | ||
"test": "./data/debug.scp", | ||
"trials": "./data/voxceleb1_test/trials" | ||
} | ||
} |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
{ | ||
"name": "simclr-base", | ||
"seed": 1717, | ||
"encoder": { | ||
"type": "ThinResNet34", | ||
"encoded_dim": 512, | ||
"weight_regularizer": 1e-4 | ||
}, | ||
"model": { | ||
"type": "SimCLR", | ||
"channel_loss_factor": 0.1, | ||
"weight_regularizer": 1e-4 | ||
}, | ||
"training": { | ||
"epochs": 100, | ||
"optimizer": { | ||
"type": "Adam" | ||
}, | ||
"batch_size": 256, | ||
"learning_rate": 0.001 | ||
}, | ||
"dataset": { | ||
"sample_frequency": 16000, | ||
"frame_length": 20480, | ||
"train": "./data/debug.scp", | ||
"test": "./data/debug.scp", | ||
"trials": "./data/voxceleb1_test/trials" | ||
} | ||
} |
Oops, something went wrong.