Skip to content

Commit

Permalink
refactor(all): remove useless models and code
Browse files Browse the repository at this point in the history
  • Loading branch information
theolepage committed Oct 20, 2021
1 parent f056d87 commit b2b9845
Show file tree
Hide file tree
Showing 135 changed files with 227 additions and 16,484 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
checkpoints/
datasets/
data/
__pycache__
.ipynb_checkpoints/
build
Expand Down
61 changes: 21 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,39 @@
# ssl-for-slr

Framework to train a speech encoder in a **self-supervised** way for **speaker and language recognition** tasks.
Collection of **self-supervised** models for **speaker and language recognition** tasks.

The aim is to train a speech encoder by using multiple self-supervised modules as shown on figure below.
## Models

## Features

- Configurable speech encoders (1D conv layers, GRU, skip connections, [SincNet](https://arxiv.org/abs/1808.00158))
- Self-supervised models:
- [Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf) *(unidirectional or bidirectional)*
- [vq-wav2vec](https://arxiv.org/pdf/1910.05453.pdf)
- [Wav2Vec 2.0](https://arxiv.org/pdf/2006.11477.pdf)
- [Local Info Max (LIM)](https://arxiv.org/pdf/1812.00271.pdf) and Global Info Max (GIM)
- [PASE](https://arxiv.org/pdf/1904.03416.pdf) and [PASE+](https://arxiv.org/pdf/2001.09239.pdf) with the following workers: *Waveform*, *LPS*, *MFCC*, *CPC*, *LIM* and *GIM*
- Evaluation on speaker recognition, speaker verification, language recognition and data-efficiency
- Handle *LibriSpeech* and *VoxLingua107* datasets
- Speech augmentation module *(reverberation, noise, frequency and temporal masks, clipping, ...)*
- Modular configuration files
- **CPC**: [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf)
- **LIM/GIM**: [Learning Speaker Representations with Mutual Information](https://arxiv.org/pdf/1812.00271.pdf)
- **SimCLR**: [Contrastive Self-Supervised Learning for Text-Independent Speaker Verification](https://sci-hub.mksa.top/10.1109/icassp39728.2021.9413351)
- **MoCo**: [Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning](https://arxiv.org/pdf/2012.07178.pdf)

## Usage

### Install dependencies (inside a virtual env)

1. `virtualenv ~/ssl-for-slr-env && source ~/ssl-for-slr-env/bin/activate`
2. `pip install -r requirements.txt`

*Type `deactivate` to exit the virtual env after use.*

### Train model on pretext task

```
python train.py configs/cpc-v1.json
```
Start self-supervised training with `python train.py configs/cpc-base.json`.

*Multiple config files are located in the `config/` folder.*

### Evaluate model on downstream task *(speaker or language recognition)*

1. Train a classifier on top of the previsouly trained encoder: `python train_evaluate.py configs/cpc-v1.json`.
2. Use notebook `evaluate.ipnyb` to evaluate metrics obtained on the downstream task.
Then, you can evaluate model on speaker verification (EER, minDCF) with `python evaluate.py configs/cpc-base.json`.

## To-Do

- [ ] Create config for different models (5) -> train -> evaluate -> experiment
- [ ] Data augmentation / MFCC pipeline (cache features with create_features.py?)
- [ ] Refactor project
- [ ] Data: check similar (padding) [30min]
- [ ] Evaluate: check works [30min]
- [ ] Model: clamp W, init -5 10, check similar encoder, mfcc [1h]
- [ ] Start SimCLR training [30min]

---
- [ ] Reproduce results of SimCLR
- [ ] If not working => use voxceleb_trainer implem
- [ ] Add data augmentation
- [ ] Evaluate: add minDCF
- [ ] Experiment with VICReg

- [ ] Dataset: cache useful? do not store audio cache in checkpoints/model/
- [ ] Refactor evaluation (choose type of classifier: random, surpervised)
- [ ] Use dataclass and YAML for all configs
---

- [ ] Explain data preparation / reproduction + cite articles in README
- [ ] Use dataclass and YAML for model configs
- [ ] CPC/LIM: @tf.function warning when doing tensor[1, :]
- [ ] Fix error end training saving history.npy
- [ ] Fix warning loading weights not used
- [ ] Create custom training loop (https://stackoverflow.com/questions/57971007/tensorflow-2-0-display-progress-bar-in-custom-training-loop)
- [ ] Allow restore optimizer
24 changes: 0 additions & 24 deletions cache_features.py

This file was deleted.

50 changes: 0 additions & 50 deletions configs/cpc-base-kaldi.json

This file was deleted.

51 changes: 0 additions & 51 deletions configs/cpc-base-kaldi_boosted.json

This file was deleted.

33 changes: 33 additions & 0 deletions configs/cpc-base.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"name": "cpc-base",
"seed": 1717,
"encoder": {
"type": "CPC",
"encoded_dim": 512,
"weight_regularizer": 1e-4
},
"model": {
"type": "CPC",
"nb_timesteps_to_predict": 12,
"context_network": {
"type": "GRU",
"dim": 256,
"nb_layers": 1
},
"bidirectional": false,
"weight_regularizer": 1e-4
},
"training": {
"epochs": 50,
"batch_size": 64,
"learning_rate": 0.0001
},
"dataset": {
"sample_frequency": 16000,
"frame_length": 20480,
"max_samples": 1000,
"train": "./data/debug.scp",
"test": "./data/debug.scp",
"trials": "./data/voxceleb1_test/trials"
}
}
40 changes: 0 additions & 40 deletions configs/debug.json

This file was deleted.

43 changes: 0 additions & 43 deletions configs/moco-base-kaldi.json

This file was deleted.

29 changes: 29 additions & 0 deletions configs/simclr-base.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"name": "simclr-base",
"seed": 1717,
"encoder": {
"type": "ThinResNet34",
"encoded_dim": 512,
"weight_regularizer": 1e-4
},
"model": {
"type": "SimCLR",
"channel_loss_factor": 0.1,
"weight_regularizer": 1e-4
},
"training": {
"epochs": 100,
"optimizer": {
"type": "Adam"
},
"batch_size": 256,
"learning_rate": 0.001
},
"dataset": {
"sample_frequency": 16000,
"frame_length": 20480,
"train": "./data/debug.scp",
"test": "./data/debug.scp",
"trials": "./data/voxceleb1_test/trials"
}
}
Loading

0 comments on commit b2b9845

Please sign in to comment.