refactor(all): remove useless models and code

theolepage · Oct 20, 2021 · b2b9845 · b2b9845
1 parent f056d87
commit b2b9845
Show file tree

Hide file tree

Showing 135 changed files with 227 additions and 16,484 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,5 @@
 checkpoints/
-datasets/
+data/
 __pycache__
 .ipynb_checkpoints/
 build

diff --git a/README.md b/README.md
@@ -1,58 +1,39 @@
 # ssl-for-slr
 
-Framework to train a speech encoder in a **self-supervised** way for **speaker and language recognition** tasks.
+Collection of **self-supervised** models for **speaker and language recognition** tasks.
 
-The aim is to train a speech encoder by using multiple self-supervised modules as shown on figure below.
+## Models
 
-## Features
-
-- Configurable speech encoders (1D conv layers, GRU, skip connections, [SincNet](https://arxiv.org/abs/1808.00158))
-- Self-supervised models:
-    - [Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf) *(unidirectional or bidirectional)*
-    - [vq-wav2vec](https://arxiv.org/pdf/1910.05453.pdf)
-    - [Wav2Vec 2.0](https://arxiv.org/pdf/2006.11477.pdf)
-    - [Local Info Max (LIM)](https://arxiv.org/pdf/1812.00271.pdf) and Global Info Max (GIM)
-    - [PASE](https://arxiv.org/pdf/1904.03416.pdf) and [PASE+](https://arxiv.org/pdf/2001.09239.pdf) with the following workers: *Waveform*, *LPS*, *MFCC*, *CPC*, *LIM* and *GIM*
-- Evaluation on speaker recognition, speaker verification, language recognition and data-efficiency
-- Handle *LibriSpeech* and *VoxLingua107* datasets
-- Speech augmentation module *(reverberation, noise, frequency and temporal masks, clipping, ...)*
-- Modular configuration files
+- **CPC**: [Representation Learning with Contrastive Predictive Coding](https://arxiv.org/pdf/1807.03748.pdf)
+-  **LIM/GIM**: [Learning Speaker Representations with Mutual Information](https://arxiv.org/pdf/1812.00271.pdf)
+-  **SimCLR**: [Contrastive Self-Supervised Learning for Text-Independent Speaker Verification](https://sci-hub.mksa.top/10.1109/icassp39728.2021.9413351)
+-  **MoCo**: [Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning](https://arxiv.org/pdf/2012.07178.pdf)
 
 ## Usage
 
-### Install dependencies (inside a virtual env)
-
-1. `virtualenv ~/ssl-for-slr-env && source ~/ssl-for-slr-env/bin/activate`
-2. `pip install -r requirements.txt`
-
-*Type `deactivate` to exit the virtual env after use.*
-
-### Train model on pretext task
-
-```
-python train.py configs/cpc-v1.json
-```
+Start self-supervised training with `python train.py configs/cpc-base.json`.
 
-*Multiple config files are located in the `config/` folder.*
-
-### Evaluate model on downstream task *(speaker or language recognition)*
-
-1. Train a classifier on top of the previsouly trained encoder: `python train_evaluate.py configs/cpc-v1.json`.
-2. Use notebook `evaluate.ipnyb` to evaluate metrics obtained on the downstream task.
+Then, you can evaluate model on speaker verification (EER, minDCF) with `python evaluate.py configs/cpc-base.json`.
 
 ## To-Do
 
-- [ ] Create config for different models (5) -> train -> evaluate -> experiment
-- [ ] Data augmentation / MFCC pipeline (cache features with create_features.py?)
+- [ ] Refactor project
+    - [ ] Data: check similar (padding) [30min]
+    - [ ] Evaluate: check works [30min]
+    - [ ] Model: clamp W, init -5 10, check similar encoder, mfcc [1h]
+    - [ ] Start SimCLR training [30min]
 
----
+- [ ] Reproduce results of SimCLR
+    - [ ] If not working => use voxceleb_trainer implem
+    - [ ] Add data augmentation
+    - [ ] Evaluate: add minDCF
+- [ ] Experiment with VICReg
 
-- [ ] Dataset: cache useful? do not store audio cache in checkpoints/model/
-- [ ] Refactor evaluation (choose type of classifier: random, surpervised)
-- [ ] Use dataclass and YAML for all configs
+---
 
+- [ ] Explain data preparation / reproduction + cite articles in README
+- [ ] Use dataclass and YAML for model configs
 - [ ] CPC/LIM: @tf.function warning when doing tensor[1, :]
-- [ ] Fix error end training saving history.npy
 - [ ] Fix warning loading weights not used
 - [ ] Create custom training loop (https://stackoverflow.com/questions/57971007/tensorflow-2-0-display-progress-bar-in-custom-training-loop)
 - [ ] Allow restore optimizer
diff --git a/cache_features.py b/cache_features.py
diff --git a/configs/cpc-base-kaldi.json b/configs/cpc-base-kaldi.json
diff --git a/configs/cpc-base-kaldi_boosted.json b/configs/cpc-base-kaldi_boosted.json
diff --git a/configs/cpc-base.json b/configs/cpc-base.json
@@ -0,0 +1,33 @@
+{
+    "name": "cpc-base",
+    "seed": 1717,
+    "encoder": {
+        "type": "CPC",
+        "encoded_dim": 512,
+        "weight_regularizer": 1e-4
+    },
+    "model": {
+        "type": "CPC",
+        "nb_timesteps_to_predict": 12,
+        "context_network": {
+            "type": "GRU",
+            "dim": 256,
+            "nb_layers": 1
+        },
+        "bidirectional": false,
+        "weight_regularizer": 1e-4
+    },
+    "training": {
+        "epochs": 50,
+        "batch_size": 64,
+        "learning_rate": 0.0001
+    },
+    "dataset": {
+        "sample_frequency": 16000,
+        "frame_length": 20480,
+        "max_samples": 1000,
+        "train": "./data/debug.scp",
+        "test": "./data/debug.scp",
+        "trials": "./data/voxceleb1_test/trials"
+    }
+}
diff --git a/configs/debug.json b/configs/debug.json
diff --git a/configs/moco-base-kaldi.json b/configs/moco-base-kaldi.json
diff --git a/configs/simclr-base.json b/configs/simclr-base.json
@@ -0,0 +1,29 @@
+{
+    "name": "simclr-base",
+    "seed": 1717,
+    "encoder": {
+        "type": "ThinResNet34",
+        "encoded_dim": 512,
+        "weight_regularizer": 1e-4
+    },
+    "model": {
+        "type": "SimCLR",
+        "channel_loss_factor": 0.1,
+        "weight_regularizer": 1e-4
+    },
+    "training": {
+        "epochs": 100,
+        "optimizer": {
+            "type": "Adam"
+        },
+        "batch_size": 256,
+        "learning_rate": 0.001
+    },
+    "dataset": {
+        "sample_frequency": 16000,
+        "frame_length": 20480,
+        "train": "./data/debug.scp",
+        "test": "./data/debug.scp",
+        "trials": "./data/voxceleb1_test/trials"
+    }
+}