This model is built upon Conformer
architecture and trained using the CTC
(Connectionist Temporal Classification) approach. The training dataset consists of 1 hour of Indonesian
speech data that is randomly selected from 20 hours Indonesian dataset sourced from the publicly available Common Voice
11.0
The script run.sh
contains the overall model training process.
- Follow the steps
data_prep.md
and rundata_prep.sh
to prepare the datset and word list for a given language. The second and fourth stages ofdata_prep.sh
involve language-specific special processing, which are detailed in thelang_process.md
. - The detailed model parameters are detailed in
config.json
andhyper-p.json
. Dataset paths should be added to themetainfo.json
for efficient management of datasets.
-
The training of this model utilized 1 NVIDIA GeForce RTX 3090 GPUs and took 10 hours.
- # of parameters (million): 89.98
- GPU info
- NVIDIA GeForce RTX 3090
- # of GPUs: 1
-
To train the model:
`bash run.sh id exp/Monolingual/id/Mono._subword_1h --sta 1 --sto 3`
-
To plot the training curves:
`python utils/plot_tb.py exp/Monolingual/id/Mono._subword_1h/log/tensorboard/file -o exp/Monolingual/id/Mono._subword_1h/monitor.png`
Monitor figure |
---|
-
To decode with CTC and calculate the %PER:
`bash run.sh id exp/Monolingual/id/Mono._subword_1h --sta 4 --sto 4`
test_id %SER 100.00 | %WER 96.62 [ 20952 / 21685, 0 ins, 18067 del, 2885 sub ]
-
For FST decoding,
config.json
andhyper-p.json
are needed to train language model. Notice the distinction between the profiles for training the ASR model and the profiles for training the language model, which have the same name but are in different directories. -
To decode with FST and calculate the %WER:
`bash run.sh id exp/Monolingual/id/Mono._subword_1h --mode subword --sta 5`
test_id_ac1.0_lm0.5_wip0.0.hyp %SER 100.00 | %WER 96.42 [ 20908 / 21685, 0 ins, 18067 del, 2841 sub ]
-
The files used to train this model and the trained model are available in the following table.
Word list Checkpoint model Language model Tensorboard log wordlist_id
Mono._subword_1h_best-3.pt
lm_id_4gram.arpa
tb_Mono._subword_1h_id