Skip to content

Latest commit

 

History

History
60 lines (51 loc) · 4.12 KB

readme.md

File metadata and controls

60 lines (51 loc) · 4.12 KB

Monolingual ASR model

Monolingual ASR model with full data

We trained a phoneme-based ASR model for each language of cv-lang10 with the same architecture that is based on a Conformer network consisting of 14 encoder blocks. The number of phonemes and training hours of the each language are in the following table.

Language Language ID # of phonemes Train hours Dev hours Test hours
English en 39 2227.3 27.2 27.0
Spanish es 32 382.3 26.0 26.5
French fr 33 823.4 25.0 25.4
Italian it 30 271.5 24.7 26.0
Kirghiz ky 32 32.7 2.1 2.2
Dutch nl 39 70.2 13.8 13.9
Russian ru 32 149.8 14.6 15.0
Swedish sv-SE 33 29.8 5.5 6.2
Turkish tr 41 61.5 10.1 11.4
Tatar tt 31 20.8 3.0 5.7

Monolingual ASR model with low-resource data

For ablation study, the training data is divided into three scales to simulate different resource scenarios: 1 hour, 10 hours, and full data. Phoneme-based and subword-based models are both trained with this three scales of training data.

Language Language ID # of phonemes # of subwords Train hours Dev hours Test hours
Indonesian id 35 500 20.8 3.7 4.1
Polish pl 35 500 129.9 11.4 11.5

Phoneme-based

Subword-based