Monolingual ASR model

Monolingual ASR model with full data

We trained a phoneme-based ASR model for each language of cv-lang10 with the same architecture that is based on a Conformer network consisting of 14 encoder blocks. The number of phonemes and training hours of the each language are in the following table.

Language	Language ID	# of phonemes	Train hours	Dev hours	Test hours
`English`	`en`	39	2227.3	27.2	27.0
`Spanish`	`es`	32	382.3	26.0	26.5
`French`	`fr`	33	823.4	25.0	25.4
`Italian`	`it`	30	271.5	24.7	26.0
`Kirghiz`	`ky`	32	32.7	2.1	2.2
`Dutch`	`nl`	39	70.2	13.8	13.9
`Russian`	`ru`	32	149.8	14.6	15.0
`Swedish`	`sv-SE`	33	29.8	5.5	6.2
`Turkish`	`tr`	41	61.5	10.1	11.4
`Tatar`	`tt`	31	20.8	3.0	5.7

%PER

Model Model size en es fr it ky nl ru sv-SE tr tt Avg.

Mono. phoneme 90 MB 7.39 2.47 4.93 2.87 2.23 4.60 2.72 18.69 6.00 10.54 6.11
%WER with 4-gram LM

Model Model size en es fr it ky nl ru sv-SE tr tt Avg.

Mono. phoneme 90 MB 10.59 7.91 15.58 9.26 1.03 8.84 1.62 8.37 8.46 9.75 8.14

Monolingual ASR model with low-resource data

For ablation study, the training data is divided into three scales to simulate different resource scenarios: 1 hour, 10 hours, and full data. Phoneme-based and subword-based models are both trained with this three scales of training data.

Language	Language ID	# of phonemes	# of subwords	Train hours	Dev hours	Test hours
`Indonesian`	`id`	35	500	20.8	3.7	4.1
`Polish`	`pl`	35	500	129.9	11.4	11.5

Phoneme-based

%PER

language 1 hour 10 hours full data

Indonesian 96.52 26.34 5.74

Polish 86.01 30.38 2.82
%WER with LM

language 1 hour 10 hours full data

Indonesian 100 9.54 3.28

Polish 99.98 13.86 4.97

Subword-based

%WER without LM

language 1 hour 10 hours full data

Indonesian 96.62 69.57 31.96

Polish 98.41 90.98 19.38
%WER with LM

language 1 hour 10 hours full data

Indonesian 96.42 49.67 10.85

Polish 98.38 59.43 7.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Monolingual ASR model

Monolingual ASR model with full data

Monolingual ASR model with low-resource data

Phoneme-based

Subword-based

language	1 hour	10 hours	full data
Indonesian	96.52	26.34	5.74
Polish	86.01	30.38	2.82

language	1 hour	10 hours	full data
Indonesian	100	9.54	3.28
Polish	99.98	13.86	4.97

language	1 hour	10 hours	full data
Indonesian	96.62	69.57	31.96
Polish	98.41	90.98	19.38

language	1 hour	10 hours	full data
Indonesian	96.42	49.67	10.85
Polish	98.38	59.43	7.12

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Monolingual ASR model

Monolingual ASR model with full data

Monolingual ASR model with low-resource data

Phoneme-based

Subword-based