Releases: se-asr/model
Swedish ASR Model
This is a DeepSpeech model trained for Swedish ASR. It is based on DeepSpeech v0.6.1
. It has been fine-tuned on the NST Acoustic Database for Swedish (roughly 350 hours of training data), using the official English DeepSpeech model as a foundation. In our thesis, this model is referred to as the general Swedish model.
Using the model
In order to use the model (not training it), start by installing the DeepSpeech client deepspeech
:
$ pip3 install deepspeech
Then download the model graph output_graph.pb, the language model trie and binary.
You can then transcribe an audio file (in .wav
file format, mono channel, 16 KHz, 16 bit PCM), using the following command:
$ deepspeech --model output_graph.pb --lm lm.binary --trie lm.trie --audio some-audio-file.wav
For better performance in production use cases, consider converting the output_graph.pb
into a memory mapped format, or to a TFLite format. See the official docs for instructions.
(please note that the model performs rather poorly on low quality audio and audio with a lot of background noise)
Hyperparameters used to train model:
train_batch_size
64
dev_batch_size
64
test_batch_size
64
n_hidden
2048
learning_rate
0.0001
dropout_rate
0.30
lm_alpha
0.75
lm_beta
1.85
If you would like to do transfer learning, simply download the checkpoint, named model-checkpoint.tar.gz. A correct alphabet is also required, and the used to train this model is found in alphabet.txt.
Language model
A 5-grams KenLM language model is attached to this release. The trie is located in lm.trie.tar.gz, and the binary is located in the parts lm.binary.tar.gz.part0*
. In order to merge the parts of the language model binary, simply use cat
:
$ cat lm.binary.tar.gz.part0* > lm.binary.tar.gz
Pro tip: Use pigz
rather than gzip
when un-compressing, this will make sure all the cores of your CPU gets some exercise 😄