Tiro Speech Core is a speech recognition server that implements a gRPC API. Support for REST is provided with through a gRPC REST gateway.
Tiro Speech Core uses the Bazel build system and will fetch most required
dependencies over the network. It's possible to use either Intel MKL or OpenBLAS
as the linear algebra library. MKL has to be downloaded from Intel and installed
in /opt/intel/mkl
. OpenBLAS is downloaded and compiled automatically.
The following command builds the server along with all its dependencies and run it with a pre-downloaded model (using MKL):
bazel run -c opt //:tiro_speech_server -- --kaldi-models=path/to/model/dir --listen-address=0.0.0.0:50051
To use OpenBLAS instead:
bazel run -c opt --@kaldi//:mathlib=openblas //:tiro_speech_server -- --kaldi-models=path/to/model/dir --listen-address=0.0.0.0:50051
To only build (note that opt
stands for an optimized compilation mode):
bazel build -c opt //:tiro_speech_server
The output binary should now be in bazel-bin/tiro_speech_server
.
Build and test with the example client:
bazel run -c opt //:tiro_speech_client -- $PWD/examples/is_is-mbl_01-2011-12-02T14:22:29.744483.wav $PWD/examples/config.pbtxt 0.0.0.0:50051
The example client also supports streaming recognition, with a long audio file
or from stdin, e.g. if sox
is installed we can capture audio from the default
microphone and transcribe it:
rec -q -r16k -c1 -esigned -traw - \
| bazel-bin/tiro_speech_client --streaming - $PWD/examples/config.pbtxt 0.0.0.0:50051 2>/dev/null
Build and run the REST gateway:
bazel run -c opt //rest-gateway/cmd:rest_gateway_server -- --endpoint=localhost:50051
The REST gateway server should now be running on port 8080.
Currently, Tiro Speech Core only supports the use of Kaldi chain models. To prepare a model for use with the server one needs to use the script tools/models/prepare_chain_dist.sh which has the usage:
Usage: tools/models/prepare_chain_dist.sh <output-dir>
--lang-dir <str|data/lang>
--ivector-extractor-dir <str|exp/nnet3/extractor>
--nnet-dir <str|exp/chain/tdnn_sp_bi>
--graph-dir <str|exp/chain/tdnn_sp_bi/graph>
--lang-code <str|> # BCP-47 language code for model
--description <str|> # short description of the model
--mfcc-config <str|conf/mfcc_hires.conf>
--fbank-config <str|> # Set either fbank-config or mfcc-config
--const-arpa <str|> # ConstArpa model for rescoring, empty string == no rescoring
--model-name <str|> # descriptive model name, used as a key
No. The service is available at speech.tiro.is:443
.
bazel run -c opt //:tiro_speech_client -- --use-ssl $PWD/examples/is_is-mbl_01-2011-12-02T14:22:29.744483.wav $PWD/examples/config.pbtxt speech.tiro.is:443
Tiro Speech Core uses OpenGram Thrax Grammer for formatting. The rules are located in src/itn/
.
The abbreviate
target compiles the grammar rules along with the mappings. This will result in a finite-state archive (.far) in bazel-bin/src/itn/
.
bazel build -c opt :abbreviate
We need to extract the FST. Create the folders models/graph
and models/norm
if needed. The following command will create ABBREVIATE.fst
which should be stored along with the model.
bazel run -c opt @openfst//:farextract -- --filename_prefix=$PWD/models/norm/ --filename_suffix=.fst --keys=ABBREVIATE $PWD/bazel-bin/src/itn/abbreviate.far
Finally, add the following flags with the appropriate path into the main.conf
model file:
--formatter.rewrite-fst=norm/ABBREVIATE.fst
Using curl
on Linux:
cat <<EOF | curl -XPOST https://speech.tiro.is/v1alpha/speech:recognize -d@-
{
"config": {
"languageCode": "is-IS",
"sampleRateHertz": "16000",
"encoding": "LINEAR16",
"maxAlternatives": 2,
"enableWordTimeOffsets": true
},
"audio": {
"content": "$(base64 -w0 < examples/is_is-mbl_01-2011-12-02T14:22:29.744483.wav)"
}
}
EOF
Which returns the following:
{
"results": [
{
"alternatives": [
{
"transcript": "gera lítið úr meintri spennu",
"confidence": 0,
"words": [
{
"startTime": "1.020s",
"endTime": "1.289s",
"word": "gera",
"confidence": 0
},
{
"startTime": "1.289s",
"endTime": "1.649s",
"word": "lítið",
"confidence": 0
},
{
"startTime": "1.650s",
"endTime": "1.799s",
"word": "úr",
"confidence": 0
},
{
"startTime": "1.799s",
"endTime": "2.129s",
"word": "meintri",
"confidence": 0
},
{
"startTime": "2.130s",
"endTime": "2.610s",
"word": "spennu",
"confidence": 0
}
]
},
{
"transcript": "gerir lítið úr meintri spennu",
"confidence": 0,
"words": []
}
]
}
]
}
See examples/python/README.md for a Python example.
To build a Docker image bundle containing two images: tiro-speech-server and tiro-speech-client:
bazel build -c opt :tiro_speech_images.tar
This can be loaded into Docker with:
docker load -i bazel-bin/tiro_speech_images.tar
and now you can run the server with Docker:
docker run -v $PATH_TO_MODEL:/model tiro-speech-server:latest --kaldi-models=/model [ARGS]
Enable the git hooks to automatically format source code:
git config core.hooksPath hooks
Tiro Speech Core is licensed under the Apache License, Version 2.0. See LICENSE for more details.
This project was funded by the Language Technology Programme for Icelandic 2019-2023. The programme, which is managed and coordinated by Almannarómur, is funded by the Icelandic Ministry of Education, Science and Culture.