Skip to content
This repository has been archived by the owner on May 18, 2021. It is now read-only.

Latest commit

 

History

History
99 lines (75 loc) · 3.11 KB

README.md

File metadata and controls

99 lines (75 loc) · 3.11 KB

Speech-to-Text Service Comparison

The scripts in this repository are used for testing online speech recognition services for the Japanese language.

Each service comes with API credentials and some services require installation of SDK or library. Refer to the individual services documentation for this.

Note that there exists a Python package to do some of this https://pypi.org/project/SpeechRecognition/, but I've decided to go through the process of figuring it out myself to better understand the inner-workings of each service.

Install

$ git clone https://github.com/ray-hrst/stt.git
$ cd stt/
$ git submodule init
$ git submodule update

Setup

Each script uses it's own Python virtual environment. Python package requirements for each enviroment can be found in the requirements folder. Auto-generate the virtual environment with:

$ ./setup.sh XXX

where XXX can be:

  • google
  • ibm
  • wit
  • fuetrek

For example:

$ ./setup.sh google

Usage

First start the appropriate Python virtual environment:

$ . venv/XXX/bin/activate

Then run the script. Note that all scripts are located in the scripts directory.

$ python stt_XXX.py /path/to/audio/sample.wav

where XXX can be:

  • google
  • ibm
  • wit
  • fuetrek

Resources

The resources folder contains voice samples and transcripts from various online sources. They can be used as input for each of the scripts.

Online Services

The following services were tested.

Notes

  • Easy to setup only after you figure out how to turn on the service; Google's Cloud management system (which is complicated).
  • Provides the most fastest and reliable service

Notes

  • Straightforward to use with good documentation and out of the box working examples

Notes

  • Free: https://wit.ai/faq
  • Straightforward to use with good documentation and out of the box working examples
  • Unreliable service

Notes

  • TBD

Notes

  • Documentation is outdated
  • I can't get it to work; examples don't work out of the box; difficult to setup
  • Nobody from Nuance seems to be supporting the forum
  • Code only works with Python 2

Voice-Sample Resources

TODO

  • Use metadata
  • Compare against text