Skip to content

Commit

Permalink
Add docs build
Browse files Browse the repository at this point in the history
  • Loading branch information
Michael Hansen committed Jun 1, 2021
1 parent 1a7681e commit c5ada76
Show file tree
Hide file tree
Showing 45 changed files with 20,224 additions and 24 deletions.
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ __pycache__/
dist/
/etc/

docs/build/

coverage.xml
.coverage

Expand Down
33 changes: 11 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ read ɹ ˈi d

Note that "wound" and "read" have different pronunciations when used in different contexts.

Gruut includes a pre-trained U.S. English model with part-of-speech/tense aware pronunciations.
[Pre-trained models](https://github.com/rhasspy/gruut/releases/tag/v1.0.0) are also available for the [supported languages](#support-languages).
See [the documentation](https://rhasspy.github.io/gruut) for more details.

## Intended Audience

Expand All @@ -57,18 +56,16 @@ Some languages also include:

gruut currently supports:

* Czech (`cs-cz`)
* German (`de-de`)
* U.S. English (`en-us`)
* Supports part-of-speech aware pronunciations
* U.K. English (`en-gb`)
* Spanish (`es-es`)
* Czech (`cs`)
* German (`de`)
* English (`en`)
* Spanish (`es`)
* Farsi/Persian (`fa`)
* French (`fr-fr`)
* Italian (`it-it`)
* French (`fr`)
* Italian (`it`)
* Dutch (`nl`)
* Russian (`ru-ru`)
* Swedish (`sv-se`)
* Russian (`ru`)
* Swedish (`sv`)

The goal is to support all of [voice2json's languages](https://github.com/synesthesiam/voice2json-profiles#supported-languages)

Expand All @@ -90,20 +87,12 @@ The goal is to support all of [voice2json's languages](https://github.com/synest
$ pip install gruut
```

For Raspberry Pi (ARM), you will first need to [manually install phonetisaurus](https://github.com/rhasspy/phonetisaurus-pypi/releases).

## Language Download

[Pre-trained models](https://github.com/rhasspy/gruut/releases/tag/v0.8.0) for gruut can be downloaded with:
Additional languages can be added during installation. For example, with French and Italian support:

```sh
$ python3 -m gruut <LANGUAGE> download
$ pip install gruut[fr,it]
```

A U.S. English model is included in the distribution.

By default, models are stored in `$HOME/.config/gruut` (technically `$XDG_CONFIG_HOME/.gruut`). This can be overridden by passing a `--lang-dir` argument to all `gruut` commands.

## Command-Line Usage

The `gruut` module can be executed with `python3 -m gruut <LANGUAGE> <COMMAND> <ARGS>`
Expand Down
4 changes: 4 additions & 0 deletions docs/build/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 90a8d147d7bb8b949280ed46e74eb2cb
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file added docs/build/.doctrees/environment.pickle
Binary file not shown.
Binary file added docs/build/.doctrees/gruut.doctree
Binary file not shown.
Binary file added docs/build/.doctrees/index.doctree
Binary file not shown.
Binary file added docs/build/.doctrees/modules.doctree
Binary file not shown.
85 changes: 85 additions & 0 deletions docs/build/_sources/gruut.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
gruut package
=============

Submodules
----------

gruut.commands module
---------------------

.. automodule:: gruut.commands
:members:
:undoc-members:
:show-inheritance:

gruut.const module
------------------

.. automodule:: gruut.const
:members:
:undoc-members:
:show-inheritance:

gruut.g2p module
----------------

.. automodule:: gruut.g2p
:members:
:undoc-members:
:show-inheritance:

gruut.lang module
-----------------

.. automodule:: gruut.lang
:members:
:undoc-members:
:show-inheritance:

gruut.lexicon2db module
-----------------------

.. automodule:: gruut.lexicon2db
:members:
:undoc-members:
:show-inheritance:

gruut.phonemize module
----------------------

.. automodule:: gruut.phonemize
:members:
:undoc-members:
:show-inheritance:

gruut.pos module
----------------

.. automodule:: gruut.pos
:members:
:undoc-members:
:show-inheritance:

gruut.toksen module
-------------------

.. automodule:: gruut.toksen
:members:
:undoc-members:
:show-inheritance:

gruut.utils module
------------------

.. automodule:: gruut.utils
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: gruut
:members:
:undoc-members:
:show-inheritance:
119 changes: 119 additions & 0 deletions docs/build/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
.. gruut documentation master file
gruut
=====

A tokenizer and `IPA <https://en.wikipedia.org/wiki/International_Phonetic_Alphabet>`_ phonemizer for multiple human languages.

.. code-block:: python
from gruut import text_to_phonemes
text = 'He wound it around the wound, saying "I read it was $10 to read."'
for sent_idx, word, word_phonemes in text_to_phonemes(text, lang="en-us"):
print(word, *word_phonemes)
Output::

he h ˈi
wound w ˈaʊ n d
it ˈɪ t
around ɚ ˈaʊ n d
the ð ə
wound w ˈu n d
, |
saying s ˈeɪ ɪ ŋ
i ˈaɪ
read ɹ ˈɛ d
it ˈɪ t
was w ə z
ten t ˈɛ n
dollars d ˈɑ l ɚ z
to t ə
read ɹ ˈi d
. ‖


Installation
------------

To install gruut with U.S. English support only::

pip install gruut


Additional languages can be added during installation. For example, with French and Italian support::

pip install gruut[fr,it]


Supported Languages
^^^^^^^^^^^^^^^^^^^

* Czech (``cs``)
* German (``de``)
* English (``en``)
* Spanish (``es``)
* Farsi/Persian (``fa``)
* French (``fr``)
* Italian (``it``)
* Dutch (``nl``)
* Russian (``ru``)
* Swedish (``sv``)


Usage
-----

gruut performs two main functions: tokenization and phonemization.
The :py:meth:`gruut.text_to_phonemes` method performs both steps for you. See the :py:class:`~gruut.TextToPhonemesReturn` enum for ways to adjust the ``return_format``.

If you need more control, see the language-specific classes in :py:mod:`gruut.lang` as well as :py:class:`~gruut.toksen.RegexTokenizer` and :py:class:`~gruut.lang.SqlitePhonemizer`.

Tokenziation operates on text and does the following:

* Splits text into words by whitespace
* Expands user-defined abbreviations
* Breaks apart words and sentences further by punctuation (periods, commas, etc.)
* Drops empty/non-word tokens
* Expands numbers into words (100 -> one hundred)
* Applies upper/lower case filter
* Predicts part of speech tags (see :py:mod:`gruut.pos`)

Once tokenized, phonemization predicts the phonetic pronunciation for each word by:

* Looking up each word in an SQLite database
* Guessing the pronunciation with a pre-trained model (see :py:mod:`gruut.g2p`)

In cases where more than one pronunciation is possible for a word, the "best" pronunciation is:

* Specified by the user with word indexes enabled and a word of the form "word_N" where N is the 1-based pronunciation index
* Whichever pronunciation has the most compatible :ref:`features`.
* The first pronunciation


.. _features:

Features
^^^^^^^^

gruut tokens can contain arbitrary features. For now, only part of speech tags are implemented for English and French.

When determining the "best" pronunciation for a word, a phonemizer may consult these features. In English, for example, some word pronunciations in the lexicon contain "preferred" parts of speech. Words like "wind" may be pronounced different depending on their use as a verb or noun. If a token "wind" is predicted to be a noun during tokenization, then the pronunciation "w ˈɪ n d" is selected instead of "w ˈaɪ n d".

French uses part of speech tags differently. During the post-processing phase of phonemization, these features are used instead to add liasons between words. For example, in the sentence "J’ai des petites oreilles.", "petites" will be pronounced "p ə t i t z" instead of "p ə t i t".

.. toctree::
:maxdepth: 2
:caption: Contents:



Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
7 changes: 7 additions & 0 deletions docs/build/_sources/modules.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
gruut
=====

.. toctree::
:maxdepth: 4

gruut
Loading

0 comments on commit c5ada76

Please sign in to comment.