Skip to content

API ‐ Standard TTS Generation API

erew123 edited this page Sep 30, 2024 · 5 revisions

This endpoint allows you to generate Text-to-Speech (TTS) audio based on text input. It supports both character and narrator speech generation.

Endpoint Details

  • URL: http://{ipaddress}:{port}/api/tts-generate
  • Method: POST
  • Content-Type: application/x-www-form-urlencoded

Request Parameters

Parameter Type Description
text_input string The text you want the TTS engine to produce.
text_filtering string Filter for text. Options: none, standard, html
character_voice_gen string The name of the character's voice file (WAV format).
rvccharacter_voice_gen string The name of the RVC voice file for the character. Format: folder\file.pth or Disabled
rvccharacter_pitch integer The pitch for the RVC voice for the character. Range: -24 to 24
narrator_enabled boolean Enable or disable the narrator function.
narrator_voice_gen string The name of the narrator's voice file (WAV format).
rvcnarrator_voice_gen string The name of the RVC voice file for the narrator. Format: folder\file.pth or Disabled
rvcnarrator_pitch integer The pitch for the RVC voice for the narrator. Range: -24 to 24
text_not_inside string Specify handling of lines not inside quotes or asterisks. Options: character, narrator, silent
language string Choose the language for TTS. (See supported languages below)
output_file_name string The name of the output file (excluding the .wav extension).
output_file_timestamp boolean Add a timestamp to the output file name.
autoplay boolean Enable or disable playing the generated TTS to your standard sound output device.
autoplay_volume float Set the autoplay volume. Range: 0.1 to 1.0
speed float Set the speed of the generated audio. Range: 0.25 to 2.0
pitch integer Set the pitch of the generated audio. Range: -10 to 10
temperature float Set the temperature for the TTS engine. Range: 0.1 to 1.0
repetition_penalty float Set the repetition penalty for the TTS engine. Range: 1.0 to 20.0

Supported Languages

Code Language
ar Arabic
zh-cn Chinese (Simplified)
cs Czech
nl Dutch
en English
fr French
de German
hi Hindi (limited support)
hu Hungarian
it Italian
ja Japanese
ko Korean
pl Polish
pt Portuguese
ru Russian
es Spanish
tr Turkish

Example Requests

Standard TTS Speech Example

Generate a time-stamped file for standard text and play the audio at the command prompt/terminal:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" \
     -d "text_input=All of this is text spoken by the character. This is text not inside quotes, though that doesnt matter in the slightest" \
     -d "text_filtering=standard" \
     -d "character_voice_gen=female_01.wav" \
     -d "narrator_enabled=false" \
     -d "narrator_voice_gen=male_01.wav" \
     -d "text_not_inside=character" \
     -d "language=en" \
     -d "output_file_name=myoutputfile" \
     -d "output_file_timestamp=true" \
     -d "autoplay=false" \
     -d "autoplay_volume=0.8"

Narrator Example

Generate a time-stamped file for text with narrator and character speech and play the audio at the command prompt/terminal:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" \
     -d "text_input=*This is text spoken by the narrator* \"This is text spoken by the character\". This is text not inside quotes." \
     -d "text_filtering=standard" \
     -d "character_voice_gen=female_01.wav" \
     -d "narrator_enabled=true" \
     -d "narrator_voice_gen=male_01.wav" \
     -d "text_not_inside=character" \
     -d "language=en" \
     -d "output_file_name=myoutputfile" \
     -d "output_file_timestamp=true" \
     -d "autoplay=false" \
     -d "autoplay_volume=0.8"

Note: If your text contains double quotes, escape them with \" (see the narrator example).

Minimal Request Example

You can send a request with any mix of settings you wish. Missing fields will be populated using default API Global settings and default TTS engine settings:

curl -X POST "http://127.0.0.1:7851/api/tts-generate" \
     -d "text_input=All of this is text spoken by the character. This is text not inside quotes, though that doesnt matter in the slightest"

Response

The API returns a JSON object with the following properties:

Property Description
status Indicates whether the generation was successful (generate-success) or failed (generate-failure).
output_file_path The on-disk location of the generated WAV file.
output_file_url The HTTP location for accessing the generated WAV file for browser playback.
output_cache_url The HTTP location for accessing the generated WAV file as a pushed download.

Example response:

{
    "status": "generate-success",
    "output_file_path": "C:\\text-generation-webui\\extensions\\alltalk_tts\\outputs\\myoutputfile_1704141936.wav",
    "output_file_url": "/audio/myoutputfile_1704141936.wav",
    "output_cache_url": "/audiocache/myoutputfile_1704141936.wav"
}

Note: The response no longer includes the IP address and port number. You will need to add these in your own software/extension.

Additional Notes

  • All global settings for the API endpoint can be configured within the AllTalk interface under Global Settings > AllTalk API Defaults.
  • TTS engine-specific settings, such as voices to use or engine parameters, can be set on an engine-by-engine basis in TTS Engine Settings > TTS Engine of your choice.
  • Although you can send all variables/settings, the loaded TTS engine will only support them if it is capable. For example, you can request a TTS generation in Russian, but if the TTS model that is loaded only supports English, it will only generate English-sounding text-to-speech.
  • Voices sent in the request have to match the voices available within the TTS engine loaded. Generation requests where the voices don't match will result in nothing being generated and possibly an error message.
Clone this wiki locally