Releases: DrewThomasson/ebook2audiobook
Releases · DrewThomasson/ebook2audiobook
V2.0: Tons of improvements and support for 1,107+ languages! 🤯
- New Improved v2.0 gui
- Easy access to fine-tuned models
- Loading bar actually works 🤯
- Support for 1,107+ languages! 🤯
- Single run/installer script Mac windows and linux locally
- THANK YOU @ROBERT-MCDOWELL
What's Changed
- Major update version 2.0.0 by @ROBERT-MCDOWELL in #35
- pull attempty into v2.0 by @DrewThomasson in #43
- swapped download_xttsv2_model with existing download_and_extract by @DrewThomasson in #44
- PR#1 by @ROBERT-MCDOWELL in #45
- V2.0 base model downloader patch by @DrewThomasson in #46
- Update README.md by @ROBERT-MCDOWELL in #48
- Update colab_ebook2audiobookxtts.ipynb by @pafend in #49
- renamed split_long_sentence to get_sentences, added punctuation for each language into language_mapping, unused code removed by @ROBERT-MCDOWELL in #50
- Added 1162 languages, removed unused code by @ROBERT-MCDOWELL in #51
- Last PR before merge to main by @ROBERT-MCDOWELL in #57
- Added around 57 more test_ebooks as well as a ebook generator by @DrewThomasson in #58
- Add disclaimer to README about DRM and legal use by @DrewThomasson in #61
- delete .DS_Store files by @DrewThomasson in #62
- added Fairseq supported language list to english readme by @DrewThomasson in #63
- Major commit by @ROBERT-MCDOWELL in #64
- more fixes by @ROBERT-MCDOWELL in #65
- rebuild test files, new tools folder, various typo fixes by @ROBERT-MCDOWELL in #66
- Rebuild voices folder tree by @ROBERT-MCDOWELL in #67
- regenerate test files, various bug fixes, new resume process implementation by @ROBERT-MCDOWELL in #70
- double quotes to simple quotes normalization by @ROBERT-MCDOWELL in #71
- various fixes by @ROBERT-MCDOWELL in #73
- more fixes by @ROBERT-MCDOWELL in #74
- Multiprocessing, multithread, multiuser ready, various changes and fixes by @ROBERT-MCDOWELL in #75
- fix model loading, removed unused conf model options by @ROBERT-MCDOWELL in #77
- Minor spelling and punctuation corrections by @Bynanaa in #79
- fix chapter audio and settences order, device gpu to cuda by @ROBERT-MCDOWELL in #81
- added voice actor voices by @DrewThomasson in #80
- implementation of fine-tuned and various fixes by @ROBERT-MCDOWELL in #82
- fixed fine-tuned dropdown by @ROBERT-MCDOWELL in #83
- fixed chapter combien audio bug. fine-tuned model cache still to solve by @ROBERT-MCDOWELL in #84
- Added discord server link to readme by @DrewThomasson in #86
- fixed fine-tuned, custom modal upload back and working(?) by @ROBERT-MCDOWELL in #87
- added ref.wav in custom model upload by @ROBERT-MCDOWELL in #88
- custom_model should work now by @ROBERT-MCDOWELL in #89
- v2.0 updated gui demo gif by @DrewThomasson in #90
- various fixes by @ROBERT-MCDOWELL in #91
- added convert to gif script to make it easier to create a new gif for readme by @DrewThomasson in #92
- V2.0 update readme improved and added assets folder by @DrewThomasson in #93
- added new options in conf.py, custom_model still in dev by @ROBERT-MCDOWELL in #94
- updated custom_model now managed by session, fixed various bugs by @ROBERT-MCDOWELL in #97
- fix conf inversion, added BobRoss in fine-tuned by @ROBERT-MCDOWELL in #98
- added fine tuned models in conf.py, renamed some by @ROBERT-MCDOWELL in #99
- optimize audio ffmpeg cmd by @ROBERT-MCDOWELL in #100
- Various fixes by @ROBERT-MCDOWELL in #101
- fixed crlf to lf unix on app.py, various other important fixes by @ROBERT-MCDOWELL in #104
- removed old code by @ROBERT-MCDOWELL in #105
- V2.0 update readme and Mac launcher by @DrewThomasson in #106
- fix convert_btn and more... by @ROBERT-MCDOWELL in #108
- fixed f-string errors by @ROBERT-MCDOWELL in #109
- optimizing audio presence by @ROBERT-MCDOWELL in #110
- varioux fixes by @ROBERT-MCDOWELL in #112
- V2.0: Tons of improvements and support for 1,107+ languages! 🤯 by @DrewThomasson in #111
New Contributors
Full Changelog: 1.2.1...2.0
V1.2.1
Fixed custom model loading issue.
What's Changed
- chinese readme by @WUYIN66 in #25
- Installation with pip in edit mode by @ROBERT-MCDOWELL in #26
- Revert "Installation with pip in edit mode" by @DrewThomasson in #27
- Merge new Kaggel additions by @DrewThomasson in #29
New Contributors
- @WUYIN66 made their first contribution in #25
- @ROBERT-MCDOWELL made their first contribution in #26
- @DrewThomasson made their first contribution in #27
Full Changelog: 1.2...1.2.1
V1.2
New and improved App
-
Single app that runs in gui or headless mode
-
Fixed Sentence splitting for all 16 languages
New and Improved Web GUI
Added these parameters for headless mode:
usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE]
[--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL]
[--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG]
[--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL]
[--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY]
[--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P]
[--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING]
Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the
Gradio interface or run the script in headless mode for direct conversion.
options:
-h, --help show this help message and exit
--share SHARE Set to True to enable a public shareable Gradio link. Defaults
to False.
--headless HEADLESS Set to True to run in headless mode without the Gradio
interface. Defaults to False.
--ebook EBOOK Path to the ebook file for conversion. Required in headless
mode.
--voice VOICE Path to the target voice file for TTS. Optional, uses a default
voice if not provided.
--language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de,
it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to
English (en).
--use_custom_model USE_CUSTOM_MODEL
Set to True to use a custom TTS model. Defaults to False. Must
be True to use custom models, otherwise you'll get an error.
--custom_model CUSTOM_MODEL
Path to the custom model file (.pth). Required if using a custom
model.
--custom_config CUSTOM_CONFIG
Path to the custom config file (config.json). Required if using
a custom model.
--custom_vocab CUSTOM_VOCAB
Path to the custom vocab file (vocab.json). Required if using a
custom model.
--custom_model_url CUSTOM_MODEL_URL
URL to download the custom model as a zip file. Optional, but
will be used if provided. Examples include David Attenborough's
model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor
ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr
ue'. More XTTS fine-tunes can be found on my Hugging Face at
'https://huggingface.co/drewThomasson'.
--temperature TEMPERATURE
Temperature for the model. Defaults to 0.65. Higher Tempatures
will lead to more creative outputs IE: more Hallucinations.
Lower Tempatures will be more monotone outputs IE: less
Hallucinations.
--length_penalty LENGTH_PENALTY
A length penalty applied to the autoregressive decoder. Defaults
to 1.0.
--repetition_penalty REPETITION_PENALTY
A penalty that prevents the autoregressive decoder from
repeating itself. Defaults to 2.0.
--top_k TOP_K Top-k sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 50.
--top_p TOP_P Top-p sampling. Lower values mean more likely outputs and
increased audio generation speed. Defaults to 0.8.
--speed SPEED Speed factor for the speech generation. IE: How fast the
Narrerator will speak. Defaults to 1.0.
--enable_text_splitting ENABLE_TEXT_SPLITTING
Enable splitting text into sentences. Defaults to True.
Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice
--language en --use_custom_model True --custom_model model.pth --custom_config
config.json --custom_vocab vocab.json
What's Changed
New Contributors
Full Changelog: 1.1...1.2