Kokoro based TTS Extension for obabooga text gereration webui

License

This project is licensed under the MIT License and is based on the Original Kokoro 82M Interferance Code.

The Model weights are NOT under the MIT License and are under the Apache 2.0 License. The model wights will be directly downloaded from the Huggingface.

Installation

You need to install espeak and ffmpeg.

You can install the required python packages by running:

.\cmd_windows.bat
pip install -r extensions\KokoroTtsTextGernerationWebui\requirements.txt

./cmd_linux.sh
pip install -r extensions/KokoroTtsTextGernerationWebui/requirements.txt

Features

Kokoro is limited to 510 tokens per input. This extension allows you to generate longer texts by splitting the input into multiple parts and concatenating the outputs.

The following methods for this are available:

Split by Sentance - The Input is split into Chunks of Sentances that are less than 510 tokens.
Split by Word - The Input is split into Chunks of Words that are less than 510 tokens.

The first method is recommended as it will keep the context of the text and results in better output quality.

Multiple GPU

If you have multiple GPUs, the first one will be used by default. You can change that in src/generate.py by setting the device variable to the desired GPU.

Roadmap

Implement the extension
Support all OS
Voice selection

Contributing

If you want to contribute to this project, feel free to create a pull request. The code is not perfect and can be improved in many ways.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Kokoro based TTS Extension for obabooga text gereration webui

License

Installation

Features

Multiple GPU

Roadmap

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Kokoro based TTS Extension for obabooga text gereration webui

License

Installation

Features

Multiple GPU

Roadmap

Contributing