Releases: lukaszliniewicz/Pandrator
0.3
Changes
This update focuses on training again. It adds several preprocessing options that improve the quality of the dataset, and thus trained models, especially for languages other than English (which generally suffer from more artifacts).
In order to make use of it, please install Pandrator again (or download the biggest package again).
The new options for processing source audio include:
- trimming end silence,
- removal of breath sounds,
- fade-in and -out effect,
- discarding segments that still end abruptly, even after all preprocessing, to avoid "clicks" at the end of generated sentences.
If your source audio is professional (studio quality), don't use any preprocessing options except for trimming, fade and perhaps abrupt cut-off detection.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.
You can use the launcher to start Pandrator, update it and install new features.
Package | Contents | Unpacked Size |
---|---|---|
1 | Pandrator and Silero | 4GB |
2 | Pandrator and XTTS (CPU only) | 7GB |
3 | Pandrator and XTTS (Nvidia GPU Support) | 14GB |
4 | Pandrator, XTTS, RVC, WhisperX (for dubbing and training) and XTTS fine-tuning | 36GB |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.295
To update to this version, please download the installer executable and replace the previous version with it, install Pandrator again or download one of the packages. And don't forget to update Pandrator from within the launcher - WhisperX and EasyXTTSTrainer have also been updated.
Changes
This update:
- further refines the training workflow, focusing on fine-tuning Silero VAD parameters.
- addresses some issues with the installer, especially the update process,
- fixes the loading of dubbing sessions (it should be possible to load a session, select a previously transcribed/translated .srt file and the original video file, and regenerate some or all dubbing, and then easily create a new synchronised version without having to delete files manually.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.
You can use the launcher to start Pandrator, update it and install new features.
Package | Contents | Unpacked Size |
---|---|---|
1 | Pandrator and Silero | 4GB |
2 | Pandrator and XTTS (CPU only) | 7GB |
3 | Pandrator and XTTS (Nvidia GPU Support) | 14GB |
4 | Pandrator, XTTS, RVC, WhisperX (for dubbing and training) and XTTS fine-tuning | 36GB |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.29
This is a very small update that addresses several dependency-related bugs as well as improves the training workflow (specifically the segmentation and segment-refinement process of source audio). Because the launcher was updated as well, if you were affected, please replace the old launcher executable with this one and then update Pandrator.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.
You can use the launcher to start Pandrator, update it and install new features.
Package | Contents | Unpacked Size |
---|---|---|
1 | Pandrator and Silero | 4GB |
2 | Pandrator and XTTS (CPU only) | 7GB |
3 | Pandrator and XTTS (Nvidia GPU Support) | 14GB |
4 | Pandrator, XTTS, RVC, WhisperX (for dubbing and training) and XTTS fine-tuning | 36GB |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.28
This update includes several enhancements to the Easy XTTS Trainer, aimed at improving the quality of trained models and providing more control over the training process.
-
Improved Audio Segmentation: The trainer now identifies optimal split points between segments by locating the quietest points in the audio. This method results in cleaner transitions between segments, reducing the likelihood of abrupt cutoffs or the inclusion of fragments of the previous or next segment, which in turn improves the overall quality and naturalness of the synthesized speech and helps eliminate artifacts.
-
Integrated Audio Preprocessing: You can now apply the following audio processing steps directly within Pandrator as a part of the training workflow:
- Normalization: Normalize audio to a target LUFS value (default -16.0). Use
--normalize <value>
to specify a different target. - De-essing: Reduce sibilance with the
--dess
flag. - Noise Reduction: Apply DeepFilterNet noise reduction with
--denoise
. - Dynamic Range Compression: Use the
--compress
option with profiles formale
,female
, orneutral
voices. - Sample Rate Control: Use
--sample-rate
to explicitly set the sample rate (22050Hz or 44100Hz). 22050Hz is recommended.
- Normalization: Normalize audio to a target LUFS value (default -16.0). Use
-
Training Options:
- Training/Validation Split: The
--training-proportion
argument (e.g.,--training-proportion 8_2
) now controls the train/validation split ratio. - Segmentation Methods: The trainer supports three segmentation methods:
maximise-punctuation
,punctuation-only
, andmixed
. The--method-proportion
argument controls the ratio for themixed
method.
- Training/Validation Split: The
-
Pandrator Integration: Trained models and reference audio samples (two: a random one from the 10% longest segments and the fastest one from the 70% longest segments) are automatically made available in Pandrator for immediate generation, as in previous versions.
These changes provide more precise control over the training process and should result in higher-quality custom XTTS voices.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.
You can use the launcher to start Pandrator, update it and install new features.
Package | Contents | Unpacked Size |
---|---|---|
1 | Pandrator and Silero | 4GB |
2 | Pandrator and XTTS (CPU only) | 7GB |
3 | Pandrator and XTTS | 14GB |
4 | Pandrator, XTTS, RVC, WhisperX (for dubbing) and XTTS fine-tuning | 36GB |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.27
EDIT (28 Oct): There was a bug that prevented Pandrator from launching under certain circumstances. It has been fixed. If you were affected, please download the launcher from this release and use the update option.
This is a very small update. I added the possibility to crop PDFs before text extraction (to remove headers and footers) as well as to remove pages that are not needed for TTS (like the title page or the table of contents) using PyCropPDF:
python_gJvypGniog.mp4
You can use the Update option in the Launcher.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.
You can use the launcher to start Pandrator, update it and install new features.
Package | Contents | Unpacked Size |
---|---|---|
1 | Pandrator and Silero | 4GB |
2 | Pandrator and XTTS | 14GB |
3 | Pandrator, XTTS, RVC, WhisperX (for dubbing) and XTTS fine-tuning | 36GB |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.26
This release focuses on the installer. Chocolatey is used instead of winget, because of better Build Tools installation reliability, and the handling of starting the XTTS server has been improved. Hopefully this will solve the issue some people experienced with it not coming online when started from the launcher.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.
You can use the launcher to start Pandrator, update it and install new features.
Package | Contents | Unpacked Size |
---|---|---|
1 | Pandrator and Silero | 4GB |
2 | Pandrator and XTTS | 14GB |
3 | Pandrator, XTTS, RVC, WhisperX (for dubbing) and XTTS fine-tuning | 36GB |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.25
Changes
- Introduced marking sentences for regeneration and saving them as a list, either with a button, by pressing the "m" key or a right-click. This can be useful when generating a longer text - you can mark problematic sentences and work on them later (a right click will save both the currently playing sentence and the previous sentence, the "m" key will save just the current sentence - if you're not looking at the playlist when listening, it might be difficult to get the right sentence in time otherwise).
- Added downloading videos from YouTube (and other web sources) using yt-dlp (for the dubbing/subtitle/translation workflow),
- Refined the metadata options and handling,
- Small bug fixes and improvements.
Self-contained packages
I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can use the launcher to start Pandrator, update it and install new features, depending on the version of the package you downloaded.
Package | Contents | Unpacked Size | Link |
---|---|---|---|
1 | Pandrator and Silero | 4GB | Download |
2 | Pandrator and XTTS | 14GB | Download |
3 | Pandrator, XTTS, RVC, WhisperX (for dubbing) and XTTS fine-tuning | 36GB | Download |
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.2
Changes
- The UI takes the whole width of the screen now and consists of two parts - the settings on the left and the generated sentences player/editor on the right.
- Preprocessing of long files has been significantly sped up through parallelisation. This led to a 3-4x time reduction.
- Introduced metadata: the ability to set the album title, the artist, the genre and upload a cover image.
- Added support for
.m4b
. - Added support for chapter detection (at the moment only for epub files) and chapter markers in m4b files (if you want to have the smallest file possible, use opus - it performs very well for speech even at 16k!)
- Small improvements in the training workflow (a folder with reference samples is automatically created in the tts_voices folder when training finishes) and fixes for the RVC workflow.
Pre-Installed Packages
You may download self-contained packages that only require unpacking from here. You don't have to install anything, all components are included in portable conda environments. You may install additional components at any time using the launcher. But please remember to update Pandrator from the launcher.
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.15
Changes
Besides bug fixes and small UI improvements, I've added the ability to fine-tune (train) a custom XTTS model. It is very simple - just select a file or a folder with multiple audio files, give the model a name and training will be performed fully automatically. The trained model will appear in the "XTTS Model" dropdown in the GUI after clicking on "Connect to server". An Nvidia GPU with at least 8 GB of VRAM is required. As little as 10m of audio is enough to improve voice cloning results vs zero-shot significantly, though I recommend at least 30m. You may experiment with increasing the number of epochs and gradient accumulation layers. When using a custom model, you still have to provide a voice file. You may upload one of the segments produced from the source audio (they are located in Pandrator/easy_xtts_trainer/<model_name>/audio_sources/processed
. Training models requires installing a tool through the launcher (if you have an existing installation, just download the newest launcher executable, put it in the same folder as the Pandrator folder, and install it).
Pre-Installed Packages
You may download self-contained packages that only require unpacking from here. You don't have to install anything, all components are included in portable conda environments. You may install additional components at any time using the launcher.
Installer
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.
0.1
In this release, I've:
- fixed splitting of Chinese and Japanese sentences,
- added the option to regenerate all sentences,
- changed the RVC implementation to RVC Python and added it to the installer as an optional tool (RVC model files are now kept in the
rvc_models
folder inside the Pandrator folder, each in its own directory; when uploading RVC models through the UI, please make sure that the .pth and .index files have the same name), - completely reworked the dubbing workflow by offloading most of it to a separate cli app, Subdub, which I made for this purpose. It is installed together with Pandrator when using the installer script or executable. It is now possible to select a video file, transcribe it (using WhisperX), translate the subtitles (using LLMs, including proprietary ones, or the DeepL api, which is free up to 500,000 characters a month), generate speech using the standard Pandrator workflow, mix the dubbing audio with the original soundtrack and save it to the video; it's also possible to load an .srt file and a video if transcription is not necessary,
- added logging to a file and log preview in the UI,
- made Pandrator connect automatically to the chosen TTS engine if opened through the launcher,
- improved the UI a little.
You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py
file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.