Skip to content

0.28

Compare
Choose a tag to compare
@lukaszliniewicz lukaszliniewicz released this 02 Nov 05:08
· 7 commits to main since this release
3516f9d

This update includes several enhancements to the Easy XTTS Trainer, aimed at improving the quality of trained models and providing more control over the training process.

python_ROFgHz97wb

  • Improved Audio Segmentation: The trainer now identifies optimal split points between segments by locating the quietest points in the audio. This method results in cleaner transitions between segments, reducing the likelihood of abrupt cutoffs or the inclusion of fragments of the previous or next segment, which in turn improves the overall quality and naturalness of the synthesized speech and helps eliminate artifacts.

  • Integrated Audio Preprocessing: You can now apply the following audio processing steps directly within Pandrator as a part of the training workflow:

    • Normalization: Normalize audio to a target LUFS value (default -16.0). Use --normalize <value> to specify a different target.
    • De-essing: Reduce sibilance with the --dess flag.
    • Noise Reduction: Apply DeepFilterNet noise reduction with --denoise.
    • Dynamic Range Compression: Use the --compress option with profiles for male, female, or neutral voices.
    • Sample Rate Control: Use --sample-rate to explicitly set the sample rate (22050Hz or 44100Hz). 22050Hz is recommended.
  • Training Options:

    • Training/Validation Split: The --training-proportion argument (e.g., --training-proportion 8_2) now controls the train/validation split ratio.
    • Segmentation Methods: The trainer supports three segmentation methods: maximise-punctuation, punctuation-only, and mixed. The --method-proportion argument controls the ratio for the mixed method.
  • Pandrator Integration: Trained models and reference audio samples (two: a random one from the 10% longest segments and the fastest one from the 70% longest segments) are automatically made available in Pandrator for immediate generation, as in previous versions.

These changes provide more precise control over the training process and should result in higher-quality custom XTTS voices.

Self-contained packages

I've prepared packages (archives) that you can simply unpack - everything is preinstalled in its own portable conda environment. You can download them from here.

You can use the launcher to start Pandrator, update it and install new features.

Package Contents Unpacked Size
1 Pandrator and Silero 4GB
2 Pandrator and XTTS (CPU only) 7GB
3 Pandrator and XTTS 14GB
4 Pandrator, XTTS, RVC, WhisperX (for dubbing) and XTTS fine-tuning 36GB

Installer

You may use the installer/launcher below, which was created from the pandrator_installer_launcher.py file in the repository, or use the source file directly. Please remember to run the executable as an administrator. It's possible that Windows or your antivirus software will flag it as a threat. You may whitelist it, or, if you're not comfortable doing that, review the code in the repository and install Pandrator manually.