Skip to content

Commit

Permalink
update readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
yxlllc committed May 13, 2023
1 parent e716efb commit e299ac2
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 10 deletions.
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Language: **English** [简体中文](./cn_README.md) [한국어](./ko_README.md)
</div>
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing).

## (3.0 - Experimental) Shallow diffusion model (DDSP + Diff-SVC refactor version)
## (3.0 - Update) Shallow diffusion model (DDSP + Diff-SVC refactor version)
![Diagram](diagram.png)

Data preparation, configuring the pre-trained encoder (hubert or contentvec ) and vocoder (nsf-hifigan) is the same as training a pure DDSP model.
Expand All @@ -32,7 +32,7 @@ python train_diff.py -c configs/diffusion.yaml
```bash
python train.py -c configs/combsub.yaml
```
As mentioned above, re-preprocessing is not required, but please check whether the parameters of `combsub.yaml` and `diffusion.yaml` match. The number of speakers 'n_spk' can be inconsistent, but try to use the same number to represent the same speaker (this makes inference easier).
As mentioned above, re-preprocessing is not required, but please check whether the parameters of `combsub.yaml` and `diffusion.yaml` match. The number of speakers 'n_spk' can be inconsistent, but try to use the same id to represent the same speaker (this makes inference easier).

(4) Non-real-time inference:
```bash
Expand All @@ -41,6 +41,8 @@ python main_diff.py -i <input.wav> -ddsp <ddsp_ckpt.pt> -diff <diff_ckpt.pt> -o

'speedup' is the acceleration speed, 'method' is 'pndm' or 'dpm-solver', 'kstep' is the number of shallow diffusion steps, 'diffid' is the speaker id of the diffusion model, and other parameters have the same meaning as `main.py`.

A reasonable 'kstep' is about 100~300. There may be a perceived loss of sound quality when ‘speedup’ exceeds 20.

If the same id has been used to represent the same speaker during training, '-diffid' can be empty, otherwise the '-diffid' option needs to be specified.

If '-ddsp' is empty, the pure diffusion model is used, at this time, shallow diffusion is performed with the mel of the input source, and if further '-kstep' is empty, full-depth Gaussian diffusion is performed.
Expand All @@ -55,11 +57,13 @@ python gui_diff.py
## 0. Introduction
DDSP-SVC is a new open source singing voice conversion project dedicated to the development of free AI voice changer software that can be popularized on personal computers.

Compared with the more famous [Diff-SVC](https://github.com/prophesier/diff-svc) and [SO-VITS-SVC](https://github.com/svc-develop-team/so-vits-svc), its training and synthesis have much lower requirements for computer hardware, and the training time can be shortened by orders of magnitude. In addition, when performing voice change in real-time, the hardware resources of this project are significantly lower than SO-VITS-SVC, and Diff-SVC is too slow to perform voice change in real-time.
Compared with the famous [SO-VITS-SVC](https://github.com/svc-develop-team/so-vits-svc), its training and synthesis have much lower requirements for computer hardware, and the training time can be shortened by orders of magnitude, which is close to the training speed of [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI).

In addition, when performing real-time voice changing, the hardware resource consumption of this project is significantly lower than that of SO-VITS-SVC and RVC, and a lower delay can be achieved by tuning parameters on the same hardware configuration.

Although the original synthesis quality of DDSP is not ideal (the original output can be heard in tensorboard while training), after using the pre-trained vocoder-based enhancer, the sound quality for some dateset can reach a level close to SO-VITS-SVC.
Although the original synthesis quality of DDSP is not ideal (the original output can be heard in tensorboard while training), after enhancing the sound quality with a pre-trained vocoder based enhancer (old version) or with a shallow diffusion model (new version) , for some data sets, it can achieve the synthesis quality no less than SOVITS-SVC and RVC. The demo outputs are in the `samples` folder, and the related model checkpoint can be downloaded from the release page.

If the quality of the training data is very high, probably still Diff-SVC will have the highest sound quality. The demo outputs are in the `samples` folder, and the related model checkpoint can be downloaded from the release page.
The old version models are still compatible, the following chapters are the instructions for the old version. Some operations of the new version are the same, see the previous chapter.

Disclaimer: Please make sure to only train DDSP-SVC models with **legally obtained authorized data**, and do not use these models and any audio they synthesize for illegal purposes. The author of this repository is not responsible for any infringement, fraud and other illegal acts caused by the use of these model checkpoints and audio.

Expand Down Expand Up @@ -199,4 +203,7 @@ Update: A splicing algorithm based on a phase vocoder is now added, but in most
* [ddsp](https://github.com/magenta/ddsp)
* [pc-ddsp](https://github.com/yxlllc/pc-ddsp)
* [soft-vc](https://github.com/bshall/soft-vc)
* [ContentVec](https://github.com/auspicious3000/contentvec)
* [DiffSinger (OpenVPI version)](https://github.com/openvpi/DiffSinger)
* [Diff-SVC](https://github.com/prophesier/diff-svc)

18 changes: 13 additions & 5 deletions cn_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Language: [English](./README.md) **简体中文**
</div>
基于 DDSP(可微分数字信号处理)的实时端到端歌声转换系统

## (3.0 - 实验性)浅扩散模型 (DDSP + Diff-SVC 重构版)
## (3.0 升级)浅扩散模型 (DDSP + Diff-SVC 重构版)
![Diagram](diagram.png)

数据准备,配置编码器(hubert 或者 contentvec ) 与声码器 (nsf-hifigan) 的环节与训练纯 DDSP 模型相同。
Expand Down Expand Up @@ -38,7 +38,9 @@ python train.py -c configs/combsub.yaml
```bash
python main_diff.py -i <input.wav> -ddsp <ddsp_ckpt.pt> -diff <diff_ckpt.pt> -o <output.wav> -k <keychange (semitones)> -id <speaker_id> -diffid <diffusion_speaker_id> -speedup <speedup> -method <method> -kstep <kstep>
```
speedup 为加速倍速,method 为 pndm 或者 dpm-solver, kstep为浅扩散步数,diffid 为扩散模型的说话人id,其他参数与 main.py 含义相同。
speedup 为加速倍速,method 为 pndm 或者 dpm-solver, kstep 为浅扩散步数,diffid 为扩散模型的说话人id,其他参数与 main.py 含义相同。

合理的 kstep 约为 100~300,speedup 超过 20 时可能将感知到音质损失。

如果训练时已经用相同的编号表示相同的说话人,则 -diffid 可以为空,否则需要指定 -diffid 选项。

Expand All @@ -54,18 +56,22 @@ python gui_diff.py
## 0.简介
DDSP-SVC 是一个新的开源歌声转换项目,致力于开发可以在个人电脑上普及的自由 AI 变声器软件。

相比于比较著名的 [Diff-SVC](https://github.com/prophesier/diff-svc)[SO-VITS-SVC](https://github.com/svc-develop-team/so-vits-svc), 它训练和合成对电脑硬件的要求要低的多,并且训练时长有数量级的缩短。另外在进行实时变声时,本项目的硬件资源显著低于 SO-VITS-SVC,而 Diff-SVC 合成太慢几乎无法进行实时变声。
相比于著名的 [SO-VITS-SVC](https://github.com/svc-develop-team/so-vits-svc), 它训练和合成对电脑硬件的要求要低的多,并且训练时长有数量级的缩短,和 [RVC](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) 的训练速度接近。

另外在进行实时变声时,本项目的硬件资源消耗显著低于 SO-VITS-SVC 和 RVC,在相同的硬件配置上经过调参可以达到更低的延迟。

虽然 DDSP 的原始合成质量不是很理想(训练时在 tensorboard 中可以听到原始输出),但在使用基于预训练声码器的增强器增强音质后,对于部分数据集可以达到接近 SOVITS-SVC 的合成质量。
虽然 DDSP 的原始合成质量不是很理想(训练时在 tensorboard 中可以听到原始输出),但在使用基于预训练声码器的增强器(老版本)或使用浅扩散模型(新版本)增强音质后,对于部分数据集可以达到不亚于 SOVITS-SVC 和 RVC 的合成质量。在`samples`文件夹中包含一个合成示例,相关模型检查点可以从仓库发布页面下载

如果训练数据的质量非常高,可能仍然 Diff-SVC 将拥有最高的合成质量。在`samples`文件夹中包含合成示例,相关模型检查点可以从仓库发布页面下载
老版本的模型仍然兼容的,以下章节是老版本的使用说明。新版本部分操作是相同的,见上一章节

免责声明:请确保仅使用**合法获得的授权数据**训练 DDSP-SVC 模型,不要将这些模型及其合成的任何音频用于非法目的。 本库作者不对因使用这些模型检查点和音频而造成的任何侵权,诈骗等违法行为负责。

1.1 更新:支持多说话人和音色混合。

2.0 更新:开始支持实时 vst 插件,并优化了 combsub 模型, 训练速度极大提升。旧的 combsub 模型仍然兼容,可用 combsub-old.yaml 训练,sins 模型不受影响,但由于训练速度远慢于 combsub, 目前版本已经不推荐使用。

3.0 更新:由于作者删库 vst 插件取消支持,转为使用独立的实时变声前端;支持多种编码器,并将 contentvec768l12 作为默认编码器;引入浅扩散模型,合成质量极大提升。

## 1. 安装依赖
1. 安装PyTorch:我们推荐从 [**PyTorch 官方网站 **](https://pytorch.org/) 下载 PyTorch.

Expand Down Expand Up @@ -229,4 +235,6 @@ python gui.py
* [ddsp](https://github.com/magenta/ddsp)
* [pc-ddsp](https://github.com/yxlllc/pc-ddsp)
* [soft-vc](https://github.com/bshall/soft-vc)
* [ContentVec](https://github.com/auspicious3000/contentvec)
* [DiffSinger (OpenVPI version)](https://github.com/openvpi/DiffSinger)
* [Diff-SVC](https://github.com/prophesier/diff-svc)

0 comments on commit e299ac2

Please sign in to comment.