-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX Inference Scripts Documentation #198
Comments
.ds files are just JSON in disguise, you can open it with any text editor. The structure inside is intuitive, so I would not explain them here, but please follow up if you have further questions. To do inference using CLI, you most likely will perform variance inference and then acoustic inference. Variance inference will add new fields to each "sentence" in the .ds file, such as breath, voicing, or any enabled feature with your checkpoint. You have to use the variance checkpoint to infer every required parameter for the acoustic model. The output of this inference step is a new .ds file. Then input the .ds file from the previous to the acoustic inference and get you .wav file out. Arguments like |
Hi there, Thank you so much for the reply! I really appreciate the insight, and this is my bad for not mentioning this earlier, but I think it would probably help if I elaborate a bit in terms of my use case. Given some music and phonemic data in a not-necessarily .ds format about sung audio over some time interval (ex, midi/an f0 spectrum and the set of phonemes I want the model to sing, with start and end times for vowel phonemes already set in stone), I want to be able to generate audio of my DiffSinger (composed of a duration-only variance model (
I'd already gotten .ds data from several .wavs in my training dataset from OpenUtau and examined them in VsCode, and while I could understand what each field meant like you said without any issues, I was asking for a bit more granular details. Accomplishing my use case using CLI commands would likely look like generating a .ds file from scratch or editing it procedurally after exporting one from OpenUtau, and then generating specific fields from my two variance models (not sure about the order to apply them) before running the .ds file through the acoustic, possibly with some intermediary editing as well. Because I didn't know exactly how I'd implement this process, which seemed fairly complicated and error-prone, I wanted some more thorough spec ** for how the .ds files and inference scripts worked, since I was changing .ds files only slightly from what I got from OpenUtau early on was getting a lot of difficult-to-understand errors. This is also my first time training a DiffSinger, so that didn't help either. However, I then checked out the scripts in this repo a little more closely, and noticed that * I wouldn't use any copyrighted material, I'm just using taking these songs for the sake of example. |
Hello again, Sorry to bother you, but is there any direction you can point me in to help solve this? I've exported my models to ONNX after you uploaded
No output is printed and session terminates with the error message "your session restarted after a crash." Running this locally, I get a segmentation fault with no further explanation. Do you have any advice on how to fix this? |
To reply to the first post, I am unsure about the question on how to construct a DS file because the answer is using an editor such as OpenUtau. I doubt anyone can manually code the DS file directly simply because, for example, it is hard to code a f0 curve. Once you have the ONNX, I am not quite sure what the problem is. If you exported the ONNX according to the guide (PyTorch 1.13.1 and the requirements file you mentioned), then they should work fine. You can verify whether they are correctly exported using tools such as https://netron.app/. You can also find out about the input and output arguments using this tool for ONNX. |
HI PETER, IS THE SEGMENTATION ERROR SIMILAR TO THIS?? IF YES WERE YOU ABLE TO FIX IT? (singer) C:\Users\Administrator\nnsvs-db-converter>python db_converter.py -L C:\Users\Administrator\nnsvs-db-converter\language-def.json -mD -c C:\Users\Administrator\nnsvs-db-converter\vlabeler_dataset_final |
Hello,
I'm interested in running command line inference using the .ckpt's of the model I trained, but after reading the instructions under
Inference
indocs/GettingStarted.md
and the outputs of--help
on the appropriate inference scripts (Specificallypython scripts/infer.py variance --help
andpython scripts/infer.py acoustic --help
) I don't fully understand the details of how .ds files work and, less importantly, what the details of some of the parameters toinfer.py
script as well (I largely understand what all of the parameters control but am interested in how to configure--num
,--key
,--expr
, and--step
based on a more precise understanding of what they actually do alongside general best practices for those parameters), as there is no thorough documentation here on either of these topics. The .ds docs may be out of scope for this repo (I looked briefly on the original OpenUtau repo and the recommended fork for OpenUtau with DiffSinger, but didn't find anything), but do you know where I could find both such docs to reference for my project?Thank you,
Peter
The text was updated successfully, but these errors were encountered: