-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
スマホ版VOICEVOXの開発 #10
Comments
こちらの件について、ONNX RuntimeをiOS向けにビルドするためのドキュメントがあったので共有しておきます。 また、CoreMLでサポートされているオペレーションについては以下のドキュメントに記載があります。 |
ありがとうございます! オペレーション一覧はまだ見てませんでした。VOICEVOXの推論機構が全部表現できるかはパッとわからないですね。。足りてないのがあるかもしれない。。 |
でもOSSとして開発していくのであれば、おそらく暗号化済みのモデルファイルを共有する仕組みがない(?)CoreMLよりも、ローカルストレージにあるバイナリファイルからモデルをloadできるonnxruntimeのほうが筋が通っているように感じました。 |
VOICEVOX/voicevox_core で公開されている各種onnxファイルとオペレーションが変わらないのであれば、そこから対応可能か確認できそうです。
この点については、CoreMLでモデルを暗号化して提供することはできそうです。 また、CoreMLの形式もアプリにバンドルすることは可能なので、アプリのリリースとともにモデルを配布することもできそうです。 CoreMLとONNXのパフォーマンスの違いはおそらくないと思います。 |
かなり多くのオペレーションが対応していないので、CoreML実行プロバイダをONNXで使うのは難しそうです。
参考として、CoreMLに変換する場合のことを書いておきます。 |
こちら、少なくとも今は変わってないです! 対応表ありがとうございます!!!!とても参考になります!! CoreMLのこともありがとうございます。 一応他にも、onnxruntimeをCoreML使わずCPUで利用するとか、WebViewを経由してWebGL版onnxruntimeを使うとかの方法が考えられます。 |
ONNX化の影響でCPU推論がかなり高速化されたので、もしかするとiPhoneやiPadでもCPUで十分快適に動作するかもしれません。 |
おーーなるほどです!!割と簡単に確かめられるかもなんですね!! |
wasmでどれくらい速度が出るのか調べるために、 PC上でブラウザを開いてCPUを用いて推論したところ、5秒ほどの音声を生成するのに10秒ほどかかりました。 また、 WebGLを用いてどれくらい早くなるのかを確かめたい気持ちがあります。 |
onnxruntime-webのthreadingを有効にした状態で検証してみました。 (thx @yamachu !!! ) WebGLを使うルートも検証し始めました。 |
NCNN, good one! |
Check this tutorial, ncnn supports stripping readable information. |
Great!!! |
Great, BTW if you're converting from pytorch, it's recommended to give ncnn's pnnx tool a try. It can directly convert the pytorch module to ncnn without generating redundant OPs like in ONNX. |
I tried to convert using Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Gather not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
# value 4
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
# value 4
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=1
Range not supported yet!
Gather not supported yet!
# axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Gather not supported yet!
# axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Gather not supported yet!
# axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Gather not supported yet!
# axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
# value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported slice step !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Cast not supported yet!
# to=7
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Cast not supported yet!
# to=7
Cast not supported yet!
# to=7
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unknown data type 0
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
# value 4
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Unknown data type 0
Shape not supported yet!
Unsupported squeeze axes !
Cast not supported yet!
# to=7
Cast not supported yet!
# to=7
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Equal not supported yet!
Cast not supported yet!
# to=9
Where not supported yet!
Cast not supported yet!
# to=9
Where not supported yet!
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
# value 4
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Shape not supported yet!
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Unknown data type 0
Shape not supported yet!
Unsupported squeeze axes !
Cast not supported yet!
# to=7
Cast not supported yet!
# to=7
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
# axis=0
Equal not supported yet!
Cast not supported yet!
# to=9
Where not supported yet!
Cast not supported yet!
# to=9
Where not supported yet!
Unsupported unsqueeze axes !
Unknown data type 0
Gather not supported yet!
# axis=0
Unsupported unsqueeze axes !
Gather not supported yet!
# axis=0
Gather not supported yet!
# axis=0 |
I didn't know there was such a thing! I see that pnnx was in a separate repository. I will try to use the exe distributed in the releases here. |
Check the second line of its README:
Apparently they merged pnnx into ncnn's repo. |
Oh, I know that one! |
I tried The The $ ./pnnx/pnnx.exe hiho_decode_script_cpu.pt inputshape=[100,1],[100,45],[1]i64 inputshape2=[200,1],[200,45],[1]i64
pnnxparam = hiho_decode_script_cpu.pnnx.param
pnnxbin = hiho_decode_script_cpu.pnnx.bin
pnnxpy = hiho_decode_script_cpu_pnnx.py
ncnnparam = hiho_decode_script_cpu.ncnn.param
ncnnbin = hiho_decode_script_cpu.ncnn.bin
ncnnpy = hiho_decode_script_cpu_ncnn.py
optlevel = 2
device = cpu
inputshape = [100,1]f32,[100,45]f32,[1]i64
inputshape2 = [200,1]f32,[200,45]f32,[1]i64
customop =
moduleop =
############# pass_level0
inline function is_tracing
inline function pad_sequence
inline function pad_sequence
inline function make_pad_mask
inline function make_non_pad_mask
inline module = espnet_pytorch_library.conformer.convolution.ConvolutionModule
inline module = espnet_pytorch_library.conformer.encoder.Encoder
inline module = espnet_pytorch_library.conformer.encoder_layer.EncoderLayer
inline module = espnet_pytorch_library.conformer.swish.Swish
inline module = espnet_pytorch_library.transformer.attention.RelPositionMultiHeadedAttention
inline module = espnet_pytorch_library.transformer.embedding.RelPositionalEncoding
inline module = espnet_pytorch_library.transformer.layer_norm.LayerNorm
inline module = espnet_pytorch_library.transformer.multi_layer_conv.MultiLayeredConv1d
inline module = espnet_pytorch_library.transformer.repeat.MultiSequential
inline module = hifi_gan.models.Generator
inline module = hifi_gan.models.ResBlock1
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitPostnet
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitYukarinSosoa
inline function is_tracing
inline function pad_sequence
inline function pad_sequence
inline function make_pad_mask
inline function make_non_pad_mask
inline module = espnet_pytorch_library.conformer.convolution.ConvolutionModule
inline module = espnet_pytorch_library.conformer.encoder.Encoder
inline module = espnet_pytorch_library.conformer.encoder_layer.EncoderLayer
inline module = espnet_pytorch_library.conformer.swish.Swish
inline module = espnet_pytorch_library.transformer.attention.RelPositionMultiHeadedAttention
inline module = espnet_pytorch_library.transformer.embedding.RelPositionalEncoding
inline module = espnet_pytorch_library.transformer.layer_norm.LayerNorm
inline module = espnet_pytorch_library.transformer.multi_layer_conv.MultiLayeredConv1d
inline module = espnet_pytorch_library.transformer.repeat.MultiSequential
inline module = hifi_gan.models.Generator
inline module = hifi_gan.models.ResBlock1
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitPostnet
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitYukarinSosoa
51 52 length.1 f00.1 phoneme.1 h.1 h0 h2.1 maxlen.1 seq_range.1 105 seq_range_expand.1 seq_length_expand.1 mask.5 111 113 mask.3 120 122 123 124 x.8 131 132 134 135 136 1094 139 1096 140 142 143 1100 145 input.2 147 148 150 151 153 154 161 bias.3 weight.3 x.3 input.8 185 186 input0.29 188 input1.25 190 input2.27 192 193 input.10 bias.5 weight.5 query.2 202 204 205 pos_bias_v.2 pos_bias_u.2 n_batch.2 234 q.2 237 k.2 240 v.2 q0.2 k0.2 value.2 q1.2 n_batch_pos.2 250 p.2 p0.2 254 q_with_bias_u.2 256 q_with_bias_v.2 258 matrix_ac.2 260 x.5 263 266 269 zero_pad.2 x_padded.2 276 279 282 283 286 x_padded0.2 290 291 292 293 295 296 1160 297 299 300 301 matrix_bd.2 303 scores.2 n_batch0.2 308 mask.2 scores0.2 311 input.12 313 x0.2 315 316 input0.10 319 320 input0.12 bias.7 weight.7 x.7 input.14 336 input0.14 338 339 340 input.16 342 343 344 input1.10 bias.9 weight.9 x.9 input.18 359 360 input0.16 362 input1.12 364 input2.8 366 1198 367 input2.10 bias.11 weight.11 input0.18 377 bias.2 weight.2 x.2 input.31 401 402 input0.35 404 input1.31 406 input2.25 408 409 input.6 bias.4 weight.4 query.1 418 420 421 pos_bias_v.1 pos_bias_u.1 n_batch.1 450 q.1 453 k.1 456 v.1 q0.1 k0.1 value.1 q1.1 n_batch_pos.1 466 p.1 p0.1 470 q_with_bias_u.1 472 q_with_bias_v.1 474 matrix_ac.1 476 x.4 479 482 485 zero_pad.1 x_padded.1 492 495 498 499 502 x_padded0.1 506 507 508 509 511 512 1255 513 515 516 517 matrix_bd.1 519 scores.1 n_batch0.1 524 mask.1 scores0.1 527 input.27 529 x0.1 531 532 input0.37 535 536 input0.33 bias.6 weight.6 x.6 input.4 553 input0.25 555 556 557 input.29 559 560 561 input1.29 bias.8 weight.8 x.1 input.33 576 577 input0.27 579 input1.27 581 input2.31 583 1293 584 input2.29 bias.10 weight.10 input0.31 bias.1 weight.1 599 h3.1 602 output1.1 606 input0.2 input1.2 input2.2 xs0.2 input0.4 input1.4 input2.4 xs1.2 input0.6 input1.6 input2.6 xs2.2 input0.8 input1.8 input2.33 xs3.2 input0.39 input1.33 650 651 output2.1 spec.1 x.10 20 663 700 input.3 702 input.5 718 input8.1 720 input9.1 input10.1 723 input11.1 725 input12.1 input13.1 728 input14.1 730 xs.5 input.7 747 input0.5 749 input1.5 input2.5 752 input3.5 754 input4.5 input5.5 757 input6.5 759 760 xs.3 input.9 777 input0.7 779 input1.7 input2.7 782 input3.7 784 input4.7 input5.7 787 input6.7 789 790 xs0.1 input0.3 input1.3 794 input.11 810 input0.9 812 input1.9 input2.9 815 input3.9 817 input4.9 input5.9 820 input6.9 822 xs.7 input.13 839 input0.11 841 input1.11 input2.11 844 input3.11 846 input4.11 input5.11 849 input6.11 851 852 xs1.1 input.15 869 input0.13 871 input1.13 input2.13 874 input3.13 876 input4.13 input5.13 879 input6.13 881 882 xs2.1 1351 input2.3 input3.3 886 input.17 902 input0.15 904 input1.15 input2.15 907 input3.15 909 input4.15 input5.15 912 input6.15 914 xs.9 input.19 931 input0.17 933 input1.17 input2.17 936 input3.17 938 input4.17 input5.17 941 input6.17 943 944 xs3.1 input.21 961 input0.19 963 input1.19 input2.19 966 input3.19 968 input4.19 input5.19 971 input6.19 973 974 xs4.1 1376 input4.3 input5.3 978 input.23 994 input0.21 996 input1.21 input2.21 999 input3.21 1001 input4.21 input5.21 1004 input6.21 1006 xs.1 input.25 1023 input0.23 1025 input1.23 input2.23 1028 input3.23 1030 input4.23 input5.23 1033 input6.23 1035 1036 xs5.1 input.1 1053 input0.1 1055 input1.1 input2.1 1058 input3.1 1060 input4.1 input5.1 1063 input6.1 1065 1066 xs6.1 1401 input6.3 input7.1 1070 1071 23
----------------
|
The error message is very useful. terminate called after throwing an instance of 'c10::Error'
what(): forward() Expected a value of type 'List[Tensor]' for argument 'f0_list' but instead found type 'Tensor'. It says that you specified the I also tried out the RuntimeError: index out of range in self at the forward call of self.speaker_embedder I think this might could be fixed by specifying an I'd like to fix them myself but I don't have access to the original models so \_(ツ)_/ |
It's true!
I see! The network structure of the model can be found here. The conversion to torch script can be done with the following code. python run_jit.py \
--yukarin_s_model_dir "model/yukarin_s" \
--yukarin_sa_model_dir "model/yukarin_sa" \
--yukarin_sosoa_model_dir "model/yukarin_sosoa" \
--hifigan_model_dir "model/hifigan" \
--texts "hello" \
--speaker_ids 0 1 |
I've changed I ran the above code to get a new ############# pass_level1
no attribute value
Segmentation fault 2022/06/24 I created the issue. |
Sorry recently I didn't have time to check it out 🙇
I guess it's better this way, the maintainer of ncnn is actively involved in the community and would give solutions way better than mine. Nevertheless, I'll keep tracking this issue whenever I have the time. |
decodeのncnn用のバイナリができました! pnnx経由でncnn化する制約として |
Have you tried it out? Actually I didn't see any issues with tracing an auto regressive model, see this tutorial. |
Thanks for letting me know! |
Sorry I wasn't around for a period, I went out to try other frameworks, Like Anyway, I was able to convert the onnx model here to Compile MNN Convert Toolgit clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build
cmake .. -DMNN_BUILD_CONVERTER=ON
make -j4
# convert
./MNNConvert -f ONNX --modelFile predict-duration.onnx --MNNModel predict-duration.mnn --bizCode biz
# test the inference result
python ../tools/script/fastTestOnnx.py ./onnx/predict-duration.onnx Modify the decoderimport onnx
model = onnx.load("decode-0.onnx")
node = next(n for n in model.graph.node if n.name == "Unsqueeze_481")
node.attribute.remove(node.attribute[0])
axes_attr = onnx.helper.make_attribute("axes", [0])
node.attribute.insert(0, axes_attr)
onnx.save(model, "./onnx/decode-0-modified.onnx") I haven't written any deployment or inference code yet since I don't have Edit: n.op_type -> n.name |
That's great !!!!!!!!!!!!! |
It works, if you go to the docs' about page you'll see
Originally it was made for mobile platforms, just like |
ここ数日間でのDiscord会話や自分が試してわかったことからタスクリストを作ってみました。 ( VOICEVOX/voicevox_mobile#28 に移動) |
新設計APIを使えばエンジンのJS実装部分を減らすことができそうだったので、それを使うようにタスクリストを更新しました。 |
スマホ版VOICEVOXを作りたいです。
目的
VOICEVOXのバリューであるユーザー数の増加と、ミッションである音声合成キャラの浸透ができそうだからです。
背景
そもそも動画を作る人というのは、高校生・大学生が多いと思います。時間がないと作れないからです。
今の高校生・大学生は基本的にスマホで物事を完結します。動画作成も例外ではないです(想像できませんが・・・)
スマホで動く音声合成アプリは少なく、特に無料のものとなるとかなり数が少ないはずです。そこを攻めます。
この領域は特に企業が参入しづらいはずです。どう頑張っても儲からないからです。
ほとんど未開のこの領域に踏み込んでみたい、というのがこのプロジェクトの意図です。
ゴール
とりあえずTTSができるアプリのデモができればOKとしたいです。
リリースに向けての動き方とかは後々に考える見込みです。
内容
開発はOSSベースを想定しています。いろんな方の力をお借りしたいからです。
初手はiOSだけで良いと思います。日本語TTSを使うメインユーザーが日本のユーザーであり、かつデバイスの計算リソースが強めなためです。
UIフレームワークはReact Nativeを検討しています。VOICEVOXがjs製なのと、マルチプラットフォームに展開したいからです。
課題
一番の課題は、音声合成用の機械学習モデルの推論をどう実現するかだと思います。
とりあえずCoreMLに変換する方法がありそうなので検討中です。
ちょっと調べた感じ、onnxruntimeをスマホ用にビルドすることもできそうですが、前例がなかなか見つからず、前途多難な予感がしています。
2番めの課題は、openjtalkが必要な点です。
これはこちらのプロジェクトのC++ TTSライブラリができ次第着手するのが効率がいいのかなと思っています。
3番めの課題はUIです。がんばってデザインしていきます。
とりあえずアクセント調整だけできれば良いかなとも思っています。
その他
手が空き次第、僕が着手しようかなと思っていますが、他のタスクも多くなかなか手がつけられていません。
もしご興味があればコメント等頂ければと思います!
The text was updated successfully, but these errors were encountered: