スマホ版VOICEVOXの開発 #10

Hiroshiba · 2022-02-09T19:09:53Z

スマホ版VOICEVOXを作りたいです。

目的

VOICEVOXのバリューであるユーザー数の増加と、ミッションである音声合成キャラの浸透ができそうだからです。

背景

そもそも動画を作る人というのは、高校生・大学生が多いと思います。時間がないと作れないからです。
今の高校生・大学生は基本的にスマホで物事を完結します。動画作成も例外ではないです（想像できませんが･･･）
スマホで動く音声合成アプリは少なく、特に無料のものとなるとかなり数が少ないはずです。そこを攻めます。
この領域は特に企業が参入しづらいはずです。どう頑張っても儲からないからです。
ほとんど未開のこの領域に踏み込んでみたい、というのがこのプロジェクトの意図です。

ゴール

とりあえずTTSができるアプリのデモができればOKとしたいです。
リリースに向けての動き方とかは後々に考える見込みです。

内容

開発はOSSベースを想定しています。いろんな方の力をお借りしたいからです。
初手はiOSだけで良いと思います。日本語TTSを使うメインユーザーが日本のユーザーであり、かつデバイスの計算リソースが強めなためです。
UIフレームワークはReact Nativeを検討しています。VOICEVOXがjs製なのと、マルチプラットフォームに展開したいからです。

課題

一番の課題は、音声合成用の機械学習モデルの推論をどう実現するかだと思います。
とりあえずCoreMLに変換する方法がありそうなので検討中です。
ちょっと調べた感じ、onnxruntimeをスマホ用にビルドすることもできそうですが、前例がなかなか見つからず、前途多難な予感がしています。

２番めの課題は、openjtalkが必要な点です。
これはこちらのプロジェクトのC++ TTSライブラリができ次第着手するのが効率がいいのかなと思っています。

３番めの課題はUIです。がんばってデザインしていきます。
とりあえずアクセント調整だけできれば良いかなとも思っています。

その他

手が空き次第、僕が着手しようかなと思っていますが、他のタスクも多くなかなか手がつけられていません。
もしご興味があればコメント等頂ければと思います！

HyodaKazuaki · 2022-02-11T12:53:10Z

一番の課題は、音声合成用の機械学習モデルの推論をどう実現するかだと思います。
とりあえずCoreMLに変換する方法がありそうなので検討中です。
ちょっと調べた感じ、onnxruntimeをスマホ用にビルドすることもできそうですが、前例がなかなか見つからず、前途多難な予感がしています。

こちらの件について、ONNX RuntimeをiOS向けにビルドするためのドキュメントがあったので共有しておきます。
CoreMLを利用する場合のビルドオプションに関する記述もあります。
https://onnxruntime.ai/docs/build/ios.html

また、CoreMLでサポートされているオペレーションについては以下のドキュメントに記載があります。
https://onnxruntime.ai/docs/execution-providers/CoreML-ExecutionProvider

Hiroshiba · 2022-02-12T08:23:48Z

ありがとうございます！
同じページを見ていたのですが、ビルドしてみた報告ブログなどが見つからず、onnxruntimeのビルドがうまくいくかは修羅の道なのかもと思ってたりします。

オペレーション一覧はまだ見てませんでした。VOICEVOXの推論機構が全部表現できるかはパッとわからないですね。。足りてないのがあるかもしれない。。

Hiroshiba · 2022-02-12T08:32:59Z

でもOSSとして開発していくのであれば、おそらく暗号化済みのモデルファイルを共有する仕組みがない（？）CoreMLよりも、ローカルストレージにあるバイナリファイルからモデルをloadできるonnxruntimeのほうが筋が通っているように感じました。
onnxruntimeでやっていきたいですね！！

HyodaKazuaki · 2022-02-12T09:00:01Z

オペレーション一覧はまだ見てませんでした。VOICEVOXの推論機構が全部表現できるかはパッとわからないですね。。足りてないのがあるかもしれない。。

VOICEVOX/voicevox_core で公開されている各種onnxファイルとオペレーションが変わらないのであれば、そこから対応可能か確認できそうです。
ちょっと確認してみます。

おそらく暗号化済みのモデルファイルを共有する仕組みがない（？）

この点については、CoreMLでモデルを暗号化して提供することはできそうです。
https://developer.apple.com/documentation/coreml/encrypting_a_model_in_your_app
https://qiita.com/kazuhiro4949/items/becb1850172d2e96281f

また、CoreMLの形式もアプリにバンドルすることは可能なので、アプリのリリースとともにモデルを配布することもできそうです。
https://developer.apple.com/documentation/coreml/integrating_a_core_ml_model_into_your_app?changes=latest_minor

CoreMLとONNXのパフォーマンスの違いはおそらくないと思います。
ですので、開発の方針として「他プラットフォームとの開発の差異を限りなく小さくすること」を優先するのであれば、ONNXモデルのまま利用できたほうがいいと思います。

HyodaKazuaki · 2022-02-12T09:30:04Z

yukarin_s.onnnx、yukarin_sa.onnx、decode.onnxの3つのONNXモデルについて、CoreML実行プロバイダを利用できるかオペレーションを確認してきました。
以下の表が3つのONNXモデルで使っているオペレーションとその対応状況です。

かなり多くのオペレーションが対応していないので、CoreML実行プロバイダをONNXで使うのは難しそうです。

Operator	Supported?
Add	Yes
Cast	Yes
Concat	Yes
ConcatFromSequence	No
ConstantOfShape	No
Conv	Yes
ConvTranspose	No
Cos	No
Div	No
Equal	No
Expand	No
Gather	No
GRU	No
LeakyRelu	No
Loop	No
MatMul	Yes
Mul	No
Pow	No
Range	No
ReduceMean	No
Relu	Yes
Reshape	Yes
ScatterND	No
Shape	No
Sigmoid	Yes
Sin	No
Slice	No
Softmax	No
SplitToSequence	No
Sqrt	No
Sub	No
Tanh	Yes
Transpose	Yes
Unsqueeze	No
Where	No

参考として、CoreMLに変換する場合のことを書いておきます。
ONNXからCoreMLに変換する機能は、Core ML Toolsというツールが提供していますが、次のバージョンでONNXからの変換が廃止されるようです。
PyTorchから直接変換する機能は提供されています。
https://developer.apple.com/jp/documentation/coreml/converting_trained_models_to_core_ml/
https://coremltools.readme.io/docs/onnx-conversion
https://coremltools.readme.io/docs/pytorch-conversion

Hiroshiba · 2022-02-12T10:16:16Z

VOICEVOX/voicevox_core で公開されている各種onnxファイルとオペレーションが変わらないのであれば

こちら、少なくとも今は変わってないです！

対応表ありがとうございます！！！！とても参考になります！！
そして思った以上に未対応が多いですね･･･
（cosとかsinとかどこで使ってるんだろうと思ったら、位置エンコーディングですね･･･）
僕もonnxruntimeでCoreMLを使うのは（かなり）難しいと思いました。

CoreMLのこともありがとうございます。
ローカルファイルから読む方法、あるんですね！
であればこちらでも全然OSSとして開発できそうな印象を受けました。
まあCoreMLを使う感じ･･･かなぁ･･･

一応他にも、onnxruntimeをCoreML使わずCPUで利用するとか、WebViewを経由してWebGL版onnxruntimeを使うとかの方法が考えられます。
WebViewを経由する方法はそれはそれでしんどそうなので微妙な気持ちですが、
性能が良いらしいiPhoneであればCPU推論が意外と早いかもとちょっと思ってます。
CPU推論が実用に耐えうるかサクッと試したいかもですが、方法ありそうでしょうか👀

HyodaKazuaki · 2022-02-12T13:23:05Z

性能が良いらしいiPhoneであればCPU推論が意外と早いかもとちょっと思ってます。
CPU推論が実用に耐えうるかサクッと試したいかもですが、方法ありそうでしょうか👀

ONNX化の影響でCPU推論がかなり高速化されたので、もしかするとiPhoneやiPadでもCPUで十分快適に動作するかもしれません。
(とはいえ、現在サポートされているiPhoneやiPadの中には古いものもあるので、快適に利用できないものもありそうです)
現在、CocoaPods(iOSなど向けのライブラリ管理ツール)にonnxruntime(onnxruntime-mobile-c)があります。
これを使えば、ONNXモデルが動作するか、そしてどれぐらいの処理速度かを確認することはできそうです。

Hiroshiba · 2022-02-13T05:30:39Z

おーーなるほどです！！割と簡単に確かめられるかもなんですね！！

Hiroshiba · 2022-06-06T19:18:32Z

wasmでどれくらい速度が出るのか調べるために、onnxruntime-webを用いてonnxモデルで推論してみるコードを書いてみました。
https://github.com/Hiroshiba/vv_check_web/tree/6809d140e526eeaa109d64d3483329f63ee71a51

PC上でブラウザを開いてCPUを用いて推論したところ、５秒ほどの音声を生成するのに１０秒ほどかかりました。
ネイティブで生成した場合はCPUでも１秒未満で完了するので、比較するとざっと１０倍ほど遅そうです。さすがに使えなさそう。

また、onnxruntime-webはWebGLモードもあるのですが、対応していないものがあって推論できませんでした。
ちなみにTypeError: int64 is not supportedというエラーでした。

WebGLを用いてどれくらい早くなるのかを確かめたい気持ちがあります。
onnxモデル作成コードはこちらにあります。

Hiroshiba · 2022-06-08T18:56:07Z

onnxruntime-webのthreadingを有効にした状態で検証してみました。（thx @yamachu !!! ）
https://github.com/Hiroshiba/vv_check_web/tree/9adb272b576e3c125432459ee32fe6119658ac0f
時間は大幅に縮まりましたが、Core i7-11700で5秒の音声を生成するのに3.4秒かかり、まだやっぱりちょっと遅いなという印象でした。

WebGLを使うルートも検証し始めました。
pytorchモデルの中の処理を変える必要がある、というのがわかってきました。
ご興味あればぜひ一緒に検証しましょう･･･！！！

onnxruntime-webのWebGLでdecode.onnxを読めるようにする Hiroshiba/vv_core_inference#4

Patchethium · 2022-06-09T14:33:33Z

Besides CoreML, I suggest considering NCNN or tract for mobile deployment, they run on native code. Although it makes use of WebGL, wasm can still be pretty slow.

Hiroshiba · 2022-06-09T16:36:35Z

NCNN, good one!
Due to encryption, I would like to load the model from memory (not from a file), but I couldn't find in the documentation if it is possible. ;->

Patchethium · 2022-06-10T01:30:21Z

Check this tutorial, ncnn supports stripping readable information.

Hiroshiba · 2022-06-12T13:57:00Z

Great!!!
I will try to convert it to NCNN model.

Patchethium · 2022-06-12T16:51:18Z

Great, BTW if you're converting from pytorch, it's recommended to give ncnn's pnnx tool a try. It can directly convert the pytorch module to ncnn without generating redundant OPs like in ONNX.

Hiroshiba · 2022-06-12T17:37:31Z

I tried to convert using ncnn from onnx, but there seemed to be a lot of errors! ;->

Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Gather not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
  # value 4
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
  # value 4
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=1
Range not supported yet!
Gather not supported yet!
  # axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Gather not supported yet!
  # axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Gather not supported yet!
  # axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Gather not supported yet!
  # axis=2
Shape not supported yet!
Expand not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Range not supported yet!
Shape not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
ConstantOfShape not supported yet!
  # value 4
Equal not supported yet!
Where not supported yet!
Expand not supported yet!
Shape not supported yet!
Unknown data type 0
ScatterND not supported yet!
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported slice step !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Cast not supported yet!
  # to=7
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Cast not supported yet!
  # to=7
Cast not supported yet!
  # to=7
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unknown data type 0
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
  # value 4
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Unknown data type 0
Shape not supported yet!
Unsupported squeeze axes !
Cast not supported yet!
  # to=7
Cast not supported yet!
  # to=7
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Equal not supported yet!
Cast not supported yet!
  # to=9
Where not supported yet!
Cast not supported yet!
  # to=9
Where not supported yet!
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
ConstantOfShape not supported yet!
  # value 4
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Unknown data type 0
Shape not supported yet!
Unsupported squeeze axes !
Cast not supported yet!
  # to=7
Cast not supported yet!
  # to=7
Unsupported unsqueeze axes !
Unknown data type 0
Shape not supported yet!
Gather not supported yet!
  # axis=0
Equal not supported yet!
Cast not supported yet!
  # to=9
Where not supported yet!
Cast not supported yet!
  # to=9
Where not supported yet!
Unsupported unsqueeze axes !
Unknown data type 0
Gather not supported yet!
  # axis=0
Unsupported unsqueeze axes !
Gather not supported yet!
  # axis=0
Gather not supported yet!
  # axis=0

Hiroshiba · 2022-06-12T18:12:26Z

Great, BTW if you're converting from pytorch, it's recommended to give ncnn's pnnx tool a try.

I didn't know there was such a thing!
It's a bit of effort as it requires torch script, but I'd like to give it a try.
(It looks like I could get ncnn params and bin, but it doesn't say if this will work on ncnn...)

I see that pnnx was in a separate repository. I will try to use the exe distributed in the releases here.

Patchethium · 2022-06-13T01:46:51Z

Check the second line of its README:

Note: The current implementation is in https://github.com/Tencent/ncnn/tree/master/tools/pnnx

Apparently they merged pnnx into ncnn's repo.

Hiroshiba · 2022-06-13T08:43:07Z

Oh, I know that one!
I didn't find the executable binary in ncnn/tools/pnnx, but I did find it in pnnx/pnnx.
Thanks!

Hiroshiba · 2022-06-13T16:38:09Z

I tried pnnx!
I found that execution stopped without any useful error messages.

The .pt file can be found here. The hiho_decode_script_cpu.pt is the target you want to onnx convert.

The shape I'm inputting looks right at [-1,1],[-1,45],[1]i64.... It seems difficult.
https://github.com/Hiroshiba/yukarin_soso_connector/blob/b875c25a1f2e331c3647a26a692316a9e38d634e/yukarin_soso_connector/jit_forwarder/jit_forwarder.py#L255-L259

$ ./pnnx/pnnx.exe hiho_decode_script_cpu.pt inputshape=[100,1],[100,45],[1]i64 inputshape2=[200,1],[200,45],[1]i64

pnnxparam = hiho_decode_script_cpu.pnnx.param
pnnxbin = hiho_decode_script_cpu.pnnx.bin
pnnxpy = hiho_decode_script_cpu_pnnx.py
ncnnparam = hiho_decode_script_cpu.ncnn.param
ncnnbin = hiho_decode_script_cpu.ncnn.bin
ncnnpy = hiho_decode_script_cpu_ncnn.py
optlevel = 2
device = cpu
inputshape = [100,1]f32,[100,45]f32,[1]i64
inputshape2 = [200,1]f32,[200,45]f32,[1]i64
customop =
moduleop =
############# pass_level0
inline function is_tracing
inline function pad_sequence
inline function pad_sequence
inline function make_pad_mask
inline function make_non_pad_mask
inline module = espnet_pytorch_library.conformer.convolution.ConvolutionModule
inline module = espnet_pytorch_library.conformer.encoder.Encoder
inline module = espnet_pytorch_library.conformer.encoder_layer.EncoderLayer
inline module = espnet_pytorch_library.conformer.swish.Swish
inline module = espnet_pytorch_library.transformer.attention.RelPositionMultiHeadedAttention
inline module = espnet_pytorch_library.transformer.embedding.RelPositionalEncoding
inline module = espnet_pytorch_library.transformer.layer_norm.LayerNorm
inline module = espnet_pytorch_library.transformer.multi_layer_conv.MultiLayeredConv1d
inline module = espnet_pytorch_library.transformer.repeat.MultiSequential
inline module = hifi_gan.models.Generator
inline module = hifi_gan.models.ResBlock1
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitPostnet
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitYukarinSosoa
inline function is_tracing
inline function pad_sequence
inline function pad_sequence
inline function make_pad_mask
inline function make_non_pad_mask
inline module = espnet_pytorch_library.conformer.convolution.ConvolutionModule
inline module = espnet_pytorch_library.conformer.encoder.Encoder
inline module = espnet_pytorch_library.conformer.encoder_layer.EncoderLayer
inline module = espnet_pytorch_library.conformer.swish.Swish
inline module = espnet_pytorch_library.transformer.attention.RelPositionMultiHeadedAttention
inline module = espnet_pytorch_library.transformer.embedding.RelPositionalEncoding
inline module = espnet_pytorch_library.transformer.layer_norm.LayerNorm
inline module = espnet_pytorch_library.transformer.multi_layer_conv.MultiLayeredConv1d
inline module = espnet_pytorch_library.transformer.repeat.MultiSequential
inline module = hifi_gan.models.Generator
inline module = hifi_gan.models.ResBlock1
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitPostnet
inline module = yukarin_soso_connector.jit_forwarder.jit_yukarin_sosoa.JitYukarinSosoa
51  52  length.1  f00.1  phoneme.1  h.1  h0  h2.1  maxlen.1  seq_range.1  105  seq_range_expand.1  seq_length_expand.1  mask.5  111  113  mask.3  120  122  123  124  x.8  131  132  134  135  136  1094  139  1096  140  142  143  1100  145  input.2  147  148  150  151  153  154  161  bias.3  weight.3  x.3  input.8  185  186  input0.29  188  input1.25  190  input2.27  192  193  input.10  bias.5  weight.5  query.2  202  204  205  pos_bias_v.2  pos_bias_u.2  n_batch.2  234  q.2  237  k.2  240  v.2  q0.2  k0.2  value.2  q1.2  n_batch_pos.2  250  p.2  p0.2  254  q_with_bias_u.2  256  q_with_bias_v.2  258  matrix_ac.2  260  x.5  263  266  269  zero_pad.2  x_padded.2  276  279  282  283  286  x_padded0.2  290  291  292  293  295  296  1160  297  299  300  301  matrix_bd.2  303  scores.2  n_batch0.2  308  mask.2  scores0.2  311  input.12  313  x0.2  315  316  input0.10  319  320  input0.12  bias.7  weight.7  x.7  input.14  336  input0.14  338  339  340  input.16  342  343  344  input1.10  bias.9  weight.9  x.9  input.18  359  360  input0.16  362  input1.12  364  input2.8  366  1198  367  input2.10  bias.11  weight.11  input0.18  377  bias.2  weight.2  x.2  input.31  401  402  input0.35  404  input1.31  406  input2.25  408  409  input.6  bias.4  weight.4  query.1  418  420  421  pos_bias_v.1  pos_bias_u.1  n_batch.1  450  q.1  453  k.1  456  v.1  q0.1  k0.1  value.1  q1.1  n_batch_pos.1  466  p.1  p0.1  470  q_with_bias_u.1  472  q_with_bias_v.1  474  matrix_ac.1  476  x.4  479  482  485  zero_pad.1  x_padded.1  492  495  498  499  502  x_padded0.1  506  507  508  509  511  512  1255  513  515  516  517  matrix_bd.1  519  scores.1  n_batch0.1  524  mask.1  scores0.1  527  input.27  529  x0.1  531  532  input0.37  535  536  input0.33  bias.6  weight.6  x.6  input.4  553  input0.25  555  556  557  input.29  559  560  561  input1.29  bias.8  weight.8  x.1  input.33  576  577  input0.27  579  input1.27  581  input2.31  583  1293  584  input2.29  bias.10  weight.10  input0.31  bias.1  weight.1  599  h3.1  602  output1.1  606  input0.2  input1.2  input2.2  xs0.2  input0.4  input1.4  input2.4  xs1.2  input0.6  input1.6  input2.6  xs2.2  input0.8  input1.8  input2.33  xs3.2  input0.39  input1.33  650  651  output2.1  spec.1  x.10  20  663  700  input.3  702  input.5  718  input8.1  720  input9.1  input10.1  723  input11.1  725  input12.1  input13.1  728  input14.1  730  xs.5  input.7  747  input0.5  749  input1.5  input2.5  752  input3.5  754  input4.5  input5.5  757  input6.5  759  760  xs.3  input.9  777  input0.7  779  input1.7  input2.7  782  input3.7  784  input4.7  input5.7  787  input6.7  789  790  xs0.1  input0.3  input1.3  794  input.11  810  input0.9  812  input1.9  input2.9  815  input3.9  817  input4.9  input5.9  820  input6.9  822  xs.7  input.13  839  input0.11  841  input1.11  input2.11  844  input3.11  846  input4.11  input5.11  849  input6.11  851  852  xs1.1  input.15  869  input0.13  871  input1.13  input2.13  874  input3.13  876  input4.13  input5.13  879  input6.13  881  882  xs2.1  1351  input2.3  input3.3  886  input.17  902  input0.15  904  input1.15  input2.15  907  input3.15  909  input4.15  input5.15  912  input6.15  914  xs.9  input.19  931  input0.17  933  input1.17  input2.17  936  input3.17  938  input4.17  input5.17  941  input6.17  943  944  xs3.1  input.21  961  input0.19  963  input1.19  input2.19  966  input3.19  968  input4.19  input5.19  971  input6.19  973  974  xs4.1  1376  input4.3  input5.3  978  input.23  994  input0.21  996  input1.21  input2.21  999  input3.21  1001  input4.21  input5.21  1004  input6.21  1006  xs.1  input.25  1023  input0.23  1025  input1.23  input2.23  1028  input3.23  1030  input4.23  input5.23  1033  input6.23  1035  1036  xs5.1  input.1  1053  input0.1  1055  input1.1  input2.1  1058  input3.1  1060  input4.1  input5.1  1063  input6.1  1065  1066  xs6.1  1401  input6.3  input7.1  1070  1071  23
----------------

Patchethium · 2022-06-14T09:46:28Z

The error message is very useful.
For the decoder,

terminate called after throwing an instance of 'c10::Error'
  what():  forward() Expected a value of type 'List[Tensor]' for argument 'f0_list' but instead found type 'Tensor'.

It says that you specified the f0_list in forward call to be List[Tensor] but in pnnx you use [-1,1] which means a 2d Tensor.
I think you may fix it by stacking the list of f0 into one Tensor. Also, do the same thing to the phoneme list.

I also tried out the yukarin_s and yukarin_sa, got this error from both of them:

RuntimeError: index out of range in self

at the forward call of

self.speaker_embedder

I think this might could be fixed by specifying an example_input in jit export, with a speaker id no larger than the embedding size.

I'd like to fix them myself but I don't have access to the original models so \_(ツ)_/

Hiroshiba · 2022-06-16T00:59:16Z

It's true!
I ran the ubuntu version and got an error!!!

I'd like to fix them myself but I don't have access to the original models so _(ツ)_/

I see!
The binary data of the models can be found here.
https://github.com/Hiroshiba/vv_core_inference/releases/tag/0.0.1

The network structure of the model can be found here.
https://github.com/Hiroshiba/yukarin_soso_connector

The conversion to torch script can be done with the following code.

python run_jit.py \
    --yukarin_s_model_dir "model/yukarin_s" \
    --yukarin_sa_model_dir "model/yukarin_sa" \
    --yukarin_sosoa_model_dir "model/yukarin_sosoa" \
    --hifigan_model_dir "model/hifigan" \
    --texts "hello" \
    --speaker_ids 0 1

Hiroshiba · 2022-06-16T01:26:00Z

I've changed List[Tensor] to Tensor! Working on this branch.
https://github.com/Hiroshiba/yukarin_soso_connector/tree/to-ncnn

I ran the above code to get a new .pt file and the level0 optimization passed through 🎉.
And I got a wonderful error in level1 optimization. ;->

############# pass_level1
no attribute value
Segmentation fault

2022/06/24　I created the issue.

Patchethium · 2022-06-24T16:58:24Z

Sorry recently I didn't have time to check it out 🙇

creates the issue

I guess it's better this way, the maintainer of ncnn is actively involved in the community and would give solutions way better than mine. Nevertheless, I'll keep tracking this issue whenever I have the time.

Hiroshiba · 2022-06-27T16:03:19Z

decodeのncnn用のバイナリができました！
https://github.com/Hiroshiba/vv_core_inference/releases/tag/ncnn

pnnx経由でncnn化する制約としてtorch.jit.traceを使う必要があるのですが、その影響でyukarin_saの自己回帰が使えず、saのncnn化ができてません。

Patchethium · 2022-09-27T06:19:36Z

Have you tried it out? Actually I didn't see any issues with tracing an auto regressive model, see this tutorial.

Hiroshiba · 2022-09-30T17:45:15Z

Thanks for letting me know!
In this example, the autoregression code was written in GreedySearchDecoder, where torch.jit.script was used instead of trace.

Patchethium · 2023-02-15T04:45:43Z

Sorry I wasn't around for a period, I went out to try other frameworks, ncnn, tvm, openvino, TNN, tract... and ended up with Alibaba's MNN.

Like NCNN is (kinda) from Tencent, MNN is also made by a Chinese Big Tech Alibaba, the one running AliExpress. It could either be an advantage or disadvantage, fortunately it has an English doc for non-Chinese speaker.

Anyway, I was able to convert the onnx model here to MNN format with little tweaking. predict-duration and predict-intonation works out-of-box, while on decoder I only need to change an axes attribute. It's just amazing in regard to NCNN which can't even run Unsqueeze.

Compile MNN Convert Tool

git clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build
cmake .. -DMNN_BUILD_CONVERTER=ON
make -j4

# convert
./MNNConvert -f ONNX --modelFile predict-duration.onnx --MNNModel predict-duration.mnn --bizCode biz

# test the inference result
python ../tools/script/fastTestOnnx.py ./onnx/predict-duration.onnx

Modify the decoder

import onnx

model = onnx.load("decode-0.onnx")

node = next(n for n in model.graph.node if n.name == "Unsqueeze_481")

node.attribute.remove(node.attribute[0])
axes_attr = onnx.helper.make_attribute("axes", [0])
node.attribute.insert(0, axes_attr)

onnx.save(model, "./onnx/decode-0-modified.onnx")

I haven't written any deployment or inference code yet since I don't have Android Studio or XCode on my laptop.

Edit: n.op_type -> n.name

Hiroshiba · 2023-02-15T16:34:34Z

That's great !!!!!!!!!!!!!
I'm very interested whether it will work on a smart phone or not !!!!!

Patchethium · 2023-02-15T16:38:12Z

It works, if you go to the docs' about page you'll see

● iOS platform: static library size for armv7+arm64 platforms is about 5MB, size increase of linked executables is about 620KB, and metallib file is about 600KB.
● Android platform: core so size is about 400KB, OpenCL so is about 400KB, Vulkan so is about 400KB.

Originally it was made for mobile platforms, just like NCNN.

sevenc-nanashi · 2023-04-20T22:42:20Z

ここ数日間でのDiscord会話や自分が試してわかったことからタスクリストを作ってみました。

（ VOICEVOX/voicevox_mobile#28 に移動）

sevenc-nanashi · 2023-04-27T08:24:05Z

新設計APIを使えばエンジンのJS実装部分を減らすことができそうだったので、それを使うようにタスクリストを更新しました。

Hiroshiba added the 優先度：低 label Feb 9, 2022

Hiroshiba mentioned this issue Jun 23, 2022

Webgl support Hiroshiba/vv_core_inference#5

Open

Hiroshiba mentioned this issue Apr 25, 2023

製品版のiOSビルドを試作・試運転してみる VOICEVOX/voicevox_core#472

Closed

Hiroshiba mentioned this issue May 16, 2023

AWSでの速度（金額あたりの処理数）を測定したい VOICEVOX/voicevox_engine#434

Closed

lawofcycles mentioned this issue May 16, 2023

汎用的なAPI基盤機能とインフラの構築セットをコード(IaC)で提供する VOICEVOX/voicevox_engine#682

Open

Hiroshiba mentioned this issue Jun 7, 2023

Add: スマホ用のヘッダーを追加 VOICEVOX/voicevox_mobile#27

Merged

sevenc-nanashi mentioned this issue Jun 8, 2023

タスクリスト VOICEVOX/voicevox_mobile#28

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

スマホ版VOICEVOXの開発 #10

スマホ版VOICEVOXの開発 #10

Hiroshiba commented Feb 9, 2022 •

edited

Loading

HyodaKazuaki commented Feb 11, 2022

Hiroshiba commented Feb 12, 2022

Hiroshiba commented Feb 12, 2022

HyodaKazuaki commented Feb 12, 2022

HyodaKazuaki commented Feb 12, 2022

Hiroshiba commented Feb 12, 2022

HyodaKazuaki commented Feb 12, 2022 •

edited

Loading

Hiroshiba commented Feb 13, 2022

Hiroshiba commented Jun 6, 2022 •

edited

Loading

Hiroshiba commented Jun 8, 2022

Patchethium commented Jun 9, 2022

Hiroshiba commented Jun 9, 2022

Patchethium commented Jun 10, 2022

Hiroshiba commented Jun 12, 2022

Patchethium commented Jun 12, 2022

Hiroshiba commented Jun 12, 2022 •

edited

Loading

Hiroshiba commented Jun 12, 2022 •

edited

Loading

Patchethium commented Jun 13, 2022

Hiroshiba commented Jun 13, 2022

Hiroshiba commented Jun 13, 2022

Patchethium commented Jun 14, 2022 •

edited

Loading

Hiroshiba commented Jun 16, 2022 •

edited

Loading

Hiroshiba commented Jun 16, 2022 •

edited

Loading

Patchethium commented Jun 24, 2022

Hiroshiba commented Jun 27, 2022 •

edited

Loading

Patchethium commented Sep 27, 2022

Hiroshiba commented Sep 30, 2022

Patchethium commented Feb 15, 2023 •

edited

Loading

Hiroshiba commented Feb 15, 2023

Patchethium commented Feb 15, 2023

sevenc-nanashi commented Apr 20, 2023 •

edited

Loading

sevenc-nanashi commented Apr 27, 2023 •

edited

Loading

スマホ版VOICEVOXの開発 #10

スマホ版VOICEVOXの開発 #10

Comments

Hiroshiba commented Feb 9, 2022 • edited Loading

目的

背景

ゴール

内容

課題

その他

HyodaKazuaki commented Feb 11, 2022

Hiroshiba commented Feb 12, 2022

Hiroshiba commented Feb 12, 2022

HyodaKazuaki commented Feb 12, 2022

HyodaKazuaki commented Feb 12, 2022

Hiroshiba commented Feb 12, 2022

HyodaKazuaki commented Feb 12, 2022 • edited Loading

Hiroshiba commented Feb 13, 2022

Hiroshiba commented Jun 6, 2022 • edited Loading

Hiroshiba commented Jun 8, 2022

Patchethium commented Jun 9, 2022

Hiroshiba commented Jun 9, 2022

Patchethium commented Jun 10, 2022

Hiroshiba commented Jun 12, 2022

Patchethium commented Jun 12, 2022

Hiroshiba commented Jun 12, 2022 • edited Loading

Hiroshiba commented Jun 12, 2022 • edited Loading

Patchethium commented Jun 13, 2022

Hiroshiba commented Jun 13, 2022

Hiroshiba commented Jun 13, 2022

Patchethium commented Jun 14, 2022 • edited Loading

Hiroshiba commented Jun 16, 2022 • edited Loading

Hiroshiba commented Jun 16, 2022 • edited Loading

Patchethium commented Jun 24, 2022

Hiroshiba commented Jun 27, 2022 • edited Loading

Patchethium commented Sep 27, 2022

Hiroshiba commented Sep 30, 2022

Patchethium commented Feb 15, 2023 • edited Loading

Hiroshiba commented Feb 15, 2023

Patchethium commented Feb 15, 2023

sevenc-nanashi commented Apr 20, 2023 • edited Loading

sevenc-nanashi commented Apr 27, 2023 • edited Loading

Hiroshiba commented Feb 9, 2022 •

edited

Loading

HyodaKazuaki commented Feb 12, 2022 •

edited

Loading

Hiroshiba commented Jun 6, 2022 •

edited

Loading

Hiroshiba commented Jun 12, 2022 •

edited

Loading

Hiroshiba commented Jun 12, 2022 •

edited

Loading

Patchethium commented Jun 14, 2022 •

edited

Loading

Hiroshiba commented Jun 16, 2022 •

edited

Loading

Hiroshiba commented Jun 16, 2022 •

edited

Loading

Hiroshiba commented Jun 27, 2022 •

edited

Loading

Patchethium commented Feb 15, 2023 •

edited

Loading

sevenc-nanashi commented Apr 20, 2023 •

edited

Loading

sevenc-nanashi commented Apr 27, 2023 •

edited

Loading