Are ONNX Models Necessary? #126

snakers4 · 2021-11-30T16:13:06Z

snakers4
Nov 30, 2021
Maintainer

In a few days we will be radically changing the models:

Probably dropping ONNX VAD models (we have not decided yet);
Reducing chunk size to 30ms (chunk will be flexible, but larger than 30ms);
Removing separate 8 / 16 kHz models, now all models would work with 8 and 16 kHz;
Most likely deprecating micro, mini and ordinary models in favor of just a mini-sized models (still running last experiments);
New models will be compatible with mobile builds of PyTorch;
Dropping the batched buffering approach we used because of large chunks;

i.e. we will be radically simplifying everything.

We have seen limited use of ONNX models, so therefore I am asking.

Kai-Karren · 2021-11-30T18:56:22Z

Kai-Karren
Nov 30, 2021

I think it should be possible to load PyTorch models in most languages like e.g. with DJL from Amazon for Java.

Also, I am not sure, but isn't it possible to convert load pretrained PyTorch models and to convert them to ONNX models anytime? So if someone needs them he could do the conversion by himself. I remember that I read something like that, but not sure about the details or if there were requirements.

I have not used the ONNX models though. Maybe someone will respond that has used the ONNX models and could explain why it would be good if they would remain 😄

0 replies

snakers4 · 2021-11-30T19:15:41Z

snakers4
Nov 30, 2021
Maintainer Author

ONNX models are created from nn.Module or JIT models in torch.

The key issue for me here is not really maintaining or creating the models, but since we have been able to radically simplify everything, just leaving one VAD model for everything seems enticing (different SR, variable chunk size, much less code for iterators, etc).

The problem is that ONNX does not support if statements and new JIT models have one if statement inside.

0 replies

khursani8 · 2021-11-30T22:27:23Z

khursani8
Nov 30, 2021

I thought we can export ONNX model if use JIT script models even though got if statement?

I prefer ONNX model because don't want to install torch in my prod env

0 replies

snakers4 · 2021-12-01T03:24:22Z

snakers4
Dec 1, 2021
Maintainer Author

I thought we can export ONNX model if use JIT script models even though got if statement?

Can you provide a minimal working export example with if?

0 replies

khursani8 · 2021-12-01T08:39:45Z

khursani8
Dec 1, 2021

I thought we can export ONNX model if use JIT script models even though got if statement?

Can you provide a minimal working export example with if?

Sure, https://colab.research.google.com/drive/1sl33VBqT8fy46zyrhU1gVjWJOKw0Ya1i?usp=sharing

0 replies

khursani8 · 2021-12-01T08:49:38Z

khursani8
Dec 1, 2021

In case you guys don't want to share onnx anymore, can share jit model without quantization?
Currently I cannot export text enhancement model to onnx because of quantization operation

0 replies

snakers4 · 2021-12-01T08:51:27Z

snakers4
Dec 1, 2021
Maintainer Author

Currently I cannot export text enhancement model to onnx because of quantization operation

Text enhancement has a lot of logic under the hood, so no ONNX planned there by design
There is a lot of python inference code tucked in

0 replies

snakers4 · 2021-12-01T08:54:58Z

snakers4
Dec 1, 2021
Maintainer Author

I thought we can export ONNX model if use JIT script models even though got if statement?

Can you provide a minimal working export example with if?

Sure, https://colab.research.google.com/drive/1sl33VBqT8fy46zyrhU1gVjWJOKw0Ya1i?usp=sharing

Nice, many thanks
We discussed this issue with colleagues this morning
This link also helps - https://pytorch.org/docs/stable/onnx.html#tracing-vs-scripting

Tbh, I did not know that control flows are supported, but only in scripting model
We could replicate this on toy examples as well
But with real models we face a cryptic error we cannot yet solve

Probably we may opt for the following solution

vad.jit 8/16 kHz 
vad_q.jit 8/16 kHz 
vad.onnx 16 kHz only

0 replies

khursani8 · 2021-12-01T08:55:37Z

khursani8
Dec 1, 2021

Currently I cannot export text enhancement model to onnx because of quantization operation

Text enhancement has a lot of logic under the hood, so no ONNX planned there by design There is a lot of python inference code tucked in

Yeah, saw a lot of function in it like enhance text, unitoken_into_token, process_unicode etc
I can move it into my inference code, the only issue at exporting the te_model because of quantization operation

0 replies

khursani8 · 2021-12-01T08:58:09Z

khursani8
Dec 1, 2021

I thought we can export ONNX model if use JIT script models even though got if statement?

Can you provide a minimal working export example with if?

Sure, https://colab.research.google.com/drive/1sl33VBqT8fy46zyrhU1gVjWJOKw0Ya1i?usp=sharing

Nice, many thanks We discussed this issue with colleagues this morning This link also helps - https://pytorch.org/docs/stable/onnx.html#tracing-vs-scripting

Tbh, I did not know that control flows are supported, but only in scripting model We could replicate this on toy examples as well But with real models we face a cryptic error we cannot yet solve

Probably we may opt for the following solution
vad.jit 8/16 kHz 
vad_q.jit 8/16 kHz 
vad.onnx 16 kHz only

cool, later when you guys share vad.jit I might try export it to onnx
If I can :p

0 replies

snakers4 · 2021-12-01T09:03:37Z

snakers4
Dec 1, 2021
Maintainer Author

Yeah, saw a lot of function in it like enhance text, unitoken_into_token, process_unicode etc

I can move it into my inference code, the only issue at exporting the te_model because of quantization operation

Tucking this in was a design decision, because we wanted to package and support as little as possible
Also a TE model is a more niche model, so we just decided not to bother with publicly sharing those

0 replies

khursani8 · 2021-12-01T09:06:57Z

khursani8
Dec 1, 2021

Yeah, saw a lot of function in it like enhance text, unitoken_into_token, process_unicode etc

I can move it into my inference code, the only issue at exporting the te_model because of quantization operation

Tucking this in was a design decision, because we wanted to package and support as little as possible Also a TE model is a more niche model, so we just decided not to bother with publicly sharing those

Ohh ok got it
I guess my alternative choice is to train it myself like what you did.
Thanks

0 replies

snakers4 · 2021-12-01T09:13:03Z

snakers4
Dec 1, 2021
Maintainer Author

cool, later when you guys share vad.jit I might try export it to onnx

If I can :p

There are some low level cryptic errors for some reason, basically saying that some data type becomes int64 and fails in ONNX. But we have some custom modules, that we also intentionally tucked in. So if we cannot fix it, we will just say that ONNX works only for 16k.

We want to radically simplify everything and having more than one variety model is a no-no for future. We had to be inventive to have the best quality, you will see in the PR releases after the VAD release.

0 replies

khursani8 · 2021-12-01T09:15:30Z

khursani8
Dec 1, 2021

cool, later when you guys share vad.jit I might try export it to onnx

If I can :p

There are some low level cryptic errors for some reason, basically saying that some data type becomes int64 and fails in ONNX. But we have some custom modules, that we also intentionally tucked in. So if we cannot fix it, we will just say that ONNX works only for 16k.

We want to radically simplify everything and having more than one variety model is a no-no for future. We had to be inventive to have the best quality, you will see in the PR releases after the VAD release.

Ohhh, might be bug in onnx graph code
Ok cannot wait for the release :)

0 replies

snakers4 · 2021-12-07T10:21:28Z

snakers4
Dec 7, 2021
Maintainer Author

In the end we could solve all of the ONNX problems that we could identify (with our custom modules), but ended up with cryptic export errors nonetheless.
We will return to this in future, when we have more time / focus.

Turns out for "if-less" models we just used tracing, which produced horribly looking, but working models.

For models with ifs (they need scripting) we could beat the ONNX limitations and polish our custom modules to be compatible, but ended up with more cryptic errors.

We will have a final module by module debugging session some time 1Q next year.

For now we decided to opt for simplicity and collect feedback.

0 replies

tmacwill · 2021-12-07T21:20:35Z

tmacwill
Dec 7, 2021

Congratulations on the new release, looks really exciting!

We were using the Silero VAD ONNX models via the onnxruntime JavaScript package. It looks like there are a few JavaScript PyTorch/TorchScript bindings that might work (though I haven't tried yet), but they don't look nearly as well-maintained or commonly-used as the ONNX runtime, so would love to have an ONNX model for this new version. The limitations you mention (e.g., 16k only) seem totally reasonable.

Thanks!

3 replies

snakers4 Dec 8, 2021
Maintainer Author

Thanks for your support.
We may just end up keeping only 16 kHz for ONNX, but that seems like a departure from core principles.
I.e. having a million models (_8, _16, _mobile, etc) seems like a chore and will likely bloat in future, which we do not want to repeat.

snakers4 Dec 8, 2021
Maintainer Author

Also it is kind of disheartening that such a fundamental thing as a VAD gains very little traction, despite the fact that basically any voice / automation / transcription service must a proper VAD at its core.

tmacwill Dec 9, 2021

Sounds great, thanks so much! For what it's worth, we're big fans of this project :) Totally agree with you that a high-quality VAD is a really important component of any speech system. The quality and performance of the Silero VAD is significantly better than every other VAD we've tried, so we really appreciate your work!

garymmi · 2021-12-09T11:08:42Z

garymmi
Dec 9, 2021

hope still provide onnx format model
because it can reduce the size of inference library when run on android device
as below table

the size of inference library (libonnxruntime.so) can be shrinked by operators of vad model

2 replies

garymmi Dec 14, 2021

if run on Linux x86_64 platform

the size of libtorch is very huge

snakers4 Dec 14, 2021
Maintainer Author

on x86 most likely you dockerize your app, so torch size is much less of an issue
most SSDs now are 128GB+
but it is large, yes

snakers4 · 2021-12-10T05:14:47Z

snakers4
Dec 10, 2021
Maintainer Author

The above comments make sense. Most likely we will just publish a 16 kHz ONNX model, and postpone a unified model for the time being (or just reserve it only for commercial customers, idk, I do no want more than 1 VAD at the moment, or it will turn into chaos again).

Also a /files folder in the repo needs clean-up, since we are focusing on the VAD itself, kind of hiding other networks that did not really gain traction yet, we would reduce the repo size 20x and treat as a sort of package, because the utils code is rudimentary and with backends other than PyTorch packages do not seem like a worthy investment of time for now.

But who knows, some feedback or help on properly adding a python conda package with Github actions may be appreciated.

0 replies

snakers4 · 2021-12-17T15:27:55Z

snakers4
Dec 17, 2021
Maintainer Author

All of the ONNX runtime adopters, rejoice
https://github.com/snakers4/silero-vad/releases/tag/v3.1

1 reply

tmacwill Dec 17, 2021

Thanks so much for the fast turnaround! Really excited to try these out.

snakers4 · 2021-12-21T11:28:04Z

snakers4
Dec 21, 2021
Maintainer Author

Silero VAD: Support For Sampling Rates Higher Than 16 kHz

jit model now can handle 8, 16, 32 and 48 kHz directly (change implemented within the model itself)
onnx model as well, but only via external wrappers (we just use each n-th sample for higher sampling rates)
This support is mostly a hack, i.e. we just use each n-th sample for higher sampling rates (instead of averaging)

You can also average over every 2 or 3 samples manually and just use 16 kHz if you want.

0 replies

emcodem · 2023-07-28T15:32:03Z

emcodem
Jul 28, 2023

Hm as far as i see the onnx model is still alive but reading this thread makes me worry about if new projects should use onnx or not.
Might i ask why you finally decided to continue supporting innx?

1 reply

snakers4 Jul 29, 2023
Maintainer Author

Hi,

Might i ask why you finally decided to continue supporting innx?

ONNX runtime is backed up by Microsoft, which invests a lot of resources in it.

Are ONNX Models Necessary? #126

snakers4 Nov 30, 2021 Maintainer

Replies: 21 comments · 7 replies

snakers4 Nov 30, 2021 Maintainer Author

snakers4 Dec 1, 2021 Maintainer Author

snakers4 Dec 1, 2021 Maintainer Author

snakers4 Dec 1, 2021 Maintainer Author

snakers4 Dec 1, 2021 Maintainer Author

snakers4 Dec 1, 2021 Maintainer Author

snakers4 Dec 7, 2021 Maintainer Author

snakers4 Dec 8, 2021 Maintainer Author

snakers4 Dec 8, 2021 Maintainer Author

snakers4 Dec 14, 2021 Maintainer Author

snakers4 Dec 10, 2021 Maintainer Author

snakers4 Dec 17, 2021 Maintainer Author

snakers4 Dec 21, 2021 Maintainer Author

Silero VAD: Support For Sampling Rates Higher Than 16 kHz

snakers4 Jul 29, 2023 Maintainer Author

snakers4
Nov 30, 2021
Maintainer

Replies: 21 comments 7 replies

snakers4
Nov 30, 2021
Maintainer Author

snakers4
Dec 1, 2021
Maintainer Author

snakers4
Dec 1, 2021
Maintainer Author

snakers4
Dec 1, 2021
Maintainer Author

snakers4
Dec 1, 2021
Maintainer Author

snakers4
Dec 1, 2021
Maintainer Author

snakers4
Dec 7, 2021
Maintainer Author

snakers4 Dec 8, 2021
Maintainer Author

snakers4 Dec 8, 2021
Maintainer Author

snakers4 Dec 14, 2021
Maintainer Author

snakers4
Dec 10, 2021
Maintainer Author

snakers4
Dec 17, 2021
Maintainer Author

snakers4
Dec 21, 2021
Maintainer Author

snakers4 Jul 29, 2023
Maintainer Author