WeNet 3.0 Roadmap #1192

robin1001 · 2022-06-02T14:05:12Z

If you are interest in WeNet 3.0, please see our roadmap https://github.com/wenet-e2e/wenet/blob/main/ROADMAP.md, add discuss here.

WeNet is a community-driven project and we love your feedback and proposals on where we should be heading.
Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).

Mddct · 2022-06-10T01:41:44Z

Do we have plan to enhance post processing like puct restoring

robin1001 · 2022-06-10T01:46:53Z

Yes, ITN, punctuation is in our plan, and the solution should be simple and elegant.

icyda17 · 2022-06-13T02:48:48Z

Do you have plan to implement Text to speech models?

Mddct · 2022-06-13T03:33:10Z

Do you have plan to implement Text to speech models?

I found this： https://github.com/wenet-e2e/wetts

Mddct · 2022-06-18T08:17:20Z

For binding：there is 3 questions：

1 get model by language type， so if we can supply small and big model for each language？（the small model could be trained with kd method）

2 shrink libtorch， now it is little big，but libtorch has many backend like mkl or openmp，it not easy make it small only through passing compile argument。 it seems we need open a repo to do this？

3 For advance usage，should we open more api for other language developer like onnx model

lubacien · 2022-06-21T09:29:49Z

Hello, any plan for a VAD in the future?

pehonnet · 2022-07-19T15:57:52Z

Hi, do you have plans to introduce some text only domain adaptation methods? Or do you have any suggestions on the topic?

pengzhendong · 2022-07-20T02:35:33Z

Hello, any plan for a VAD in the future?

Under testing...

fengshi-cherish · 2022-07-26T08:15:03Z

Hello, any plan for a VAD in the future?

Under testing...

new architure？ any paper for reference？

pengzhendong · 2022-07-26T08:32:03Z

new architure？ any paper for reference？

The server-side is the old architure with a smaller acoustic unit. But we don't need the force alignment.
The idea is the same as the endpoint of wenet runtime.

From @Mddct:
Plz see: An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

The network outputs directly to characters in the alphabet including the blank and space characters.

lucasjinreal · 2022-08-02T07:47:24Z

@robin1001 Hi, macoS m1 can not built.

Mddct · 2022-08-17T02:00:14Z

Moree data augment like rir

pytorch/audio#2624

torchaudio will add multi channel riri based on pyroomacoustics

robin1001 · 2022-08-17T07:46:18Z

Moree data augment like rir

pytorch/audio#2624

torchaudio will add multi channel riri based on pyroomacoustics

Great!

BSen007 · 2022-10-11T05:43:54Z

what about adding time-stamp per word? The current completion does not seems accurate with 100ms fixed width.

StuartIanNaylor · 2022-11-26T01:06:03Z

Really exciting to see the Raspberry Pi on the 3.0 roadmap but think there are now so many great Aarch64 platforms that often have friendlier OpenCL capable GPU's and NPU's that maybe not just focus on Raspberry and with stock availability things are not looking good.

It would be great to see a losely coupled system where you can mix and match preference of any vedor modules to make up a voice system.
A saw what you did with websockets and grpc and on the output of each module all is required is a simple queue and route to the next module with keeping any framework specifics between them to an absolute minimum.

Linux needs standalone standard conf modules that are weakly linked by queues.

There is a natural serial queue to voice processing:-

1... Mic/kws input
2... Mic/kws & speech enhancement server
3... ASR
4... NLP Skill router
5... TTS
6... Audio server
7... Skill Server(s)
2 – 5 can work in a really simplistic manner where either audio & metadata or text & metadata or queued until the process in front is clear and then sent.

Thats it in a nutshell how simple a native Linux voice can be as its just a series of queues and keeping it simple with Native Linux methods than embedded programming means its scalable to the complex.

Each Mic/KWS is allocated to a zone (room) and channel which should remain /etc/conf linux file system that likely mirrors the zone & channel of the audio system outputs
As distributed mic/kws can connect to a Mic/kws & speech enhancement server and on KW hit the best stream of that zone of the KWS argmax is selected.
The Mic/kws & speech enhancement server receives both audio and metadata transcribes to audio but merely passes on the metadata to a skill router.
A Skill router connects to skill servers to collect simple entity data by basic NLP matching of predicate and subject to route to a skill server again purely forwarding metadata
The Skill router will also accept text from skill servers that return metadata so the TTS will forward audio to the correct zone & channel also on completion the calling skill server is added to the metadata and forwarded back to the Mic/kws speech enhancement server to initiate a non kws mic broadcast.
Again the chain starts again and because the initiate skill server metadata is included the skill server knows that transcription dialog destination.
Thats it and you can add multiple routes at any stage to multiple instances so that it scales.

robin1001 · 2022-11-26T03:22:23Z

Sorry, I don't get the point. wenet focuses on ASR. It should be easy to integrate wenet in your system if ASR is required for your system,

StuartIanNaylor · 2022-11-26T05:25:14Z

Sorry, I don't get the point. wenet focuses on ASR. It should be easy to integrate wenet in your system if ASR is required for your system,

You have wekws & wetts aswell ?

robin1001 · 2022-11-26T06:03:16Z

Sorry, I don't get the point. wenet focuses on ASR. It should be easy to integrate wenet in your system if ASR is required for your system,

You have wekws & wetts aswell ?

Yes.

rookie0607 · 2023-03-10T10:19:25Z

For binding：there is 3 questions：

1 get model by language type， so if we can supply small and big model for each language？（the small model could be trained with kd method）

2 shrink libtorch， now it is little big，but libtorch has many backend like mkl or openmp，it not easy make it small only through passing compile argument。 it seems we need open a repo to do this？

3 For advance usage，should we open more api for other language developer like onnx model

Is it convenient for you to provide the code for knowledge distillation based on wenet

xingchensong · 2023-03-10T10:22:39Z

For binding：there is 3 questions：

1 get model by language type， so if we can supply small and big model for each language？（the small model could be trained with kd method）

2 shrink libtorch， now it is little big，but libtorch has many backend like mkl or openmp，it not easy make it small only through passing compile argument。 it seems we need open a repo to do this？

3 For advance usage，should we open more api for other language developer like onnx model

Update: For 2, we now support ort backend in wenetruntime

#1708

robin1001 pinned this issue Jun 2, 2022

robin1001 unpinned this issue Feb 8, 2023

xingchensong added the documentation Improvements or additions to documentation label Feb 21, 2023

github-actions bot added the Stale label Jan 21, 2024

github-actions bot closed this as completed Jan 29, 2024

Mddct reopened this Jan 29, 2024

github-actions bot removed the Stale label Jan 30, 2024

github-actions bot added the Stale label Mar 31, 2024

github-actions bot closed this as completed Apr 7, 2024

Mddct reopened this Apr 7, 2024

github-actions bot removed the Stale label Apr 8, 2024

github-actions bot added the Stale label Jun 7, 2024

github-actions bot closed this as completed Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeNet 3.0 Roadmap #1192

WeNet 3.0 Roadmap #1192

robin1001 commented Jun 2, 2022 •

edited

Loading

Mddct commented Jun 10, 2022

robin1001 commented Jun 10, 2022

icyda17 commented Jun 13, 2022

Mddct commented Jun 13, 2022 •

edited

Loading

Mddct commented Jun 18, 2022 •

edited

Loading

lubacien commented Jun 21, 2022

pehonnet commented Jul 19, 2022

pengzhendong commented Jul 20, 2022

fengshi-cherish commented Jul 26, 2022

pengzhendong commented Jul 26, 2022 •

edited

Loading

lucasjinreal commented Aug 2, 2022

Mddct commented Aug 17, 2022

robin1001 commented Aug 17, 2022

BSen007 commented Oct 11, 2022

StuartIanNaylor commented Nov 26, 2022

robin1001 commented Nov 26, 2022

StuartIanNaylor commented Nov 26, 2022

robin1001 commented Nov 26, 2022

rookie0607 commented Mar 10, 2023

xingchensong commented Mar 10, 2023

WeNet 3.0 Roadmap #1192

WeNet 3.0 Roadmap #1192

Comments

robin1001 commented Jun 2, 2022 • edited Loading

Mddct commented Jun 10, 2022

robin1001 commented Jun 10, 2022

icyda17 commented Jun 13, 2022

Mddct commented Jun 13, 2022 • edited Loading

Mddct commented Jun 18, 2022 • edited Loading

lubacien commented Jun 21, 2022

pehonnet commented Jul 19, 2022

pengzhendong commented Jul 20, 2022

fengshi-cherish commented Jul 26, 2022

pengzhendong commented Jul 26, 2022 • edited Loading

lucasjinreal commented Aug 2, 2022

Mddct commented Aug 17, 2022

robin1001 commented Aug 17, 2022

BSen007 commented Oct 11, 2022

StuartIanNaylor commented Nov 26, 2022

robin1001 commented Nov 26, 2022

StuartIanNaylor commented Nov 26, 2022

robin1001 commented Nov 26, 2022

rookie0607 commented Mar 10, 2023

xingchensong commented Mar 10, 2023

robin1001 commented Jun 2, 2022 •

edited

Loading

Mddct commented Jun 13, 2022 •

edited

Loading

Mddct commented Jun 18, 2022 •

edited

Loading

pengzhendong commented Jul 26, 2022 •

edited

Loading