Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript Implementation of Snowboy #98

Open
mslinn opened this issue Dec 31, 2016 · 31 comments
Open

Javascript Implementation of Snowboy #98

mslinn opened this issue Dec 31, 2016 · 31 comments

Comments

@mslinn
Copy link

mslinn commented Dec 31, 2016

A version of Snowboy that could run in most popular web browsers would be really great!

@Nixellion
Copy link

I'll +1 to this issue. It would definitely be awesome to have some front-end javascript based hotword detection system. If i'm correct snowboy and sonus both require node.js and other server side stuff? So basically you can build a standalone Alexa-like hardware tool running Snowboy to detect hotwords from one microphone input (well, I know you can connect many mics and mix them into one channel, but still it'll be pain in the ass compared to web-based ui.

I'm writing my own home assistant bot as well, using Python for command processing, and I only use browser as a UI that recognizes speech and sends text commands to the Python Flask server.

I chose this approach, because this way I can just put a few cheap android or windows tablets around the house, instead of dealing with and mixing a lot of microphones routed to one pc, or building multiple RPi 'assistants'. It also allows me to use my AI when I'm not at home. So it makes it more like Cortana\OkGoogle\Alexa server.

So I'm really curious about how to detect hotwords with browser-side JS.
Not feeling like writing a standalone app for this yet :)

@chenguoguo
Copy link
Collaborator

It's not impossible to turn a C++ library/binary into javascript, e.g., I've done this for sox with Emscripten.

For Snowboy, however, there will be a lot of difficulties. E.g., how can we turn the CBLAS functions we use in Snowboy into Javascript? Also, Javascript basically means open sourcing it (well I mean the source code not just the library), so it's also a decision to make on our side...

I'll leave this issue open for a while.

@mslinn
Copy link
Author

mslinn commented Jan 5, 2017 via email

@chenguoguo
Copy link
Collaborator

Thanks @mslinn for the detailed writeup! We do have algorithms that we don't want to release to the public yet. If it's just a implementation of something well known, then as you suggested licensing should solve the issue. That's why I said "it's also a decision to make on our side".

@mslinn
Copy link
Author

mslinn commented Jan 5, 2017

A JS obfuscator might help. Yes, obfuscators can be cracked. I believe your existing code is equally subject to reverse engineering. Make it easy to keep regular folk honest. Those bent on criminal behavior won't be deterred from hacking your existing product.

@evancohen
Copy link
Contributor

@chenguoguo I'd be very willing to help with this if you choose to take that route :)

@chenguoguo
Copy link
Collaborator

@evancohen I'm seriously considering this, but no decision yet :-) so I'll leave this up for a while.

CBLAS functions like sgemm usually require quite some optimization at assembly code level, and generic implementations of the those functions can be very slow. So I'm also not sure how this will turn out.

@evancohen
Copy link
Contributor

Weblas claims to have "performance comparable to native". That might be a good place to start. Having a truly cross-platform version of snowboy would be amazing!

Also, I'm just throwing this out there because I'm not sure how it handles libraries like CBLAS, but another option would be to use a Native Client. Unfortunately this would really only work in Chrome/Chromium, and having attempted to create one in the past, has its own drawbacks.

@Thalhammer
Copy link

Thalhammer commented May 8, 2017

How about WebAssembly?
It's far easier to port existing C++ code, faster than JS and would work in most major browsers.
Your algorithms would be protected no less than now.

@gauthamzz
Copy link

Any update on this,feature. Is this coming soon?

@HeyFood
Copy link

HeyFood commented Jan 30, 2018

Also looking for updates on this feature?

@chenguoguo
Copy link
Collaborator

Not yet.

@marcus7777
Copy link

need this too

@cfmaley3
Copy link

This feature would be a big help.

@gauthamzz
Copy link

I would like to work on this. Could you guide me on how to do this.

@Thalhammer
Copy link

You would probably need the source of this library, which is not open source.

@gauthamzz
Copy link

So will this feature ever come ?

@chenguoguo
Copy link
Collaborator

We don't have resources for that at this point, and we put it in low priority category...

@richtier
Copy link

As a workaround I stream Html5 webaudio to a Django (python) websever via a websocket and convert the webaudio to wav and feed it into snowboy.

I'll share the code if there is any interest in it

@gauthamzz
Copy link

gauthamzz commented Aug 18, 2018 via email

@Nixellion
Copy link

Nixellion commented Aug 19, 2018

@richtier This is a good workaround, and I thought about it too. Thanks for sharing your implementation! It's awesome.

However one downside to this, which is the reason why I did not pursue this approach much, is that it means that audio will be transmitted almost constantly to a webserver, polluting local network. It may seem like not that much traffic, but add multiple clients and combine it with regular user's traffic and other things and it starts to look worse. Especially if it's noisy or if there's music playing, so it's not enough to filter it by sound level. It certainly is better than using google's speech recognition for hotword :D But much worse than client-side hotword detection.

@richtier
Copy link

richtier commented Aug 20, 2018

@Nixellion good point.

To quantify that, currently every 0.37 seconds approx 0.5 megabytes (533800 bytes) are sent to the server.

Given that 162 payloads of 533800 bytes are sent per minute, that's:

  • 86.5 megabytes a minute (86475600 bytes)
  • 5.2 gigabytes per hour (5188536000 bytes)
  • 124.5 gigabytes per day (124524864000 bytes)

For context, watching Netflix uses 2.5GB of data per hour, albeit external data from outside the local network (so a somewhat useful benchmark).

I'm planning on having at least four devices connected. 500 gigabytes of extra internal network usage looks bad, certainly, it will encourage me to optimize it by not transmitting "silence". In my case, only two people are in the house, and we close doors so normally only one device will be active at any given time.

I considered compressing the audio before transmitting it. A cursory gzipping actually increases the size. That's probably because compressing random data is "hard".

Perhaps we can compress by reducing the range from 32 bit too 16 bit somehow. Won't be lossless compression, but as long as it's good enough for snowboy to understand it should be good enough. That would cut the payload size in half.

I wonder though if high internal traffic is too much of a bad thing? Might require a router upgrade? Maybe increase electricity usage?

@Thalhammer
Copy link

Another way to reduce network usage would be to use a lower Samplerate and bit depth and only transmit 1 channel. Snowboy only accepts audio with 16k Samplerate, 16 bit and 1 channel anyway, so there is no reason to transmit higher quality.

16k * 16bit(2bytes) * 1 channel = 32kBytes/second uncompressed. You could use for example flac.js* to compress it even further but 32k/s seems low enough for an internal network.

32k/s = 1,875M/minute = 112,5M/h = 2.4G per day

This, however, is only a solution if you control both client and server. A public website that streams all of my microphone input to a remote server (even if the intent is honest) would be a website I would never visit again.

@richtier
Copy link

richtier commented Aug 20, 2018

@Thalhammer good tip. thanks!

My use case is indeed for my internal network. I'm using a browser as the human interface for controlling my smart home - primarily via voice commands. I'm not CIA :)

How would lowering the Samplerate help? no matter how small we cut a pizza, we're still left with one pizza.

@Nixellion
Copy link

@richtier Internal traffic usage is not THAT bad, it will certainly put much more work on the router, though. It's about how much it will saturate your network. If you're watching TV over network, play games, watch some youtube and download something - so if its uncompressed like that it may slow down other things on your network, add some ping or lag, etc. I'm no expert here though.

You can also consider filtering low frequencies, as they transmit through walls and such

@richtier
Copy link

richtier commented Aug 23, 2018

@Nixellion that makes sense, thanks.

@richtier
Copy link

richtier commented Sep 18, 2018

I now do the webaudio to wav encoding client side: ircleci.com/gh/uktrade.

This reduced data transfer by 90%

@scredii
Copy link

scredii commented May 21, 2019

Any update ?

@Sushantmkarande
Copy link

@richtier I am

> As a workaround I stream Html5 webaudio to a Django (python) websever via a websocket and convert the webaudio to wav and feed it into snowboy.
I did not find any html code which which serves this purpose can you please help me this...

@CT83
Copy link

CT83 commented Apr 5, 2020

I found this helpful as an alternative.

https://github.com/TalAter/annyang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests