-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Javascript Implementation of Snowboy #98
Comments
I'll +1 to this issue. It would definitely be awesome to have some front-end javascript based hotword detection system. If i'm correct snowboy and sonus both require node.js and other server side stuff? So basically you can build a standalone Alexa-like hardware tool running Snowboy to detect hotwords from one microphone input (well, I know you can connect many mics and mix them into one channel, but still it'll be pain in the ass compared to web-based ui. I'm writing my own home assistant bot as well, using Python for command processing, and I only use browser as a UI that recognizes speech and sends text commands to the Python Flask server. I chose this approach, because this way I can just put a few cheap android or windows tablets around the house, instead of dealing with and mixing a lot of microphones routed to one pc, or building multiple RPi 'assistants'. It also allows me to use my AI when I'm not at home. So it makes it more like Cortana\OkGoogle\Alexa server. So I'm really curious about how to detect hotwords with browser-side JS. |
It's not impossible to turn a C++ library/binary into javascript, e.g., I've done this for For Snowboy, however, there will be a lot of difficulties. E.g., how can we turn the CBLAS functions we use in Snowboy into Javascript? Also, Javascript basically means open sourcing it (well I mean the source code not just the library), so it's also a decision to make on our side... I'll leave this issue open for a while. |
"You can use f2c to convert the BLAS/LAPACK code to c and then it
compiles straightforwardly with emscripten... The GNU Scientific Library
has a c implementation of BLAS, as well as a whole load of other useful
stuff, and it also compiles well with emscripten"
https://groups.google.com/d/msg/emscripten-discuss/4Qt1OXKCKrk/0ETZBsbFVxwJ
Do not confuse open source licensing with source readability. Just
because someone one can do something does not mean they are allowed to
do so. The topics of licensing and source code availability are
orthogonal. If a customer has read-only access to source, support costs
go down. Right to modify for internal use could cost more.
Tiered licensing can have a no cost or low cost entry tier, with
successively more expensive tiers. The technique of using licensing as a
marketing device has worked for many companies over the years. A startup
that uses a free tier to launch is going to be able to pay as they gain
traction. Those same companies would be unable to pay up front.
Competitors that do not adopt this approach would be at a disadvantage.
A dual license is also popular. This would provide widest possible
distribution at no cost. Once usage by an organization or project grows
beyond some metric, payments would be required.
The market for handsfree voice control applications will explode
throughout 2017. An open source library of this type is inevitable. Will
it be yours, or a competitor?
Mike
|
Thanks @mslinn for the detailed writeup! We do have algorithms that we don't want to release to the public yet. If it's just a implementation of something well known, then as you suggested licensing should solve the issue. That's why I said "it's also a decision to make on our side". |
A JS obfuscator might help. Yes, obfuscators can be cracked. I believe your existing code is equally subject to reverse engineering. Make it easy to keep regular folk honest. Those bent on criminal behavior won't be deterred from hacking your existing product. |
@chenguoguo I'd be very willing to help with this if you choose to take that route :) |
@evancohen I'm seriously considering this, but no decision yet :-) so I'll leave this up for a while. CBLAS functions like sgemm usually require quite some optimization at assembly code level, and generic implementations of the those functions can be very slow. So I'm also not sure how this will turn out. |
Weblas claims to have "performance comparable to native". That might be a good place to start. Having a truly cross-platform version of snowboy would be amazing! Also, I'm just throwing this out there because I'm not sure how it handles libraries like CBLAS, but another option would be to use a Native Client. Unfortunately this would really only work in Chrome/Chromium, and having attempted to create one in the past, has its own drawbacks. |
How about WebAssembly? |
Any update on this,feature. Is this coming soon? |
Also looking for updates on this feature? |
Not yet. |
need this too |
This feature would be a big help. |
I would like to work on this. Could you guide me on how to do this. |
You would probably need the source of this library, which is not open source. |
So will this feature ever come ? |
We don't have resources for that at this point, and we put it in low priority category... |
As a workaround I stream Html5 webaudio to a Django (python) websever via a websocket and convert the webaudio to wav and feed it into snowboy. I'll share the code if there is any interest in it |
yes do share the code, it would a great help.
…On Sat, Aug 18, 2018 at 7:24 PM Richard Tier ***@***.***> wrote:
As a workaround I stream Html5 webaudio to a Django (python) websever via
a websocket and convert the webaudio to wav and feed it into snowboy.
I'll share the code if there is any interest in it
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#98 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ALjL_LBBLWjmFJxAbt4sGg1fP7OgnX1mks5uSByRgaJpZM4LYne0>
.
--
Gautham Santhosh
|
The webaudio to wav converter: https://github.com/richtier/voice-command-lifecycle/blob/1e03fb8e434a4ad86532c59952bcce91d82eca35/command_lifecycle/helpers.py#L14 I use this approach to use a browser as an Alexa sim: https://github.com/richtier/alexa-browser-client You can fork alexa-browser-client and change the behaviour of here https://github.com/richtier/alexa-browser-client/blob/2ec5b215e4fd263b37b1c2526431a819d61aed84/alexa_browser_client/alexa_browser_client/helpers.py#L27 |
@richtier This is a good workaround, and I thought about it too. Thanks for sharing your implementation! It's awesome. However one downside to this, which is the reason why I did not pursue this approach much, is that it means that audio will be transmitted almost constantly to a webserver, polluting local network. It may seem like not that much traffic, but add multiple clients and combine it with regular user's traffic and other things and it starts to look worse. Especially if it's noisy or if there's music playing, so it's not enough to filter it by sound level. It certainly is better than using google's speech recognition for hotword :D But much worse than client-side hotword detection. |
@Nixellion good point. To quantify that, currently every 0.37 seconds approx 0.5 megabytes ( Given that 162 payloads of 533800 bytes are sent per minute, that's:
For context, watching Netflix uses 2.5GB of data per hour, albeit external data from outside the local network (so a somewhat useful benchmark). I'm planning on having at least four devices connected. 500 gigabytes of extra internal network usage looks bad, certainly, it will encourage me to optimize it by not transmitting "silence". In my case, only two people are in the house, and we close doors so normally only one device will be active at any given time. I considered compressing the audio before transmitting it. A cursory gzipping actually increases the size. That's probably because compressing random data is "hard". Perhaps we can compress by reducing the range from 32 bit too 16 bit somehow. Won't be lossless compression, but as long as it's good enough for snowboy to understand it should be good enough. That would cut the payload size in half. I wonder though if high internal traffic is too much of a bad thing? Might require a router upgrade? Maybe increase electricity usage? |
Another way to reduce network usage would be to use a lower Samplerate and bit depth and only transmit 1 channel. Snowboy only accepts audio with 16k Samplerate, 16 bit and 1 channel anyway, so there is no reason to transmit higher quality. 16k * 16bit(2bytes) * 1 channel = 32kBytes/second uncompressed. You could use for example flac.js* to compress it even further but 32k/s seems low enough for an internal network. 32k/s = 1,875M/minute = 112,5M/h = 2.4G per day This, however, is only a solution if you control both client and server. A public website that streams all of my microphone input to a remote server (even if the intent is honest) would be a website I would never visit again. |
@Thalhammer good tip. thanks! My use case is indeed for my internal network. I'm using a browser as the human interface for controlling my smart home - primarily via voice commands. I'm not CIA :) How would lowering the Samplerate help? no matter how small we cut a pizza, we're still left with one pizza. |
@richtier Internal traffic usage is not THAT bad, it will certainly put much more work on the router, though. It's about how much it will saturate your network. If you're watching TV over network, play games, watch some youtube and download something - so if its uncompressed like that it may slow down other things on your network, add some ping or lag, etc. I'm no expert here though. You can also consider filtering low frequencies, as they transmit through walls and such |
@Nixellion that makes sense, thanks. |
I now do the webaudio to wav encoding client side: ircleci.com/gh/uktrade. This reduced data transfer by 90% |
Any update ? |
@richtier I am
|
I found this helpful as an alternative. |
A version of Snowboy that could run in most popular web browsers would be really great!
The text was updated successfully, but these errors were encountered: