WebSpeech API with offline STT via DeepSpeech
Categories
(Core :: Web Speech, enhancement)
Tracking
()
People
(Reporter: tim.langhorst, Unassigned)
References
(Blocks 2 open bugs, )
Details
There is ongoing work to implement the WebSpeech API using an online service, but this is about implementing it using on-device recognition. This has multiple advantages: * Working when offline / using slow internet connection * Faster interference * Everything is kept securely on the device * Lower cost of interference servers at Mozilla But it might be too much for embedded devices and not as accurate I think platforms that should be implemented at a minimum are: arm64 (for newer phones that got enough performance), arm64/x86_64 (32bit is dead) and OS wise: Windows, OSX, Linux, Android, IOS It shouldn't be that bad to exclude some old/not much used platforms, because this is just an enhancement and there is the online interference as fallback.
Reporter | ||
Comment 1•6 years ago
|
||
And about the memory and storage requirements I think it should be downloaded on the fly when a user first uses the API (or gets asked like with DRM and it should also be downloadable in settings) and then should adjust to the device, so for a phone/tablet more like 200MB memory an 50MB storage and for a desktop more like 200MB storage and 1GB memory. There should be something like 5+ Versions for different hardware capabilities.
Reporter | ||
Comment 2•6 years ago
|
||
And instead of "arm64/x86_64" is meant "amd64/x86_64"
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 3•6 years ago
|
||
I should also mention that this is only what I think is reasonable from a normal people's perspective, I'm not into source code for Mozilla
Reporter | ||
Updated•6 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 5•3 years ago
|
||
I have followed the procedure laid out in this section: https://wiki.mozilla.org/Web_Speech_API_-_Speech_Recognition#How_can_I_test_with_Deep_Speech.3F , which says to set media.webspeech.service.endpoint
to https://dev.speaktome.nonprod.cloudops.mozgcp.net/
.
The results are very inaccurate, and would be unacceptable by today's standards. I am not just trying to lambaste a product I know the DeepSpeech people have invested much of their time and energies in, and I'm genuinely appreciative of. I am just wondering if the article is out of date, and there is a different endpoint that can be set which will give more accurate results.
I am comparing this to using with media.webspeech.service.endpoint
pref removed (which I believe would then be using the OS native service,) where accuracy is very high, and more what I would expect. The point being, I do not think this is an issue with the audio hardware involved.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•