Closed Bug 1474084 Opened 7 years ago Closed 8 months ago

WebSpeech API with offline STT via DeepSpeech

Categories

(Core :: Web Speech, enhancement)

enhancement

Tracking

()

RESOLVED WONTFIX

People

(Reporter: tim.langhorst, Unassigned)

References

(Blocks 2 open bugs, )

Details

(Keywords: webcompat:platform-bug)

There is ongoing work to implement the WebSpeech API using an online service, but this is about implementing it using on-device recognition. This has multiple advantages: * Working when offline / using slow internet connection * Faster interference * Everything is kept securely on the device * Lower cost of interference servers at Mozilla But it might be too much for embedded devices and not as accurate I think platforms that should be implemented at a minimum are: arm64 (for newer phones that got enough performance), arm64/x86_64 (32bit is dead) and OS wise: Windows, OSX, Linux, Android, IOS It shouldn't be that bad to exclude some old/not much used platforms, because this is just an enhancement and there is the online interference as fallback.
And about the memory and storage requirements I think it should be downloaded on the fly when a user first uses the API (or gets asked like with DRM and it should also be downloadable in settings) and then should adjust to the device, so for a phone/tablet more like 200MB memory an 50MB storage and for a desktop more like 200MB storage and 1GB memory. There should be something like 5+ Versions for different hardware capabilities.
And instead of "arm64/x86_64" is meant "amd64/x86_64"
Component: General → Web Speech
Product: Firefox → Core
Target Milestone: Future → ---
Version: unspecified → Trunk
See Also: → 1244460
Depends on: 1248897
Depends on: 1392065
I should also mention that this is only what I think is reasonable from a normal people's perspective, I'm not into source code for Mozilla
Severity: normal → enhancement
Assignee: nobody → lissyx+mozillians
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

I have followed the procedure laid out in this section: https://wiki.mozilla.org/Web_Speech_API_-_Speech_Recognition#How_can_I_test_with_Deep_Speech.3F , which says to set media.webspeech.service.endpoint to https://dev.speaktome.nonprod.cloudops.mozgcp.net/ .

The results are very inaccurate, and would be unacceptable by today's standards. I am not just trying to lambaste a product I know the DeepSpeech people have invested much of their time and energies in, and I'm genuinely appreciative of. I am just wondering if the article is out of date, and there is a different endpoint that can be set which will give more accurate results.

I am comparing this to using with media.webspeech.service.endpoint pref removed (which I believe would then be using the OS native service,) where accuracy is very high, and more what I would expect. The point being, I do not think this is an issue with the audio hardware involved.

Status: ASSIGNED → NEW
Assignee: lissyx+mozillians → nobody
Severity: normal → S3

Deepspeech project is abandoned since 3 years so maybe this ticket can be closed?

you're right we should WONTFIX it, I kept it alive so far in hope someone would continue, but Coqui being defunct now as well ...

Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.