Closed
Bug 1051604
Opened 10 years ago
Closed 9 years ago
Adapt VAD strategy on SpeechRecognition to be less strict on some devices with poor mics
Categories
(Core :: Web Speech, defect)
Core
Web Speech
Tracking
()
Tracking | Status | |
---|---|---|
firefox42 | --- | fixed |
People
(Reporter: anatal, Assigned: anatal)
References
Details
(Whiteboard: [webspeechapi])
Attachments
(1 file, 1 obsolete file)
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715215003
Reporter | ||
Comment 1•10 years ago
|
||
We should benchmark and test pocketsphinx vad instead the current one that is leading to a number of issues.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Updated•10 years ago
|
OS: Linux → All
Hardware: x86_64 → All
Reporter | ||
Updated•10 years ago
|
Assignee | ||
Updated•9 years ago
|
Component: DOM → Web Speech
Assignee | ||
Comment 2•9 years ago
|
||
Attachment #8643529 -
Flags: review?(bugs)
Comment 3•9 years ago
|
||
Just set "media.webspeech.silence_length" and "media.webspeech.long_silence_length" pref to some other
value than the defaults? I don't see need for the magical 3 in the code.
Updated•9 years ago
|
Attachment #8643529 -
Flags: review?(bugs) → review-
Assignee | ||
Comment 4•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #3)
> Just set "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length" pref to some other
> value than the defaults? I don't see need for the magical 3 in the code.
I know, but I tried modify these parameters but had no practical effect since requested_silence_length changes dynamically based on the amount of speech input.
This Vad algorithm is not good enough and I don't understand why it was pick instead https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/common_audio/vad/ that is much wider adopted by the industry and is already on gecko codebase.
(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
>
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input.
What if you set both "media.webspeech.silence_length" and "media.webspeech.long_silence_length"
to 3 times their current values?
Comment 6•9 years ago
|
||
(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
>
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input.
I don't understand. requested_silence_length is set to be either
long_speech_input_complete_silence_length_us_ or speech_input_complete_silence_length_us_
and both those variables are set based on the prefs.
> This Vad algorithm is not good enough and I don't understand why it was pick
> instead
> https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> common_audio/vad/ that is much wider adopted by the industry and is already
> on gecko codebase.
IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc code in tree. Though, webrtc and speech API code did land around the same time.
But feel free to make speech API to use the same code as what webrtc uses.
Assignee | ||
Comment 7•9 years ago
|
||
Changed the patch to set default value of the preferences: PREFERENCE_ENDPOINTER_SILENCE_LENGTH and PREFERENCE_ENDPOINTER_LONG_SILENCE_LENGTH
Attachment #8643529 -
Attachment is obsolete: true
Attachment #8644114 -
Flags: review?(bugs)
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #6)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> >
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input.
> I don't understand. requested_silence_length is set to be either
> long_speech_input_complete_silence_length_us_ or
> speech_input_complete_silence_length_us_
> and both those variables are set based on the prefs.
>
>
Ok, I set there.
>
> > This Vad algorithm is not good enough and I don't understand why it was pick
> > instead
> > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> > common_audio/vad/ that is much wider adopted by the industry and is already
> > on gecko codebase.
> IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc
> code in tree. Though, webrtc and speech API code did land around the same
> time.
> But feel free to make speech API to use the same code as what webrtc uses.
Unfortunately we don't enough have time to do so deep change, so better keep it and change the parameters.
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to kdavis from comment #5)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> >
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input.
>
>
> What if you set both "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length"
> to 3 times their current values?
Thank you Kelly. Yes, I did that. I changed to 2.5 times since 3 times was taking too long to decode.
Assignee | ||
Comment 10•9 years ago
|
||
Try for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82944aa09865
Comment 11•9 years ago
|
||
Comment on attachment 8644114 [details] [diff] [review]
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech
This is fine too, though I wonder why you don't just set the prefs on b2g.js or somewhere.
Attachment #8644114 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #11)
> Comment on attachment 8644114 [details] [diff] [review]
> Relax current VAD algorithm increasing the amount of end silence required to
> be input to decree end of speech
>
> This is fine too, though I wonder why you don't just set the prefs on b2g.js
> or somewhere.
Thank you Olli.
Well, I preferred to change the default because this level looks more accurate both on desktop and on the phone, mainly for sequence of digits that is the principal issue we're having with the VAD, so seems better to increase the defaults at all.
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 13•9 years ago
|
||
Btw, for this kinds of changes, where the backend handling is changed, I'm totally fine if
just kdavis reviews your patches (that might be faster when my review load is high).
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #13)
> Btw, for this kinds of changes, where the backend handling is changed, I'm
> totally fine if
> just kdavis reviews your patches (that might be faster when my review load
> is high).
thank you Olli, this gonna be really helpful.
Comment 15•9 years ago
|
||
Keywords: checkin-needed
Comment 16•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox42:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
You need to log in
before you can comment on or make changes to this bug.
Description
•