Adapt VAD strategy on SpeechRecognition to be less strict on some devices with poor mics

RESOLVED FIXED in Firefox 42

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: anatal, Assigned: anatal)

Tracking

(Blocks 2 bugs)

unspecified
mozilla42
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(feature-b2g:2.5+, firefox42 fixed)

Details

(Whiteboard: [webspeechapi])

Attachments

(1 attachment, 1 obsolete attachment)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715215003
Blocks: 1049931
We should benchmark and test  pocketsphinx vad instead the current one that is leading to a number of issues.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: x86_64 → All
Blocks: 1067689
No longer blocks: 1049931
Whiteboard: [webspeechapi]
Assignee: nobody → anatal
Blocks: 1172883
Component: DOM → Web Speech
Blocks: 1180668
Blocks: Meta-Vaani
No longer blocks: 1180668
Blocks: 1185233
Just set "media.webspeech.silence_length" and "media.webspeech.long_silence_length" pref to some other
value than the defaults? I don't see need for the magical 3 in the code.
Attachment #8643529 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #3)
> Just set "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length" pref to some other
> value than the defaults? I don't see need for the magical 3 in the code.

I know, but I tried modify these parameters but had no practical effect since requested_silence_length changes dynamically based on the amount of speech input. 

This Vad algorithm is not good enough and I don't understand why it was pick instead https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/common_audio/vad/ that is much wider adopted by the industry and is already on gecko codebase.
(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
> 
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input. 


What if you set both "media.webspeech.silence_length" and "media.webspeech.long_silence_length"
to 3 times their current values?
(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
> 
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input. 
I don't understand. requested_silence_length is set to be either 
long_speech_input_complete_silence_length_us_ or speech_input_complete_silence_length_us_
and both those variables are set based on the prefs.


 
> This Vad algorithm is not good enough and I don't understand why it was pick
> instead
> https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> common_audio/vad/ that is much wider adopted by the industry and is already
> on gecko codebase.
IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc code in tree. Though, webrtc and speech API code did land around the same time.
But feel free to make speech API to use the same code as what webrtc uses.
Changed the patch to set default value of the preferences: PREFERENCE_ENDPOINTER_SILENCE_LENGTH and PREFERENCE_ENDPOINTER_LONG_SILENCE_LENGTH
Attachment #8643529 - Attachment is obsolete: true
Attachment #8644114 - Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] from comment #6)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> > 
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input. 
> I don't understand. requested_silence_length is set to be either 
> long_speech_input_complete_silence_length_us_ or
> speech_input_complete_silence_length_us_
> and both those variables are set based on the prefs.
> 
> 

Ok, I set there.

>  
> > This Vad algorithm is not good enough and I don't understand why it was pick
> > instead
> > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> > common_audio/vad/ that is much wider adopted by the industry and is already
> > on gecko codebase.
> IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc
> code in tree. Though, webrtc and speech API code did land around the same
> time.
> But feel free to make speech API to use the same code as what webrtc uses.

Unfortunately we don't enough have time to do so deep change, so better keep it and change the parameters.
(In reply to kdavis from comment #5)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> > 
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input. 
> 
> 
> What if you set both "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length"
> to 3 times their current values?

Thank you Kelly. Yes, I did that. I changed to 2.5 times since 3 times was taking too long to decode.
Comment on attachment 8644114 [details] [diff] [review]
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech

This is fine too, though I wonder why you don't just set the prefs on b2g.js or somewhere.
Attachment #8644114 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #11)
> Comment on attachment 8644114 [details] [diff] [review]
> Relax current VAD algorithm increasing the amount of end silence required to
> be input to decree end of speech
> 
> This is fine too, though I wonder why you don't just set the prefs on b2g.js
> or somewhere.

Thank you Olli.

Well, I preferred to change the default because this level looks more accurate both on desktop and on the phone, mainly for sequence of digits that is the principal issue we're having with the VAD, so seems better to increase the defaults at all.
Keywords: checkin-needed
Btw, for this kinds of changes, where the backend handling is changed, I'm totally fine if
just kdavis reviews your patches (that might be faster when my review load is high).
(In reply to Olli Pettay [:smaug] from comment #13)
> Btw, for this kinds of changes, where the backend handling is changed, I'm
> totally fine if
> just kdavis reviews your patches (that might be faster when my review load
> is high).

thank you Olli, this gonna be really helpful.
https://hg.mozilla.org/mozilla-central/rev/79ecbf9133b1
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
feature-b2g: --- → 2.5+
You need to log in before you can comment on or make changes to this bug.