Adapt VAD strategy on SpeechRecognition to be less strict on some devices with poor mics

RESOLVED FIXED in Firefox 42

Status

()

Core
Web Speech
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: Andre Natal, Assigned: André Natal)

Tracking

(Blocks: 4 bugs)

unspecified
mozilla42
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(feature-b2g:2.5+, firefox42 fixed)

Details

(Whiteboard: [webspeechapi])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

4 years ago
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715215003
(Reporter)

Updated

4 years ago
Blocks: 1049931
(Reporter)

Comment 1

4 years ago
We should benchmark and test  pocketsphinx vad instead the current one that is leading to a number of issues.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Reporter)

Updated

4 years ago
OS: Linux → All
Hardware: x86_64 → All
(Reporter)

Updated

4 years ago
Blocks: 1067689
No longer blocks: 1049931

Updated

3 years ago
Whiteboard: [webspeechapi]
(Assignee)

Updated

3 years ago
Assignee: nobody → anatal
Blocks: 1172883
(Assignee)

Updated

3 years ago
Component: DOM → Web Speech
(Assignee)

Updated

3 years ago
Blocks: 1180668

Updated

3 years ago
Blocks: 1172875
No longer blocks: 1180668

Updated

3 years ago
Blocks: 1185233
(Assignee)

Comment 2

3 years ago
Created attachment 8643529 [details] [diff] [review]
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97df26316f22
Attachment #8643529 - Flags: review?(bugs)

Comment 3

3 years ago
Just set "media.webspeech.silence_length" and "media.webspeech.long_silence_length" pref to some other
value than the defaults? I don't see need for the magical 3 in the code.

Updated

3 years ago
Attachment #8643529 - Flags: review?(bugs) → review-
(Assignee)

Comment 4

3 years ago
(In reply to Olli Pettay [:smaug] from comment #3)
> Just set "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length" pref to some other
> value than the defaults? I don't see need for the magical 3 in the code.

I know, but I tried modify these parameters but had no practical effect since requested_silence_length changes dynamically based on the amount of speech input. 

This Vad algorithm is not good enough and I don't understand why it was pick instead https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/common_audio/vad/ that is much wider adopted by the industry and is already on gecko codebase.

Comment 5

3 years ago
(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
> 
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input. 


What if you set both "media.webspeech.silence_length" and "media.webspeech.long_silence_length"
to 3 times their current values?

Comment 6

3 years ago
(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
> 
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input. 
I don't understand. requested_silence_length is set to be either 
long_speech_input_complete_silence_length_us_ or speech_input_complete_silence_length_us_
and both those variables are set based on the prefs.


 
> This Vad algorithm is not good enough and I don't understand why it was pick
> instead
> https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> common_audio/vad/ that is much wider adopted by the industry and is already
> on gecko codebase.
IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc code in tree. Though, webrtc and speech API code did land around the same time.
But feel free to make speech API to use the same code as what webrtc uses.
(Assignee)

Comment 7

3 years ago
Created attachment 8644114 [details] [diff] [review]
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech

Changed the patch to set default value of the preferences: PREFERENCE_ENDPOINTER_SILENCE_LENGTH and PREFERENCE_ENDPOINTER_LONG_SILENCE_LENGTH
Attachment #8643529 - Attachment is obsolete: true
Attachment #8644114 - Flags: review?(bugs)
(Assignee)

Comment 8

3 years ago
(In reply to Olli Pettay [:smaug] from comment #6)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> > 
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input. 
> I don't understand. requested_silence_length is set to be either 
> long_speech_input_complete_silence_length_us_ or
> speech_input_complete_silence_length_us_
> and both those variables are set based on the prefs.
> 
> 

Ok, I set there.

>  
> > This Vad algorithm is not good enough and I don't understand why it was pick
> > instead
> > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> > common_audio/vad/ that is much wider adopted by the industry and is already
> > on gecko codebase.
> IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc
> code in tree. Though, webrtc and speech API code did land around the same
> time.
> But feel free to make speech API to use the same code as what webrtc uses.

Unfortunately we don't enough have time to do so deep change, so better keep it and change the parameters.
(Assignee)

Comment 9

3 years ago
(In reply to kdavis from comment #5)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> > 
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input. 
> 
> 
> What if you set both "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length"
> to 3 times their current values?

Thank you Kelly. Yes, I did that. I changed to 2.5 times since 3 times was taking too long to decode.
Comment on attachment 8644114 [details] [diff] [review]
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech

This is fine too, though I wonder why you don't just set the prefs on b2g.js or somewhere.
Attachment #8644114 - Flags: review?(bugs) → review+
(Assignee)

Comment 12

3 years ago
(In reply to Olli Pettay [:smaug] from comment #11)
> Comment on attachment 8644114 [details] [diff] [review]
> Relax current VAD algorithm increasing the amount of end silence required to
> be input to decree end of speech
> 
> This is fine too, though I wonder why you don't just set the prefs on b2g.js
> or somewhere.

Thank you Olli.

Well, I preferred to change the default because this level looks more accurate both on desktop and on the phone, mainly for sequence of digits that is the principal issue we're having with the VAD, so seems better to increase the defaults at all.
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
Btw, for this kinds of changes, where the backend handling is changed, I'm totally fine if
just kdavis reviews your patches (that might be faster when my review load is high).
(Assignee)

Comment 14

3 years ago
(In reply to Olli Pettay [:smaug] from comment #13)
> Btw, for this kinds of changes, where the backend handling is changed, I'm
> totally fine if
> just kdavis reviews your patches (that might be faster when my review load
> is high).

thank you Olli, this gonna be really helpful.
https://hg.mozilla.org/mozilla-central/rev/79ecbf9133b1
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox42: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42

Updated

3 years ago
feature-b2g: --- → 2.5+
You need to log in before you can comment on or make changes to this bug.