1051604 - Adapt VAD strategy on SpeechRecognition to be less strict on some devices with poor mics

Just set "media.webspeech.silence_length" and "media.webspeech.long_silence_length" pref to some other value than the defaults? I don't see need for the magical 3 in the code.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Attachment #8643529 - Flags: review?(bugs) → review-

André Natal

Assignee

Comment 4

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #3) > Just set "media.webspeech.silence_length" and > "media.webspeech.long_silence_length" pref to some other > value than the defaults? I don't see need for the magical 3 in the code. I know, but I tried modify these parameters but had no practical effect since requested_silence_length changes dynamically based on the amount of speech input. This Vad algorithm is not good enough and I don't understand why it was pick instead https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/common_audio/vad/ that is much wider adopted by the industry and is already on gecko codebase.

kdavis

Comment 5

•

9 years ago

(In reply to Andre Natal from comment #4) > (In reply to Olli Pettay [:smaug] from comment #3) > > Just set "media.webspeech.silence_length" and > > "media.webspeech.long_silence_length" pref to some other > > value than the defaults? I don't see need for the magical 3 in the code. > > I know, but I tried modify these parameters but had no practical effect > since requested_silence_length changes dynamically based on the amount of > speech input. What if you set both "media.webspeech.silence_length" and "media.webspeech.long_silence_length" to 3 times their current values?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

9 years ago

(In reply to Andre Natal from comment #4) > (In reply to Olli Pettay [:smaug] from comment #3) > > Just set "media.webspeech.silence_length" and > > "media.webspeech.long_silence_length" pref to some other > > value than the defaults? I don't see need for the magical 3 in the code. > > I know, but I tried modify these parameters but had no practical effect > since requested_silence_length changes dynamically based on the amount of > speech input. I don't understand. requested_silence_length is set to be either long_speech_input_complete_silence_length_us_ or speech_input_complete_silence_length_us_ and both those variables are set based on the prefs. > This Vad algorithm is not good enough and I don't understand why it was pick > instead > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/ > common_audio/vad/ that is much wider adopted by the industry and is already > on gecko codebase. IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc code in tree. Though, webrtc and speech API code did land around the same time. But feel free to make speech API to use the same code as what webrtc uses.

André Natal

Assignee

Comment 7

•

9 years ago

Attached patch Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech — Details — Splinter Review

Changed the patch to set default value of the preferences: PREFERENCE_ENDPOINTER_SILENCE_LENGTH and PREFERENCE_ENDPOINTER_LONG_SILENCE_LENGTH

Attachment #8643529 - Attachment is obsolete: true

Attachment #8644114 - Flags: review?(bugs)

André Natal

Assignee

Comment 8

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #6) > (In reply to Andre Natal from comment #4) > > (In reply to Olli Pettay [:smaug] from comment #3) > > > Just set "media.webspeech.silence_length" and > > > "media.webspeech.long_silence_length" pref to some other > > > value than the defaults? I don't see need for the magical 3 in the code. > > > > I know, but I tried modify these parameters but had no practical effect > > since requested_silence_length changes dynamically based on the amount of > > speech input. > I don't understand. requested_silence_length is set to be either > long_speech_input_complete_silence_length_us_ or > speech_input_complete_silence_length_us_ > and both those variables are set based on the prefs. > > Ok, I set there. > > > This Vad algorithm is not good enough and I don't understand why it was pick > > instead > > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/ > > common_audio/vad/ that is much wider adopted by the industry and is already > > on gecko codebase. > IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc > code in tree. Though, webrtc and speech API code did land around the same > time. > But feel free to make speech API to use the same code as what webrtc uses. Unfortunately we don't enough have time to do so deep change, so better keep it and change the parameters.

André Natal

Assignee

Comment 9

•

9 years ago

(In reply to kdavis from comment #5) > (In reply to Andre Natal from comment #4) > > (In reply to Olli Pettay [:smaug] from comment #3) > > > Just set "media.webspeech.silence_length" and > > > "media.webspeech.long_silence_length" pref to some other > > > value than the defaults? I don't see need for the magical 3 in the code. > > > > I know, but I tried modify these parameters but had no practical effect > > since requested_silence_length changes dynamically based on the amount of > > speech input. > > > What if you set both "media.webspeech.silence_length" and > "media.webspeech.long_silence_length" > to 3 times their current values? Thank you Kelly. Yes, I did that. I changed to 2.5 times since 3 times was taking too long to decode.

André Natal

Assignee

Comment 10

•

9 years ago

Try for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82944aa09865

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

9 years ago

Comment on attachment 8644114 [details] [diff] [review] Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech This is fine too, though I wonder why you don't just set the prefs on b2g.js or somewhere.

Attachment #8644114 - Flags: review?(bugs) → review+

André Natal

Assignee

Comment 12

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #11) > Comment on attachment 8644114 [details] [diff] [review] > Relax current VAD algorithm increasing the amount of end silence required to > be input to decree end of speech > > This is fine too, though I wonder why you don't just set the prefs on b2g.js > or somewhere. Thank you Olli. Well, I preferred to change the default because this level looks more accurate both on desktop and on the phone, mainly for sequence of digits that is the principal issue we're having with the VAD, so seems better to increase the defaults at all.

André Natal

Assignee

Updated

•

9 years ago

Keywords: checkin-needed

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 13

•

9 years ago

Btw, for this kinds of changes, where the backend handling is changed, I'm totally fine if just kdavis reviews your patches (that might be faster when my review load is high).

André Natal

Assignee

Comment 14

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #13) > Btw, for this kinds of changes, where the backend handling is changed, I'm > totally fine if > just kdavis reviews your patches (that might be faster when my review load > is high). thank you Olli, this gonna be really helpful.

Pulsebot

Comment 15

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/79ecbf9133b1

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 16

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/79ecbf9133b1

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Aaron Wu

Updated

•

9 years ago

feature-b2g: --- → 2.5+

Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech 9 years ago André Natal 1.74 KB, patch	smaug : review-	Details \| Diff \| Splinter Review
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech 9 years ago André Natal 1.54 KB, patch	smaug : review+	Details \| Diff \| Splinter Review