Closed Bug 1051604 Opened 10 years ago Closed 9 years ago

Adapt VAD strategy on SpeechRecognition to be less strict on some devices with poor mics

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla42

Project Flags:

feature-b2g

2.5+

Tracking Flags:

Tracking

Status

firefox42

---

fixed

People

(Reporter: anatal, Assigned: anatal)

References

Details

(Whiteboard: [webspeechapi])

Attachments

(1 file, 1 obsolete file)

Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech 9 years ago André Natal 1.74 KB, patch	smaug : review-	Details \| Diff \| Splinter Review
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech 9 years ago André Natal 1.54 KB, patch	smaug : review+	Details \| Diff \| Splinter Review

Andre Natal

Reporter

Description

•

10 years ago

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715215003

Andre Natal

Reporter

Updated

•

10 years ago

Blocks: 1049931

Andre Natal

Reporter

Comment 1

•

10 years ago

We should benchmark and test  pocketsphinx vad instead the current one that is leading to a number of issues.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Andre Natal

Reporter

Updated

•

10 years ago

OS: Linux → All

Hardware: x86_64 → All

Andre Natal

Reporter

Updated

•

10 years ago

Blocks: 1067689
No longer blocks: 1049931

kdavis

Updated

•

9 years ago

Whiteboard: [webspeechapi]

André Natal

Assignee

Updated

•

9 years ago

Assignee: nobody → anatal

Blocks: 1172883

André Natal

Assignee

Updated

•

9 years ago

Component: DOM → Web Speech

André Natal

Assignee

Updated

•

9 years ago

Blocks: 1180668

kdavis

Updated

•

9 years ago

Blocks: Meta-Vaani
No longer blocks: 1180668

kdavis

Updated

•

9 years ago

Blocks: 1185233

André Natal

Assignee

Comment 2

•

9 years ago

Attached patch Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech (obsolete) — Details — Splinter Review

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97df26316f22

Attachment #8643529 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 3

•

9 years ago

Just set "media.webspeech.silence_length" and "media.webspeech.long_silence_length" pref to some other
value than the defaults? I don't see need for the magical 3 in the code.

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Attachment #8643529 - Flags: review?(bugs) → review-

André Natal

Assignee

Comment 4

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #3)
> Just set "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length" pref to some other
> value than the defaults? I don't see need for the magical 3 in the code.

I know, but I tried modify these parameters but had no practical effect since requested_silence_length changes dynamically based on the amount of speech input. 

This Vad algorithm is not good enough and I don't understand why it was pick instead https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/common_audio/vad/ that is much wider adopted by the industry and is already on gecko codebase.

kdavis

Comment 5

•

9 years ago

(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
> 
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input. 


What if you set both "media.webspeech.silence_length" and "media.webspeech.long_silence_length"
to 3 times their current values?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

9 years ago

(In reply to Andre Natal from comment #4)
> (In reply to Olli Pettay [:smaug] from comment #3)
> > Just set "media.webspeech.silence_length" and
> > "media.webspeech.long_silence_length" pref to some other
> > value than the defaults? I don't see need for the magical 3 in the code.
> 
> I know, but I tried modify these parameters but had no practical effect
> since requested_silence_length changes dynamically based on the amount of
> speech input. 
I don't understand. requested_silence_length is set to be either 
long_speech_input_complete_silence_length_us_ or speech_input_complete_silence_length_us_
and both those variables are set based on the prefs.


 
> This Vad algorithm is not good enough and I don't understand why it was pick
> instead
> https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> common_audio/vad/ that is much wider adopted by the industry and is already
> on gecko codebase.
IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc code in tree. Though, webrtc and speech API code did land around the same time.
But feel free to make speech API to use the same code as what webrtc uses.

André Natal

Assignee

Comment 7

•

9 years ago

Attached patch Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech — Details — Splinter Review

Changed the patch to set default value of the preferences: PREFERENCE_ENDPOINTER_SILENCE_LENGTH and PREFERENCE_ENDPOINTER_LONG_SILENCE_LENGTH

Attachment #8643529 - Attachment is obsolete: true

Attachment #8644114 - Flags: review?(bugs)

André Natal

Assignee

Comment 8

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #6)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> > 
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input. 
> I don't understand. requested_silence_length is set to be either 
> long_speech_input_complete_silence_length_us_ or
> speech_input_complete_silence_length_us_
> and both those variables are set based on the prefs.
> 
> 

Ok, I set there.

>  
> > This Vad algorithm is not good enough and I don't understand why it was pick
> > instead
> > https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/
> > common_audio/vad/ that is much wider adopted by the industry and is already
> > on gecko codebase.
> IIRC at the time endpointer.cc was initial needed, we didn't have any webrtc
> code in tree. Though, webrtc and speech API code did land around the same
> time.
> But feel free to make speech API to use the same code as what webrtc uses.

Unfortunately we don't enough have time to do so deep change, so better keep it and change the parameters.

André Natal

Assignee

Comment 9

•

9 years ago

(In reply to kdavis from comment #5)
> (In reply to Andre Natal from comment #4)
> > (In reply to Olli Pettay [:smaug] from comment #3)
> > > Just set "media.webspeech.silence_length" and
> > > "media.webspeech.long_silence_length" pref to some other
> > > value than the defaults? I don't see need for the magical 3 in the code.
> > 
> > I know, but I tried modify these parameters but had no practical effect
> > since requested_silence_length changes dynamically based on the amount of
> > speech input. 
> 
> 
> What if you set both "media.webspeech.silence_length" and
> "media.webspeech.long_silence_length"
> to 3 times their current values?

Thank you Kelly. Yes, I did that. I changed to 2.5 times since 3 times was taking too long to decode.

André Natal

Assignee

Comment 10

•

9 years ago

Try for this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82944aa09865

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

9 years ago

Comment on attachment 8644114 [details] [diff] [review]
Relax current VAD algorithm increasing the amount of end silence required to be input to decree end of speech

This is fine too, though I wonder why you don't just set the prefs on b2g.js or somewhere.

Attachment #8644114 - Flags: review?(bugs) → review+

André Natal

Assignee

Comment 12

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #11)
> Comment on attachment 8644114 [details] [diff] [review]
> Relax current VAD algorithm increasing the amount of end silence required to
> be input to decree end of speech
> 
> This is fine too, though I wonder why you don't just set the prefs on b2g.js
> or somewhere.

Thank you Olli.

Well, I preferred to change the default because this level looks more accurate both on desktop and on the phone, mainly for sequence of digits that is the principal issue we're having with the VAD, so seems better to increase the defaults at all.

André Natal

Assignee

Updated

•

9 years ago

Keywords: checkin-needed

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 13

•

9 years ago

Btw, for this kinds of changes, where the backend handling is changed, I'm totally fine if
just kdavis reviews your patches (that might be faster when my review load is high).

André Natal

Assignee

Comment 14

•

9 years ago

(In reply to Olli Pettay [:smaug] from comment #13)
> Btw, for this kinds of changes, where the backend handling is changed, I'm
> totally fine if
> just kdavis reviews your patches (that might be faster when my review load
> is high).

thank you Olli, this gonna be really helpful.

Pulsebot

Comment 15

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/79ecbf9133b1

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 16

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/79ecbf9133b1

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Aaron Wu

Updated

•

9 years ago

feature-b2g: --- → 2.5+

You need to log in before you can comment on or make changes to this bug.