SpeechRecognition only capture 1-2 seconds of audio
Categories
(Core :: Web Speech, defect, P3)
Tracking
()
People
(Reporter: guest271314, Unassigned)
Details
Attachments
(1 file, 1 obsolete file)
3.70 MB,
text/html
|
Details |
User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/80.0.3987.7 Chrome/80.0.3987.7 Safari/537.36
Steps to reproduce:
- Set media.webspeech.recognition.enable and media.webspeech.recognition.force_enable to true at about:config
- Execute SpeechRecognition start() and playback 33 seconds of voice speaking at <audio> element.
Actual results:
- speechend event.timeStamp 70529 audio.currentTime 1.101208
audioend event.timeStamp 70531 audio.currentTime 1.101208
end event.timeStamp 71561 audio.currentTime 2.111604
Only 1-2 seconds of voice audio are recognized, then end event is dispatched. Only occasionally does result event fire.
Expected results:
speechend and audioend events should not be dispatched until at least 33 seconds of voice audio playbback.
Comment 1•4 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Comment hidden (admin-reviewed) |
Reporter | ||
Comment 3•4 years ago
|
||
Does MAX_LISTENING_TIME
commence immediately when SpeechRecognition()
is executed?
Comment 4•4 years ago
|
||
We don't ship this API yet so I don't think we can give too much support on it. Especially the voice activity detector is going to be replaced.
There's no need to needinfo so many people on a minor issue like this. It can go through normal triage procedures like everything else.
Also: TypeError: document.body is null
Reporter | ||
Comment 5•4 years ago
|
||
See updated attachment.
It is virtually impossible to currently test in the field where input is clipped to 1-2 seconds.
How to send OGG
files to the end-point directly to test?
For minimal input using speechSyhthesis.speak()
with default voice for input
"Speech synthesis at Nightly"
the output for 3 runs:
"THC"
"peach seed"
"peach seed"
For the attached file the output was
"can I watch"
"can I watch"
"can I watch"
Same input (WAV
file)
https://speech-to-text-demo.ng.bluemix.net/
Speaker 1:
Now watch.
Speaker 1:
Speaker 2:
This a science works.
Speaker 2:
One.
Speaker 1:
Researcher comes up with a result.
Speaker 1:
At that is not the truth.
Speaker 3:
No no.
Speaker 2:
A scientific emergent.
Speaker 1:
Truth is not the result of any.
Speaker 3:
One.
Speaker 2:
Experiment.
Speaker 1:
What has to happen is.
Speaker 2:
Somebody else has to.
Speaker 2:
Verified.
Speaker 2:
Prefer a bleak.
Speaker 1:
A competitor.
Speaker 2:
The food police someone who doesn't want you to be correct.
https://cloud.google.com/speech-to-text/
[...temp2.querySelector('div').children]
.filter(el => el.tagName === "DIV")
.forEach(el => {
[...el.children].filter(e => e.tagName === "SP-WORD")
.forEach(e => {
console.log(e.shadowRoot.querySelector('.word').textContent)
})
})
Can I watch?
I'm just have science works.
One researcher comes up with a result.
At that is not the truth.
No, no a scientific emergent truth is not
the result of any one experiment what
has to happen is somebody else has to verify it.
Preferably a competitor.
Preferably someone who doesn't want you to be correct?
By hand
Now watch. Um, this how science works.
One researcher comes up with a result.
And that is not the truth. No, no.
A scientific emergent truth is not the
result of one experiment. What has to
happen is somebody else has to verify
it. Preferably a competitor. Preferably
someone who doesn't want you to be correct.
92nd Street Y May 3, 2017 Neil deGrasse Tyson in conversation with Robert Krulwich
Reporter | ||
Comment 6•4 years ago
|
||
(In reply to guest271314 from comment #5)
How to send
OGG
files to the end-point directly to test?
https://wiki.mozilla.org/Web_Speech_API_-_Speech_Recognition#How_can_I_test_with_Deep_Speech.3F
Expected file type "Body should be an Opus or Webm audio file".
Output for 3 runs of same file as audio/webm;codecs=opus
"i watch im this a science works
one researcher comes up with a result
and that is not the truth no no a scientific
emergent truth is not the result of any
one experiment what has to happen is
somebody else has to verify preferably
a competitor referred ly some one who
doesn't want you to be correct"
"i watch im this a science works
one researcher comes up with a result
and that is not the truth no no a scientific
emergent truth is not the result of any
one experiment what has to happen is
somebody else has to verify preferably
a competitor referred ly some one who
doesn't want you to be correct"
"i watch im this a science works
one researcher comes up with a result
and that is not the truth no no a scientific
emergent truth is not the result of any
one experiment what has to happen is
somebody else has to verify preferably
a competitor referred ly some one who
doesn't want you to be correct"
4th run
"now watch im this a science works
one researcher comes up with a result
and that is not the troop no no a sientific
emergent truth is not the result of any
one experiment what has to happen is
somebody else has to verify it preferably
a competitor referred ly some one who
doesn't want you to be correct"
The first instance of result where "now"
is first word, reflecting input, with relevant headers set to false. The machine learned nonetheless?
Can the endpoint input time be extended to at least 30 seconds for testing?
(In reply to Andreas Pehrson [:pehrsons] (On leave; back Aug 1st 2020) from comment #4)
Especially the voice activity detector is going to be replaced.
Do tracking bugs for removal of "activity detector" and maximum input time exist?
Reporter | ||
Comment 7•4 years ago
|
||
4th run was .opus
file.
Comment 8•4 years ago
|
||
The priority flag is not set for this bug.
:anatal, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Updated•2 years ago
|
Description
•