Text to Speech changes language when paused and played again.
Categories
(Core :: Web Speech, defect, P3)
Tracking
()
People
(Reporter: tyuzu, Assigned: chunmin)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36
Steps to reproduce:
- Open Firefox Browser from your desktop.
- Open "https://aalokindreams.github.io/forblog/tts" by writing it in address bar.
- Click on the play icon to start playing the speech.
- Now click on the pause button.
- Now click on the play button again.
Actual results:
I clicked on the play button > The TTS starting playing the speech in a voice in american accent > I paused the speech and then clicked on the play button again > The voice changed to another voice in the TTS system.
Expected results:
The voice speaking the text should have remained the same. It should not switch voices every time the speech is played and paused.
Comment 1•6 years ago
|
||
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Hi,
I have tested your issue on latest FF release (66.0.3) and latest Nightly build and could not reproduce it, using Win10 and Win8.1
Also I have tested the issue in Chrome and it does not reproduce there either.
Could you please tell me what version of Firefox do you use?
Is this still reproducible on your end? If yes, can you please retest this using latest FF release and latest Nightly build (https://nightly.mozilla.org/) to confirm if the issue is still reproducible on your end in the mentioned versions and report back the results?
Thank you for your report.
Hi There,
I tested it on [ Firefox Nightly 68.0a1 (2019-04-24) (64-bit) ] and [ Firefox Stable 66.0.3 (64-bit) ],
The probability of occurring of the bug with above description is only 0.25.
However, I have found the steps with 100% reproducibility.
I am writing the whole reproduction steps :
- Open Firefox Browser from your desktop.
- Open "https://aalokindreams.github.io/forblog/tts" by writing it in address bar.
- Click on the "play icon" to start playing the speech.
- Now click on the "stop" button.
- Now click on the "play" button again.
This bug is only Firefox Specific. Chromium based browsers do not have this bug.
Thank you for asking for clarification.
Regards
Comment 3•6 years ago
|
||
Hi,
After your updated steps I am able to reproduce the issue on latest FF release (66.0.3), latest Nightly build (68.0a1) using Windows 10 and Windows 8.1 . When click "play" button for first time, a women voice starts speaking. If click "stop" button and click "play" again a man voice starts speaking. But after second play, when the man voice start speaking if click "stop" then "play" again, the voice is not changed anymore, the man voice keep speaking.
I'm going to set a component so developers can take a look at it. If this is not the right component, please feel free to move it to a more appropriate one.
Thank you for your report.
Comment 4•6 years ago
|
||
Eitan, do you know this code enough to have a look? Sadly, web speech doesn't seem to have ownership atm.
Comment 5•6 years ago
|
||
I hope to have cycles to look at this soon, so I am keeping the needinfo flag. But if someone could step through this and debug it, much would be appreciated.
Comment 6•6 years ago
|
||
The priority flag is not set for this bug.
:anatal, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•6 years ago
|
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
•
|
||
Problem
Here is some analysis by tracing code and from the log of the test page. The test page is a simplified version of the https://aalokindreams.github.io/forblog/tts
speechSynthesis.getVoices
is an empty list at first.- Calling
speechSynthesis.getVoices
(on content process) will initiate building a voice list. - Code path is from [0] to [14]
- Calling
- The voice of the utterance is assigned to
speechSynthesis.getVoices[0]
, which is null now - To speak the utterance, we will initialize the speech service first and add some voices if there is no available voice at all
- Code path is [x] -> [6] -> [7] -> [8]
- The voice list is already built by step 1
- When the voice of the utterance is null, we will find a voice matching with the UI language, which is en-US here
- Our matching algorithm will start searching a matched voice from the last available voice. The matched one is Microsoft Zira Desktop
- The utterance is spoken with voice Microsoft Zira Desktop
- After the utterance is finished speaking, voice list is also built (Async IPC).
speechSynthesis.getVoices
is NOT empty now. - The voice of the next utterance is assigned to
speechSynthesis.getVoices[0]
again, which is Microsoft David Desktop now. - Tthe utterance is spoken with voice Microsoft David Desktop
LOG
(by running $ MOZ_LOG=SpeechSynthesis:5 ./mach run
with the test page)
...
...
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::AddVoice uri='urn:moz-tts:sapi:Microsoft David Desktop - English (United States)?en-US' name='Microsoft David Desktop - English (United States)' lang='en-US' local=true queued=true
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::AddVoice uri='urn:moz-tts:sapi:Microsoft Zira Desktop - English (United States)?en-US' name='Microsoft Zira Desktop - English (United States)' lang='en-US' local=true queued=true
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::AddVoice uri='urn:moz-tts:sapi:Microsoft Hanhan Desktop - Chinese (Taiwan)?zh-TW' name='Microsoft Hanhan Desktop - Chinese (Taiwan)' lang='zh-TW' local=true queued=true
[Child 2148: Main Thread]: D/SpeechSynthesis SpeechSynthesis::onvoiceschanged
[Child 2148: Main Thread]: D/SpeechSynthesis SpeechSynthesis::AdvanceQueue length=1
[Child 2148, Main Thread] WARNING: 'found', file c:/mozilla-source/mozilla-central/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp, line 481
[Child 2148, Main Thread] WARNING: 'found', file c:/mozilla-source/mozilla-central/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp, line 481
[Child 2148, Main Thread] WARNING: 'found', file c:/mozilla-source/mozilla-central/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp, line 481
[Child 2148: Main Thread]: D/SpeechSynthesis SpeechSynthesis::onvoiceschanged
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::FindBestMatch - Matched UI language (en-US ~= en-US)
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::Speak queueing text='Hello' lang='' uri='' rate=1.000000 pitch=1.000000
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::SpeakImpl queueing text='Hello' uri='urn:moz-tts:sapi:Microsoft Zira Desktop - English (United States)?en-US' rate=1.000000 pitch=1.000000
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSpeechTask::Setup
[Child 2148: Main Thread]: D/SpeechSynthesis nsSpeechTask::DispatchStartImpl
...
...
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::SpeakNext 0
[Child 2148: Main Thread]: D/SpeechSynthesis nsSpeechTask::DispatchEndImpl
[Child 2148: Main Thread]: D/SpeechSynthesis SpeechSynthesis::AdvanceQueue length=0
[Child 2148: Main Thread]: D/SpeechSynthesis SpeechSynthesis::AdvanceQueue length=1
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::FindBestMatch - Matched URI
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::Speak queueing text='World' lang='' uri='urn:moz-tts:sapi:Microsoft David Desktop - English (United States)?en-US' rate=1.000000 pitch=1.000000
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::SpeakImpl queueing text='World' uri='urn:moz-tts:sapi:Microsoft David Desktop - English (United States)?en-US' rate=1.000000 pitch=1.000000
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSpeechTask::Setup
[Parent 20768: Main Thread]: D/SpeechSynthesis ~nsSpeechTask
[Child 2148: Main Thread]: D/SpeechSynthesis nsSpeechTask::DispatchStartImpl
...
[Parent 20768: Main Thread]: D/SpeechSynthesis nsSynthVoiceRegistry::SpeakNext 0
[Child 2148: Main Thread]: D/SpeechSynthesis nsSpeechTask::DispatchEndImpl
[Child 2148: Main Thread]: D/SpeechSynthesis SpeechSynthesis::AdvanceQueue length=0
...
[Child 2148: Main Thread]: D/SpeechSynthesis ~nsSpeechTask
[Child 2148: Main Thread]: D/SpeechSynthesis ~nsSpeechTask
[Parent 20768: Main Thread]: D/SpeechSynthesis ~nsSpeechTask
...
code path
-- content process--
[0] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/SpeechSynthesis.cpp#240
[1] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#167
[2] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#149
-- parent process --
[3] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/ipc/ContentParent.cpp#3758
[4] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/ipc/SpeechSynthesisParent.cpp#24
[5] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#171
[6] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/windows/SapiService.cpp#426
[7] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/windows/SapiService.cpp#418
[8] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/windows/SapiService.cpp#201
[9] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/windows/SapiService.cpp#301
[10] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#300
[11] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#488,501
-- content process --
[12] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/ipc/SpeechSynthesisChild.cpp#29
[13] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#249
[14] https://searchfox.org/mozilla-central/rev/928742d3ea30e0eb4a8622d260041564d81a8468/dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp#476,485
Assignee | ||
Comment 9•6 years ago
|
||
In fact, Chrome has the same problem. Their speechSynthesis.getVoices
is also empty at first. If we assign voice of the utterance to speechSynthesis.getVoices[1]
instead of speechSynthesis.getVoices[0]
. Open this page in chrome will hear different voices of "Hello" and "World".
I am not sure how chrome assign the voice to the utterance when the voice of the utterance is null (maybe it's here). But I guess they get luck in this case.
Making our matching algorithm to start searching a matched voice from the first (instead of last) available voice could work around in the previous test page. However, it will fail in this test page.
I am not sure if there is a spec to define how we do when the voice of the utterance is null. Or user should make sure they get the available voice before they use it. Anyway, I will check the spec to see if this behavior is defined or not.
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to C.M.Chang[:chunmin] from comment #9)
I am not sure if there is a spec to define how we do when the voice of the utterance is null.
By the spec:
If the voice attribute of the SpeechSynthesisUtterance is unset or null at the time of the speak() method call, then the user agent must use a user agent default voice.
Thus, the problem now is how default voice is defined and do we really use the default voice when utterance.voice
is null.
Updated•2 years ago
|
![]() |
||
Updated•24 days ago
|
Description
•