Closed Bug 1523920 Opened 5 years ago Closed 5 years ago

SpeechSynthesis Utterances are not reusable

Categories

(Core :: Web Speech, defect)

Desktop
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- fixed

People

(Reporter: masterjames, Assigned: chunmin)

References

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

Steps to reproduce:

I think it is logical and efficient if a speech Utterance can be created once, saved, and reused repeatedly via speak() calls.
I save the item created via
let uttr = new SpeechSynthesisUtterance('some message');
play it once okay via...
window.speechSynthesis.speak( uttr );
after it's finish try it again and nothing happens.
window.speechSynthesis.speak( uttr );
util you rebuild the utterance.

Actual results:

I attempted to replay the same utterance that has completed successfully and it will not play a second time.

Expected results:

It should play it a second time, and however many time I would like.
Also it should when using cancel, stop all speech, clear the queue, and be able to play immediately the next request to speak() without a delay or clearing that one from the queue as well. It should be able to immediately restart speaking the same saved utterance, after a call to cancel as well.

Hi, Could you please provide me some Test-case/s, URL's in order to test this issue. You refer at SpeechSynthesisUtterance API?

Flags: needinfo?(masterjames)

Sure it's pretty simple here is a jsfiddle that is slightly more involved.

What happens is clearing the previous speech being an API to an external services doesn't recognize the order of an utterance being spoken after a previous has been cleared.

You really should even be able to call speak() with a flag (API recommendation) to stop all others, clear the queue and start immediately on this new one being passed.

https://jsfiddle.net/MasterJames/3nxygaze/7/

let ss = window.speechSynthesis;
let ssu = SpeechSynthesisUtterance;
let uttrA = new ssu('some message');
let uttrB = new ssu('not playing');

ss.speak( uttrA );
ss.clear();
ss.speak( uttrB );
ss.speak( uttrA );

Further it is even more important to note the difference between Chrome and Firefox.

https://jsfiddle.net/MasterJames/3nxygaze/8/

Here if you comment out the clear call and the second utterance you will see that it can't play a saved utterance twice.
In Chrome it queues both and repeats it twice as requested, while in Firefix the utterance becomes unable to play again after a single usage.

hi, Thanks for your contribution. I've tested this issue on several machines and also on certain Firefox versions (Nightly 67.0a1, beta 66.0b6 and release 65.0).
Here are my remarks: The issue can be reproduced on all specified above versions of Firefox and on different machines: Windows, Linux and Mac OS X.
Additionally, in other browser like IE or Safari the issue won't occur. In the matter of Chrome - from the version 71 and above, this feature "SpeechSynthesis.speak" is removed: https://www.chromestatus.com/feature/5687444770914304 - so, I don't have terms of comparing between Chrome <-> FF. (I guess you use a 71's bellow version).

Status: UNCONFIRMED → NEW
Component: Untriaged → Web Speech
Ever confirmed: true
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → Desktop
Version: 65 Branch → Trunk

Please be clear that the speak is obviously not removed. That link explains that the ability to automatically "speak" requires the user 'click' once anywhere first. So on first click it should speak whatever has been queued (ideally).

In my modern world with only a canvas. Voice is now an important input and output means. While it's annoying a few abusers have had this automatic speaking removed from Chrome (without a click first) it's agreeable and tolerable. A user may want to mute there sound first etc. (with kids sleeping or something I guess).

Anyway I am just clarifying it's only 'automatic' speaking they disabled until the browser user clicks anywhere on the screen. Then the speaking is enabled. We the server developers now have to either pop up something (also typically blocked so internal to the exclusive full screen VR/AR canvas) to say 'click to enable speech' or just wait for them to give up wondering what to do until they click something and then the voice explains maybe something like "say 'menu' to display your voice menu options or 'help' to learn more" (in an extreme case of course providing a question mark would likely get them to click if first time with this type of interface).

In your comment above it looks like maybe you (or another reader might) mistakenly believe the method speak was removed entirely.

Flags: needinfo?(masterjames)

This version of Chrome is Version 71.0.3578.98 (Official Build) (64-bit) and auto updates, but yup the speaking won't work until you click the window anywhere first. Once I get that event I start speaking instead of immediately on page load.

Hi, I said "removed" and after I put for more explanation the link after. I guess I was pretty clear in the above scenario.

Okay great, thanks.

It says 'without user activation (removed)' after the first click it works and will well into the future. I just wanted to be clear it's not removed, only if called before an initial click it will not through an error. If you click then it will work.

ok, I get it, so, let's stay with the focus on Firefox side where the issue occurs.

I realized another side note: if you visited the site and click ounce to allow Speech to be heard, and then navigate to another page, speech calls/spoken will be heard without clicking to enable on the next page.
So it only needs the initial click on the first page for the entire domain. Then speech works fine as before/normal/expected (without requiring a click to allow speech to be heard) for all subsequent pages.

I've got more to add to this now.
https://jsfiddle.net/MasterJames/3nxygaze/20/

It seems putting a call to ss.resume() after the cancel() and second speak() call, will achieve the desired effect but this is not intuitive or properly documented I guess.

I thought maybe I saw a call for more info somewhere I thought to add this.
Even though Resume helps and explains lots I would still say it's broken and not great in design vs dev-user experience expectations.
I was trying to hack around this cancel problem and saw 'speaking' stuck (says true but not paused and nothing speaking) and took utterances onend methods and tried to set speaking to false there etc. Nothing worked (override the speaking setter getter etc.).

If it was me I would make the SpeechSynthesis have an accessible and editable cue (queue) (shift, swap, etc.).
Utterance re-usability likely is still broken. You should be able to create an utterance on say a rollover event and reuse it as the user mouses over the same item, saves resources volume etc.

So you could also imagine some of the utterances are not 'cancellable' (= false, default is true), but still be able to override that cancelAll etc. [cancel( item );]

You'll need to open the console log and remember to click in the HTML white box first before 'run' to activate speaking (now blocked until first click).
https://jsfiddle.net/MasterJames/3nxygaze/27/

Ultimately the values for speaking and paused are like this:
speak A - ss.speaking:false
speak A - ss.paused:false
cancel - ss.speaking:false
cancel - ss.paused:false
speak B - ss.speaking:false
speak B - ss.paused:false
resume - ss.speaking:false
resume - ss.paused:false

I think the issue might be they all need to be async or something? It was extremely confusing to determine what was going on when.

I guess I know why.
The reason is that current implementation would only push the utterance whose state is none. The state of an utterance is none at first when it's created. After the utterance is "spoken", its state would be changed. The SpeechSynthesis::Speak only accepts the utterance whose state is none, so that's why the used utterance cannot be reused.

let uttrA = new ssu('hello'); // State of uttrA is none
let uttrB = new ssu('world'); // State of uttrB is none

ss.speak( uttrA ); // uttrA would be pushed to queue. The state of uttrA changes from none to pending.
ss.speak( uttrB ); // uttrB would be pushed to queue. The state of uttrB changes from none to pending.
ss.speak( uttrA ); // State of uttrA is pending, so it would be ignored.

I don't see chrome code uses state in utterance, so maybe it could be removed. I am not familiar with the speech code, so this is just my guess.

Assignee: nobody → cchang

All the LOG are placed in Dispatch*Impl except DispatchErrorImpl. Move
the LOG from DispatchError to DispatchErrorImpl to align the LOG policy
in the nsSpeechTask.

When error occurs, there is no need to use audio.

Depends on D35461

It would be easier to reuse the utterance if it's stateless. The state
could still be tracked by moving from SpeechSynthesisUtterance to
nsSpeechTask, which is the place to change the state in original
implementation. By removing state in utterance, we don't need to check
the state of utterance when it's pushed into the speech queue.

Depends on D35463

I think utterances should be immutable after the are sent to speak().

If they aren't then attributes like voice, rate, pitch, volume can be changed while queued and the result would be nondeterministic depending on if the speech started or not. Also, since they are event targets listeners would not get an accurate set of parameters of the current speech.

Keywords: checkin-needed

Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2e49430660ae
P1: Align the LOG policy in the nsSpeechTask. r=eeejay
https://hg.mozilla.org/integration/autoland/rev/18d8187317cd
P2: Destroy AudioChannelAgent when error occurs. r=eeejay
https://hg.mozilla.org/integration/autoland/rev/1d4911d7a26c
P3: Enable the web-platform test to check the utterance is reusable. r=eeejay
https://hg.mozilla.org/integration/autoland/rev/c5c1ac80fd43
P4: Move the state from SpeechSynthesisUtterance to nsSpeechTask. r=eeejay

Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: