Support SpeechSynthesis global queue

RESOLVED FIXED in Firefox 42

Status

()

defect
RESOLVED FIXED
4 years ago
4 months ago

People

(Reporter: eeejay, Assigned: eeejay)

Tracking

unspecified
mozilla42
Points:
---

Firefox Tracking Flags

(firefox42 fixed)

Details

Attachments

(4 attachments, 4 obsolete attachments)

We need to have the ability to harmonize a few windows trying to use the synth Web Speech API when the platform only allows one speaker at a time. One window should not be able to monopolize the speech output.

A list of cases, and the expected outcomes:
https://docs.google.com/spreadsheets/d/1QojxT6JdNW7lyhRRMjt44ipthhpfGVI_UGftQoJvbbw/edit?usp=sharing

More discussion about concurrent vs. non-concurrent speech:
https://www.w3.org/Bugs/Public/show_bug.cgi?id=21110
Do we want this to be cross-process? I guess so. So the queue would be somehow managed by the parent process?
Yeah, it would be a global queue and it would schedule the utterances from each window that wants to speak. This isn't perfect at all, since the platform will be doing its own scheduling, and the browser windows will all be allocated as one app.

There will be 3 queues:
1. window.speechSynthesis queue
2. Global browser queue
3. Platform queue

I generally don't like having to do this, but it is a common platform limitation, and we will need to work with it in a way that keeps users and developers sane with some relative determinism when calling speak().
Attachment #8641377 - Flags: review?(kdavis)
Attachment #8641378 - Flags: review?(kdavis)
Attachment #8641379 - Flags: review?(kdavis)
Attachment #8641380 - Flags: review?(kdavis)
In order to bring in more reviewers, I persuaded kdavis to look at the patches.
Attachment #8641380 - Flags: review?(kdavis)
Attachment #8641380 - Flags: review?(kdavis)
(In reply to Eitan Isaacson [:eeejay] from comment #2)
> There will be 3 queues:
> 1. window.speechSynthesis queue
> 2. Global browser queue
> 3. Platform queue

What you mean with platform queue?
Is that some queue in the parent/system process?
Comment on attachment 8641377 [details] [diff] [review]
(Part 1) Enable/disable global queue depending on voices and pref.

Since we don't really use 'queued' for anything in this patch, looks ok.
It is not clear from this patch why SapiService.cpp passes true for queued, but
nsPicoService.cpp and nsFakeSynthServices.cpp pass false.

All those 3 cases need some comment (and I can't say from this patch whether they are right, but I assume they are, and I assume other patches explain why.)
Attachment #8641377 - Flags: review?(bugs) → review+
Comment on attachment 8641377 [details] [diff] [review]
(Part 1) Enable/disable global queue depending on voices and pref.

Review of attachment 8641377 [details] [diff] [review]:
-----------------------------------------------------------------

Seems fine, assuming the subsequent patches make sense in the context
of this patch. The only nit I have is with naming.

::: dom/media/webspeech/synth/nsISynthVoiceRegistry.idl
@@ +17,5 @@
>     * @param aUri          a unique identifier for this voice.
>     * @param aName         human-readable name for this voice.
>     * @param aLang         a BCP 47 language tag.
>     * @param aLocalService true if service does not require network.
> +   * @param aIsQueued     true if voice only handles one utterance at a time

Here, and other places, I am wondering if "aIsQueued" is the right variable
name to use. (This reminds me of the Phil Karlton quote.)

Here "aLocalService" indicates if the voice "is a locale service"; so, arguing
by analogy, one would think "aIsQueued" would indicate if the voice "is queued"
(Or, if one were being very literal, "is is queued".) But "aIsQueued" does not
indicate that.

Maybe "aQueuesUtterances" would be better. Using the analogy with "aLocalService"
again "aQueuesUtterances" would indicate if the voice queues utterances. Which I 
think captures things a bit better.
Attachment #8641377 - Flags: review?(kdavis)
Attachment #8641377 - Flags: review?(bugs)
Attachment #8641377 - Flags: review+
It looks like something happened and killed Olli's review+ for the first patch
when I submitted my review+ for the first patch. I think my window had been
open over the weekend and didn't have Olli's review+ and killed his review+
I am just going to flip Olli's review+ bit.
Attachment #8641377 - Flags: review?(bugs) → review+
That didn't work.

I flipped Olli's review+ bit comment 13 and it showed up as flipping my review+ bit again,
see patch 1. I'd guess Olli needs to flip the bit.
Comment on attachment 8641378 [details] [diff] [review]
(Part 2) Introduce global queue and track speaking state across windows.

Review of attachment 8641378 [details] [diff] [review]:
-----------------------------------------------------------------

Overall fine.

I have two questions reflected in the comments on nsSpeechTask.cpp and nsSynthVoiceRegistry.cpp
that may be bugs, or maybe I am simply misunderstanding the code at these two places.

If both of these points are addressed, either by telling me I don't know what I'm talking about—
which may be the case—or by fixing them, I'm fine with the remainder of the code.

::: dom/media/webspeech/synth/nsSpeechTask.cpp
@@ +171,5 @@
>      }
>      return NS_OK;
>    }
>  
> +  // mStream is set up in Init() that should be called before this.

What happens when Init() is passed a null aStream? As far as I can see,
mStream is a nullptr in this case.

::: dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp
@@ +700,5 @@
> +  if (mGlobalQueue.IsEmpty()) {
> +    return;
> +  }
> +
> +  mGlobalQueue.RemoveElementAt(0);

What if the element at 0 was pre-paused? Should we remove it?
It seems like this causes paused tasks to get lost, never to
by heard.
Attachment #8641378 - Flags: review?(kdavis) → review+
Comment on attachment 8641378 [details] [diff] [review]
(Part 2) Introduce global queue and track speaking state across windows.

> SpeechSynthesis::Speaking() const
> {
>-  if (mSpeechQueue.IsEmpty()) {
>-    return false;
>-  }
>+  bool localSpeaking = !mSpeechQueue.IsEmpty() &&
>+                       mSpeechQueue.ElementAt(0)->GetState() == SpeechSynthesisUtterance::STATE_SPEAKING;
> 
>-  return mSpeechQueue.ElementAt(0)->GetState() == SpeechSynthesisUtterance::STATE_SPEAKING;
>+  return localSpeaking || nsSynthVoiceRegistry::GetInstance()->IsSpeaking();
> }
Why do we want nsSynthVoiceRegistry::GetInstance()->IsSpeaking(); here?
Feels rather odd that (new SpeechSynthesis()).speaking might return true.
Is this because Google wants only the global queue?
If we support that, do we even need to check local speaking check?
Please explain or change code. Add also some comment to the code.
...
and I see later in the patch that nsSynthVoiceRegistry::GetInstance()->IsSpeaking()
depends on whether global queue is enabled. Ok, fine.
So please add a comment to the code here why nsSynthVoiceRegistry::GetInstance()->IsSpeaking() is used.
 
>@@ -154,17 +167,17 @@ nsSpeechTask::Setup(nsISpeechTaskCallback* aCallback,
> 
>   if (mIndirectAudio) {
>     if (argc > 0) {
>       NS_WARNING("Audio info arguments in Setup() are ignored for indirect audio services.");
>     }
>     return NS_OK;
>   }
> 
>-  // mStream is set up in BindStream() that should be called before this.
>+  // mStream is set up in Init() that should be called before this.
>   MOZ_ASSERT(mStream);
Ah, this is ok, because mIndirectAudio is set to true when mStream is null.
So, to self-document the code, perhaps add 
MOZ_ASSERT(!mStream); inside the if (mIndirectAudio) ?



>+
>+  nsTArray<nsRefPtr<GlobalQueueItem> > mGlobalQueue;
s/> >/>>/
(even the compiler on Windows supports >> these days.)
Attachment #8641378 - Flags: review?(bugs) → review+
Comment on attachment 8641379 [details] [diff] [review]
(Part 3) Introduce [ChromeOnly] SpeechSynthesis.forceCancel for tests.


> 
>-// For testing purposes, allows us to drop anything in the global queue from
>-// content, and bring the browser to initial state.
>+// For testing purposes, allows us to cancel the current task that is
>+// misbehaving, and flush the queue.

Why "misbehaving". You just want to cancel the current task.
forceCancel() is after all quite similar to normal cancel().
I would call the method CancelCurrentTask()
(in webidl cancelCurrentTask())
And please add some comment to the .webidl what the method does.


With that, r+
Attachment #8641379 - Flags: review?(bugs) → review+
Comment on attachment 8641380 [details] [diff] [review]
(Part 4) Introduce mochitest coverage for speech global queue.


>+testFunc(function() {
>+  SpecialPowers.pushPrefEnv(
>+    { set: [['media.webspeech.synth.force_global_queue', true]] }, testFunc);
>+});
This setup is non-obvious. Please add a comment that the latter testFunc doesn't get any callback, so SimpleTest.finish is called, 
or make this somehow easier to read.

The same issue is in couple of places.
Attachment #8641380 - Flags: review?(bugs) → review+
(In reply to kdavis from comment #15)
> Comment on attachment 8641378 [details] [diff] [review]
> (Part 2) Introduce global queue and track speaking state across windows.
> 
> Review of attachment 8641378 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Overall fine.
> 
> I have two questions reflected in the comments on nsSpeechTask.cpp and
> nsSynthVoiceRegistry.cpp
> that may be bugs, or maybe I am simply misunderstanding the code at these
> two places.
> 
> If both of these points are addressed, either by telling me I don't know
> what I'm talking about—
> which may be the case—or by fixing them, I'm fine with the remainder of the
> code.
> 
> ::: dom/media/webspeech/synth/nsSpeechTask.cpp
> @@ +171,5 @@
> >      }
> >      return NS_OK;
> >    }
> >  
> > +  // mStream is set up in Init() that should be called before this.
> 
> What happens when Init() is passed a null aStream? As far as I can see,
> mStream is a nullptr in this case.

If init is passes a null stream, then it is an indirect service (mIndirectService==true), and it returns in the block above.

> 
> ::: dom/media/webspeech/synth/nsSynthVoiceRegistry.cpp
> @@ +700,5 @@
> > +  if (mGlobalQueue.IsEmpty()) {
> > +    return;
> > +  }
> > +
> > +  mGlobalQueue.RemoveElementAt(0);
> 
> What if the element at 0 was pre-paused? Should we remove it?
> It seems like this causes paused tasks to get lost, never to
> by heard.

This function only gets called when the current task has ended (mGlobalQueue index 0), so it can never be in the paused state.
(In reply to Olli Pettay [:smaug] from comment #16)
> Comment on attachment 8641378 [details] [diff] [review]
> (Part 2) Introduce global queue and track speaking state across windows.
> 
> > SpeechSynthesis::Speaking() const
> > {
> >-  if (mSpeechQueue.IsEmpty()) {
> >-    return false;
> >-  }
> >+  bool localSpeaking = !mSpeechQueue.IsEmpty() &&
> >+                       mSpeechQueue.ElementAt(0)->GetState() == SpeechSynthesisUtterance::STATE_SPEAKING;
> > 
> >-  return mSpeechQueue.ElementAt(0)->GetState() == SpeechSynthesisUtterance::STATE_SPEAKING;
> >+  return localSpeaking || nsSynthVoiceRegistry::GetInstance()->IsSpeaking();
> > }
> Why do we want nsSynthVoiceRegistry::GetInstance()->IsSpeaking(); here?
> Feels rather odd that (new SpeechSynthesis()).speaking might return true.
> Is this because Google wants only the global queue?

Yes.

> If we support that, do we even need to check local speaking check?

Ah, good point. I guess not.

> Please explain or change code. Add also some comment to the code.

I'll comment.

> ...
> and I see later in the patch that
> nsSynthVoiceRegistry::GetInstance()->IsSpeaking()
> depends on whether global queue is enabled. Ok, fine.
> So please add a comment to the code here why
> nsSynthVoiceRegistry::GetInstance()->IsSpeaking() is used.
>  
> >@@ -154,17 +167,17 @@ nsSpeechTask::Setup(nsISpeechTaskCallback* aCallback,
> > 
> >   if (mIndirectAudio) {
> >     if (argc > 0) {
> >       NS_WARNING("Audio info arguments in Setup() are ignored for indirect audio services.");
> >     }
> >     return NS_OK;
> >   }
> > 
> >-  // mStream is set up in BindStream() that should be called before this.
> >+  // mStream is set up in Init() that should be called before this.
> >   MOZ_ASSERT(mStream);
> Ah, this is ok, because mIndirectAudio is set to true when mStream is null.
> So, to self-document the code, perhaps add 
> MOZ_ASSERT(!mStream); inside the if (mIndirectAudio) ?
> 
> 

Done.

> 
> >+
> >+  nsTArray<nsRefPtr<GlobalQueueItem> > mGlobalQueue;
> s/> >/>>/
> (even the compiler on Windows supports >> these days.)

Done, and fixing the nsTArray template lines above.
(In reply to Olli Pettay [:smaug] from comment #17)
> Comment on attachment 8641379 [details] [diff] [review]
> (Part 3) Introduce [ChromeOnly] SpeechSynthesis.forceCancel for tests.
> 
> 
> > 
> >-// For testing purposes, allows us to drop anything in the global queue from
> >-// content, and bring the browser to initial state.
> >+// For testing purposes, allows us to cancel the current task that is
> >+// misbehaving, and flush the queue.
> 
> Why "misbehaving". You just want to cancel the current task.

We have an "indirect" test service that misbehaves by not doing dispatchEnd when it gets a cancel(). "direct audio" services (ie. ones with media streams), automatically get an end event since we terminate the stream. But if we don't send a dispatchEnd from an "indirect" service, we are left in limbo and the queue can't advance.

Before the "global queue", it didn't matter because we only tested per-window queues. So once you closed the window we would be back in a good state. With a global queue we need to make sure that no misbehaving service is tying up the global browser queue from the previous test.

> forceCancel() is after all quite similar to normal cancel().
> I would call the method CancelCurrentTask()
> (in webidl cancelCurrentTask())

I hope the explanation above gives context of why I called this forceCancel(). I would be open to another name, but I think cancelCurrentTask() is even more obscure.

> And please add some comment to the .webidl what the method does.
> 

Will do.

> 
> With that, r+
And per IRC, it will be called forceEnd()
Changed argument name, and added comments everywhere it is used.
Attachment #8641377 - Attachment is obsolete: true
More comments where needed, whitespace, and added an assertion as per smaug's suggestion.
Attachment #8641378 - Attachment is obsolete: true
Renamed forceEnd and documented better.
Attachment #8641379 - Attachment is obsolete: true
Attachment #8641379 - Flags: review?(kdavis)
Attachment #8642647 - Flags: review?(kdavis)
Made tests that run twice (with and without global queue) more clear.
Attachment #8641380 - Attachment is obsolete: true
Attachment #8641380 - Flags: review?(kdavis)
Attachment #8642651 - Flags: review?(kdavis)
Assignee: nobody → eitan
Comment on attachment 8642647 [details] [diff] [review]
(Part 3) Introduce [ChromeOnly] SpeechSynthesis.forceEnd for tests. r=smaug

Review of attachment 8642647 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #8642647 - Flags: review?(kdavis) → review+
Comment on attachment 8642651 [details] [diff] [review]
(Part 4) Introduce mochitest coverage for speech global queue. r=smaug

Review of attachment 8642651 [details] [diff] [review]:
-----------------------------------------------------------------

I have a small few issues I'd like clarified, mostly dealing with timing of tests,
which are indicated in the comments, but other than those it looks good.

::: dom/media/webspeech/synth/test/file_global_queue.html
@@ +60,5 @@
> +      SimpleTest.finish();
> +    });
> +    win1.speechSynthesis.speak(utterance1);
> +    win1.speechSynthesis.speak(utterance3);
> +    win2.speechSynthesis.speak(utterance2);

Should the order of the last two lines be switched?

::: dom/media/webspeech/synth/test/file_global_queue_pause.html
@@ +59,5 @@
> +      win2.speechSynthesis.pause();
> +
> +      testSynthState(win1, { speaking: true, pending: false, paused: false});
> +      // 1188099: currently, paused state is not gaurenteed to be immediate.
> +      testSynthState(win2, { speaking: true, pending: true });

We don't have to, but couldn't we test the paused state in a pause event handler for utterance2?

@@ +64,5 @@
> +    });
> +    utterance1.addEventListener('end', function(e) {
> +      is(eventOrder.shift(), 'end1', 'end1');
> +      testSynthState(win1, { speaking: false, pending: false, paused: false});
> +      testSynthState(win2, { speaking: false, pending: true, paused: true});

If paused state is not immediate, are we guaranteed that paused is true for win2 here?

@@ +81,5 @@
> +      testSynthState(win2, { speaking: false, pending: false, paused: false});
> +
> +      win1.speechSynthesis.pause();
> +
> +      testSynthState(win1, { speaking: false, pending: false, paused: true});

Is paused only immediate when nothing is queued?

@@ +87,5 @@
> +
> +      win1.speechSynthesis.speak(utterance3);
> +      win2.speechSynthesis.speak(utterance4);
> +
> +      testSynthState(win1, { speaking: false, pending: true, paused: true});

Again if paused is not immediate how do we know it's always true here?

::: dom/media/webspeech/synth/test/file_speech_cancel.html
@@ +83,3 @@
>  
> +  speechSynthesis.speak(utterance);
> +  ok(!speechSynthesis.speaking, "speechSynthesis is not speaking yet.");

Isn't there the danger that this line is hit after the onstart event handler
of utterance is called, but before speechSynthesis.cancel() from the onstart
event handler of utterance is called?

In other words, there seems to be a chance, with the right timings, that
speechSynthesis.speaking is true.

::: dom/media/webspeech/synth/test/test_global_queue.html
@@ +14,5 @@
> +<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1188099">Mozilla Bug 1188099</a>
> +<p id="display"></p>
> +<iframe id="testFrame"></iframe>
> +<div id="content" style="display: none">
> +  

Could you remove the trailing spaces here
Attachment #8642651 - Flags: review?(kdavis) → review+
(In reply to kdavis from comment #28)
> Comment on attachment 8642651 [details] [diff] [review]
> (Part 4) Introduce mochitest coverage for speech global queue. r=smaug
> 
> Review of attachment 8642651 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I have a small few issues I'd like clarified, mostly dealing with timing of
> tests,
> which are indicated in the comments, but other than those it looks good.
> 
> ::: dom/media/webspeech/synth/test/file_global_queue.html
> @@ +60,5 @@
> > +      SimpleTest.finish();
> > +    });
> > +    win1.speechSynthesis.speak(utterance1);
> > +    win1.speechSynthesis.speak(utterance3);
> > +    win2.speechSynthesis.speak(utterance2);
> 
> Should the order of the last two lines be switched?
> 

The numbering is confusing. The numbers are for the expected order, and not the speak() order. I renamed them so that now it is in speak order, and we expect the utterance order to be 1,3,2.

> ::: dom/media/webspeech/synth/test/file_global_queue_pause.html
> @@ +59,5 @@
> > +      win2.speechSynthesis.pause();
> > +
> > +      testSynthState(win1, { speaking: true, pending: false, paused: false});
> > +      // 1188099: currently, paused state is not gaurenteed to be immediate.
> > +      testSynthState(win2, { speaking: true, pending: true });
> 
> We don't have to, but couldn't we test the paused state in a pause event
> handler for utterance2?
> 

No event is emitted since the utterance is not spoken yet.

https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#dfn-utteranceonpause

> @@ +64,5 @@
> > +    });
> > +    utterance1.addEventListener('end', function(e) {
> > +      is(eventOrder.shift(), 'end1', 'end1');
> > +      testSynthState(win1, { speaking: false, pending: false, paused: false});
> > +      testSynthState(win2, { speaking: false, pending: true, paused: true});
> 
> If paused state is not immediate, are we guaranteed that paused is true for
> win2 here?

Good question. Our implementation guarantees that at this stage the paused state will be true.

> 
> @@ +81,5 @@
> > +      testSynthState(win2, { speaking: false, pending: false, paused: false});
> > +
> > +      win1.speechSynthesis.pause();
> > +
> > +      testSynthState(win1, { speaking: false, pending: false, paused: true});
> 
> Is paused only immediate when nothing is queued?
> 

Correct. If there is no speech queued then the SpeechSynthesis instance could immediately switch to paused state without any async calls to the parent process.

It generally is not something we need to check for from a compliance perspective. But it does allow us to reliably assert that the state changed without listening for a non-existent event.

> @@ +87,5 @@
> > +
> > +      win1.speechSynthesis.speak(utterance3);
> > +      win2.speechSynthesis.speak(utterance4);
> > +
> > +      testSynthState(win1, { speaking: false, pending: true, paused: true});
> 
> Again if paused is not immediate how do we know it's always true here?
> 

Same as above.

> ::: dom/media/webspeech/synth/test/file_speech_cancel.html
> @@ +83,3 @@
> >  
> > +  speechSynthesis.speak(utterance);
> > +  ok(!speechSynthesis.speaking, "speechSynthesis is not speaking yet.");
> 
> Isn't there the danger that this line is hit after the onstart event handler
> of utterance is called, but before speechSynthesis.cancel() from the onstart
> event handler of utterance is called?
> 
> In other words, there seems to be a chance, with the right timings, that
> speechSynthesis.speaking is true.
> 

Maybe theoretically? That has never been a failure that we have seen in similar tests till now. I'm not familiar enough with our JS engine, but I assume that both lines are evaled in the same event loop iteration.

> ::: dom/media/webspeech/synth/test/test_global_queue.html
> @@ +14,5 @@
> > +<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1188099">Mozilla Bug 1188099</a>
> > +<p id="display"></p>
> > +<iframe id="testFrame"></iframe>
> > +<div id="content" style="display: none">
> > +  
> 
> Could you remove the trailing spaces here

Sure!
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.