Text to Speech API on Firefox for Linux

RESOLVED FIXED in Firefox 44

Status

()

defect
RESOLVED FIXED
5 years ago
a month ago

People

(Reporter: yash.girdhar, Assigned: eeejay)

Tracking

({dev-doc-complete})

unspecified
mozilla44
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

5 years ago
As a part of Web Speech API (https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html), implement Text to Speech API on firefox for Linnux.

Plan is to use speechd, as the gnome-speech has not been active for a time now.
(Reporter)

Updated

5 years ago
So the API has been implemented already, but backend is missing.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Assignee)

Updated

4 years ago
Assignee: nobody → eitan
(Assignee)

Updated

4 years ago
Depends on: 1207806
Comment on attachment 8665201 [details] [diff] [review]
Support Web Speech API synthesis via speech-dispatcher

kdavis said he could take look.
(sorry, my review load is a bit high)
Attachment #8665201 - Flags: review?(bugs) → review?(kdavis)

Comment 4

4 years ago
Comment on attachment 8665201 [details] [diff] [review]
Support Web Speech API synthesis via speech-dispatcher

Review of attachment 8665201 [details] [diff] [review]:
-----------------------------------------------------------------

Overall it's fine, but I have a few nits, noted throughout.

However, there is at least one bug in SpeechDispatcherService::Speak where rate is set.
If this is bug fixed, along with some of the nits, it's a review+

::: configure.in
@@ +4785,5 @@
> +if test "$MOZ_ENABLE_GTK" -o "$MOZ_ENABLE_QT"
> +then
> +    MOZ_SYNTH_SPEECHD=1
> +
> +    MOZ_ARG_DISABLE_BOOL(dbus,

The name here 'dbus' duplicates the name of the dbus support bool above. A unique name should be given to this boolean.

::: dom/media/webspeech/synth/speechd/SpeechDispatcherService.cpp
@@ +5,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "nsISupports.h"
> +#include "nsPrintfCString.h"
> +#include "SpeechDispatcherService.h"

Should come before all other includes[http://mzl.la/1Fv89o8]

@@ +14,5 @@
> +#include "nsReadableUtils.h"
> +
> +#include "mozilla/dom/nsSynthVoiceRegistry.h"
> +#include "mozilla/dom/nsSpeechTask.h"
> +#include "mozilla/Preferences.h"

Includes should be in alphabetic order[http://mzl.la/1Fv89o8]

@@ +59,5 @@
> +  SPDCallback callback_pause;
> +  SPDCallback callback_resume;
> +  SPDCallbackIM callback_im;
> +
> +  /* partial, more private fields in structure */

If this is to have private fields, why is it a struct and not a class?

@@ +134,5 @@
> +
> +  // Voice name
> +  nsString mName;
> +
> +  // Voice language, in BCB-47 syntax

Should be BCP-47

@@ +248,5 @@
> +    case SPD_EVENT_END:
> +      mTask->DispatchEnd((TimeStamp::Now() - mStartTime).ToSeconds(), 0);
> +      remove = true;
> +      break;
> +

Can you explicitly indicate here that SPD_EVENT_INDEX_MARK is not supported?
Something as simple as:

case SPD_EVENT_INDEX_MARK:
  // Not yet supported
  break;

@@ +352,5 @@
> +
> +  spd_set_notification_on(mSpeechdClient, SPD_BEGIN);
> +  spd_set_notification_on(mSpeechdClient, SPD_END);
> +  spd_set_notification_on(mSpeechdClient, SPD_CANCEL);
> +  spd_set_notification_on(mSpeechdClient, SPD_RESUME);

I assume no pause due to reasons explained in SpeechDispatcherCallback::OnPause.

@@ +442,5 @@
> +SpeechDispatcherService::Speak(const nsAString& aText, const nsAString& aUri,
> +                               float aVolume, float aRate, float aPitch,
> +                               nsISpeechTask* aTask)
> +{
> +  NS_ENSURE_TRUE(mInitialized, NS_ERROR_NOT_AVAILABLE);

Should use NS_WARN_IF instead of NS_ENSURE_* [http://mzl.la/1NWF2eJ]

@@ +467,5 @@
> +  int rate = 0;
> +
> +  if (aRate > 1) {
> +    rate = static_cast<int>((aRate - 1) * 10);
> +  } else if (aRate < 0) {

This should be (aRate < 1) as according to the comments above (aRate < 0) never occurs as "We provide a rate of 0.1 to 10 with 1 being default."

@@ +468,5 @@
> +
> +  if (aRate > 1) {
> +    rate = static_cast<int>((aRate - 1) * 10);
> +  } else if (aRate < 0) {
> +    rate = static_cast<int>((aRate * 100) - 110);

This also can't be correct as aRate = 1 should map to rate = 0. It doesn't. It maps to -10.

There is a discontinuity in the mapping as [assuming the (aRate < 0)=>(aRate < 1) switch above is made]  aRate = 1+epsilon maps to rate = 0+epsilon but aRate = 1-epsilon maps to rate = -10+epsilon. (Where of course epsilon is small and positive.)

Actually this should be:

rate = static_cast<int>((aRate - 1) * (100/0.9));

@@ +477,5 @@
> +  // We provide a pitch of 0 to 2 with 1 being the default.
> +  // speech-dispatcher expects -100 to 100 with 0 being default.
> +  spd_set_voice_pitch(mSpeechdClient, static_cast<int>((aPitch - 1) * 100));
> +
> +  // The last three parameters doesn't matter for an indirect service

Type-o "don't matter"

::: dom/media/webspeech/synth/speechd/SpeechDispatcherService.h
@@ +11,5 @@
> +#include "nsIThread.h"
> +#include "nsISpeechService.h"
> +#include "nsRefPtrHashtable.h"
> +#include "nsTArray.h"
> +#include "mozilla/StaticPtr.h"

Include "mozilla/StaticPtr.h" should be sorted alphabetically[http://mzl.la/1Fv89o8]
Attachment #8665201 - Flags: review?(kdavis) → review+
(Assignee)

Comment 5

4 years ago
(In reply to kdavis from comment #4)
> Comment on attachment 8665201 [details] [diff] [review]
> Support Web Speech API synthesis via speech-dispatcher
> 
> Review of attachment 8665201 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Overall it's fine, but I have a few nits, noted throughout.
> 
> However, there is at least one bug in SpeechDispatcherService::Speak where
> rate is set.
> If this is bug fixed, along with some of the nits, it's a review+
> 
> ::: configure.in
> @@ +4785,5 @@
> > +if test "$MOZ_ENABLE_GTK" -o "$MOZ_ENABLE_QT"
> > +then
> > +    MOZ_SYNTH_SPEECHD=1
> > +
> > +    MOZ_ARG_DISABLE_BOOL(dbus,
> 
> The name here 'dbus' duplicates the name of the dbus support bool above. A
> unique name should be given to this boolean.
> 

Oops.

> ::: dom/media/webspeech/synth/speechd/SpeechDispatcherService.cpp
> @@ +5,5 @@
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +#include "nsISupports.h"
> > +#include "nsPrintfCString.h"
> > +#include "SpeechDispatcherService.h"
> 
> Should come before all other includes[http://mzl.la/1Fv89o8]

Done.

> 
> @@ +14,5 @@
> > +#include "nsReadableUtils.h"
> > +
> > +#include "mozilla/dom/nsSynthVoiceRegistry.h"
> > +#include "mozilla/dom/nsSpeechTask.h"
> > +#include "mozilla/Preferences.h"
> 
> Includes should be in alphabetic order[http://mzl.la/1Fv89o8]
> 

Done.

> @@ +59,5 @@
> > +  SPDCallback callback_pause;
> > +  SPDCallback callback_resume;
> > +  SPDCallbackIM callback_im;
> > +
> > +  /* partial, more private fields in structure */
> 
> If this is to have private fields, why is it a struct and not a class?
> 

It is a C struct in the libspeechd library. When I say private here, I mean that there is a comment
 in the speechd headers that says that the rest of the fields are private:
http://git.freebsoft.org/?p=speechd.git;a=blob;f=src/api/c/libspeechd.h;h=050c9f59cd5be1039d73d7ea2c289d11290420bf;hb=HEAD#l90

I'll make the comment clearer.

> @@ +134,5 @@
> > +
> > +  // Voice name
> > +  nsString mName;
> > +
> > +  // Voice language, in BCB-47 syntax
> 
> Should be BCP-47
> 

Yup.

> @@ +248,5 @@
> > +    case SPD_EVENT_END:
> > +      mTask->DispatchEnd((TimeStamp::Now() - mStartTime).ToSeconds(), 0);
> > +      remove = true;
> > +      break;
> > +
> 
> Can you explicitly indicate here that SPD_EVENT_INDEX_MARK is not supported?
> Something as simple as:
> 
> case SPD_EVENT_INDEX_MARK:
>   // Not yet supported
>   break;
> 

Done.

> @@ +352,5 @@
> > +
> > +  spd_set_notification_on(mSpeechdClient, SPD_BEGIN);
> > +  spd_set_notification_on(mSpeechdClient, SPD_END);
> > +  spd_set_notification_on(mSpeechdClient, SPD_CANCEL);
> > +  spd_set_notification_on(mSpeechdClient, SPD_RESUME);
> 
> I assume no pause due to reasons explained in
> SpeechDispatcherCallback::OnPause.
> 

Yeah, removed that.

> @@ +442,5 @@
> > +SpeechDispatcherService::Speak(const nsAString& aText, const nsAString& aUri,
> > +                               float aVolume, float aRate, float aPitch,
> > +                               nsISpeechTask* aTask)
> > +{
> > +  NS_ENSURE_TRUE(mInitialized, NS_ERROR_NOT_AVAILABLE);
> 
> Should use NS_WARN_IF instead of NS_ENSURE_* [http://mzl.la/1NWF2eJ]
> 

Done, thanks.

> @@ +467,5 @@
> > +  int rate = 0;
> > +
> > +  if (aRate > 1) {
> > +    rate = static_cast<int>((aRate - 1) * 10);
> > +  } else if (aRate < 0) {
> 
> This should be (aRate < 1) as according to the comments above (aRate < 0)
> never occurs as "We provide a rate of 0.1 to 10 with 1 being default."
> 

Oops!

> @@ +468,5 @@
> > +
> > +  if (aRate > 1) {
> > +    rate = static_cast<int>((aRate - 1) * 10);
> > +  } else if (aRate < 0) {
> > +    rate = static_cast<int>((aRate * 100) - 110);
> 
> This also can't be correct as aRate = 1 should map to rate = 0. It doesn't.
> It maps to -10.
> 
> There is a discontinuity in the mapping as [assuming the (aRate < 0)=>(aRate
> < 1) switch above is made]  aRate = 1+epsilon maps to rate = 0+epsilon but
> aRate = 1-epsilon maps to rate = -10+epsilon. (Where of course epsilon is
> small and positive.)
> 
> Actually this should be:
> 
> rate = static_cast<int>((aRate - 1) * (100/0.9));
> 

Yes! Thank you! I was sloppy there. The new block looks like this:
  if (aRate > 1) {
    rate = static_cast<int>((aRate - 1) * 10);
  } else if (aRate <= 1) {
    rate = static_cast<int>((aRate - 1) * (100/0.9));
  }

> @@ +477,5 @@
> > +  // We provide a pitch of 0 to 2 with 1 being the default.
> > +  // speech-dispatcher expects -100 to 100 with 0 being default.
> > +  spd_set_voice_pitch(mSpeechdClient, static_cast<int>((aPitch - 1) * 100));
> > +
> > +  // The last three parameters doesn't matter for an indirect service
> 
> Type-o "don't matter"
> 

Yup.

> ::: dom/media/webspeech/synth/speechd/SpeechDispatcherService.h
> @@ +11,5 @@
> > +#include "nsIThread.h"
> > +#include "nsISpeechService.h"
> > +#include "nsRefPtrHashtable.h"
> > +#include "nsTArray.h"
> > +#include "mozilla/StaticPtr.h"
> 
> Include "mozilla/StaticPtr.h" should be sorted
> alphabetically[http://mzl.la/1Fv89o8]

Done.
https://hg.mozilla.org/mozilla-central/rev/01472662a9ab
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44

Updated

4 years ago
Depends on: 1227848
This has now been documented — see all the goodness at https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API! A technical review would be really nice, although I appreciate that it is a fairly big API, so we might want to share the work between the different engineers involved.
Depends on: 1229489
Component: DOM → DOM: Core & HTML
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.