1003457 - Text to Speech API on Firefox for Windows

Reporter

Description

•

11 years ago

As a part of Web Speech API (https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html), implement Text to Speech API on firefox for OS X. Plan is to use SAPI (Speech Application Programming API). The Speech Application Programming Interface or SAPI is an API developed by Microsoft to allow the use of speech synthesis within Windows applications.

Yash

Reporter

Updated

•

11 years ago

Blocks: TTS_for_firefox

Component: Disability Access APIs → DOM

Matt Brubeck (:mbrubeck)

Updated

•

11 years ago

Assignee: nobody → yash.girdhar

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Yash

Reporter

Comment 1

•

10 years ago

Attached patch [Work in Progress] A very basic version of the service, using Microsoft Speech API(SAPI) (obsolete) — Details — Splinter Review

First and a very basic version of the Sapi service. Following functionalities have been implemented : -> speak() function -> getVoices() function -> specific voice can be selected. Demo : http://eeejay.github.io/webspeechdemos/ -> events support(start and end of input stream, word and sentence boundary), the stop/pause/resume/interrupt functions (Though I have tested them at a basic level). Though I am still working on the patch right now, feedback/suggestions are welcome.

Eitan Isaacson [:eeejay]

Comment 2

•

10 years ago

I just tried this with a debug build on windows. It works indeed! Some quick feedback, please use the feedback flag in the future when you are ready with a new version. 1. The patch needs to be rebased, nsStaticXULComponents.cpp went away. 2. Please clean up all the whitespace, replace all tabs with spaces and remove trailing whitespace. 3. Please conform to the coding style in that module. After playing with it, I noticed: 1. The voice doesn't change, even though I choose different voices. 2. The boundary events don't give accurate timestamps. 3. It crashes a lot. This is a 64bit debug try build, I'm not really set up for getting a stack trace in windows, soI don't know if it related to the patch.

Yash

Reporter

Comment 3

•

10 years ago

Attached patch Added some more functionalities to the service (obsolete) — Details — Splinter Review

-> support for cancelling the current speech, along with pause and resume events.I have tested them at some level, but they need more testing. -> support for changing pitch, rate and volume. -> It gives correct timestamps for the boundary events now. I have used windows inbuilt function and I am searching if there is any better option. @Eitan : Regarding changing the voice, it works fine on my system, so I am still looking for what could the problem in your case. Next to do : Make the speech concurrent, so that two windows can simultaneously use it. Right now, the speech for two different windows is not independent.

Attachment #8451741 - Attachment is obsolete: true

Attachment #8460158 - Flags: feedback?

Yash

Reporter

Updated

•

10 years ago

Attachment #8460158 - Flags: feedback? → feedback?(eitan)

Eitan Isaacson [:eeejay]

Comment 4

•

10 years ago

Comment on attachment 8460158 [details] [diff] [review] Added some more functionalities to the service Review of attachment 8460158 [details] [diff] [review]: ----------------------------------------------------------------- Hey Yash. Before I give feedback, please rebase this on top of most recent mozilla-central. Thanks!

Attachment #8460158 - Flags: feedback?(eitan)

Yash

Reporter

Comment 5

•

10 years ago

Attached patch Added some more functionalities to the service (obsolete) — Details — Splinter Review

Some minor changes and rebased on the Mozilla-central head.

Attachment #8460158 - Attachment is obsolete: true

Attachment #8461732 - Flags: feedback+

Yash

Reporter

Comment 6

•

10 years ago

Attached patch Added some more functionalities to the service (obsolete) — Details — Splinter Review

some changes, to resolve the try errors.

Attachment #8461732 - Attachment is obsolete: true

Attachment #8462011 - Flags: feedback+

Yash

Reporter

Comment 7

•

10 years ago

Attached patch Added some more functionalities to the service (obsolete) — Details — Splinter Review

The previous one failed on try on windows, due to a known bug in visual studio 2010 - http://connect.microsoft.com/VisualStudio/feedback/details/621653/including-stdint-after-intsafe-generates-warnings . Added a workaround for that in this patch. Didn't test locally.

Attachment #8462011 - Attachment is obsolete: true

Attachment #8462960 - Flags: feedback+

Eitan Isaacson [:eeejay]

Comment 8

•

10 years ago

Still same error: https://tbpl.mozilla.org/?tree=Try&rev=6337168f13a7

Yash

Reporter

Comment 9

•

10 years ago

Attached patch Added more functionalities to the service (obsolete) — Details — Splinter Review

Disabled 'FAIL_ON_WARNINGS', so that the patch runs on the try server. Removed extra whitespace and corrected newlines.

Attachment #8462960 - Attachment is obsolete: true

Attachment #8470183 - Flags: feedback+

Yash

Reporter

Comment 10

•

10 years ago

Attached patch Added more functionalities to the service (obsolete) — Details — Splinter Review

Rebased on top of Mozilla-central.

Attachment #8470183 - Attachment is obsolete: true

Attachment #8470206 - Flags: feedback+

Yash

Reporter

Comment 11

•

10 years ago

Attached patch Added more functionalities to the service (obsolete) — Details — Splinter Review

Corrected a minor mistake.

Attachment #8470206 - Attachment is obsolete: true

Attachment #8470414 - Flags: feedback+

Eitan Isaacson [:eeejay]

Comment 12

•

10 years ago

Comment on attachment 8470414 [details] [diff] [review] Added more functionalities to the service Review of attachment 8470414 [details] [diff] [review]: ----------------------------------------------------------------- This works as expected! To be honest, the windows code is a bit hard for me to understand. I don't really get how this does not completely block the UI thread, but it magically doesn't. - Still plenty of whitespace issues, both trailing whitespaces and tabs. Also, code style is wrong. - The mochitests fail on windows. Looks like the addition of more voices are not accounted for in the test. - The voice lang name is wrong, for David I see "409", it should be "en-US". - The voice URLs are strange (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_DAVID_11.0). They should be consistent with other speech services: urn:moz-tts:[servicename]:[voicename]?[languagecode]. ::: content/media/webspeech/synth/nsISpeechService.idl @@ +149,5 @@ > * @param aTask task instance for utterance, used for sending events or audio > * data back to browser. > */ > void speak(in DOMString aText, in DOMString aUri, > + in float aRate, in float aPitch, in float aVolume, Add documentation above. Explain that the volume should only be used in indirect services. ::: content/media/webspeech/synth/nsSpeechTask.cpp @@ +462,5 @@ > DebugOnly<nsresult> rv = mCallback->OnPause(); > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback"); > } > + > + if (!mIndirectAudio) Trailing spaces (above if line). As a rule, indirect services should emit their own events, correct? ::: content/media/webspeech/synth/nsSynthVoiceRegistry.cpp @@ +542,5 @@ > const nsAString& aLang, > const nsAString& aUri, > const float& aRate, > const float& aPitch, > + const float& aVolume, Remove tab indentations. ::: content/media/webspeech/synth/nsSynthVoiceRegistry.h @@ +34,5 @@ > already_AddRefed<nsSpeechTask> SpeakUtterance(SpeechSynthesisUtterance& aUtterance, > const nsAString& aDocLang); > > void Speak(const nsAString& aText, const nsAString& aLang, > + const nsAString& aUri, const float& aRate, const float& aPitch, const float& aVolume, Wrap lines at 80 columns. ::: content/media/webspeech/synth/sapi/nsSapiService.cpp @@ +22,5 @@ > +#include "mozilla/DebugOnly.h" > + > +enum TtsGenderType { > + TTS_GENDER_NONE, > + TTS_GENDER_MALE, indentation @@ +26,5 @@ > + TTS_GENDER_MALE, > + TTS_GENDER_FEMALE > +}; > + > +double starting_time; Why is this a global? @@ +39,5 @@ > + public: > + SapiVoice() {} > + ~SapiVoice() {} > + > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice) No need for this to be thread safe, AFAICT. @@ +43,5 @@ > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice) > + > + WCHAR *mName; > + WCHAR *mLanguage; > + TtsGenderType mGender; Why do we need gender? Our API does not use that, and I don't see it used anywhere. @@ +47,5 @@ > + TtsGenderType mGender; > + CComPtr<ISpObjectToken> voice_token; > + }; > + > + HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds) Not sure why this needs do be a windowsy function. One argument could be gotten from a field off of runnable, and the other is always the same (INFINITE). This could be a private method of the runnable and return void. @@ +50,5 @@ > + > + HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds) > + { > + HRESULT hr = S_OK; > + BOOL fContinue = TRUE; Don't use platform types when not necessary. @@ +54,5 @@ > + BOOL fContinue = TRUE; > + > + while (fContinue) > + { > + DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hWaitHandle, dwMilliseconds, QS_ALLINPUT, MWMO_INPUTAVAILABLE); This DWORD is the only platform type you need to use in this entire file. @@ +106,5 @@ > + std::wstring mText; > + > + // We use this pointer to compare it with the current service task. > + // If they differ, this runnable should stop. > + nsISpeechTask* mTask; The comment above is not true. You use mTask for dispatching events to gecko. @@ +211,5 @@ > + DebugOnly<nsresult> rv = > + data->mRegistry->AddVoice( > + data->mService, aUri, nsDependentString(static_cast<char16_t*>(aVoice->mName)), nsDependentString(static_cast<char16_t*>(aVoice->mLanguage)), true); > + NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Failed to add voice"); > + rv = data->mRegistry->SetDefaultVoice(aUri, true); So every voice is set as default? That doesn't make sense.. @@ +221,5 @@ > + { > + //not sure if we need this > + /*if (mCurrentTask != callbackRunnable->mTask) { > + return; > + } */ From some crude testing, it is not. Not sure why. But you should remove this before review. @@ +241,5 @@ > + break; > + } > + case SPEI_TTS_BOOKMARK:{ > + currentIndex = static_cast<ULONG>(speechEvent.lParam) - prefixLen; > + callbackRunnable->mTask->DispatchBoundary(nsDependentString(static_cast<char16_t*>(L"mark")), GetTickCount() - starting_time, currentIndex); Wrap long lines @@ +291,5 @@ > + mVoice->SetInterest(event_mask, event_mask); > + > + /*set the callback function for receiving the events*/ > + SPNOTIFYCALLBACK *cb = SpeechEventCallback; > + mVoice->SetNotifyCallbackFunction(cb, 0, 0); Can't you directly cast SpeechEventCallback here instead of an intermediate variable? @@ +315,5 @@ > + > + for (unsigned int i = 0; i < voice_count; i++){ > + SapiVoice *voice = new SapiVoice(); > + CComPtr<ISpObjectToken> voice_token; > + if (S_OK != voice_tokens->Next(1, &voice_token, NULL)){ How about that condition being the while loop instead of a for loop? @@ +342,5 @@ > + else if (0 == _wcsicmp(gender_s, L"female")){ > + gender = TTS_GENDER_FEMALE; > + } > + } > + voice->mGender = gender; This gender stuff looks useless. @@ +345,5 @@ > + } > + voice->mGender = gender; > + > + WCHAR *language = nullptr; > + if (S_OK != attributes->GetStringValue(L"Language", &language)){ It looks like this returns an MS LangID, it needs to be converted to a BCP 47 language tag (eg. en-GB). @@ +365,5 @@ > + > + void nsSapiService::RegisterVoices() > + { > + VoiceTraverserData data = { this, nsSynthVoiceRegistry::GetInstance() }; > + mVoices.Enumerate(SapiAddVoiceTraverser, &data); Can't this happen in PopulateVoices instead of going over the list again here? ::: content/media/webspeech/synth/sapi/nsSapiService.h @@ +63,5 @@ > + bool mInitialized; > + > + nsRefPtrHashtable<nsStringHashKey, SapiVoice> mVoices; > + > + ISpVoice* mVoice; The name is misleading, since it does not refer to the same distinct voices that we use. I would call this mSapiVoice or mSapiClient.

Attachment #8470414 - Flags: feedback+

Yash

Reporter

Comment 13

•

10 years ago

(In reply to Eitan Isaacson [:eeejay] from comment #12) > Comment on attachment 8470414 [details] [diff] [review] > Added more functionalities to the service > > Review of attachment 8470414 [details] [diff] [review]: > ----------------------------------------------------------------- > > This works as expected! > > To be honest, the windows code is a bit hard for me to understand. I don't > really get how this does not completely block the UI thread, but it > magically doesn't. > > - Still plenty of whitespace issues, both trailing whitespaces and tabs. > Also, code style is wrong. > - The mochitests fail on windows. Looks like the addition of more voices are > not accounted for in the test. > - The voice lang name is wrong, for David I see "409", it should be "en-US". > - The voice URLs are strange > (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN- > US_DAVID_11.0). They should be consistent with other speech services: > urn:moz-tts:[servicename]:[voicename]?[languagecode]. > > ::: content/media/webspeech/synth/nsISpeechService.idl > @@ +149,5 @@ > > * @param aTask task instance for utterance, used for sending events or audio > > * data back to browser. > > */ > > void speak(in DOMString aText, in DOMString aUri, > > + in float aRate, in float aPitch, in float aVolume, > > Add documentation above. Explain that the volume should only be used in > indirect services. Done. > > ::: content/media/webspeech/synth/nsSpeechTask.cpp > @@ +462,5 @@ > > DebugOnly<nsresult> rv = mCallback->OnPause(); > > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback"); > > } > > + > > + if (!mIndirectAudio) > > Trailing spaces (above if line). > > As a rule, indirect services should emit their own events, correct? Correct, AFAIK. > > ::: content/media/webspeech/synth/nsSynthVoiceRegistry.cpp > @@ +542,5 @@ > > const nsAString& aLang, > > const nsAString& aUri, > > const float& aRate, > > const float& aPitch, > > + const float& aVolume, > > Remove tab indentations. Done. > > ::: content/media/webspeech/synth/nsSynthVoiceRegistry.h > @@ +34,5 @@ > > already_AddRefed<nsSpeechTask> SpeakUtterance(SpeechSynthesisUtterance& aUtterance, > > const nsAString& aDocLang); > > > > void Speak(const nsAString& aText, const nsAString& aLang, > > + const nsAString& aUri, const float& aRate, const float& aPitch, const float& aVolume, > > Wrap lines at 80 columns. Done. > ::: content/media/webspeech/synth/sapi/nsSapiService.cpp > @@ +22,5 @@ > > +#include "mozilla/DebugOnly.h" > > + > > +enum TtsGenderType { > > + TTS_GENDER_NONE, > > + TTS_GENDER_MALE, > > indentation Gender Code removed. > @@ +26,5 @@ > > + TTS_GENDER_MALE, > > + TTS_GENDER_FEMALE > > +}; > > + > > +double starting_time; > > Why is this a global? It is used in various functions and I didn't think it's important enough to be a field of the class. > > @@ +39,5 @@ > > + public: > > + SapiVoice() {} > > + ~SapiVoice() {} > > + > > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice) > > No need for this to be thread safe, AFAICT. If I remove this, it gives error in using SapiVoice in the nsRefPtrHashtable. > @@ +43,5 @@ > > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice) > > + > > + WCHAR *mName; > > + WCHAR *mLanguage; > > + TtsGenderType mGender; > > Why do we need gender? Our API does not use that, and I don't see it used > anywhere. Removed. > @@ +47,5 @@ > > + TtsGenderType mGender; > > + CComPtr<ISpObjectToken> voice_token; > > + }; > > + > > + HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds) > > Not sure why this needs do be a windowsy function. One argument could be > gotten from a field off of runnable, and the other is always the same > (INFINITE). This could be a private method of the runnable and return void. Have made this a method of the runnable. > @@ +50,5 @@ > > + > > + HRESULT WaitAndPumpMessagesWithTimeout(HANDLE hWaitHandle, DWORD dwMilliseconds) > > + { > > + HRESULT hr = S_OK; > > + BOOL fContinue = TRUE; > > Don't use platform types when not necessary. Converted. > @@ +54,5 @@ > > + BOOL fContinue = TRUE; > > + > > + while (fContinue) > > + { > > + DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hWaitHandle, dwMilliseconds, QS_ALLINPUT, MWMO_INPUTAVAILABLE); > > This DWORD is the only platform type you need to use in this entire file. > > @@ +106,5 @@ > > + std::wstring mText; > > + > > + // We use this pointer to compare it with the current service task. > > + // If they differ, this runnable should stop. > > + nsISpeechTask* mTask; > > The comment above is not true. You use mTask for dispatching events to gecko. Changed. > @@ +211,5 @@ > > + DebugOnly<nsresult> rv = > > + data->mRegistry->AddVoice( > > + data->mService, aUri, nsDependentString(static_cast<char16_t*>(aVoice->mName)), nsDependentString(static_cast<char16_t*>(aVoice->mLanguage)), true); > > + NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Failed to add voice"); > > + rv = data->mRegistry->SetDefaultVoice(aUri, true); > > So every voice is set as default? That doesn't make sense.. Changed it. It was initially written for some testing, forgot to remove it. > @@ +221,5 @@ > > + { > > + //not sure if we need this > > + /*if (mCurrentTask != callbackRunnable->mTask) { > > + return; > > + } */ > > From some crude testing, it is not. Not sure why. But you should remove this > before review. Removed. > @@ +241,5 @@ > > + break; > > + } > > + case SPEI_TTS_BOOKMARK:{ > > + currentIndex = static_cast<ULONG>(speechEvent.lParam) - prefixLen; > > + callbackRunnable->mTask->DispatchBoundary(nsDependentString(static_cast<char16_t*>(L"mark")), GetTickCount() - starting_time, currentIndex); > > Wrap long lines Done. > @@ +291,5 @@ > > + mVoice->SetInterest(event_mask, event_mask); > > + > > + /*set the callback function for receiving the events*/ > > + SPNOTIFYCALLBACK *cb = SpeechEventCallback; > > + mVoice->SetNotifyCallbackFunction(cb, 0, 0); > > Can't you directly cast SpeechEventCallback here instead of an intermediate > variable? Done. > @@ +315,5 @@ > > + > > + for (unsigned int i = 0; i < voice_count; i++){ > > + SapiVoice *voice = new SapiVoice(); > > + CComPtr<ISpObjectToken> voice_token; > > + if (S_OK != voice_tokens->Next(1, &voice_token, NULL)){ > > How about that condition being the while loop instead of a for loop? Done. > @@ +342,5 @@ > > + else if (0 == _wcsicmp(gender_s, L"female")){ > > + gender = TTS_GENDER_FEMALE; > > + } > > + } > > + voice->mGender = gender; > > This gender stuff looks useless. Removed. > @@ +345,5 @@ > > + } > > + voice->mGender = gender; > > + > > + WCHAR *language = nullptr; > > + if (S_OK != attributes->GetStringValue(L"Language", &language)){ > > It looks like this returns an MS LangID, it needs to be converted to a BCP > 47 language tag (eg. en-GB). Will do that in the next version. Tried some in-Built methods, but no success. > @@ +365,5 @@ > > + > > + void nsSapiService::RegisterVoices() > > + { > > + VoiceTraverserData data = { this, nsSynthVoiceRegistry::GetInstance() }; > > + mVoices.Enumerate(SapiAddVoiceTraverser, &data); > > Can't this happen in PopulateVoices instead of going over the list again > here? Done. > ::: content/media/webspeech/synth/sapi/nsSapiService.h > @@ +63,5 @@ > > + bool mInitialized; > > + > > + nsRefPtrHashtable<nsStringHashKey, SapiVoice> mVoices; > > + > > + ISpVoice* mVoice; > > The name is misleading, since it does not refer to the same distinct voices > that we use. I would call this mSapiVoice or mSapiClient. Renamed to mSapiClient.

Yash

Reporter

Comment 14

•

10 years ago

Attached patch First version of the service (obsolete) — Details — Splinter Review

worked on the feedback. except MS LangID still not converted to a BCP 47 language tag. will do that in the next version.

Attachment #8470414 - Attachment is obsolete: true

Attachment #8473735 - Flags: feedback?(eitan)

Eitan Isaacson [:eeejay]

Comment 15

•

10 years ago

Comment on attachment 8473735 [details] [diff] [review] First version of the service Review of attachment 8473735 [details] [diff] [review]: ----------------------------------------------------------------- Overall this is progress. I am sure smaug will have a lot more feedback in his review. Before you request a review make sure that: 1. The language string is fixed. 2. Mochitests pass on Windows. ::: content/media/webspeech/synth/nsSpeechTask.cpp @@ +454,5 @@ > return; > } > > + if (mStream) { > + mStream->ChangeExplicitBlockerCount(1); whitespace @@ +463,5 @@ > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback"); > } > > + if (!mIndirectAudio){ > + DispatchPauseImpl(GetCurrentTime(), GetCurrentCharOffset()); whitespace @@ +477,5 @@ > return; > } > > + if (mStream) { > + mStream->ChangeExplicitBlockerCount(-1); whitespace @@ +486,5 @@ > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onResume() callback"); > } > > + if (!mIndirectAudio){ > + DispatchResumeImpl(GetCurrentTime(), GetCurrentCharOffset()); whitespace @@ +498,5 @@ > > LOG(PR_LOG_DEBUG, ("nsSpeechTask::Cancel")); > > + if (mStream) { > + mStream->ChangeExplicitBlockerCount(1); whitespace @@ +507,5 @@ > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onCancel() callback"); > } > > + if (!mIndirectAudio){ > + DispatchEndImpl(GetCurrentTime(), GetCurrentCharOffset()); whitespace ::: content/media/webspeech/synth/sapi/nsSapiService.cpp @@ +20,5 @@ > +#include "prenv.h" > + > +#include "mozilla/DebugOnly.h" > + > +double starting_time; This still seems wrong to me, especially since starting_time is specific to each call of speak(). Please make this a field in the runnable class (have it public so service instance could access it).

Attachment #8473735 - Flags: feedback?(eitan) → feedback+

Yash

Reporter

Comment 16

•

10 years ago

Attached patch Updated patch for the sapi service (obsolete) — Details — Splinter Review

Made the code changes suggested in the last feedback. However, regarding the mochitest failing, I am getting some test timed out errors on running the mochitest for the synth directory(on two tests) on my system. Eitan, I guess you were working on something similar here(https://bugzilla.mozilla.org/show_bug.cgi?id=1059472) . I tried after adding your patch also, but its not working. I am also attaching the errors which I am getting with this patch.

Attachment #8473735 - Attachment is obsolete: true

Flags: needinfo?(eitan)

Yash

Reporter

Comment 17

•

10 years ago

Attached file Mochitest Errors on my system — Details

Here is the errors which I am getting on my system, when running the mochitest.

Flags: needinfo?(eitan)

Eitan Isaacson [:eeejay]

Comment 18

•

10 years ago

I was finally able to build it and test. First of all: this patch introduces a dependency on atlbase.h. In the past, I tried to get this to compile with visual studio express 2013 with no luck because apparently, that file is not included. I decided this week to give it a second chance on a clean VM, and it builds fine. The 3 variables I could think of that would make this work are: 1. Something else in m-c introduced an include for atlbase.h and we somehow added a workaround for it. 2. A newer MozillaBuild is doing something different. 3. I built it with 32bit toolchain (i think), perhaps it is missing in 64bit? I have no clue... But I am still concerned about this introduction, and I am not sure how we should be dealing with this. Should we disable TTS support on windows during configure if no atlbase.h is found? Ryan, what are the best practices for this?

Flags: needinfo?(ryanvm)

Eitan Isaacson [:eeejay]

Comment 19

•

10 years ago

(In reply to Yash from comment #16) > I am also attaching the errors which I am getting with this patch. Those errors have to do with the fact that the SAPI voices have higher priority since they were added earlier. They are interfering with our voice selecting priority test. You will need to provide a preference for removing SAPI voices during mochitests, and disable it in the test's pushPrefEnv().

Eitan Isaacson [:eeejay]

Comment 20

•

10 years ago

Comment on attachment 8513423 [details] [diff] [review] Updated patch for the sapi service Review of attachment 8513423 [details] [diff] [review]: ----------------------------------------------------------------- Besides the whitespace comments, please rebase this on mozilla-central. Also, the voice.lang attribute needs to be fixed and return a proper locale string. Maybe someone more familiar with windows could give better insight on how to fix it. But it is definitely a show stopper. After you rebase, and address these issues. Please flag smaug for feedback, he should probably review this. ::: content/media/webspeech/synth/moz.build @@ +41,5 @@ > > + if CONFIG['MOZ_SYNTH_SAPI']: > + DIRS = ['sapi'] > + CFLAGS += [ > + '-wd4005' Can't this be put in sapi/moz.build? @@ +49,5 @@ > 'ipc/PSpeechSynthesis.ipdl', > 'ipc/PSpeechSynthesisRequest.ipdl', > ] > > +FAIL_ON_WARNINGS = False Is this necessary? ::: content/media/webspeech/synth/nsSpeechTask.cpp @@ +474,5 @@ > DebugOnly<nsresult> rv = mCallback->OnPause(); > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onPause() callback"); > } > > + if (!mIndirectAudio){ Add space between ) and { @@ +497,5 @@ > DebugOnly<nsresult> rv = mCallback->OnResume(); > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onResume() callback"); > } > > + if (!mIndirectAudio){ Same @@ +518,5 @@ > DebugOnly<nsresult> rv = mCallback->OnCancel(); > NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Unable to call onCancel() callback"); > } > > + if (!mIndirectAudio){ Same. ::: content/media/webspeech/synth/sapi/moz.build @@ +6,5 @@ > + > +UNIFIED_SOURCES += [ > +'nsSapiService.cpp', > +'SapiModule.cpp' > +] Indentation. ::: content/media/webspeech/synth/sapi/nsSapiService.cpp @@ +32,5 @@ > + SapiVoice() {} > + ~SapiVoice() {} > + NS_INLINE_DECL_THREADSAFE_REFCOUNTING(SapiVoice) > + WCHAR *mName; > + WCHAR *mLanguage; * goes near type. @@ +64,5 @@ > + nsRefPtr<nsSapiService> mService; > + > + // A windows handle to receive speech events, passed on to WaitForEvents() > + HANDLE hEventsWaitHandle; > + Trailing whitespace @@ +79,5 @@ > +void SapiCallbackRunnable::WaitForEvents(HANDLE hEventsWaitHandle) > +{ > + DWORD dwMilliseconds = INFINITE; > + bool fContinue = true; > + while (fContinue){ space between ) and {. @@ +82,5 @@ > + bool fContinue = true; > + while (fContinue){ > + DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hEventsWaitHandle, dwMilliseconds, > + QS_ALLINPUT, MWMO_INPUTAVAILABLE); > + switch (dwWaitId){ Same. @@ +83,5 @@ > + while (fContinue){ > + DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hEventsWaitHandle, dwMilliseconds, > + QS_ALLINPUT, MWMO_INPUTAVAILABLE); > + switch (dwWaitId){ > + case WAIT_OBJECT_0:{ Same. @@ +87,5 @@ > + case WAIT_OBJECT_0:{ > + fContinue = false; > + break; > + } > + case WAIT_OBJECT_0 + 1:{ Same. Not sure I like this + in a switch statement. There is no proper constant to use? @@ +89,5 @@ > + break; > + } > + case WAIT_OBJECT_0 + 1:{ > + MSG Msg; > + while (::PeekMessage(&Msg, NULL, 0, 0, PM_REMOVE)){ Same. @@ +95,5 @@ > + ::DispatchMessage(&Msg); > + } > + break; > + } > + case WAIT_TIMEOUT:{ Same. @@ +99,5 @@ > + case WAIT_TIMEOUT:{ > + fContinue = false; > + break; > + } > + default:{ Same. I think we don't do curly brackets if there are no scoped variables in the switch statement. @@ +110,5 @@ > +} > + > +NS_IMETHODIMP SapiCallbackRunnable::Run() > +{ > + if (!IsCurrentTask()){ Same. @@ +117,5 @@ > + } > + > + nsISpeechTask* task = mTask; > + task->Setup(this, 1, 16000, 2); //the last three parameters doesn't matter > + //for an indirect service Either align these comments or just make it one above the Setup() line. @@ +128,5 @@ > + > +NS_IMETHODIMP > +SapiCallbackRunnable::OnPause() > +{ > + if (S_OK != mService->mSapiClient->Pause()){ Same. @@ +160,5 @@ > + mService->mSapiClient->Speak(L"", SPF_PURGEBEFORESPEAK | SPF_ASYNC, NULL); > + > + mService->mCurrentTask = nullptr; > + mTask->DispatchEnd(GetTickCount() - starting_time, mService->currentIndex); > + if (mService->isPaused){ Same. You get the picture, fix this everywhere :) I'll stop annotating these. @@ +263,5 @@ > + > + mSapiClient->SetInterest(event_mask, event_mask); > + > + /*set the callback function for receiving the events*/ > + mSapiClient->SetNotifyCallbackFunction((SPNOTIFYCALLBACK *)SpeechEventCallback, 0, 0); No space in cast (SPNOTIFYCALLBACK*) @@ +285,5 @@ > + unsigned long voice_count, i = 0; > + voice_tokens->GetCount(&voice_count); > + > + while(i < voice_count){ > + SapiVoice *voice = new SapiVoice(); * goes near type. @@ +291,5 @@ > + if (S_OK != voice_tokens->Next(1, &voice_token, NULL)){ > + return; > + } > + > + WCHAR *description = nullptr; Same. @@ +304,5 @@ > + NS_WARNING("unable to get voice attributes from sapi"); > + return; > + } > + > + WCHAR *language = nullptr; Same. @@ +378,5 @@ > +{ > + MOZ_ASSERT(NS_IsMainThread()); > + if (XRE_GetProcessType() != GeckoProcessType_Default) { > + MOZ_ASSERT(false, "nsSapiService can only be started on main gecko process"); > + return nullptr; indentation.

Jim Mathies [:jimm]

Comment 21

•

10 years ago

+void SapiCallbackRunnable::WaitForEvents(HANDLE hEventsWaitHandle) +{ + DWORD dwMilliseconds = INFINITE; + bool fContinue = true; + while (fContinue){ + DWORD dwWaitId = ::MsgWaitForMultipleObjectsEx(1, &hEventsWaitHandle, dwMilliseconds, + QS_ALLINPUT, MWMO_INPUTAVAILABLE); + switch (dwWaitId){ + case WAIT_OBJECT_0:{ + fContinue = false; + break; + } + case WAIT_OBJECT_0 + 1:{ + MSG Msg; + while (::PeekMessage(&Msg, NULL, 0, 0, PM_REMOVE)){ + ::TranslateMessage(&Msg); + ::DispatchMessage(&Msg); + } + break; + } + case WAIT_TIMEOUT:{ + fContinue = false; + break; + } + default:{ + fContinue = false; + break; + } + } + } + return; +} Is this getting called on the main thread? (It looks like it since we dispatch the runnable to the main thread.) + callbackRunnable = new SapiCallbackRunnable(final_text, aTask, this); + return NS_DispatchToMainThread(callbackRunnable); In which case this is dangerous. Can we move it to a background thread?

Jim Mathies [:jimm]

Comment 22

•

10 years ago

Comment on attachment 8513423 [details] [diff] [review] Updated patch for the sapi service Review of attachment 8513423 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/media/webspeech/synth/sapi/nsSapiService.cpp @@ +242,5 @@ > +{ > + MOZ_ASSERT(!mInitialized); > + > + /*initialize the COM*/ > + if (FAILED(::CoInitialize(NULL))){ You shouldn't need this, and there's a risk here that if it fails for some reason, you still call couninitialize in the destructor, which would leave our coinit count unbalanced. If you do call this, you should track the success of the call and always match that to the uninitialize. (Actually looking at this more, if this did fail we'd just crash in destructor as it's currently defined, which isn't much better.) @@ +326,5 @@ > + nsDependentString(static_cast<char16_t*>(voice->mName)), > + nsDependentString(static_cast<char16_t*>(voice->mLanguage)), > + true); > + NS_WARN_IF_FALSE(NS_SUCCEEDED(rv), "Failed to add voice"); > + if(i==0){ nit - if (!i) { also please use something like 'idx' or 'index'. @@ +355,5 @@ > + > + original_text = aText.BeginReading(); > + > + /*set the pitch using xml*/ > + std::wstring pitch = std::to_wstring(static_cast<long long>(aPitch * 10 - 10)); I don't think std::wstring is approved for use in mozilla code. Look at the mozilla string classes as a replacement. @@ +388,5 @@ > +} > + > +nsSapiService::~nsSapiService() > +{ > + mSapiClient->Release(); Isn't it possible for mSapiClient to be null here? What versions of Windows support ISpVoice?

Ryan VanderMeulen [:RyanVM]

Comment 23

•

10 years ago

(In reply to Eitan Isaacson [:eeejay] from comment #18) > Ryan, what are the best practices for this? My understanding is that it doesn't come with the Express editions. However, the MSVC 2013 Community Edition (functionally equivalent to Professional but with a free for open source development license) has largely usurped Express AFAICT and definitely does come with it. Given that we've officially dropped support for anything older than MSVC2013, I'd just steer people to the Community Edition and get on with life, personally :) I'm not sure if there's catches to implicitly dropping support for the Express edition that I'm not aware of, though. Nor am I sure who the right person to answer that would be either. Is this a feature that could be disabled via .mozconfig if such a need were to arise? Maybe that would suffice.

Flags: needinfo?(ryanvm)

Biru [:poiru]

Comment 24

•

10 years ago

(In reply to Eitan Isaacson [:eeejay] from comment #18) > First of all: this patch introduces a dependency on atlbase.h. Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr.

Jim Mathies [:jimm]

Comment 25

•

10 years ago

(In reply to Birunthan Mohanathas [:poiru] from comment #24) > (In reply to Eitan Isaacson [:eeejay] from comment #18) > > First of all: this patch introduces a dependency on atlbase.h. > > Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The > ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr. There's no need for this dependency, the uses of CComPtr should simply be removed. COM pointers can be stored in nsRefPtr if need be.

Yash

Reporter

Comment 26

•

10 years ago

(In reply to Jim Mathies [:jimm] from comment #25) > (In reply to Birunthan Mohanathas [:poiru] from comment #24) > > (In reply to Eitan Isaacson [:eeejay] from comment #18) > > > First of all: this patch introduces a dependency on atlbase.h. > > > > Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The > > ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr. > > There's no need for this dependency, the uses of CComPtr should simply be > removed. COM pointers can be stored in nsRefPtr if need be. I think I would need help on that.

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Comment 27

•

10 years ago

(In reply to Yash from comment #26) > (In reply to Jim Mathies [:jimm] from comment #25) > > (In reply to Birunthan Mohanathas [:poiru] from comment #24) > > > (In reply to Eitan Isaacson [:eeejay] from comment #18) > > > > First of all: this patch introduces a dependency on atlbase.h. > > > > > > Based on a cursory glance, the only ATL thing the patch uses is CComPtr. The > > > ATL dependency can be dropped by switching to Microsoft::WRL::ComPtr. > > > > There's no need for this dependency, the uses of CComPtr should simply be > > removed. COM pointers can be stored in nsRefPtr if need be. > > I think I would need help on that. We do this throughout the code base, here are some example in hal of using nsrefptr to store a com pointer - http://mxr.mozilla.org/mozilla-central/search?find=%2Fhal%2Fwindows%2F&string=nsRefPtr

Flags: needinfo?(jmathies)

Yash

Reporter

Comment 28

•

10 years ago

Attached patch Updated patch for the service. (obsolete) — Details — Splinter Review

Worked on the feedback provided and also removed the ATL dependency. But, the application is crashing on pause, remove and resume buttons. I tried to run it with the last successful patch, but it's still resulting in a crash. So, I guess the recent changes to remove the atl dependency are not the cause behind the crash. I have not been getting enough time to debug it as it's my last month of graduation and there is so much going on. I don't want to hold up this patch. So, I am removing myself as the assignee. Anyone who is interested can take this up.

Attachment #8513423 - Attachment is obsolete: true

Yash

Reporter

Updated

•

10 years ago

Assignee: yash.girdhar → nobody

Status: ASSIGNED → NEW

kdavis

Updated

•

10 years ago

Whiteboard: [webspeechapi]

Makoto Kato [:m_kato]

Assignee

Updated

•

10 years ago

Depends on: 1160844

Makoto Kato [:m_kato]

Assignee

Comment 29

•

10 years ago

Attached patch Implement Windows SAPI backend (obsolete) — Details — Splinter Review

based on yash's code. At first, we shouldn't use nsRunnable for SAPI implementation. Speak method is async and this COM interface has callback function for notifications such as start and end, so it is unnecessary to use nsRunnable. Now although I discuss with Eitan about current mochitests, some tests has bugs that it doesn't work with any backend even if Pico for B2G.

Assignee: nobody → m_kato

Makoto Kato [:m_kato]

Assignee

Comment 30

•

10 years ago

Attached patch Implement Windows SAPI backend v1.1 — Details — Splinter Review

Attachment #8592161 - Attachment is obsolete: true

Attachment #8602033 - Attachment is obsolete: true

Makoto Kato [:m_kato]

Assignee

Comment 31

•

10 years ago

Comment on attachment 8603164 [details] [diff] [review] Implement Windows SAPI backend v1.1 based no Yash's code.

Attachment #8603164 - Flags: review?(jmathies)

Jim Mathies [:jimm]

Comment 32

•

10 years ago

Comment on attachment 8603164 [details] [diff] [review] Implement Windows SAPI backend v1.1 Review of attachment 8603164 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/webspeech/synth/windows/SapiService.cpp @@ +98,5 @@ > + // After cancel, mCurrentIndex may be updated. > + // At cancel case, use mCurrentIndex for DispatchEnd. > + mSpeakTextLen = 0; > + // Purge all the previous utterances and speak an empty string > + if (FAILED(mSapiClient->Speak(nullptr, SPF_PURGEBEFORESPEAK, nullptr))) { any risk of jank here? @@ +347,5 @@ > +{ > + MOZ_ASSERT(NS_IsMainThread()); > + if (XRE_GetProcessType() != GeckoProcessType_Default) { > + MOZ_ASSERT(false, > + "SapiService can only be started on main gecko process"); will this trigger on startup in content processes?

Attachment #8603164 - Flags: review?(jmathies) → review+

Eitan Isaacson [:eeejay]

Comment 33

•

9 years ago

Hey Makoto, Looks like this may be ready to land? Do you have answers about jank/startup?

Flags: needinfo?(m_kato)

David Bolter [:davidb] (NeedInfo me for attention)

Comment 34

•

9 years ago

Vladan, if we land this on nightly and there is jank will that be automagically caught via tooling?

Flags: needinfo?(vdjeric)

Makoto Kato [:m_kato]

Assignee

Comment 35

•

9 years ago

(In reply to Jim Mathies [:jimm] from comment #32) > Comment on attachment 8603164 [details] [diff] [review] > Implement Windows SAPI backend v1.1 > > Review of attachment 8603164 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/media/webspeech/synth/windows/SapiService.cpp > @@ +98,5 @@ > > + // After cancel, mCurrentIndex may be updated. > > + // At cancel case, use mCurrentIndex for DispatchEnd. > > + mSpeakTextLen = 0; > > + // Purge all the previous utterances and speak an empty string > > + if (FAILED(mSapiClient->Speak(nullptr, SPF_PURGEBEFORESPEAK, nullptr))) { > > any risk of jank here? No. all speak queue is reset only > @@ +347,5 @@ > > +{ > > + MOZ_ASSERT(NS_IsMainThread()); > > + if (XRE_GetProcessType() != GeckoProcessType_Default) { > > + MOZ_ASSERT(false, > > + "SapiService can only be started on main gecko process"); > > will this trigger on startup in content processes? This implementation is XPCOM. So it is possible that addon creates instance.

Makoto Kato [:m_kato]

Assignee

Comment 36

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=494b4060608a

Makoto Kato [:m_kato]

Assignee

Comment 37

•

9 years ago

(In reply to Makoto Kato (:m_kato) from comment #35) > > will this trigger on startup in content processes? > > This implementation is XPCOM. So it is possible that addon creates instance. But I think that we can remove this. I will file new bug if needed

Vladan Djeric (:vladan)

Comment 38

•

9 years ago

(In reply to David Bolter [:davidb] from comment #34) > Vladan, if we land this on nightly and there is jank will that be > automagically caught via tooling? If there is a regression in "hot" startup time (startup time without disk I/O cost), it will be flagged. The jank from actually using the API will not be caught because existing Talos tests don't exercise text-to-speech functionality.

Flags: needinfo?(vdjeric)

Makoto Kato [:m_kato]

Assignee

Comment 39

•

9 years ago

We turn off webspeech prefs for Firefox desktop now. And mochitest always uses Mock implementation.

Flags: needinfo?(m_kato)

Pulsebot

Comment 40

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/cd54ba9f7d53

Ryan VanderMeulen [:RyanVM]

Comment 41

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/cd54ba9f7d53

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox42: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla42

Xidorn Quan [:xidorn] UTC+10

Comment 42

•

9 years ago

It seems this patch breaks compiling with VS2013 Express. It reports error: > C:\Program Files (x86)\Windows Kits\8.1\include\um\sphelper.h(51) : fatal error C1083: > Cannot open include file: 'atlbase.h': No such file or directory when compiling Unified_cpp_synth_windows0.cpp. And it seems ATL is part of Visual Studio except Express Edition [1]. Well, I know we have official deprecated Express Edition... (And I eventually have to upgrade MozillaBuild and try VS2015 now) [1] http://stackoverflow.com/a/16404536

Nicholas Nethercote [inactive]

Comment 43

•

9 years ago

> It seems this patch breaks compiling with VS2013 Express. I just hit this as well.

(Away)

Comment 44

•

9 years ago

Maybe we could detect lack of ATL during configure, and disable this code?

Makoto Kato [:m_kato]

Assignee

Comment 45

•

9 years ago

Why does Windows SDK use ATL? :-< But I think that I can replace SpEnumToken with SAPI COM API, then I can remove sphelper.h.

Makoto Kato [:m_kato]

Assignee

Updated

•

9 years ago

Blocks: 1187155

Makoto Kato [:m_kato]

Assignee

Comment 46

•

9 years ago

patch is coming at bug 1187155 soon.

Sebastian Zartner [:sebo]

Updated

•

9 years ago

Keywords: dev-doc-needed

Sebastian Zartner [:sebo]

Updated

•

9 years ago

Depends on: 1221443

[:philipp]

Updated

•

9 years ago

Depends on: 1228134

Chris Mills [:cmills]

Comment 47

•

9 years ago

This has now been documented — see all the goodness at https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API! A technical review would be really nice, although I appreciate that it is a fairly big API, so we might want to share the work between the different engineers involved.

Keywords: dev-doc-needed → dev-doc-complete

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Depends on: 1230533

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

[Work in Progress] A very basic version of the service, using Microsoft Speech API(SAPI) 10 years ago Yash 19.30 KB, patch		Details \| Diff \| Splinter Review
Added some more functionalities to the service 10 years ago Yash 36.38 KB, patch		Details \| Diff \| Splinter Review
Added some more functionalities to the service 10 years ago Yash 34.78 KB, patch	yash.girdhar : feedback+	Details \| Diff \| Splinter Review
Added some more functionalities to the service 10 years ago Yash 35.38 KB, patch	yash.girdhar : feedback+	Details \| Diff \| Splinter Review
Added some more functionalities to the service 10 years ago Yash 35.34 KB, patch	yash.girdhar : feedback+	Details \| Diff \| Splinter Review
Added more functionalities to the service 10 years ago Yash 33.25 KB, patch	yash.girdhar : feedback+	Details \| Diff \| Splinter Review
Added more functionalities to the service 10 years ago Yash 33.13 KB, patch	yash.girdhar : feedback+	Details \| Diff \| Splinter Review
Added more functionalities to the service 10 years ago Yash 33.13 KB, patch		Details \| Diff \| Splinter Review
First version of the service 10 years ago Yash 30.88 KB, patch	eeejay : feedback+	Details \| Diff \| Splinter Review
Updated patch for the sapi service 10 years ago Yash 31.67 KB, patch		Details \| Diff \| Splinter Review
Mochitest Errors on my system 10 years ago Yash 19.89 KB, text/plain		Details
Updated patch for the service. 10 years ago Yash 29.99 KB, patch		Details \| Diff \| Splinter Review
Implement Windows SAPI backend 10 years ago Makoto Kato [:m_kato] 14.94 KB, patch		Details \| Diff \| Splinter Review
Implement Windows SAPI backend v1.1 10 years ago Makoto Kato [:m_kato] 15.24 KB, patch	jimm : review+	Details \| Diff \| Splinter Review