Prep for Bug 1178326: Make speech recognition services language dependent and remove assumption of a single service

RESOLVED FIXED in Firefox 42

Status

()

Core
Web Speech
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: kdavis, Assigned: kdavis)

Tracking

(Blocks: 1 bug)

unspecified
mozilla42
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox42 fixed)

Details

(Whiteboard: [webspeechapi][vaani][systemsfe])

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

3 years ago
Currently the WebSpeech API's code makes several assumptions that will be
false once we allow for dynamically changing speech recognition languages.

The core assumption the code makes is that there is a single global SpeechRecognition engine that has the "correct" language set.

As a result of this assumption it reaches several conclusions that will not
be true once we allow for dynamically changing speech recognition languages:

1. On each call to SpeechGrammarList::addFromString() it is possible to
   validate the passed grammar using the global SpeechRecognition engine.
   This is false once multiple engines and languages are allowed as each
   engine many have a different notion of "valid grammar".
2. The global SpeechRecognition engine is language independent. This is
   false as the global engine works only for a single language. In future
   a global engine will exists for each language.
3. SpeechRecognition::grammars is set implicitly once one creates a
   SpeechGrammarList instance, as there is only one global SpeechRecognition.
   Once there can be multiple SpeechRecognition instances present, one
   needs to explicitly assign SpeechRecognition::grammars to a given
   SpeechGrammarList instance.
4. SpeechRecognition::lang need never be set as there is only a single
   language en-US. Once multiple languages are allowed the value of
   this variable will determine the SpeechRecognition instance used.
(Assignee)

Updated

3 years ago
Summary: Prep for dynamically changing speech recognition languages: Make speech recognition services language dependent and remove assumption of a single service → Prep for Bug 1178326: Make speech recognition services language dependent and remove assumption of a single service
(Assignee)

Updated

3 years ago
Assignee: nobody → kdavis
Blocks: 1178326
(Assignee)

Updated

3 years ago
Whiteboard: [webspeechapi][vaani][systemsfe]
(Assignee)

Updated

3 years ago
Blocks: 1185234
(Assignee)

Comment 1

3 years ago
The block of Bug 1185234 is only a result of the fact that they both edit the same file
and this patch must be applied first.
(Assignee)

Comment 2

3 years ago
Created attachment 8636502 [details] [diff] [review]
Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service

Part 1 of 1: Made speech recognition services language dependent and removed
assumption of a single service

REZA: This will brake your code!!! Please take a look at how this changes things
in the examples below. This break, unfortunately, is required to properly
support multiple languages.

In preparation for Bug 1178326 the current WebSpeech API needs to be modified.
The core modification of this patch consists of removing the assumption that
there exists a single speech recognition engine. After this patch the assumption
is that there exists a single speech recognition engine *per language*.

The patch consists of various changes related to removing this assumption

1. The name of the default speech recognition service isn't "pocketsphinx" but
"pocketsphinx-en-US” this is in preparation for the support of multiple
languages where each language has a appropriately named service.

2. SpeechGrammarList::AddFromString no longer validates and sets the grammar
on *the* speech recognition engine as there may be multiple engines present each
with different notions of “valid” grammar.

3. The SpeechRecognition::grammars member must be explicitly assigned to as
there may be multiple instances of SpeechRecognition present each with a
difference language and the assignment can no longer happen implicitly.

4. The member SpeechRecognition::lang must be explicitly set or the lang is
derived from the lang attribute of the enclosing HTML root element. If
SpeechRecognition::lang is not set and the lang attribute of the enclosing HTML
root element is not set, an error will occur on calls to
SpeechRecognition::start()

5. Validation of a SpeechGrammarList only occurs upon calls to
SpeechRecognition::start() as this is the only time at which all of the required
elements (language and SpeechGrammarList) can be assumed to be correctly
set by the user. Anytime before a call to SpeechRecognition::start(), the user
may be in the process of assigning to the SpeechRecognition attributes.


Practically these items have the effect that old code of the form

var sr = new SpeechRecognition(),
var sgl = new SpeechGrammarList();
sgl.addFromString(
"#JSGF V1.0; grammar test; public <simple> =   how to recognize speech |" +
                                              "how to wreck a nice beach ;”,1);
sr.start();

is no longer valid as the SpeechRecognition has no associated SpeechGrammarList
or language, assuming that enclosing HTML root has no lang set.

If the enclosing HTML root has no lang set

<html xmlns="http://www.w3.org/1999/html">
...
</html>

one must explicitly set SpeechRecognition::grammars and SpeechRecognition::lang

var sr = new SpeechRecognition(),
sr.lang =“en-US”;
var sgl = new SpeechGrammarList();
sgl.addFromString(
"#JSGF V1.0; grammar test; public <simple> =   how to recognize speech |" +
                                              "how to wreck a nice beach ;”,1);
sr. grammars = sgl;
sr.start(); // Validation of sr.grammars occurs here

If the enclosing HTML root has a lang set

<html xmlns="http://www.w3.org/1999/html" lang="en-US”>
...
</html>

one must only explicitly set SpeechRecognition::grammars

var sr = new SpeechRecognition(),
var sgl = new SpeechGrammarList();
sgl.addFromString(
"#JSGF V1.0; grammar test; public <simple> =   how to recognize speech |" +
                                              "how to wreck a nice beach ;”,1);
sr. grammars = sgl;
sr.start(); // Validation of sr.grammars occurs here

as the language will be automatically picked up from the HTML root.


Addendum:
In making this patch the behaviour of SpeechRecognition::start() had to change
with respect to having mRecognitionService null. 

Previously, as there was a single service, if mRecognitionService was null this
was an error in gecko and a warning was issued but SpeechRecognition::start()
didn’t throw.

Now as the user’s language choice causes mRecognitionService to be set to point
to the apropos service, if its’s null, due to the user choosing an unsupported
language start() throws to tell the user they messed up.

This change unfortunately exposed a problem in test_audio_capture_error.html
which is also fixed in this patch. (On non-B2G platforms mRecognitionService
was always null as the fake backend was never used and on non-B2G platforms
pocketsphinx was also not there. This caused the new code to throw. Now the
fake backend is used everywhere and it doesn’t throw.)
Attachment #8636502 - Flags: review?(bugs)
Attachment #8636502 - Flags: review?(anatal)
(Assignee)

Comment 3

3 years ago
Almost forgot, the try is running here https://treeherder.mozilla.org/#/jobs?repo=try&revision=59b406dd1ea1
(Assignee)

Comment 4

3 years ago
Any progress on reviews for this bug?

Comment 5

3 years ago
review is coming. Sorry, was on vacation and my review load is what it is.
(Assignee)

Comment 6

3 years ago
(In reply to Olli Pettay [:smaug] from comment #5)
> review is coming. Sorry, was on vacation and my review load is what it is.

No problem.

As there are two reviewers and no review, I was just a little worried that
it was, for whatever reason, lost and not queued anywhere.

Comment 7

3 years ago
Comment on attachment 8636502 [details] [diff] [review]
Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service

>+GetSpeechRecognitionService(const nsAString& aLang)
> {
>   nsAutoCString speechRecognitionServiceCID;
> 
>@@ -70,19 +74,23 @@ GetSpeechRecognitionService()
>   Preferences::GetCString(PREFERENCE_DEFAULT_RECOGNITION_SERVICE);
>   nsAutoCString speechRecognitionService;
> 
>-  if (!prefValue.get() || prefValue.IsEmpty()) {
>-    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
>-  } else {
>+  if (!prefValue.IsEmpty()) {
>     speechRecognitionService = prefValue;
>+  } else if (!aLang.IsEmpty()) {
>+    speechRecognitionService =
>+      NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
>+      NS_ConvertUTF16toUTF8(aLang);
>+  } else {
>+    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
>   }

I would first check if (!aLang.IsEmpty()) ...
then the pref based value, and then DEFAULT_RECOGNITION_SERVICE.
Otherwise we always pick us-EN on b2g, since the patch sets 
pocketsphinx-en-US pref.




> 
>-  if (!SpeechRecognition::mTestConfig.mFakeRecognitionService){
>+  if (SpeechRecognition::mTestConfig.mFakeRecognitionService){

while you're here, want to add the missing space between ) and {

> SpeechRecognition::GetGrammars(ErrorResult& aRv) const
> {
>-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
>-  return nullptr;
>+  nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
>+  return speechGrammarList.forget();
> }
Per WebIDL .grammars is never null, but nothing guarantees mSpeechGrammarList isn't null.
So, this would crash if you access speechRecognition.grammars before anyone has set any grammars.
The spec is vague here, but could we do something like
if (!mSpeechGrammarList) {
  mSpeechGrammarList = new SpeechGrammarList(GetParentObject());
}
nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
return speechGrammarList.forget();


> SpeechRecognition::SetGrammars(SpeechGrammarList& aArg, ErrorResult& aRv)
> {
>-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
>-  return;
>+  mSpeechGrammarList = &aArg;
> }
And since the getter or setter can throw anymore, want to remove [Throws] from the .webidl and then remove ErrorResult param.



> 
> void
> SpeechRecognition::GetLang(nsString& aRetVal, ErrorResult& aRv) const
> {
>-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
>+  aRetVal = mLang;
>   return;
> }
> 
> void
> SpeechRecognition::SetLang(const nsAString& aArg, ErrorResult& aRv)
> {
>-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
>+  mLang = aArg;
>   return;
> }
ditto here about [Throws] and ErrorResult
>+bool SpeechRecognition::SetRecognitionService(ErrorResult& aRv) {
Nit,
bool
SpeechRecognition::SetRecognitionService(ErrorResult& aRv)
{


>+bool SpeechRecognition::ValidateAndSetGrammarList(ErrorResult& aRv) {
ditto



>+  if (nullptr == mSpeechGrammarList) {
just 
if (!mSpeechGrammarList)

>+  for (uint32_t count = 0; count < grammarListLength; ++count) {
>+    nsRefPtr<SpeechGrammar> speechGrammar = mSpeechGrammarList->Item(count,aRv);
space after ,


>+    if (NS_OK != mRecognitionService->ValidateAndSetGrammarList(speechGrammar.get(), nullptr)) {
if (NS_FAILED(mRecognitionService->ValidateAndSetGrammarList(speechGrammar.get(), nullptr))))

>@@ -249,6 +250,10 @@ private:
>   nsCOMPtr<nsITimer> mSpeechDetectionTimer;
>   bool mAborted;
> 
>+  nsString mLang;
>+
>+  nsRefPtr<SpeechGrammarList> mSpeechGrammarList;
We need to cycle collect this one.
Which means... interesting, SpeechRequest doesn't add anything to cycle collection yet.
Looks like mDOMStream member variable is also cycle collectable and should have been traversed/unlinked.
So (this is perhaps new to you, if you haven't dealt with cycle collection in other patches yet.)
You add
  NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED(SpeechRecognition, DOMEventTargetHelper)
right after NS_DECL_ISUPPORTS_INHERITED in SpeechRecognition.h

Then in .cpp
NS_IMPL_CYCLE_COLLECTION_INHERITED(SpeechRecognition, DOMEventTargetHelper, mDOMStream, mSpeechGrammarList)

That will add those two member variables to cycle collection, so that they get traversed and unlinked properly.

You need to also change
NS_INTERFACE_MAP_BEGIN(SpeechRecognition)
to
NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED(SpeechRecognition)


If you're curious about cycle collection
http://blog.kylehuey.com/post/27564411715/cycle-collection is a good blog post, and
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#8 has some more technical information.
Attachment #8636502 - Flags: review?(bugs)
Attachment #8636502 - Flags: review?(anatal)
Attachment #8636502 - Flags: review-
(Assignee)

Comment 8

3 years ago
(In reply to Olli Pettay [:smaug] from comment #7)
> Comment on attachment 8636502 [details] [diff] [review]
> Part 1 of 1: Made speech recognition services language dependent and removed
> assumption of a single service
> 
> >+GetSpeechRecognitionService(const nsAString& aLang)
> > {
> >   nsAutoCString speechRecognitionServiceCID;
> > 
> >@@ -70,19 +74,23 @@ GetSpeechRecognitionService()
> >   Preferences::GetCString(PREFERENCE_DEFAULT_RECOGNITION_SERVICE);
> >   nsAutoCString speechRecognitionService;
> > 
> >-  if (!prefValue.get() || prefValue.IsEmpty()) {
> >-    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> >-  } else {
> >+  if (!prefValue.IsEmpty()) {
> >     speechRecognitionService = prefValue;
> >+  } else if (!aLang.IsEmpty()) {
> >+    speechRecognitionService =
> >+      NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
> >+      NS_ConvertUTF16toUTF8(aLang);
> >+  } else {
> >+    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> >   }
> 
> I would first check if (!aLang.IsEmpty()) ...
> then the pref based value, and then DEFAULT_RECOGNITION_SERVICE.
> Otherwise we always pick us-EN on b2g, since the patch sets 
> pocketsphinx-en-US pref.
> 


Right, don't know how I missed that.


> > 
> >-  if (!SpeechRecognition::mTestConfig.mFakeRecognitionService){
> >+  if (SpeechRecognition::mTestConfig.mFakeRecognitionService){
> 
> while you're here, want to add the missing space between ) and {

OK
 
> > SpeechRecognition::GetGrammars(ErrorResult& aRv) const
> > {
> >-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
> >-  return nullptr;
> >+  nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
> >+  return speechGrammarList.forget();
> > }
> Per WebIDL .grammars is never null, but nothing guarantees
> mSpeechGrammarList isn't null.
> So, this would crash if you access speechRecognition.grammars before anyone
> has set any grammars.

Just for so I learn, how would this cause a crash? Is this a problem with
nsRefPtr not allowing nullptr? Or is a problem with the WebIDL machinery?

If mSpeechGrammarList is nullptr, shouldn't it cause an "undefined", or at
worst "null", JavaScript value to be returned if someone asked for it?

> The spec is vague here, but could we do something like
> if (!mSpeechGrammarList) {
>   mSpeechGrammarList = new SpeechGrammarList(GetParentObject());
> }
> nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
> return speechGrammarList.forget();

In light of the above having to explicitly new up a SpeechGrammarList if
mSpeechGrammarList is nullptr seems like the wrong thing to do as is does
not reflect the fact that the member SpeechRecognition.grammars has been
declared but not assigned to and thus should be "undefined" when
accessing it from JavaScript.

> 
> > SpeechRecognition::SetGrammars(SpeechGrammarList& aArg, ErrorResult& aRv)
> > {
> >-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
> >-  return;
> >+  mSpeechGrammarList = &aArg;
> > }
> And since the getter or setter can throw anymore, want to remove [Throws]
> from the .webidl and then remove ErrorResult param.
> 

Ok.

> > 
> > void
> > SpeechRecognition::GetLang(nsString& aRetVal, ErrorResult& aRv) const
> > {
> >-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
> >+  aRetVal = mLang;
> >   return;
> > }
> > 
> > void
> > SpeechRecognition::SetLang(const nsAString& aArg, ErrorResult& aRv)
> > {
> >-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
> >+  mLang = aArg;
> >   return;
> > }
> ditto here about [Throws] and ErrorResult

OK.

> >+bool SpeechRecognition::SetRecognitionService(ErrorResult& aRv) {
> Nit,
> bool
> SpeechRecognition::SetRecognitionService(ErrorResult& aRv)
> {
> 

OK

> >+bool SpeechRecognition::ValidateAndSetGrammarList(ErrorResult& aRv) {
> ditto

OK 

> 
> >+  if (nullptr == mSpeechGrammarList) {
> just 
> if (!mSpeechGrammarList)

OK
 
> >+  for (uint32_t count = 0; count < grammarListLength; ++count) {
> >+    nsRefPtr<SpeechGrammar> speechGrammar = mSpeechGrammarList->Item(count,aRv);
> space after ,

OK

> >+    if (NS_OK != mRecognitionService->ValidateAndSetGrammarList(speechGrammar.get(), nullptr)) {
> if
> (NS_FAILED(mRecognitionService->ValidateAndSetGrammarList(speechGrammar.
> get(), nullptr))))

OK
 
> >@@ -249,6 +250,10 @@ private:
> >   nsCOMPtr<nsITimer> mSpeechDetectionTimer;
> >   bool mAborted;
> > 
> >+  nsString mLang;
> >+
> >+  nsRefPtr<SpeechGrammarList> mSpeechGrammarList;
> We need to cycle collect this one.
> Which means... interesting, SpeechRequest doesn't add anything to cycle
> collection yet.
> Looks like mDOMStream member variable is also cycle collectable and should
> have been traversed/unlinked.
> So (this is perhaps new to you, if you haven't dealt with cycle collection
> in other patches yet.)
> You add
>   NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED(SpeechRecognition,
> DOMEventTargetHelper)
> right after NS_DECL_ISUPPORTS_INHERITED in SpeechRecognition.h
> 
> Then in .cpp
> NS_IMPL_CYCLE_COLLECTION_INHERITED(SpeechRecognition, DOMEventTargetHelper,
> mDOMStream, mSpeechGrammarList)
> 
> That will add those two member variables to cycle collection, so that they
> get traversed and unlinked properly.

OK
 
> You need to also change
> NS_INTERFACE_MAP_BEGIN(SpeechRecognition)
> to
> NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED(SpeechRecognition)

OK 

> If you're curious about cycle collection
> http://blog.kylehuey.com/post/27564411715/cycle-collection is a good blog
> post, and
> http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.
> cpp#8 has some more technical information.

I'll take a look
(Assignee)

Comment 9

3 years ago
(In reply to Olli Pettay [:smaug] from comment #7)
> Comment on attachment 8636502 [details] [diff] [review]
> Part 1 of 1: Made speech recognition services language dependent and removed
> assumption of a single service
> 
> > SpeechRecognition::GetGrammars(ErrorResult& aRv) const
> > {
> >-  aRv.Throw(NS_ERROR_NOT_IMPLEMENTED);
> >-  return nullptr;
> >+  nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
> >+  return speechGrammarList.forget();
> > }
> Per WebIDL .grammars is never null, but nothing guarantees
> mSpeechGrammarList isn't null.
> So, this would crash if you access speechRecognition.grammars before anyone
> has set any grammars.
> The spec is vague here, but could we do something like
> if (!mSpeechGrammarList) {
>   mSpeechGrammarList = new SpeechGrammarList(GetParentObject());
> }
> nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
> return speechGrammarList.forget();
> 

Confirmed crash, but returning an empty SpeechGrammarList also seems incorrect
as it does not reflect the fact that the member SpeechRecognition.grammars has
been declared but not assigned to and thus should be "undefined" when accessing
it from JavaScript.

Is there anyway I can check if mSpeechGrammarList is nullptr and then return
some special C++ value that will be translated into "undefined" in the JS
layer? Or, at worst, a "null" JS value?
(Assignee)

Comment 10

3 years ago
(In reply to kdavis from comment #8)
> (In reply to Olli Pettay [:smaug] from comment #7)
> > Comment on attachment 8636502 [details] [diff] [review]
> > Part 1 of 1: Made speech recognition services language dependent and removed
> > assumption of a single service
> > 
> > >+GetSpeechRecognitionService(const nsAString& aLang)
> > > {
> > >   nsAutoCString speechRecognitionServiceCID;
> > > 
> > >@@ -70,19 +74,23 @@ GetSpeechRecognitionService()
> > >   Preferences::GetCString(PREFERENCE_DEFAULT_RECOGNITION_SERVICE);
> > >   nsAutoCString speechRecognitionService;
> > > 
> > >-  if (!prefValue.get() || prefValue.IsEmpty()) {
> > >-    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> > >-  } else {
> > >+  if (!prefValue.IsEmpty()) {
> > >     speechRecognitionService = prefValue;
> > >+  } else if (!aLang.IsEmpty()) {
> > >+    speechRecognitionService =
> > >+      NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
> > >+      NS_ConvertUTF16toUTF8(aLang);
> > >+  } else {
> > >+    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> > >   }
> > 
> > I would first check if (!aLang.IsEmpty()) ...
> > then the pref based value, and then DEFAULT_RECOGNITION_SERVICE.
> > Otherwise we always pick us-EN on b2g, since the patch sets 
> > pocketsphinx-en-US pref.
> > 
> 
> 
> Right, don't know how I missed that.
> 

Now I see how I missed it. I was testing on desktop where the pref is not set.
(Assignee)

Comment 11

3 years ago
(In reply to kdavis from comment #10)
> (In reply to kdavis from comment #8)
> > (In reply to Olli Pettay [:smaug] from comment #7)
> > > Comment on attachment 8636502 [details] [diff] [review]
> > > Part 1 of 1: Made speech recognition services language dependent and removed
> > > assumption of a single service
> > > 
> > > >+GetSpeechRecognitionService(const nsAString& aLang)
> > > > {
> > > >   nsAutoCString speechRecognitionServiceCID;
> > > > 
> > > >@@ -70,19 +74,23 @@ GetSpeechRecognitionService()
> > > >   Preferences::GetCString(PREFERENCE_DEFAULT_RECOGNITION_SERVICE);
> > > >   nsAutoCString speechRecognitionService;
> > > > 
> > > >-  if (!prefValue.get() || prefValue.IsEmpty()) {
> > > >-    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> > > >-  } else {
> > > >+  if (!prefValue.IsEmpty()) {
> > > >     speechRecognitionService = prefValue;
> > > >+  } else if (!aLang.IsEmpty()) {
> > > >+    speechRecognitionService =
> > > >+      NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
> > > >+      NS_ConvertUTF16toUTF8(aLang);
> > > >+  } else {
> > > >+    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> > > >   }
> > > 
> > > I would first check if (!aLang.IsEmpty()) ...
> > > then the pref based value, and then DEFAULT_RECOGNITION_SERVICE.
> > > Otherwise we always pick us-EN on b2g, since the patch sets 
> > > pocketsphinx-en-US pref.
> > > 
> > 
> > 
> > Right, don't know how I missed that.
> > 
> 
> Now I see how I missed it. I was testing on desktop where the pref is not set.

Now that I think about it more I am wondering if it wouldn't be best to remove the default
value of the pref PREFERENCE_DEFAULT_RECOGNITION_SERVICE from the B2G build and leave the
code as is. Then...

1. When testing, setting PREFERENCE_DEFAULT_RECOGNITION_SERVICE could override everything.
2. If SpeechRecognition.lang is set in JS, it is used.
3. If SpeechRecognition.lang is not set in JS but the HTML lang attribute is set, it is used.
4. If none of the above are set, the default DEFAULT_RECOGNITION_SERVICE is used.

To me this seems more natural. You have the ability to force things with the preference value,
the ability to work within the normal functionality setting SpeechRecognition.lang and/or the
HTML lang attribute, and also, if all else fails, you default to DEFAULT_RECOGNITION_SERVICE.
(Assignee)

Comment 12

3 years ago
Created attachment 8641500 [details] [diff] [review]
Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service.

Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service.

Part 1 of 1 of this bug.

This patch obsoletes the previous patch and makes several changes
relative to the previous patch:

1. It removes the default value of the B2G preference
   PREFERENCE_DEFAULT_RECOGNITION_SERVICE in accord
   with Comment 11
2. Fixes formatting in accord with Comment 7
3. Adds cycle collection code in accord with Comment 7
4. Removes throw specifier from SpeechRecognition.grammars
   in accord with Comment 7
5. Removes throw specifier from SpeechRecognition.lang
   in accord with Comment 7
6. Makes SpeechRecognition.grammars nullable to overcome
   the problems of Comment 7 and Comment 9

The try for this patch is here https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa797f44a12a

The full interdiff between the previous patch and this
patch is as follows:

diff -u b/b2g/app/b2g.js b/b2g/app/b2g.js
--- b/b2g/app/b2g.js
+++ b/b2g/app/b2g.js
@@ -1016,7 +1016,6 @@
 
 // Enable Web Speech recognition API
 pref("media.webspeech.recognition.enable", true);
-pref("media.webspeech.service.default", "pocketsphinx-en-US");
 
 // Downloads API
 pref("dom.mozDownloads.enabled", true);
diff -u b/dom/media/webspeech/recognition/SpeechRecognition.cpp b/dom/media/webspeech/recognition/SpeechRecognition.cpp
--- b/dom/media/webspeech/recognition/SpeechRecognition.cpp
+++ b/dom/media/webspeech/recognition/SpeechRecognition.cpp
@@ -84,7 +84,7 @@
     speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
   }
 
-  if (SpeechRecognition::mTestConfig.mFakeRecognitionService){
+  if (SpeechRecognition::mTestConfig.mFakeRecognitionService) {
     speechRecognitionServiceCID =
       NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX "fake";
   } else {
@@ -99,7 +99,9 @@
   return recognitionService.forget();
 }
 
-NS_INTERFACE_MAP_BEGIN(SpeechRecognition)
+NS_IMPL_CYCLE_COLLECTION_INHERITED(SpeechRecognition, DOMEventTargetHelper, mDOMStream, mSpeechGrammarList)
+
+NS_INTERFACE_MAP_BEGIN_CYCLE_COLLECTION_INHERITED(SpeechRecognition)
   NS_INTERFACE_MAP_ENTRY(nsIObserver)
 NS_INTERFACE_MAP_END_INHERITING(DOMEventTargetHelper)
 
@@ -628,30 +630,28 @@
 }
 
 already_AddRefed<SpeechGrammarList>
-SpeechRecognition::GetGrammars(ErrorResult& aRv) const
+SpeechRecognition::GetGrammars() const
 {
   nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
   return speechGrammarList.forget();
 }
 
 void
-SpeechRecognition::SetGrammars(SpeechGrammarList& aArg, ErrorResult& aRv)
+SpeechRecognition::SetGrammars(SpeechGrammarList* aArg)
 {
-  mSpeechGrammarList = &aArg;
+  mSpeechGrammarList = aArg;
 }
 
 void
-SpeechRecognition::GetLang(nsString& aRetVal, ErrorResult& aRv) const
+SpeechRecognition::GetLang(nsString& aRetVal) const
 {
   aRetVal = mLang;
-  return;
 }
 
 void
-SpeechRecognition::SetLang(const nsAString& aArg, ErrorResult& aRv)
+SpeechRecognition::SetLang(const nsAString& aArg)
 {
   mLang = aArg;
-  return;
 }
 
 bool
@@ -750,7 +750,9 @@
   NS_DispatchToMainThread(event);
 }
 
-bool SpeechRecognition::SetRecognitionService(ErrorResult& aRv) {
+bool
+SpeechRecognition::SetRecognitionService(ErrorResult& aRv)
+{
   // See: https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html#dfn-lang
   if (!mLang.IsEmpty()) {
     mRecognitionService = GetSpeechRecognitionService(mLang);
@@ -791,8 +793,10 @@
   return true;
 }
 
-bool SpeechRecognition::ValidateAndSetGrammarList(ErrorResult& aRv) {
-  if (nullptr == mSpeechGrammarList) {
+bool
+SpeechRecognition::ValidateAndSetGrammarList(ErrorResult& aRv)
+{
+  if (!mSpeechGrammarList) {
     aRv.Throw(NS_ERROR_DOM_INVALID_STATE_ERR);
     return false;
   }
@@ -804,11 +808,11 @@
   }
 
   for (uint32_t count = 0; count < grammarListLength; ++count) {
-    nsRefPtr<SpeechGrammar> speechGrammar = mSpeechGrammarList->Item(count,aRv);
+    nsRefPtr<SpeechGrammar> speechGrammar = mSpeechGrammarList->Item(count, aRv);
     if (aRv.Failed()) {
       return false;
     }
-    if (NS_OK != mRecognitionService->ValidateAndSetGrammarList(speechGrammar.get(), nullptr)) {
+    if (NS_FAILED(mRecognitionService->ValidateAndSetGrammarList(speechGrammar.get(), nullptr))) {
       aRv.Throw(NS_ERROR_DOM_INVALID_STATE_ERR);
       return false;
     }
diff -u b/dom/media/webspeech/recognition/SpeechRecognition.h b/dom/media/webspeech/recognition/SpeechRecognition.h
--- b/dom/media/webspeech/recognition/SpeechRecognition.h
+++ b/dom/media/webspeech/recognition/SpeechRecognition.h
@@ -58,6 +58,7 @@
   explicit SpeechRecognition(nsPIDOMWindow* aOwnerWindow);
 
   NS_DECL_ISUPPORTS_INHERITED
+  NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED(SpeechRecognition, DOMEventTargetHelper)
 
   NS_DECL_NSIOBSERVER
 
@@ -70,13 +71,13 @@
   static already_AddRefed<SpeechRecognition>
   Constructor(const GlobalObject& aGlobal, ErrorResult& aRv);
 
-  already_AddRefed<SpeechGrammarList> GetGrammars(ErrorResult& aRv) const;
+  already_AddRefed<SpeechGrammarList> GetGrammars() const;
 
-  void SetGrammars(mozilla::dom::SpeechGrammarList& aArg, ErrorResult& aRv);
+  void SetGrammars(mozilla::dom::SpeechGrammarList* aArg);
 
-  void GetLang(nsString& aRetVal, ErrorResult& aRv) const;
+  void GetLang(nsString& aRetVal) const;
 
-  void SetLang(const nsAString& aArg, ErrorResult& aRv);
+  void SetLang(const nsAString& aArg);
 
   bool GetContinuous(ErrorResult& aRv) const;
 
only in patch2:
unchanged:
--- a/dom/webidl/SpeechRecognition.webidl
+++ b/dom/webidl/SpeechRecognition.webidl
@@ -10,19 +10,17 @@
  * liability, trademark and document use rules apply.
  */
 
 [Constructor,
  Pref="media.webspeech.recognition.enable",
  Func="SpeechRecognition::IsAuthorized"]
 interface SpeechRecognition : EventTarget {
     // recognition parameters
-    [Throws]
-    attribute SpeechGrammarList grammars;
-    [Throws]
+    attribute SpeechGrammarList? grammars;
     attribute DOMString lang;
     [Throws]
     attribute boolean continuous;
     [Throws]
     attribute boolean interimResults;
     [Throws]
     attribute unsigned long maxAlternatives;
     [Throws]
Attachment #8636502 - Attachment is obsolete: true
Attachment #8641500 - Flags: review?(bugs)
Comment on attachment 8641500 [details] [diff] [review]
Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service.


>-  if (!prefValue.get() || prefValue.IsEmpty()) {
>-    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
>-  } else {
>+  if (!prefValue.IsEmpty()) {
>     speechRecognitionService = prefValue;
>+  } else if (!aLang.IsEmpty()) {
>+    speechRecognitionService =
>+      NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
>+      NS_ConvertUTF16toUTF8(aLang);
>+  } else {
>+    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
>   }
> 
>-  if (!SpeechRecognition::mTestConfig.mFakeRecognitionService){
>+  if (SpeechRecognition::mTestConfig.mFakeRecognitionService) {
>     speechRecognitionServiceCID =
>-      NS_LITERAL_CSTRING(NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX) +
>-      speechRecognitionService;
>+      NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX "fake";
>   } else {
>     speechRecognitionServiceCID =
>-      NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX "fake";
>+      NS_LITERAL_CSTRING(NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX) +
>+      speechRecognitionService;
>   }
What about localization case in Firefox. If we enable pocketsphinx there, I could imagine Preferences UI to have some way to change the default recognition lang or something.
So I would use the following order
1. lang
2. pref
3. default (which is en-US)


>+  nsCOMPtr<nsPIDOMWindow> window = GetOwner();
>+  if(!window) {
>+    aRv.Throw(NS_ERROR_DOM_INVALID_STATE_ERR);
>+    return false;
>+  }
>+  nsCOMPtr<nsIDocument> document = window->GetDoc();
You actually want GetExtantDoc() here.

> interface SpeechRecognition : EventTarget {
>     // recognition parameters
>-    [Throws]
>-    attribute SpeechGrammarList grammars;
>-    [Throws]
>+    attribute SpeechGrammarList? grammars;
You made .grammars nullable. That does make sense to me, but that isn't what the spec says.
So, file a spec bug - just that the issue is recorded somewhere. I wonder how to get the spec updated these days.
Or change GetGrammar to be what I said in comment 8. either way.
(About returning undefined here would be wrong per Webidl spec. null is ok.)


And the crash if you return null when .webidl doesn't expect null is largely because our webidl bindings are
super optimized, where any extra checks or more possible code paths make things slower.
Attachment #8641500 - Flags: review?(bugs) → review+
(In reply to kdavis from comment #11)
> 1. When testing, setting PREFERENCE_DEFAULT_RECOGNITION_SERVICE could
> override everything.
> 2. If SpeechRecognition.lang is set in JS, it is used.
> 3. If SpeechRecognition.lang is not set in JS but the HTML lang attribute is
> set, it is used.
> 4. If none of the above are set, the default DEFAULT_RECOGNITION_SERVICE is
> used.
> 
> To me this seems more natural. You have the ability to force things with the
> preference value,
> the ability to work within the normal functionality setting
> SpeechRecognition.lang and/or the
> HTML lang attribute, and also, if all else fails, you default to
> DEFAULT_RECOGNITION_SERVICE.

In other words, I disagree with this. Why should the pref override .lang?

Hmm, we hardcode recognition service here anyway.
Shouldn't we always use the value from "media.webspeech.service.default", and then have a separate pref for the default language
(Assignee)

Comment 15

3 years ago
(In reply to Olli Pettay [:smaug] from comment #13)
> Comment on attachment 8641500 [details] [diff] [review]
> Part 1 of 1: Made speech recognition services language dependent and removed
> assumption of a single service.
> 
> 
> >-  if (!prefValue.get() || prefValue.IsEmpty()) {
> >-    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> >-  } else {
> >+  if (!prefValue.IsEmpty()) {
> >     speechRecognitionService = prefValue;
> >+  } else if (!aLang.IsEmpty()) {
> >+    speechRecognitionService =
> >+      NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
> >+      NS_ConvertUTF16toUTF8(aLang);
> >+  } else {
> >+    speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
> >   }
> > 
> >-  if (!SpeechRecognition::mTestConfig.mFakeRecognitionService){
> >+  if (SpeechRecognition::mTestConfig.mFakeRecognitionService) {
> >     speechRecognitionServiceCID =
> >-      NS_LITERAL_CSTRING(NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX) +
> >-      speechRecognitionService;
> >+      NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX "fake";
> >   } else {
> >     speechRecognitionServiceCID =
> >-      NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX "fake";
> >+      NS_LITERAL_CSTRING(NS_SPEECH_RECOGNITION_SERVICE_CONTRACTID_PREFIX) +
> >+      speechRecognitionService;
> >   }
> What about localization case in Firefox. If we enable pocketsphinx there, I
> could imagine Preferences UI to have some way to change the default
> recognition lang or something.
> So I would use the following order
> 1. lang
> 2. pref
> 3. default (which is en-US)

I'm fine with this order, with a pref for the default.

But, the pref will come in a future patch.

> 
> 
> >+  nsCOMPtr<nsPIDOMWindow> window = GetOwner();
> >+  if(!window) {
> >+    aRv.Throw(NS_ERROR_DOM_INVALID_STATE_ERR);
> >+    return false;
> >+  }
> >+  nsCOMPtr<nsIDocument> document = window->GetDoc();
> You actually want GetExtantDoc() here.

OK.

> 
> > interface SpeechRecognition : EventTarget {
> >     // recognition parameters
> >-    [Throws]
> >-    attribute SpeechGrammarList grammars;
> >-    [Throws]
> >+    attribute SpeechGrammarList? grammars;
> You made .grammars nullable. That does make sense to me, but that isn't what
> the spec says.
> So, file a spec bug - just that the issue is recorded somewhere. I wonder
> how to get the spec updated these days.
> Or change GetGrammar to be what I said in comment 8. either way.
> (About returning undefined here would be wrong per Webidl spec. null is ok.)

To adhere to the spec, I'll add code as in comment 8 with a non-nullable grammar.
 
> And the crash if you return null when .webidl doesn't expect null is largely
> because our webidl bindings are
> super optimized, where any extra checks or more possible code paths make
> things slower.

Understood.
(Assignee)

Comment 16

3 years ago
(In reply to Olli Pettay [:smaug] from comment #14)
> (In reply to kdavis from comment #11)
> > 1. When testing, setting PREFERENCE_DEFAULT_RECOGNITION_SERVICE could
> > override everything.
> > 2. If SpeechRecognition.lang is set in JS, it is used.
> > 3. If SpeechRecognition.lang is not set in JS but the HTML lang attribute is
> > set, it is used.
> > 4. If none of the above are set, the default DEFAULT_RECOGNITION_SERVICE is
> > used.
> > 
> > To me this seems more natural. You have the ability to force things with the
> > preference value,
> > the ability to work within the normal functionality setting
> > SpeechRecognition.lang and/or the
> > HTML lang attribute, and also, if all else fails, you default to
> > DEFAULT_RECOGNITION_SERVICE.
> 
> In other words, I disagree with this. Why should the pref override .lang?
> 
> Hmm, we hardcode recognition service here anyway.
> Shouldn't we always use the value from "media.webspeech.service.default",
> and then have a separate pref for the default language

I'm fine with the order you suggest

1. lang
2. pref
3. default

with a separate pref for default, to be added in another commit.
(Assignee)

Comment 17

3 years ago
Created attachment 8642200 [details] [diff] [review]
Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service

Part 1 of 1: Made speech recognition services language dependent and removed assumption of a single service

Part 1 of 1 of this bug.

This patch modifies the previous patch with the changes suggested in comment 13.

In particular this patch, relative to the previous patch:

1. Changes the order of the language prefs to aLang, pref, and default.
2. Makes SpeechRecognition.grammars non-nullable
3. Initializes SpeechRecognition.grammars to an empty SpeechGrammarList
4. Gets the window doc w/GetExtantDoc() and not GetDoc()

The try is running here https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ee1a6097272

The interdiff between this patch and the last is

diff -u b/dom/media/webspeech/recognition/SpeechRecognition.cpp b/dom/media/webspeech/recognition/SpeechRecognition.cpp
--- b/dom/media/webspeech/recognition/SpeechRecognition.cpp
+++ b/dom/media/webspeech/recognition/SpeechRecognition.cpp
@@ -74,12 +74,12 @@
   Preferences::GetCString(PREFERENCE_DEFAULT_RECOGNITION_SERVICE);
   nsAutoCString speechRecognitionService;
 
-  if (!prefValue.IsEmpty()) {
-    speechRecognitionService = prefValue;
-  } else if (!aLang.IsEmpty()) {
+  if (!aLang.IsEmpty()) {
     speechRecognitionService =
       NS_LITERAL_CSTRING(DEFAULT_RECOGNITION_SERVICE_PREFIX) +
       NS_ConvertUTF16toUTF8(aLang);
+  } else if (!prefValue.IsEmpty()) {
+    speechRecognitionService = prefValue;
   } else {
     speechRecognitionService = DEFAULT_RECOGNITION_SERVICE;
   }
@@ -115,7 +115,7 @@
   , mEndpointer(kSAMPLE_RATE)
   , mAudioSamplesPerChunk(mEndpointer.FrameSize())
   , mSpeechDetectionTimer(do_CreateInstance(NS_TIMER_CONTRACTID))
-  , mSpeechGrammarList(nullptr)
+  , mSpeechGrammarList(new SpeechGrammarList(GetParentObject()))
 {
   SR_LOG("created SpeechRecognition");
 
@@ -630,16 +630,16 @@
 }
 
 already_AddRefed<SpeechGrammarList>
-SpeechRecognition::GetGrammars() const
+SpeechRecognition::Grammars() const
 {
   nsRefPtr<SpeechGrammarList> speechGrammarList = mSpeechGrammarList;
   return speechGrammarList.forget();
 }
 
 void
-SpeechRecognition::SetGrammars(SpeechGrammarList* aArg)
+SpeechRecognition::SetGrammars(SpeechGrammarList& aArg)
 {
-  mSpeechGrammarList = aArg;
+  mSpeechGrammarList = &aArg;
 }
 
 void
@@ -770,7 +770,7 @@
     aRv.Throw(NS_ERROR_DOM_INVALID_STATE_ERR);
     return false;
   }
-  nsCOMPtr<nsIDocument> document = window->GetDoc();
+  nsCOMPtr<nsIDocument> document = window->GetExtantDoc();
   if(!document) {
     aRv.Throw(NS_ERROR_DOM_INVALID_STATE_ERR);
     return false;
diff -u b/dom/media/webspeech/recognition/SpeechRecognition.h b/dom/media/webspeech/recognition/SpeechRecognition.h
--- b/dom/media/webspeech/recognition/SpeechRecognition.h
+++ b/dom/media/webspeech/recognition/SpeechRecognition.h
@@ -71,9 +71,9 @@
   static already_AddRefed<SpeechRecognition>
   Constructor(const GlobalObject& aGlobal, ErrorResult& aRv);
 
-  already_AddRefed<SpeechGrammarList> GetGrammars() const;
+  already_AddRefed<SpeechGrammarList> Grammars() const;
 
-  void SetGrammars(mozilla::dom::SpeechGrammarList* aArg);
+  void SetGrammars(mozilla::dom::SpeechGrammarList& aArg);
 
   void GetLang(nsString& aRetVal) const;
 
diff -u b/dom/webidl/SpeechRecognition.webidl b/dom/webidl/SpeechRecognition.webidl
--- b/dom/webidl/SpeechRecognition.webidl
+++ b/dom/webidl/SpeechRecognition.webidl
@@ -15,7 +15,7 @@
  Func="SpeechRecognition::IsAuthorized"]
 interface SpeechRecognition : EventTarget {
     // recognition parameters
-    attribute SpeechGrammarList? grammars;
+    attribute SpeechGrammarList grammars;
     attribute DOMString lang;
     [Throws]
     attribute boolean continuous;
Attachment #8641500 - Attachment is obsolete: true
Attachment #8642200 - Flags: review+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/ee7df762c195

FWIW, you probably don't need "Part 1 of 1" in the commit message :)
https://developer.mozilla.org/en-US/docs/Mercurial/Using_Mercurial#Commit_Message_Conventions
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox42: --- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
You need to log in before you can comment on or make changes to this bug.