Closed Bug 1051554 Opened 10 years ago Closed 9 years ago

Adapt Speech Grammars to generate phonetic representation of words

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: anatal, Assigned: anatal)

References

(Blocks 1 open bug)

Details

(Whiteboard: [webspeechapi])

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140716183446
Blocks: 1049931
OS: Linux → All
Hardware: x86_64 → All
Blocks: 1051148
No longer blocks: 1049931
Status: UNCONFIRMED → NEW
Ever confirmed: true
For each language we'll want to support, we need a g2p model to generate phonetic representation of words lacking on cmudict. This way we avoid the need to ship the dictionary (reducing download time and model size) and also supports new words.

We also will need to integrate phonetisaurus: https://code.google.com/p/phonetisaurus/wiki/ReadMe

phonetisaurus-g2p --model=cmudict.fst --input=sandip

And train cmudict.fst from gsp model
http://sourceforge.net/projects/cmusphinx/files/G2P%20Models/en_us_nostress.tar.gz/download


This is required for each language we want to support
Summary: Adapt Speech Grammars to generate phonetic representation of words lacking on dictionary → Adapt Speech Grammars to generate phonetic representation of words
Blocks: 1067683
No longer blocks: 1051148
Whiteboard: [webspeechapi]
Assignee: nobody → anatal
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
These patches introduces methods on the SpeechRecognition class and underlying SpeechRecognitioServices to allow management of phonemes dictionaries in the decoder.  

Trye:  https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0fe350a02bc
Comment on attachment 8624598 [details] [diff] [review]
0001-Bug-1051554-Introducing-methods-on-SpeechRecognition.patch

># HG changeset patch
># User Andre Natal <anatal@gmail.com>
>
>Bug 1051554 Introducing methods on SpeechRecognition to manage phonemes on speech engine r=smaug
>
>Signed-off-by: Andre Natal <anatal@gmail.com>
>---
> dom/media/webspeech/recognition/SpeechRecognition.cpp | 14 ++++++++++++++
> dom/media/webspeech/recognition/SpeechRecognition.h   |  6 ++++++
> dom/webidl/SpeechRecognition.webidl                   |  6 ++++++
> 3 files changed, 26 insertions(+)
>
>diff --git a/dom/media/webspeech/recognition/SpeechRecognition.cpp b/dom/media/webspeech/recognition/SpeechRecognition.cpp
>index 0c03ad1..45b3170 100644
>--- a/dom/media/webspeech/recognition/SpeechRecognition.cpp
>+++ b/dom/media/webspeech/recognition/SpeechRecognition.cpp
>@@ -759,6 +759,20 @@ SpeechRecognition::Abort()
> }
> 
> void
>+SpeechRecognition::AddPhonemeToDictionary(const nsAString& aString,
>+  const nsAString& aString2, ErrorResult& aRv)

Call the parameters something descriptive, not just aString and aString2
And align parameters


>+SpeechRecognition::VerifyPhonemeInDictionary(const nsAString& aString,
>+                                 ErrorResult& aRv)
>+{
Ditto

>@@ -27,6 +27,12 @@ interface SpeechRecognition : EventTarget {
>     attribute unsigned long maxAlternatives;
>     [Throws]
>     attribute DOMString serviceURI;
>+    
>+    // methods to manage phonetic dictionary
>+    [Throws]
>+    boolean verifyPhonemeInDictionary(DOMString string);
>+    [Throws]
>+    void addPhonemeToDictionary(DOMString string, DOMString string2);
Same here, some better names for the parameters, and please add
a comment that the methods are non-standard, temporary mozilla only things.
And I would call verifyPhonemeInDictionary isPhonemeInDictionary


Hopefully kdavis can take a look at too.
(I'll be mostly offline today because of Midsummer in Finland)
Attachment #8624598 - Flags: review?(kdavis)
Attachment #8624598 - Flags: review?(bugs)
Attachment #8624598 - Flags: review+
Comment on attachment 8624599 [details] [diff] [review]
0002-Bug-1051554-Integrating-methods-to-manage-phonemes-i.patch


>+PocketSphinxSpeechRecognitionService::AddPhonemeToDictionary(
>+  const nsAString& aGrapheme, const nsAString& aPhoneme)
align parameters (unless exceeding 80 chars per line)

>+{
>+  ps_add_word(mPSHandle, ToNewUTF8String(aGrapheme), ToNewUTF8String(aPhoneme),0);
space before 0.
What happens if aGrapheme or aPhoneme is empty string.
I would add some checks and call ps_add_word only if Length() > 0

Who releases the memory allocated by ToNewUTF8String call?
Based on http://www.speech.cs.cmu.edu/sphinx/doc/doxygen/pocketsphinx/pocketsphinx_8h.html I think
PocketSphinx doesn't take ownership of that memory, so we have a leak here.

So, something like
NS_ENSURE_TRUE(!aGrapheme.IsEmpty() && !aPhoneme.IsEmpty(), NS_ERROR_DOM_SYNTAX_ERR);

NS_ConvertUTF16toUTF8 grapheme(aGrapheme);
NS_ConvertUTF16toUTF8 phoneme(aPhoneme);

And then pass
grapheme.get(), and phoneme.get() as param.


>+PocketSphinxSpeechRecognitionService::VerifyPhonemeInDictionary(const nsAString& aGrapheme)
>+{
>+  const char * word;
    const char* word = ps_lookup_word(mPSHandle, ToNewUTF8String(aGrapheme));
But again, you're leaking memory here, as far as I see, so use NS_ConvertUTF16toUTF8


>+  word = ps_lookup_word(mPSHandle, ToNewUTF8String(aGrapheme));
>+  if (word == nullptr)
>+    return NS_ERROR_FAILURE;
>+  else
>+    return NS_OK;
>+}
Always {} with 'if' and 'else'

>+++ b/dom/media/webspeech/recognition/nsISpeechRecognitionService.idl
>@@ -36,6 +36,8 @@ interface nsISpeechRecognitionService : nsISupports {
>     void validateAndSetGrammarList(in SpeechGrammarPtr aSpeechGrammar, in nsISpeechGrammarCompilationCallback aCallback);
>     void soundEnd();
>     void abort();
>+    void addPhonemeToDictionary(in DOMString aGrapheme, in DOMString aPhoneme);
>+    void verifyPhonemeInDictionary(in DOMString aGrapheme);
You're updating an .idl interface, so update also the uuid



>+FakeSpeechRecognitionService::AddPhonemeToDictionary(
>+  const nsAString& aGrapheme, const nsAString& aPhoneme)
align params


>+FakeSpeechRecognitionService::VerifyPhonemeInDictionary(
>+   const nsAString& aGrapheme)
Does the param fit into 80 chars per line if it is at the end of the previous line.
Attachment #8624599 - Flags: review?(kdavis)
Attachment #8624599 - Flags: review?(bugs)
Attachment #8624599 - Flags: review+
Is the current title of this bug:

"Adapt Speech Grammars to generate phonetic representation of words"

really reflective of the patch? The patch is not adapting speech grammars
to the phonetic representation of words.

It seems like the title should change to reflect what the patch is doing
or a new bug should be opened.

Maybe a better title would be something like:

"Add methods to the WebSpeech API that allow for the addition of words to the phoneme dictionary"
Comment on attachment 8624598 [details] [diff] [review]
0001-Bug-1051554-Introducing-methods-on-SpeechRecognition.patch

Review of attachment 8624598 [details] [diff] [review]:
-----------------------------------------------------------------

I am still unsure as to the logic of adding this patch in now instead of trying for a
more sustainable solution.

The only logic I could conceive of is that we need a temporary solution for Whistler.

However, for Whistler we will not need this patch as we are doing "Open App" with only
the core apps and we are not doing "Calling".

I think the code, save the few little nit picks is fine. My only reservation is if we
should add this now? Can you maybe educate me a bit as to why we need this patch now?

::: dom/media/webspeech/recognition/SpeechRecognition.cpp
@@ +759,5 @@
>  }
>  
>  void
> +SpeechRecognition::AddPhonemeToDictionary(const nsAString& aString,
> +  const nsAString& aString2, ErrorResult& aRv)

Like Olli mentioned formatting and parameter names should be more descriptive.

@@ +766,5 @@
> +}
> +
> +bool
> +SpeechRecognition::VerifyPhonemeInDictionary(const nsAString& aString,
> +                                 ErrorResult& aRv)

Here again, formatting, parameter names should be more descriptive, and IsPhonemeInDictionary is a more standard method name.

::: dom/media/webspeech/recognition/SpeechRecognition.h
@@ +102,4 @@
>  
>    void Abort();
>  
> +  void AddPhonemeToDictionary(const nsAString& aString, const nsAString& aString2, 

Trailing space and parameter names should be more descriptive

@@ +104,5 @@
>  
> +  void AddPhonemeToDictionary(const nsAString& aString, const nsAString& aString2, 
> +    ErrorResult& aRv);
> +
> +  bool VerifyPhonemeInDictionary(const nsAString& aString, 

Here again trailing space, parameter names should be more descriptive, and IsPhonemeInDictionary is a more standard method name

::: dom/webidl/SpeechRecognition.webidl
@@ +27,4 @@
>      attribute unsigned long maxAlternatives;
>      [Throws]
>      attribute DOMString serviceURI;
> +    

More trailing space

@@ +29,5 @@
>      attribute DOMString serviceURI;
> +    
> +    // methods to manage phonetic dictionary
> +    [Throws]
> +    boolean verifyPhonemeInDictionary(DOMString string);

Better parameter names and you should add a comment that this is a Mozilla extension and will be removed. Also isPhonemeInDictionary is a more standard method name.

@@ +31,5 @@
> +    // methods to manage phonetic dictionary
> +    [Throws]
> +    boolean verifyPhonemeInDictionary(DOMString string);
> +    [Throws]
> +    void addPhonemeToDictionary(DOMString string, DOMString string2);

Better parameter names, and you should add a comment that this is a Mozilla extension and will be removed
Attachment #8624598 - Flags: review+ → review-
Comment on attachment 8624599 [details] [diff] [review]
0002-Bug-1051554-Integrating-methods-to-manage-phonemes-i.patch

Review of attachment 8624599 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp
@@ +291,5 @@
>  NS_IMETHODIMP
> +PocketSphinxSpeechRecognitionService::AddPhonemeToDictionary(
> +  const nsAString& aGrapheme, const nsAString& aPhoneme)
> +{
> +  ps_add_word(mPSHandle, ToNewUTF8String(aGrapheme), ToNewUTF8String(aPhoneme),0);

A few questions and comments:

1. Why is the final parameter 0? According to the documentation of ps_add_word the final parameter "if TRUE, updates the search module (whichever one is currently active) to recognize the newly added word. If adding multiple words, it is more efficient to pass FALSE here in all but the last word." So, it seems as if the final parameter when set to 0 doesn't allow the new word to be recognized.

2. ToNewUTF8String creates a new UTF8 string[1]. The code of ps_add_word does not take responsibility for the strings passed to it. Thus, this leaks two UTF8 strings per execution.

3. Why are error codes not checked here. The method ps_add_word returns "the internal ID (>= 0) of the newly added word, or <0 on failure". As you indicated this method could throw in SpeechRecognition.webidl, you could throw on failure.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Memory_Allocation_-_how_to_avoid_it.2C_which_methods_to_use

@@ +299,5 @@
> +NS_IMETHODIMP
> +PocketSphinxSpeechRecognitionService::VerifyPhonemeInDictionary(const nsAString& aGrapheme)
> +{
> +  const char * word;
> +  word = ps_lookup_word(mPSHandle, ToNewUTF8String(aGrapheme));

A few points:

1. Again ToNewUTF8String news up a string[1]. The code of ps_lookup_word does not take responsibility for this newed up string. Hence, the return value of ToNewUTF8String is leaked.

2. According to the documentation of ps_lookup_word it returns a string that "is allocated and must be freed by the user." The code here never deallocates this returned word. Hence there is a leak.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Memory_Allocation_-_how_to_avoid_it.2C_which_methods_to_use
Attachment #8624599 - Flags: review+ → review-
Attachment #8624598 - Flags: review?(kdavis)
Attachment #8624599 - Flags: review?(kdavis)
(In reply to kdavis from comment #9)
> Comment on attachment 8624598 [details] [diff] [review]
> 0001-Bug-1051554-Introducing-methods-on-SpeechRecognition.patch
> 
> Review of attachment 8624598 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I am still unsure as to the logic of adding this patch in now instead of
> trying for a
> more sustainable solution.
> 
> The only logic I could conceive of is that we need a temporary solution for
> Whistler.
> 

I thought we already had this discussion.

This is as temporary as Bug 1167875. What have no logic is try to integrate openfst plus all those 3rd party libs to gecko, plus a 22mb model, and in addition, phonetisaurus (that is the only reason to use openfst) that have itself other 3rd party libs as dependencies. 

The goal of this patch was take off the weight of gecko, but if you believe is more reasonable to put all that stuff on it, instead on gaia, then it's up to you.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → WONTFIX
(In reply to anatal from comment #11)
> (In reply to kdavis from comment #9)
> > Comment on attachment 8624598 [details] [diff] [review]
> > 0001-Bug-1051554-Introducing-methods-on-SpeechRecognition.patch
>
> This is as temporary as Bug 1167875. What have no logic is try to integrate
> openfst plus all those 3rd party libs to gecko

I have all the patches for Bug 1167875 for all platforms other than desktop
OS X. (Actually the patches work fox OS X too, but our try OS X build system
does not reflect the newest OS X build tools.)

Maybe we can just do that later today when we meet? We should just talk then
about what's best to do.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: