Open Bug 1181269 Opened 9 years ago Updated 2 years ago

SpeechGrammarList's addFromString should accept numerals and uppercase letters

Categories

(Core :: Web Speech, defect)

defect

Tracking

()

Tracking Status
firefox42 --- affected

People

(Reporter: cvan, Unassigned)

References

Details

(Whiteboard: [webspeechapi])

Attachments

(4 files, 5 obsolete files)

Attached file lowercase example
If you pass an uppercase (English) letter or a number to the JSGF expansion when calling `SpeechGrammarList.prototype.addFromString`.

Using all lowercase letters [1] works fine without crashing:

```
new SpeechGrammarList().addFromString('#JSGF V1.0; grammar test; public <simple> = hello;', 1);
```

Using an uppercase letter [2] causes a crash:

```
new SpeechGrammarList().addFromString('#JSGF V1.0; grammar test; public <simple> = Hello;', 1);
```

Using a number [3] causes a crash:

```
new SpeechGrammarList().addFromString('#JSGF V1.0; grammar test; public <simple> = 1;', 1);
```

––
[1] http://cvan.io/bug-tests/webspeech-grammar-case-test/lower.html
[2] http://cvan.io/bug-tests/webspeech-grammar-case-test/upper.html
[3] http://cvan.io/bug-tests/webspeech-grammar-case-test/number.html
Attached file uppercase example
Attached file number example
Assignee: nobody → anatal
Blocks: 1172883
Summary: SpeechGrammarList's addFromURI/addFromString should accept numerals and uppercase letters → SpeechGrammarList's addFromString should accept numerals and uppercase letters
At least initially we are not supporting addFromURI.
Hi Andre, can I take this bug?
This is the patch that handles the upper case grammar and the numeral case is solved in Bug 1182384 .
Assignee: anatal → kechen
Kevin who should review the patch?
I am still working on the test files. After that, can you or Andre help me to review this?
I'll help
Flags: needinfo?(anatal)
Hi, Kelly
I want to write a mochitest for this, but I found that the nsISpeechGrammarCompilationCallback is not implemented yet, which means I cannot figure out if the grammar I pass to validateAndSetGrammar is successfully compiled.
Should I file another bug for this and solve it or just start to review this patch first?
Flags: needinfo?(kdavis)
(In reply to Kevin Chen from comment #9)
> Hi, Kelly
> I want to write a mochitest for this, but I found that the
> nsISpeechGrammarCompilationCallback is not implemented yet, which means I
> cannot figure out if the grammar I pass to validateAndSetGrammar is
> successfully compiled.
> Should I file another bug for this and solve it or just start to review this
> patch first?

ValidateAndSetGrammarList on PocketSphinxSpeechRecognitionService returns NS_OK
if the grammar was compiled and NS_ERROR_NOT_INITIALIZED otherwise. Shouldn't
that be enough?
Flags: needinfo?(kdavis)
Also, I have a path for Bug 1185018 that is in the review queue which will
cause SpeechRecognizer.start() in JavaScript to throw if the grammar is in-
valie.
Hi Kelly, sorry for waiting so long. Please help me to review this patch, thank you.
Attachment #8637023 - Attachment is obsolete: true
Attachment #8640320 - Flags: review?(kdavis)
(In reply to Kevin Chen from comment #12)
> Created attachment 8640320 [details] [diff] [review]
> Transfer grammars to lower case in jsgf_scanner.c
> 
> Hi Kelly, sorry for waiting so long. Please help me to review this patch,
> thank you.

Hey Kevin, did you run try for this patch?
(In reply to Kevin Chen from comment #14)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b94806a4f773
> Hi Kelly, I am still running the try.

Hey Kevin, I guess you see the test is failing.

The major problem is SpeechGrammarList is not available. From SpeechGrammarList.webidl

[Constructor, Pref="media.webspeech.recognition.enable",
 Func="SpeechRecognition::IsAuthorized"]
interface SpeechGrammarList {
  ...
};

thus SpeechGrammarList is only available in JS if SpeechRecognition::IsAuthorized
returns true and

bool
SpeechRecognition::IsAuthorized(JSContext* aCx, JSObject* aGlobal)
{
  bool inCertifiedApp = IsInCertifiedApp(aCx, aGlobal);
  bool enableTests = Preferences::GetBool(TEST_PREFERENCE_ENABLE);
  bool enableRecognitionEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_ENABLE);
  bool enableRecognitionForceEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_FORCE_ENABLE);
  return (inCertifiedApp || enableRecognitionForceEnable || enableTests) && enableRecognitionEnable;
}

is not returning true for the test suite.
Comment on attachment 8640320 [details] [diff] [review]
Transfer grammars to lower case in jsgf_scanner.c

Review of attachment 8640320 [details] [diff] [review]:
-----------------------------------------------------------------

Hey Kevin, I guess you see the test is failing.

The major problem is SpeechGrammarList is not available. From SpeechGrammarList.webidl

[Constructor, Pref="media.webspeech.recognition.enable",
 Func="SpeechRecognition::IsAuthorized"]
interface SpeechGrammarList {
  ...
};

thus SpeechGrammarList is only available in JS if SpeechRecognition::IsAuthorized
returns true and

bool
SpeechRecognition::IsAuthorized(JSContext* aCx, JSObject* aGlobal)
{
  bool inCertifiedApp = IsInCertifiedApp(aCx, aGlobal);
  bool enableTests = Preferences::GetBool(TEST_PREFERENCE_ENABLE);
  bool enableRecognitionEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_ENABLE);
  bool enableRecognitionForceEnable = Preferences::GetBool(TEST_PREFERENCE_RECOGNITION_FORCE_ENABLE);
  return (inCertifiedApp || enableRecognitionForceEnable || enableTests) && enableRecognitionEnable;
}

is not returning true for the test suite.

The easiest solution is to remove the test. I can look at the rest of the code
once the tests are not failing.
Attachment #8640320 - Flags: review?(kdavis) → review-
Hi, Kelly. can you help me to review this patch again?
And sorry for not checking try run last time.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=31787db00b4a
I think Android 4.0 API11+ opt M(5), Android 4.0 API11+ debug M(5) are caused by Bug 1188323.
Attachment #8640320 - Attachment is obsolete: true
Attachment #8641501 - Flags: review?(kdavis)
Comment on attachment 8641501 [details] [diff] [review]
Transfer grammars to lower case in jsgf_scanner.c

Review of attachment 8641501 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webspeech/recognition/SpeechGrammarList.cpp
@@ +93,5 @@
>    mItems.AppendElement(speechGrammar);
> +  nsresult rv = mRecognitionService->ValidateAndSetGrammarList(speechGrammar, nullptr);
> +  if(rv != NS_OK){
> +    aRv.Throw(NS_ERROR_INVALID_ARG);
> +  }

I think this change is great, but unfortunately Bug 1185018, which is
currently in the commit queue, is changing where this check is done and
is placing it in the start() method of SpeechRecognition.

The reason for this is that when dealing with multiple languages, or
for that matter multiple SpeechRecognition backends, one can't simply
obtain one single global mRecognitionService as there will be at least
one for English, at least one for Mandarin, at least one for Spanish....

So the way SpeechGrammarList obtains the global mRecognitionService
will only work in a very limited case of a single language and a
single SpeechRecognition backend for that language. As in future we
will have multiple languages, Bug 1185018 lifts the assumptions
made by SpeechGrammarList and the code here.

So, I'd suggest waiting until Bug 1185018 lands then re-evaluating
this code, as it will then be obsolete.

::: dom/media/webspeech/recognition/test/test_grammar_registry.html
@@ +27,5 @@
> +            try{
> +                this.speechGrammarList = new SpeechGrammarList();
> +                var grammar = '#JSGF v1.0; grammar registryTest; ' +
> +                              'public <sample> = Firefox 1;'
> +                this.speechGrammarList.addFromString(grammar, 1);

When the patches for Bug 1185018, which are currently in the commit queue,
land this code will not function as expected as the check you added for the
method addFromString() will now be moved.

I think you could take a look at Bug 1185018 and the explanation there
as to why and how the grammar check now occurs.

Once Bug 1185018 lands, you could add a check like this one, but it would
have to be a bit more involved something like

var sr = new SpeechRecognition();
sr.lang = "en-US",
var sgl = new SpeechGrammarList();
var grammar = "#JSGF v1.0; grammar registryTest; public <sample> = Firefox 1;";
sgl.addFromString(grammar, 1);
sr.grammars = sgl;
sr.start(); // This is where the throw will now occur

::: media/pocketsphinx/src/pocketsphinx.c
@@ +448,5 @@
>  ps_set_search(ps_decoder_t *ps, const char *name)
>  {
>      ps_search_t *search = ps_find_search(ps, name);
> +    if (!search){
> +        ps->pl_window = 0;

I think you'll have to explain this to me. I'm a bit slow.

How does the window size for phoneme lookahead affect uppercase/numeric values?

::: media/sphinxbase/src/libsphinxbase/lm/jsgf_parser.c
@@ +1531,5 @@
>      { (yyval.rule) = jsgf_optional_new(jsgf, (yyvsp[(2) - (3)].rhs)); }
>      break;
>  
>    case 28:
> +   //Change the tokens to lower case

Shouldn't we place jsgf_parser.y under source control,
then make changes in jsgf_parser.y and regenerate this
file instead of making changes here?

Also, conversion from upper case to lower case, even
though it sounds simple, is actually complicated and
locale dependent.

I do not think this code will work in general, but I'd
be happy to be proved wrong. For example, will it work
for UTF-8 strings?
Attachment #8641501 - Flags: review?(kdavis) → review-
(In reply to kdavis from comment #18)
> Comment on attachment 8641501 [details] [diff] [review]
> Transfer grammars to lower case in jsgf_scanner.c
> 
> Review of attachment 8641501 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/webspeech/recognition/SpeechGrammarList.cpp
> @@ +93,5 @@
> >    mItems.AppendElement(speechGrammar);
> > +  nsresult rv = mRecognitionService->ValidateAndSetGrammarList(speechGrammar, nullptr);
> > +  if(rv != NS_OK){
> > +    aRv.Throw(NS_ERROR_INVALID_ARG);
> > +  }
> 
> I think this change is great, but unfortunately Bug 1185018, which is
> currently in the commit queue, is changing where this check is done and
> is placing it in the start() method of SpeechRecognition.
> 
> The reason for this is that when dealing with multiple languages, or
> for that matter multiple SpeechRecognition backends, one can't simply
> obtain one single global mRecognitionService as there will be at least
> one for English, at least one for Mandarin, at least one for Spanish....
> 
> So the way SpeechGrammarList obtains the global mRecognitionService
> will only work in a very limited case of a single language and a
> single SpeechRecognition backend for that language. As in future we
> will have multiple languages, Bug 1185018 lifts the assumptions
> made by SpeechGrammarList and the code here.
> 
> So, I'd suggest waiting until Bug 1185018 lands then re-evaluating
> this code, as it will then be obsolete.
> 
> ::: dom/media/webspeech/recognition/test/test_grammar_registry.html
> @@ +27,5 @@
> > +            try{
> > +                this.speechGrammarList = new SpeechGrammarList();
> > +                var grammar = '#JSGF v1.0; grammar registryTest; ' +
> > +                              'public <sample> = Firefox 1;'
> > +                this.speechGrammarList.addFromString(grammar, 1);
> 
> When the patches for Bug 1185018, which are currently in the commit queue,
> land this code will not function as expected as the check you added for the
> method addFromString() will now be moved.
> 
> I think you could take a look at Bug 1185018 and the explanation there
> as to why and how the grammar check now occurs.
> 
> Once Bug 1185018 lands, you could add a check like this one, but it would
> have to be a bit more involved something like
> 
> var sr = new SpeechRecognition();
> sr.lang = "en-US",
> var sgl = new SpeechGrammarList();
> var grammar = "#JSGF v1.0; grammar registryTest; public <sample> = Firefox
> 1;";
> sgl.addFromString(grammar, 1);
> sr.grammars = sgl;
> sr.start(); // This is where the throw will now occur

I've checked Bug 1185018, and I will rewrite my test case. Thank you.
 
> ::: media/pocketsphinx/src/pocketsphinx.c
> @@ +448,5 @@
> >  ps_set_search(ps_decoder_t *ps, const char *name)
> >  {
> >      ps_search_t *search = ps_find_search(ps, name);
> > +    if (!search){
> > +        ps->pl_window = 0;
> 
> I think you'll have to explain this to me. I'm a bit slow.
> 
> How does the window size for phoneme lookahead affect uppercase/numeric
> values?
> 

Once the validate and set grammar to speech recognition engine has failed, the pointer 'search' will be assigned as a nullptr.

And this will cause a segment fault when doing the strcmp in [1], which is because strcmp doesn't accept a string compares with nullptr.

Should I fix the error in this bug? Because I think it relates to grammar registry error. Or should I file another bug to deal with this?

[1] https://dxr.mozilla.org/mozilla-central/source/media/pocketsphinx/src/pocketsphinx.c#455

> ::: media/sphinxbase/src/libsphinxbase/lm/jsgf_parser.c
> @@ +1531,5 @@
> >      { (yyval.rule) = jsgf_optional_new(jsgf, (yyvsp[(2) - (3)].rhs)); }
> >      break;
> >  
> >    case 28:
> > +   //Change the tokens to lower case
> 
> Shouldn't we place jsgf_parser.y under source control,
> then make changes in jsgf_parser.y and regenerate this
> file instead of making changes here?
> 
> Also, conversion from upper case to lower case, even
> though it sounds simple, is actually complicated and
> locale dependent.
> 
> I do not think this code will work in general, but I'd
> be happy to be proved wrong. For example, will it work
> for UTF-8 strings?

Yes, I can not guarantee that English letters always map to the same code when using different encoding, I will modify my code and place it in jspf_parser.y.
Flags: needinfo?(anatal)
Hi Kelly in this patch, I add the following codes to deal with upper case problem.

- Add a function "jsgf_token_new" in jsgf.c to convert upper case letters to lower case ones.
According to [1], the recognition engine transfers grammar to UTF8 format. So in this function I handle the UTF8's dynamic length format to filter only English letters for conversion.

- Regenerate jsgf_scanner.c from jsgf_parser.y to change parser flow.

- Add a test case "test_grammar_registry" depends on the modification in Bug 1185018

- Fix a logic error in "pocketsphinx.c" to prevent recognition engine from crash when grammar registering fails.

This is my try run :
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2738ed0d8d37
The error in Linux debug M-e10s is caused by Bug 1166297 which is not related to my patch.

Thank you.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp?offset=0#268
Attachment #8641501 - Attachment is obsolete: true
Attachment #8648598 - Flags: review?(kdavis)
Comment on attachment 8648598 [details] [diff] [review]
[patch] Transfer grammars to lower case

Review of attachment 8648598 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/pocketsphinx/src/pocketsphinx.c
@@ +444,5 @@
>      return acmod_update_mllr(ps->acmod, mllr);
>  }
>  
>  int
>  ps_set_search(ps_decoder_t *ps, const char *name)

On looking at how this is used in PocketSphinxSpeechRecognitionService.cpp
I wonder if this is the correct solution.

Shouldn't one only call this function if ps_set_jsgf_string() does not fail.

In other words change PocketSphinxSpeechRecognitionService.cpp as follows:

--- a/dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp
+++ b/dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp
@@ -274,21 +274,20 @@ PocketSphinxSpeechRecognitionService::ValidateAndSetGrammarList(
   } else if (aSpeechGrammar) {
     nsAutoString grammar;
     ErrorResult rv;
     aSpeechGrammar->GetSrc(grammar, rv);
 
     int result = ps_set_jsgf_string(mPSHandle, "name",
                                     NS_ConvertUTF16toUTF8(grammar).get());
 
-    ps_set_search(mPSHandle, "name");
-
     if (result != 0) {
       ISGrammarCompiled = false;
     } else {
+      ps_set_search(mPSHandle, "name");
       ISGrammarCompiled = true;
     }
   } else {
     ISGrammarCompiled = false;
   }
 
   return ISGrammarCompiled ? NS_OK : NS_ERROR_NOT_INITIALIZED;
 }

::: media/sphinxbase/src/libsphinxbase/lm/jsgf.c
@@ +58,5 @@
>   * into Sphinx finite-state grammars.
>   **/
>  
>  static int expand_rule(jsgf_t *grammar, jsgf_rule_t *rule, int rule_entry, int rule_exit);
> +void convert_to_lower_case(char *name);

Shouldn't this be static? It isn't used in other translations units.

@@ +206,5 @@
>      grammar->links = glist_add_ptr(grammar->links, link);
>  }
>  
> +void
> +convert_to_lower_case(char* name)

Unfortunately, having convert_to_lower_case() work only for English is
not sufficient. We need to support many other languages in FxOS[1].

For example[2], in Turkish a uppercase "I" is the same character as
an English uppercase "I". However, in Turkish the lowercase of "I"
is "ı", a character which is not in the English alphabet. Therefore,
as we need to support Turkish[1], the function convert_to_lower_case()
is not sufficient for FxOS.

In general for conversion to lowercase you will need the language the
conversion is being done for ("en", "de", "tr"...) and the conversion
algorithm.

Case conversion is extremely complicated and has many odd rules, like
the one above for Turkish. So, I would strongly recommend using the ICU
library[3] for the conversion algorithm. All of these strange rules are
already accounted for in the ICU library, and, further more, ICU is
already a part of gecko[4].

From a quick look at the ICU documentation, it seems as if the proper
method to use is ucasemap_utf8ToLower()[5]; however, I myself have not
tried this method.

In addition, the length of a string can change when going from
upper case to lower case. For example, a lower case letter may
require more bytes than its uppercase letter. So, it is likely
that this means of solution, replacing characters in the string
passed into this function, can not work.

Maybe we should try discussing another means of solving this
problem in the comments of this bug.


[1] https://en.wikipedia.org/wiki/Mozilla_localizations#Mozilla_Firefox_OS
[2] http://www.fileformat.info/info/unicode/char/0049/index.htm
[3] http://site.icu-project.org/
[4] https://github.com/mozilla/gecko-dev/tree/master/intl/icu
[5] http://icu-project.org/apiref/icu4c/ucasemap_8h.html#a8d14045335e130a16b68213194a04cc0
Attachment #8648598 - Flags: review?(kdavis) → review-
For more on the complexities of case mappings you can refer to section 5.18 of
the Unicode spec[1].

In addition to lots of gory details, it also several examples of what I was
worried about at the end of comment 21, string length changes as a result of
case changes.

For example in UTF-8 in Turkish "TOPKAPI" takes 7 bytes while its lowercase
form "topkapı" takes 8 bytes.

Also, characters have differing case mappings dependent upon the surrounding
text. For example, Σ in greek lowercases to σ if it is followed by another
letter, but if it is not followed by another letter it lowercases to ς. So,
changing chase letter by letter can't work.

Basically, the take-away is that case conversion is extremely complicated
and you should use ICU library to do it.

However, in our case even the ICU library will not help as we have to
maintain the same length for the upper and lowercase strings, which is
impossible in UTF-8. So, we have to find another solution.

http://www.unicode.org/versions/Unicode6.2.0/ch05.pdf#G21180
Comment on attachment 8667790 [details] [diff] [review]
SpeechGrammarList's addFromString should accept numerals and uppercase letters

># HG changeset patch
># User Kevin Chen <kechen@mozilla.com>
>
>[PATCH 1/2] [WIP]Bug 1181269 - SpeechGrammarList's addFromString should accept numerals and uppercase letters
>
>---
> media/pocketsphinx/src/fsg_history.c               |   2 +-
> media/pocketsphinx/src/fsg_lextree.c               |   8 +-
> media/pocketsphinx/src/fsg_search.c                |  41 ++++----
> media/sphinxbase/moz.build                         |   3 +
> media/sphinxbase/sphinxbase/fsg_model.h            |  26 ++++-
> media/sphinxbase/src/libsphinxbase/lm/fsg_model.c  | 105 +++++++++++++++++++--
> media/sphinxbase/src/libsphinxbase/lm/jsgf.c       |   2 +-
> .../src/libsphinxbase/lm/jsgf_internal.h           |   1 +
> 8 files changed, 147 insertions(+), 41 deletions(-)
>
>diff --git a/media/pocketsphinx/src/fsg_history.c b/media/pocketsphinx/src/fsg_history.c
>index 25c6eb0..aa81042 100644
>--- a/media/pocketsphinx/src/fsg_history.c
>+++ b/media/pocketsphinx/src/fsg_history.c
>@@ -300,17 +300,17 @@ fsg_history_print(fsg_history_t *h, dict_t *dict)
>     	    char const *baseword;
>     	    int32 wid;
>     	    bp = fsg_hist_entry_pred(hist_entry);
>     	    wid = fsg_link_wid(fl);
> 
>     	    if (fl == NULL)
>         	    continue;
> 
>-    	    baseword = fsg_model_word_str(h->fsg, wid);
>+    	    baseword = fsg_model_word_str_ori(h->fsg, wid);// word_orgin
> 
>     	    printf("%s(%d->%d:%d) ", baseword, 
>     				     fsg_link_from_state(hist_entry->fsglink), 
>     				     fsg_link_to_state(hist_entry->fsglink), 
>     				     hist_entry->frame);
> 	}
> 	printf("\n");
>     }
>diff --git a/media/pocketsphinx/src/fsg_lextree.c b/media/pocketsphinx/src/fsg_lextree.c
>index 573f06b..66a21a5 100644
>--- a/media/pocketsphinx/src/fsg_lextree.c
>+++ b/media/pocketsphinx/src/fsg_lextree.c
>@@ -113,17 +113,17 @@ fsg_lextree_lc_rc(fsg_lextree_t *lextree)
>     for (s = 0; s < fsg->n_state; s++) {
>         fsg_arciter_t *itor;
>         for (itor = fsg_model_arcs(fsg, s); itor; itor = fsg_arciter_next(itor)) {
>             fsg_link_t *l = fsg_arciter_get(itor);
>             int32 dictwid; /**< Dictionary (not FSG) word ID!! */
> 
>             if (fsg_link_wid(l) >= 0) {
>                 dictwid = dict_wordid(lextree->dict,
>-                                      fsg_model_word_str(lextree->fsg, l->wid));
>+                                      fsg_model_word_str(lextree->fsg, l->wid)); // word_trans
> 
>                 /*
>                  * Add the first CIphone of l->wid to the rclist of state s, and
>                  * the last CIphone to lclist of state d.
>                  * (Filler phones are a pain to deal with.  There is no direct
>                  * marking of a filler phone; but only filler words are supposed to
>                  * use such phones, so we use that fact.  HACK!!  FRAGILE!!)
>                  *
>@@ -383,17 +383,17 @@ psubtree_add_trans(fsg_lextree_t *lextree,
>     int n_lc_alloc = 0, n_int_alloc = 0, n_rc_alloc = 0;
> 
>     silcipid = bin_mdef_silphone(lextree->mdef);
>     n_ci = bin_mdef_n_ciphone(lextree->mdef);
> 
>     wid = fsg_link_wid(fsglink);
>     assert(wid >= 0);           /* Cannot be a null transition */
>     dictwid = dict_wordid(lextree->dict,
>-                          fsg_model_word_str(lextree->fsg, wid));
>+                          fsg_model_word_str(lextree->fsg, wid)); // word_trans
>     pronlen = dict_pronlen(lextree->dict, dictwid);
>     assert(pronlen >= 1);
> 
>     assert(lclist[0] >= 0);     /* At least one phonetic context provided */
>     assert(rclist[0] >= 0);
> 
>     head = *alloc_head;
>     pred = NULL;
>@@ -724,17 +724,17 @@ fsg_psubtree_init(fsg_lextree_t *lextree,
>         int32 dst;
>         fsglink = fsg_arciter_get(itor);
>         dst = fsglink->to_state;
> 
>         if (fsg_link_wid(fsglink) < 0)
>             continue;
> 
>         E_DEBUG(2,("Building lextree for arc from %d to %d: %s\n",
>-                   from_state, dst, fsg_model_word_str(fsg, fsg_link_wid(fsglink))));
>+                   from_state, dst, fsg_model_word_str(fsg, fsg_link_wid(fsglink)))); // word_trans
>         root = psubtree_add_trans(lextree, root, &glist, fsglink,
>                                   lextree->lc[from_state],
>                                   lextree->rc[dst],
>                                   alloc_head);
>         ++n_arc;
>     }
>     E_DEBUG(2,("State %d has %d outgoing arcs\n", from_state, n_arc));
> 
>@@ -776,17 +776,17 @@ void fsg_psubtree_dump_node(fsg_lextree_t *tree, fsg_pnode_t *node, FILE *fp)
>         fprintf(fp, " [");
>         for (i = 0; i < FSG_PNODE_CTXT_BVSZ; i++)
>             fprintf(fp, "%08x", node->ctxt.bv[i]);
>         fprintf(fp, "]");
>     }
>     if (node->leaf) {
>         tl = node->next.fsglink;
>         fprintf(fp, " {%s[%d->%d](%d)}",
>-                fsg_model_word_str(tree->fsg, tl->wid),
>+                fsg_model_word_str(tree->fsg, tl->wid), // word_origin
>                 tl->from_state, tl->to_state, tl->logs2prob);
>     } else {
>         fprintf(fp, " %p.NXT", node->next.succ);
>     }
>     fprintf(fp, "\n");
> 
>     return;
> }
>diff --git a/media/pocketsphinx/src/fsg_search.c b/media/pocketsphinx/src/fsg_search.c
>index f24a0fb..39ee7e0 100644
>--- a/media/pocketsphinx/src/fsg_search.c
>+++ b/media/pocketsphinx/src/fsg_search.c
>@@ -132,26 +132,26 @@ fsg_search_check_dict(fsg_search_t *fsgs, fsg_model_t *fsg)
>     dict_t *dict;
>     int i;
> 
>     dict = ps_search_dict(fsgs);
>     for (i = 0; i < fsg_model_n_word(fsg); ++i) {
>         char const *word;
>         int32 wid;
> 
>-        word = fsg_model_word_str(fsg, i);
>+        word = fsg_model_word_str(fsg, i);// should use trans
>         wid = dict_wordid(dict, word);
>         if (wid == BAD_S3WID) {
>             E_WARN("The word '%s' is missing in the dictionary. Trying to create new phoneme \n", word);
>             if (!dict->ngram_g2p_model) {
>                 E_ERROR("NO dict->ngram_g2p_model. Aborting..");
>                 return FALSE;
>             }
> 
>-            int new_wid = dict_add_g2p_word(dict, word);
>+            int new_wid = dict_add_g2p_word(dict, word); // deal with g2p
>             if (new_wid > 0){
>                 /* Now we also have to add it to dict2pid. */
>                 dict2pid_add_word(ps_search_dict2pid(fsgs), new_wid);
>             } else {
>                 E_ERROR("Exiting... \n");
>                 return FALSE;
>             }
>         }
>@@ -167,24 +167,25 @@ fsg_search_add_altpron(fsg_search_t *fsgs, fsg_model_t *fsg)
>     int n_alt, n_word;
>     int i;
> 
>     dict = ps_search_dict(fsgs);
>     /* Scan FSG's vocabulary for words that have alternate pronunciations. */
>     n_alt = 0;
>     n_word = fsg_model_n_word(fsg);
>     for (i = 0; i < n_word; ++i) {
>-        char const *word;
>+        vocab_t *vocab;
>         int32 wid;
> 
>-        word = fsg_model_word_str(fsg, i);
>-        wid = dict_wordid(dict, word);
>+        vocab = fsg_model_word_vocab(fsg, i); // get vocab
>+        if (!vocab) continue;
>+        wid = dict_wordid(dict, vocab->word_trans);
>         if (wid != BAD_S3WID) {
>             while ((wid = dict_nextalt(dict, wid)) != BAD_S3WID) {
>-	        n_alt += fsg_model_add_alt(fsg, word, dict_wordstr(dict, wid));
>+	        n_alt += fsg_model_add_alt(fsg, vocab->word_origin, dict_wordstr(dict, wid));
>     	    }
>     	}
>     }
> 
>     E_INFO("Added %d alternate word transitions\n", n_alt);
>     return n_alt;
> }
> 
>@@ -467,17 +468,17 @@ fsg_search_pnode_exit(fsg_search_t *fsgs, fsg_pnode_t * pnode)
>     /*
>      * Check if this is filler or single phone word; these do not model right
>      * context (i.e., the exit score applies to all right contexts).
>      */
>     if (fsg_model_is_filler(fsgs->fsg, wid)
>         /* FIXME: This might be slow due to repeated calls to dict_to_id(). */
>         || (dict_is_single_phone(ps_search_dict(fsgs),
>                                    dict_wordid(ps_search_dict(fsgs),
>-                                               fsg_model_word_str(fsgs->fsg, wid))))) {
>+                                               fsg_model_word_str(fsgs->fsg, wid))))) { // trans
>         /* Create a dummy context structure that applies to all right contexts */
>         fsg_pnode_add_all_ctxt(&ctxt);
> 
>         /* Create history table entry for this word exit */
>         fsg_history_entry_add(fsgs->history,
>                               fl,
>                               fsgs->frame,
>                               hmm_out_score(hmm),
>@@ -999,19 +1000,17 @@ fsg_search_hyp(ps_search_t *search, int32 *out_score, int32 *out_is_final)
>         fsg_link_t *fl = fsg_hist_entry_fsglink(hist_entry);
>         char const *baseword;
>         int32 wid;
> 
>         bp = fsg_hist_entry_pred(hist_entry);
>         wid = fsg_link_wid(fl);
>         if (wid < 0 || fsg_model_is_filler(fsgs->fsg, wid))
>             continue;
>-        baseword = dict_basestr(dict,
>-                                dict_wordid(dict,
>-                                            fsg_model_word_str(fsgs->fsg, wid)));
>+        baseword = fsg_model_word_str_ori(fsgs->fsg, wid); // use origin
>         len += strlen(baseword) + 1;
>     }
>     
>     ckd_free(search->hyp_str);
>     if (len == 0) {
> 	search->hyp_str = NULL;
> 	return search->hyp_str;
>     }
>@@ -1024,19 +1023,17 @@ fsg_search_hyp(ps_search_t *search, int32 *out_score, int32 *out_is_final)
>         fsg_link_t *fl = fsg_hist_entry_fsglink(hist_entry);
>         char const *baseword;
>         int32 wid;
> 
>         bp = fsg_hist_entry_pred(hist_entry);
>         wid = fsg_link_wid(fl);
>         if (wid < 0 || fsg_model_is_filler(fsgs->fsg, wid))
>             continue;
>-        baseword = dict_basestr(dict,
>-                                dict_wordid(dict,
>-                                            fsg_model_word_str(fsgs->fsg, wid)));
>+        baseword = fsg_model_word_str_ori(fsgs->fsg, wid);// use origin
>         len = strlen(baseword);
>         c -= len;
>         memcpy(c, baseword, len);
>         if (c > search->hyp_str) {
>             --c;
>             *c = ' ';
>         }
>     }
>@@ -1048,17 +1045,17 @@ static void
> fsg_seg_bp2itor(ps_seg_t *seg, fsg_hist_entry_t *hist_entry)
> {
>     fsg_search_t *fsgs = (fsg_search_t *)seg->search;
>     fsg_hist_entry_t *ph = NULL;
>     int32 bp;
> 
>     if ((bp = fsg_hist_entry_pred(hist_entry)) >= 0)
>         ph = fsg_history_entry_get(fsgs->history, bp);
>-    seg->word = fsg_model_word_str(fsgs->fsg, hist_entry->fsglink->wid);
>+    seg->word = fsg_model_word_str(fsgs->fsg, hist_entry->fsglink->wid); // use_trans
>     seg->ef = fsg_hist_entry_frame(hist_entry);
>     seg->sf = ph ? fsg_hist_entry_frame(ph) + 1 : 0;
>     /* This is kind of silly but it happens for null transitions. */
>     if (seg->sf > seg->ef) seg->sf = seg->ef;
>     seg->prob = 0; /* Bogus value... */
>     /* "Language model" score = transition probability. */
>     seg->lback = 1;
>     seg->lscr = fsg_link_logs2prob(hist_entry->fsglink) >> SENSCR_SHIFT;
>@@ -1233,34 +1230,34 @@ find_start_node(fsg_search_t *fsgs, ps_lattice_t *dag)
>     ps_latnode_t *node;
>     glist_t start = NULL;
>     int nstart = 0;
> 
>     /* Look for all nodes starting in frame zero with some exits. */
>     for (node = dag->nodes; node; node = node->next) {
>         if (node->sf == 0 && node->exits) {
>             E_INFO("Start node %s.%d:%d:%d\n",
>-                   fsg_model_word_str(fsgs->fsg, node->wid),
>+                   fsg_model_word_str_ori(fsgs->fsg, node->wid),// word_orign
>                    node->sf, node->fef, node->lef);
>             start = glist_add_ptr(start, node);
>             ++nstart;
>         }
>     }
> 
>     /* If there was more than one start node candidate, then we need
>      * to create an artificial start node with epsilon transitions to
>      * all of them. */
>     if (nstart == 1) {
>         node = gnode_ptr(start);
>     }
>     else {
>         gnode_t *st;
>         int wid;
> 
>-        wid = fsg_model_word_add(fsgs->fsg, "<s>");
>+        wid = fsg_model_word_add(fsgs->fsg, "<s>", 0);
>         if (fsgs->fsg->silwords)
>             bitvec_set(fsgs->fsg->silwords, wid);
>         node = new_node(dag, fsgs->fsg, 0, 0, wid, -1, 0);
>         for (st = start; st; st = gnode_next(st))
>             ps_lattice_link(dag, node, gnode_ptr(st), 0, 0);
>     }
>     glist_free(start);
>     return node;
>@@ -1272,17 +1269,17 @@ find_end_node(fsg_search_t *fsgs, ps_lattice_t *dag)
>     ps_latnode_t *node;
>     glist_t end = NULL;
>     int nend = 0;
> 
>     /* Look for all nodes ending in last frame with some entries. */
>     for (node = dag->nodes; node; node = node->next) {
>         if (node->lef == dag->n_frames - 1 && node->entries) {
>             E_INFO("End node %s.%d:%d:%d (%d)\n",
>-                   fsg_model_word_str(fsgs->fsg, node->wid),
>+                   fsg_model_word_str_ori(fsgs->fsg, node->wid),// word_origin
>                    node->sf, node->fef, node->lef, node->info.best_exit);
>             end = glist_add_ptr(end, node);
>             ++nend;
>         }
>     }
> 
>     if (nend == 1) {
>         node = gnode_ptr(end);
>@@ -1297,26 +1294,26 @@ find_end_node(fsg_search_t *fsgs, ps_lattice_t *dag)
>             if (node->lef > ef && node->entries) {
>                 last = node;
>                 ef = node->lef;
>             }
>         }
>         node = last;
>         if (node)
>             E_INFO("End node %s.%d:%d:%d (%d)\n",
>-                   fsg_model_word_str(fsgs->fsg, node->wid),
>+                   fsg_model_word_str_ori(fsgs->fsg, node->wid),// word_origin
>                    node->sf, node->fef, node->lef, node->info.best_exit);
>     }    
>     else {
>         /* If there was more than one end node candidate, then we need
>          * to create an artificial end node with epsilon transitions
>          * out of all of them. */
>         gnode_t *st;
>         int wid;
>-        wid = fsg_model_word_add(fsgs->fsg, "</s>");
>+        wid = fsg_model_word_add(fsgs->fsg, "</s>", 0);
>         if (fsgs->fsg->silwords)
>             bitvec_set(fsgs->fsg->silwords, wid);
>         node = new_node(dag, fsgs->fsg, fsgs->frame, fsgs->frame, wid, -1, 0);
>         /* Use the "best" (in reality it will be the only) exit link
>          * score from this final node as the link score. */
>         for (st = end; st; st = gnode_next(st)) {
>             ps_latnode_t *src = gnode_ptr(st);
>             ps_lattice_link(dag, src, node, src->info.best_exit, fsgs->frame);
>@@ -1496,26 +1493,26 @@ fsg_search_lattice(ps_search_t *search)
>     }
>     if ((dag->end = find_end_node(fsgs, dag)) == NULL) {
> 	E_WARN("Failed to find the end node\n");
>         goto error_out;
>     }
> 
> 
>     E_INFO("lattice start node %s.%d end node %s.%d\n",
>-           fsg_model_word_str(fsg, dag->start->wid), dag->start->sf,
>-           fsg_model_word_str(fsg, dag->end->wid), dag->end->sf);
>+           fsg_model_word_str_ori(fsg, dag->start->wid), dag->start->sf, // word_origin
>+           fsg_model_word_str_ori(fsg, dag->end->wid), dag->end->sf); // word_origin
>     /* FIXME: Need to calculate final_node_ascr here. */
> 
>     /*
>      * Convert word IDs from FSG to dictionary.
>      */
>     for (node = dag->nodes; node; node = node->next) {
>         node->wid = dict_wordid(dag->search->dict,
>-                                fsg_model_word_str(fsg, node->wid));
>+                                fsg_model_word_str(fsg, node->wid)); //word_trans
>         node->basewid = dict_basewid(dag->search->dict, node->wid);
>     }
> 
>     /*
>      * Now we are done, because the links in the graph are uniquely
>      * defined by the history table.  However we should remove any
>      * nodes which are not reachable from the end node of the FSG.
>      * Everything is reachable from the start node by definition.
>diff --git a/media/sphinxbase/moz.build b/media/sphinxbase/moz.build
>index ab16dab..4a5ade2 100644
>--- a/media/sphinxbase/moz.build
>+++ b/media/sphinxbase/moz.build
>@@ -65,20 +65,23 @@ SOURCES += [
>     'src/libsphinxbase/util/utf8.c',
> ]
> 
> # Suppress warnings in third-party code.
> if CONFIG['GNU_CC']:
>     CFLAGS += [
>         '-Wno-parentheses',
>         '-Wno-sign-compare',
>+        'MOZ_ICU_CFLAGS',
>     ]
> 
> # Add define required of third party code.
> if CONFIG['GNU_CC']:
>     DEFINES['HAVE_CONFIG_H'] = True
> 
> if CONFIG['GKMEDIAS_SHARED_LIBRARY']:
>     NO_VISIBILITY_FLAGS = True,
> 
> ALLOW_COMPILER_WARNINGS = True
> 
> FINAL_LIBRARY = 'gkmedias'
>+
>+USE_LIBS += ['icu']
>diff --git a/media/sphinxbase/sphinxbase/fsg_model.h b/media/sphinxbase/sphinxbase/fsg_model.h
>index cb82b90..c48314f 100644
>--- a/media/sphinxbase/sphinxbase/fsg_model.h
>+++ b/media/sphinxbase/sphinxbase/fsg_model.h
>@@ -77,28 +77,37 @@ typedef struct fsg_link_s {
> #define fsg_link_logs2prob(l)	((l)->logs2prob)
> 
> /**
>  * Adjacency list (opaque) for a state in an FSG.
>  */
> typedef struct trans_list_s trans_list_t;
> 
> /**
>+ * Vocabulary object for this FSG.
>+ */
>+typedef struct vocab_s {
>+    char *word_origin;
>+    char *word_trans;
>+}vocab_t;
>+
>+
>+/**
>  * Word level FSG definition.
>  * States are simply integers 0..n_state-1.
>  * A transition emits a word and has a given probability of being taken.
>  * There can also be null or epsilon transitions, with no associated emitted
>  * word.
>  */
> typedef struct fsg_model_s {
>     int refcount;       /**< Reference count. */
>     char *name;		/**< A unique string identifier for this FSG */
>     int32 n_word;       /**< Number of unique words in this FSG */
>     int32 n_word_alloc; /**< Number of words allocated in vocab */
>-    char **vocab;       /**< Vocabulary for this FSG. */
>+    vocab_t* vocab;     /**< Collection of vocabularies for this FSG. */
>     bitvec_t *silwords; /**< Indicates which words are silence/fillers. */
>     bitvec_t *altwords; /**< Indicates which words are pronunciation alternates. */
>     logmath_t *lmath;	/**< Pointer to log math computation object. */
>     int32 n_state;	/**< number of states in FSG */
>     int32 start_state;	/**< Must be in the range [0..n_state-1] */
>     int32 final_state;	/**< Must be in the range [0..n_state-1] */
>     float32 lw;		/**< Language weight that's been applied to transition
> 			   logprobs */
>@@ -109,17 +118,19 @@ typedef struct fsg_model_s {
> /* Access macros */
> #define fsg_model_name(f)		((f)->name)
> #define fsg_model_n_state(f)		((f)->n_state)
> #define fsg_model_start_state(f)	((f)->start_state)
> #define fsg_model_final_state(f)	((f)->final_state)
> #define fsg_model_log(f,p)		logmath_log((f)->lmath, p)
> #define fsg_model_lw(f)			((f)->lw)
> #define fsg_model_n_word(f)		((f)->n_word)
>-#define fsg_model_word_str(f,wid)       (wid == -1 ? "(NULL)" : (f)->vocab[wid])
>+#define fsg_model_word_vocab(f,wid)     (wid == -1 ? NULL : &((f)->vocab[wid]))
>+#define fsg_model_word_str(f,wid)       (wid == -1 ? "(NULL)" : (f)->vocab[wid].word_trans)
>+#define fsg_model_word_str_ori(f,wid)   (wid == -1 ? "(NULL)" : (f)->vocab[wid].word_origin)
> 
> /**
>  * Iterator over arcs.
>  */
> typedef struct fsg_arciter_s fsg_arciter_t;
> 
> /**
>  * Have silence transitions been added?
>@@ -208,17 +219,26 @@ SPHINXBASE_EXPORT
> int fsg_model_free(fsg_model_t *fsg);
> 
> /**
>  * Add a word to the FSG vocabulary.
>  *
>  * @return Word ID for this new word.
>  */
> SPHINXBASE_EXPORT
>-int fsg_model_word_add(fsg_model_t *fsg, char const *word);
>+int fsg_model_word_add(fsg_model_t *fsg, char const *word, int is_pronounce);
>+
>+/**
>+ * Add a alt word to the FSG vocabulary.
>+ *
>+ * @return Word ID for this new word.
>+ */
>+SPHINXBASE_EXPORT
>+int fsg_model_alt_word_add(fsg_model_t * fsg, char const *baseword, char const *word);
>+
> 
> /**
>  * Look up a word in the FSG vocabulary.
>  *
>  * @return Word ID for this word
>  */
> SPHINXBASE_EXPORT
> int fsg_model_word_id(fsg_model_t *fsg, char const *word);
>diff --git a/media/sphinxbase/src/libsphinxbase/lm/fsg_model.c b/media/sphinxbase/src/libsphinxbase/lm/fsg_model.c
>index 3748977..bd30c6d 100644
>--- a/media/sphinxbase/src/libsphinxbase/lm/fsg_model.c
>+++ b/media/sphinxbase/src/libsphinxbase/lm/fsg_model.c
>@@ -48,16 +48,23 @@
> #include "sphinxbase/err.h"
> #include "sphinxbase/pio.h"
> #include "sphinxbase/ckd_alloc.h"
> #include "sphinxbase/prim_type.h"
> #include "sphinxbase/strfuncs.h"
> #include "sphinxbase/hash_table.h"
> #include "sphinxbase/fsg_model.h"
> 
>+/* icu headers */
>+#include <unicode/ucasemap.h>
>+#include <unicode/ucol.h>
>+#include <unicode/uiter.h>
>+#include <unicode/ustring.h>
>+#include <unicode/unorm.h>
>+
> /**
>  * Adjacency list (opaque) for a state in an FSG.
>  *
>  * Actually we use hash tables so that random access is a bit faster.
>  * Plus it allows us to make the lookup code a bit less ugly.
>  */
> 
> struct trans_list_s {
>@@ -80,16 +87,54 @@ struct fsg_arciter_s {
> #define FSG_MODEL_S_DECL			"S"
> #define FSG_MODEL_START_STATE_DECL	"START_STATE"
> #define FSG_MODEL_F_DECL			"F"
> #define FSG_MODEL_FINAL_STATE_DECL	"FINAL_STATE"
> #define FSG_MODEL_T_DECL			"T"
> #define FSG_MODEL_TRANSITION_DECL	"TRANSITION"
> #define FSG_MODEL_COMMENT_CHAR		'#'
> 
>+static char*
>+get_lower_case(const char *name){
>+
>+    UCaseMap *caseMap;
>+    UErrorCode status =U_ZERO_ERROR;
>+    char *utf8Dest;
>+    const char *utf8Src = name;
>+    int32_t utf8SrcLen = strlen(utf8Src);
>+    int32_t destCapacity = utf8SrcLen + 1;
>+    int32_t destLen;
>+
>+    status = U_ZERO_ERROR;
>+    utf8Dest = (char*)ckd_malloc(sizeof(char)*destCapacity);
>+    caseMap = ucasemap_open("en-US", 0, &status);
>+
>+    if (U_FAILURE(status)) {
>+        return;
>+    }
>+
>+    destLen = ucasemap_utf8ToLower(caseMap,
>+                                   utf8Dest,
>+                                   destCapacity,
>+                                   utf8Src,
>+                                   utf8SrcLen,
>+                                   &status);
>+
>+    if (status == U_BUFFER_OVERFLOW_ERROR) {
>+        utf8Dest = (char*)realloc(utf8Dest, sizeof(char)*(destLen + 1));
>+        destLen = ucasemap_utf8ToLower(caseMap,
>+                                       utf8Dest,
>+                                       destLen + 1,
>+                                       utf8Src,
>+                                       utf8SrcLen,
>+                                       &status);
>+    }
>+
>+    return utf8Dest;
>+}
> 
> static int32
> nextline_str2words(FILE * fp, int32 * lineno,
>                    char **lineptr, char ***wordptr)
> {
>     for (;;) {
>         size_t len;
>         int32 n;
>@@ -375,27 +420,27 @@ fsg_arciter_free(fsg_arciter_t * itor)
> 
> int
> fsg_model_word_id(fsg_model_t * fsg, char const *word)
> {
>     int wid;
> 
>     /* Search for an existing word matching this. */
>     for (wid = 0; wid < fsg->n_word; ++wid) {
>-        if (0 == strcmp(fsg->vocab[wid], word))
>+        if (0 == strcmp(fsg->vocab[wid].word_origin, word))
>             break;
>     }
>     /* If not found, add this to the vocab. */
>     if (wid == fsg->n_word)
>         return -1;
>     return wid;
> }
> 
> int
>-fsg_model_word_add(fsg_model_t * fsg, char const *word)
>+fsg_model_word_add(fsg_model_t * fsg, char const *word, int is_pronounce)
> {
>     int wid, old_size;
> 
>     /* Search for an existing word matching this. */
>     wid = fsg_model_word_id(fsg, word);
>     /* If not found, add this to the vocab. */
>     if (wid == -1) {
>         wid = fsg->n_word;
>@@ -408,31 +453,69 @@ fsg_model_word_add(fsg_model_t * fsg, char const *word)
>             if (fsg->silwords)
>                 fsg->silwords =
>                     bitvec_realloc(fsg->silwords, old_size, fsg->n_word_alloc);
>             if (fsg->altwords)
>                 fsg->altwords =
>                     bitvec_realloc(fsg->altwords, old_size, fsg->n_word_alloc);
>         }
>         ++fsg->n_word;
>-        fsg->vocab[wid] = ckd_salloc(word);
>+        fsg->vocab[wid].word_origin = ckd_salloc(word);
>+        if (is_pronounce) {
>+            fsg->vocab[wid].word_trans = get_lower_case(word);
>+//            fsg->vocab[wid].word_trans = ckd_salloc(word);
>+        }
>+        else {
>+            fsg->vocab[wid].word_trans = ckd_salloc(word);
>+        }
>     }
>     return wid;
> }
> 
> int
>+fsg_model_alt_word_add(fsg_model_t * fsg, char const *baseword, char const *word)
>+{
>+    int wid, old_size;
>+
>+    /* Search for an existing word matching this. */
>+    wid = fsg_model_word_id(fsg, word);
>+    /* If not found, add this to the vocab. */
>+    if (wid == -1) {
>+        wid = fsg->n_word;
>+        if (fsg->n_word == fsg->n_word_alloc) {
>+            old_size = fsg->n_word_alloc;
>+            fsg->n_word_alloc += 10;
>+            fsg->vocab = ckd_realloc(fsg->vocab,
>+                                     fsg->n_word_alloc *
>+                                     sizeof(*fsg->vocab));
>+            if (fsg->silwords)
>+                fsg->silwords =
>+                    bitvec_realloc(fsg->silwords, old_size, fsg->n_word_alloc);
>+            if (fsg->altwords)
>+                fsg->altwords =
>+                    bitvec_realloc(fsg->altwords, old_size, fsg->n_word_alloc);
>+        }
>+        ++fsg->n_word;
>+        fsg->vocab[wid].word_origin = ckd_salloc(baseword);
>+        fsg->vocab[wid].word_trans = ckd_salloc(word);
>+    }
>+    return wid;
>+}
>+
>+
>+int
> fsg_model_add_silence(fsg_model_t * fsg, char const *silword,
>                       int state, float32 silprob)
> {
>     int32 logsilp;
>     int n_trans, silwid, src;
> 
>     E_INFO("Adding silence transitions for %s to FSG\n", silword);
> 
>-    silwid = fsg_model_word_add(fsg, silword);
>+    silwid = fsg_model_word_add(fsg, silword, 0);
>     logsilp = (int32) (logmath_log(fsg->lmath, silprob) * fsg->lw);
>     if (fsg->silwords == NULL)
>         fsg->silwords = bitvec_alloc(fsg->n_word_alloc);
>     bitvec_set(fsg->silwords, silwid);
> 
>     n_trans = 0;
>     if (state == -1) {
>         for (src = 0; src < fsg->n_state; src++) {
>@@ -453,23 +536,23 @@ int
> fsg_model_add_alt(fsg_model_t * fsg, char const *baseword,
>                   char const *altword)
> {
>     int i, basewid, altwid;
>     int ntrans;
> 
>     /* FIXME: This will get slow, eventually... */
>     for (basewid = 0; basewid < fsg->n_word; ++basewid)
>-        if (0 == strcmp(fsg->vocab[basewid], baseword))
>+        if (0 == strcmp(fsg->vocab[basewid].word_origin, baseword))
>             break;
>     if (basewid == fsg->n_word) {
>         E_ERROR("Base word %s not present in FSG vocabulary!\n", baseword);
>         return -1;
>     }
>-    altwid = fsg_model_word_add(fsg, altword);
>+    altwid = fsg_model_alt_word_add(fsg, baseword, altword);
>     if (fsg->altwords == NULL)
>         fsg->altwords = bitvec_alloc(fsg->n_word_alloc);
>     bitvec_set(fsg->altwords, altwid);
>     if (fsg_model_is_filler(fsg, basewid)) {
>          if (fsg->silwords == NULL)
> 	      fsg->silwords = bitvec_alloc(fsg->n_word_alloc);
>          bitvec_set(fsg->silwords, altwid);
>     }
>@@ -708,17 +791,17 @@ fsg_model_read(FILE * fp, logmath_t * lmath, float32 lw)
>     /* Now create a string table from the "dictionary" */
>     fsg->n_word = hash_table_inuse(vocab);
>     fsg->n_word_alloc = fsg->n_word + 10;       /* Pad it a bit. */
>     fsg->vocab = ckd_calloc(fsg->n_word_alloc, sizeof(*fsg->vocab));
>     for (itor = hash_table_iter(vocab); itor;
>          itor = hash_table_iter_next(itor)) {
>         char const *word = hash_entry_key(itor->ent);
>         int32 wid = (int32) (long) hash_entry_val(itor->ent);
>-        fsg->vocab[wid] = (char *) word;
>+        //fsg->vocab[wid] = (char *) word;
>     }
>     hash_table_free(vocab);
> 
>     /* Do transitive closure on null transitions */
>     nulls = fsg_model_null_trans_closure(fsg, nulls);
>     glist_free(nulls);
> 
>     ckd_free(lineptr);
>@@ -787,22 +870,24 @@ fsg_model_free(fsg_model_t * fsg)
>     int i;
> 
>     if (fsg == NULL)
>         return 0;
> 
>     if (--fsg->refcount > 0)
>         return fsg->refcount;
> 
>-    for (i = 0; i < fsg->n_word; ++i)
>-        ckd_free(fsg->vocab[i]);
>+    for (i = 0; i < fsg->n_word; ++i) {
>+        ckd_free(fsg->vocab[i].word_trans);
>+        ckd_free(fsg->vocab[i].word_origin);
>+    }
>+    ckd_free(fsg->vocab);
>     for (i = 0; i < fsg->n_state; ++i)
>         trans_list_free(fsg, i);
>     ckd_free(fsg->trans);
>-    ckd_free(fsg->vocab);
>     listelem_alloc_free(fsg->link_alloc);
>     bitvec_free(fsg->silwords);
>     bitvec_free(fsg->altwords);
>     ckd_free(fsg->name);
>     ckd_free(fsg);
>     return 0;
> }
> 
>diff --git a/media/sphinxbase/src/libsphinxbase/lm/jsgf.c b/media/sphinxbase/src/libsphinxbase/lm/jsgf.c
>index 90e161c..7400537 100644
>--- a/media/sphinxbase/src/libsphinxbase/lm/jsgf.c
>+++ b/media/sphinxbase/src/libsphinxbase/lm/jsgf.c
>@@ -545,17 +545,17 @@ jsgf_build_fsg_internal(jsgf_t *grammar, jsgf_rule_t *rule,
>         jsgf_link_t *link = gnode_ptr(gn);
> 
>         if (link->atom) {
>             if (jsgf_atom_is_rule(link->atom)) {
>                 fsg_model_null_trans_add(fsg, link->from, link->to,
>                                         logmath_log(lmath, link->atom->weight));
>             }
>             else {
>-                int wid = fsg_model_word_add(fsg, link->atom->name);
>+                int wid = fsg_model_word_add(fsg, link->atom->name, 1);
>                 fsg_model_trans_add(fsg, link->from, link->to,
>                                    logmath_log(lmath, link->atom->weight), wid);
>             }
>         }
>         else {
>             fsg_model_null_trans_add(fsg, link->from, link->to, 0);
>         }            
>     }
>diff --git a/media/sphinxbase/src/libsphinxbase/lm/jsgf_internal.h b/media/sphinxbase/src/libsphinxbase/lm/jsgf_internal.h
>index a5cbc98..68599ae 100644
>--- a/media/sphinxbase/src/libsphinxbase/lm/jsgf_internal.h
>+++ b/media/sphinxbase/src/libsphinxbase/lm/jsgf_internal.h
>@@ -117,16 +117,17 @@ struct jsgf_link_s {
>     jsgf_atom_t *atom; /**< Name, tags, weight */
>     int from;          /**< From state */
>     int to;            /**< To state */
> };
> 
> #define jsgf_atom_is_rule(atom) ((atom)->name[0] == '<')
> 
> void jsgf_add_link(jsgf_t *grammar, jsgf_atom_t *atom, int from, int to);
>+jsgf_atom_t *jsgf_token_new(char *name, float weight);
> jsgf_atom_t *jsgf_atom_new(char *name, float weight);
> jsgf_atom_t *jsgf_kleene_new(jsgf_t *jsgf, jsgf_atom_t *atom, int plus);
> jsgf_rule_t *jsgf_optional_new(jsgf_t *jsgf, jsgf_rhs_t *exp);
> jsgf_rule_t *jsgf_define_rule(jsgf_t *jsgf, char *name, jsgf_rhs_t *rhs, int is_public);
> jsgf_rule_t *jsgf_import_rule(jsgf_t *jsgf, char *name);
> 
> int jsgf_atom_free(jsgf_atom_t *atom);
> int jsgf_rule_free(jsgf_rule_t *rule);
>
Attachment #8667790 - Attachment is obsolete: true
Hi Kelly, in this patch I changed the data structure of fsg_model_s to support upper case grammar, the following is the reason:

-Make sure application get case sensitive output :
  When doing the recognition, engine needs lower case words to do the search in dictionary. And need to mapping the words back to original form when returning recognition results, so that applications can get expected words (e.g., <greeting> = Hello; will get "Hello" instead of "hello").
  To implement that, I change the structure of fsg_model_s to store words in both form (origin and lower case), does this change make sense to you?

There are two problems need to be considered under this design :
   1. Compatibility problem :
      PocketSphinx support fsg_model_read and fsg_model_write functions to let engine read/write fsg model from/to a file[1]. Should we change the file format when we writing the file ? That might make our fsg_model file not compatible to other PocketSphinx engine. Though we wouldn't use this function in our code right now.
   2. Multi-result problem :
      We don't support multiple recognition results in current implementation, so, when there exists a grammar "<greeting> = Hello | hello;", which result should we return ? "Hello" or "hello" ? Maybe we should replace ps_get_hyp with ps_nbest_hyp ? [2]


[1] http://cca.nuigroup.com/docs/0.6/fsg__model_8h.html#a3168d49e09047a8fac127d40645404a7
[2] https://dxr.mozilla.org/mozilla-central/source/dom/media/webspeech/recognition/PocketSphinxSpeechRecognitionService.cpp#114
Flags: needinfo?(kdavis)
Here are my thoughts on the two issues:

1. Compatibility problem - I think changing the Pocketsphinx code to break compatibility with future and past versions of Pocketsphinx is not the right thing to do without a very good reason. I think letting the grammar accept uppercase letters doesn't reach that bar of a "very good reason" as there are easier solutions, for example using toLocaleLowerCase() in the JavaScript calling the WebSpeech API, that have the same effect without the downsides.

2.  Multi-result problem - There are various problems your are considering at once:

a. Capitonym - A word that changes meaning and sometimes pronunciation when capitalized. For example, in German "Laut" means sound while "laut" means loud. If a capitonym changes pronunciation when capitalized, Pocketsphinx simply can not handle this case as the Pocketsphinx phonetic dictionary is all in lowercase. So, I think we shouldn't worry about this while using Pocketsphinx as Pocketsphinx is not equipped to deal with this case.

b. Returning multiple recognition results - Changing ps_get_hyp to ps_nbest_hyp. Unfortunately this change breaks other things in the API. For example, Bug 1185235, would have to be revisited.
Flags: needinfo?(kdavis)
(In reply to kdavis from comment #26)
> Here are my thoughts on the two issues:
> 
> 1. Compatibility problem - I think changing the Pocketsphinx code to break
> compatibility with future and past versions of Pocketsphinx is not the right
> thing to do without a very good reason. I think letting the grammar accept
> uppercase letters doesn't reach that bar of a "very good reason" as there
> are easier solutions, for example using toLocaleLowerCase() in the
> JavaScript calling the WebSpeech API, that have the same effect without the
> downsides.
> 

Hi Kelly, thanks for your reply. 

-I've considered doing the case conversion outsides of PocketSpinx source code, however, it's a little complex to implement.
  1. To change grammar to lower case, we have to parse the jsgf and only convert the word tokens (some tokens  
like capital rule names is allowed). 
  2. When getting the results, we need to map lower case result back to original one which is also complex, because jsgf has some special tokens like kleene star, import rule, alternative, etc. 
  And these parsing processes have been done once in pocketsphinx engine. I think it is kind of duplicate work.

-Another possible solution which is to convert the words when doing the dictionary query [1].

Can I have your advice of which way is better? Thank you.

[1] https://dxr.mozilla.org/mozilla-central/source/media/pocketsphinx/src/dict.c#397

> 2.  Multi-result problem - There are various problems your are considering
> at once:
> 
> a. Capitonym - A word that changes meaning and sometimes pronunciation when
> capitalized. For example, in German "Laut" means sound while "laut" means
> loud. If a capitonym changes pronunciation when capitalized, Pocketsphinx
> simply can not handle this case as the Pocketsphinx phonetic dictionary is
> all in lowercase. So, I think we shouldn't worry about this while using
> Pocketsphinx as Pocketsphinx is not equipped to deal with this case.
> 
> b. Returning multiple recognition results - Changing ps_get_hyp to
> ps_nbest_hyp. Unfortunately this change breaks other things in the API. For
> example, Bug 1185235, would have to be revisited.

Do you think returning multiple recognition results is necessary?
Flags: needinfo?(kdavis)
> Can I have your advice of which way is better?

I think leaving it as it is for now is best.

> Do you think returning multiple recognition results is necessary?

No. Later we should add this, but we should learn to walk before we run.
Flags: needinfo?(kdavis)
(In reply to kdavis from comment #28)
> > Can I have your advice of which way is better?
> 
> I think leaving it as it is for now is best.
> 

Hi Kelly, in current code, if we add a upper case word to pocketsphinx, it will be handled by g2p (Bug 1180113), it kind of solve upper case problem on some level.

Should I change the status to "RESOLVED WONTFIX" ?

> > Do you think returning multiple recognition results is necessary?
> 
> No. Later we should add this, but we should learn to walk before we run.
Flags: needinfo?(kdavis)
(In reply to Kevin Chen from comment #29)
> (In reply to kdavis from comment #28)
> > > Can I have your advice of which way is better?
> > 
> > I think leaving it as it is for now is best.
> > 
> 
> Hi Kelly, in current code, if we add a upper case word to pocketsphinx, it
> will be handled by g2p (Bug 1180113), it kind of solve upper case problem on
> some level.

As far as I know this is not the case. The g2p code is a machine learning model 
trained on the dictionary. Thus is has the same problems the dictionary does, it
can not deal with upper case words.

> Should I change the status to "RESOLVED WONTFIX" ?

Maybe we can leave this open, but not try and deal with it in the 2.5 timeframe.
Flags: needinfo?(kdavis)
Assignee: kechen → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: