Closed Bug 1182384 Opened 9 years ago Closed 9 years ago

Include digits in numerical form to standard english pocketsphinx dictionary

Categories

(Core :: Web Speech, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: anatal, Assigned: anatal)

Details

Attachments

(1 file)

Standard pocketsphinx dictionary does not support single digits in form of numbers.
Summary: Include digits to standard dictionary → Include digits in numeric form to standard dictionary
Summary: Include digits in numeric form to standard dictionary → Include digits in numerical form to standard dictionary
Summary: Include digits in numerical form to standard dictionary → Include digits in numerical form to standard pocketsphinx dictionary
Summary: Include digits in numerical form to standard pocketsphinx dictionary → Include digits in numerical form to standard english pocketsphinx dictionary
Comment on attachment 8631981 [details] [diff] [review]
This patch adds digits in numerical form to standard pocketsphinx english dictionary

Looks ok to me, though I wonder if there is some actual reason why numbers aren't included.


And kdavis could review this kind of patches.


Are we going to try to upload this kinds of patches to the original
repository of cmu07a.dic?
Attachment #8631981 - Flags: review?(kdavis)
Attachment #8631981 - Flags: review?(bugs)
Attachment #8631981 - Flags: review+
(In reply to Olli Pettay [:smaug] from comment #2)
> Comment on attachment 8631981 [details] [diff] [review]
> This patch adds digits in numerical form to standard pocketsphinx english
> dictionary
> 
> Looks ok to me, though I wonder if there is some actual reason why numbers
> aren't included.
> 
> 
> And kdavis could review this kind of patches.
> 
> 
> Are we going to try to upload this kinds of patches to the original
> repository of cmu07a.dic?

The reasoj that digits aren't included because they aren't included. Actual situation of poceketsphinx is that developers are responsible for dictionaries by themselves, and they simple are not responsible to have phonemes for every word existent in the world.
I took a quick look, and have a few comments, but I need to look in
a bit more detail into the pocketsphinx code, before I do an official
review. (I have to finish preparing a presentation tonight for the
talk at https://mozweekend.de so time is a bit tight.)

My few comments are:

1. It looks like the dictionary was modified by someone already, our
   "copy" has firefox as the first entry.
2. All entries, other than firefox and the new numerical values, are
   in alphabetic order.
3. The dict code in pocketsphinx may require the entries to be in al-
   phabetic order, in order to, for example, construct an internal
   data structure used to access the phonemes for a given word.(On
   this I am not sure and want to check before I give an official
   review.)
> 
> 1. It looks like the dictionary was modified by someone already, our
>    "copy" has firefox as the first entry.


Yes, I added this word. And as you just copied the patches I wrote, you received this modification too.



> 2. All entries, other than firefox and the new numerical values, are
>    in alphabetic order.

Yes, they are.



> 3. The dict code in pocketsphinx may require the entries to be in al-
>    phabetic order, 

Clearly don't require, otherwise wouldn't be working at all, since "firefox" is already the first word since one year ago.

> in order to, for example, construct an internal data structure used to access the phonemes for a given word.(On this I am not sure and want to check before I give an official review.)

The internal structure to store the dictionary is a hashtable so clearly the words in the dict file does not need to be in alphabetical order
Comment on attachment 8631981 [details] [diff] [review]
This patch adds digits in numerical form to standard pocketsphinx english dictionary

Review of attachment 8631981 [details] [diff] [review]:
-----------------------------------------------------------------

After looking at the Pocketsphinx code, it looks like adding
entries in non-alphabetical order is fine.

Also, it seems as if adding numerical entries is also, as far
as the dictionary is concerned, OK as it seems the keys are
uninterpreted. Thus is doesn't matter if they are numeric or
alphabetic.

A few other entries in the dictionary are not in alphabetic order
according to "sort u" in "vim" in my default locale. However, I
think we should alphabetize the entire file if for nor other reason
than helping us to not enter the same word twice.

Andre could you open another Bug and assign to to yourself,
whose sole purpose is to alphabetize the dictionary file by
the grapheme?
Attachment #8631981 - Flags: review?(kdavis) → review+
FYI CMU already has a tool for, among other things, alphabetizing
such dictionaries https://github.com/cmusphinx/cmudict-tools
> Andre could you open another Bug and assign to to yourself,
> whose sole purpose is to alphabetize the dictionary file by
> the grapheme?

Sure, I can do that.
(In reply to kdavis from comment #7)
> FYI CMU already has a tool for, among other things, alphabetizing
> such dictionaries https://github.com/cmusphinx/cmudict-tools

Yes, I know that. Thank you.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/26346ab6682c
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: