Closed Bug 1180687 Opened 10 years ago Closed 9 years ago

Complete Acoustic Model Back-end of "Community Portion" functionality in Vaani

Categories

(Cloud Services :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kdavis, Assigned: kdavis)

References

Details

(Whiteboard: [webspeechapi][vaani][systemsfe])

Attachments

(4 files)

Complete Acoustic Model Back-end of "Community Portion" functionality in Vaani: 1. Create acoustic model infrastructure that a. Stores acoustic data and related data, e.g. the audio transcript b. Serves acoustic data and related data, e.g. the audio transcript 2. Create resources to collect acoustic data for ‘en’ and ‘es’ a. Create texts to be read for ‘en’ b. Create texts to be read for ‘es' 3. Obtain acoustic data to create acoustic models from a. Obtain acoustic, and related, data for the language 'en' b. Store acoustic, and related, data for the language ‘en’ in the acoustic model infrastructure c. Obtain acoustic, and related, data for the language ‘es' d. Store acoustic, and related, data for the language ‘ es’ in the acoustic model infrastructure 4. Create “workflows" to create acoustic models from raw audio data for apropos speech recognition engines a. Create workflow, hand triggered scripts, to create acoustic models for Kaldi recognition engine b. Create workflow, hand triggered scripts, to create acoustic models for Pocketsphinx recognition engine
Assignee: nobody → kdavis
Blocks: 1178322
Whiteboard: [webspeechapi][vaani][systemsfe]
The additional language fr should also be included.
Field Id's of the texts to be read, see http://bit.ly/1MdsnCd
English texts to be read, see http://bit.ly/1MdsnCd
French texts to be read, see http://bit.ly/1MdsnCd
Spanish texts to be read, see http://bit.ly/1MdsnCd
The acoustic model infrastructure that a. Stores acoustic data and related data, e.g. the audio transcript b. Serves acoustic data and related data, e.g. the audio transcript is provided by the s3 bucket s3://acoustic-models
The resources to collect acoustic data for ‘en’, ‘es’, and 'fr' a. Texts to be read for ‘en’ b. Texts to be read for ‘es' c. Texts to be read for ‘fr' are attached to this bug and in the s3 bucket s3://acoustic-models
The acoustic data and storage infrastructure to: a. Obtain acoustic, and related, data for the language 'en' b. Store acoustic, and related, data for the language ‘en’ in the acoustic model infrastructure c. Obtain acoustic, and related, data for the language ‘es' d. Store acoustic, and related, data for the language ‘es’ in the acoustic model infrastructure e. Obtain acoustic, and related, data for the language ‘fr' f. Store acoustic, and related, data for the language ‘fr’ in the acoustic model infrastructure is half finished in that obtaining the acoustic data has to wait on the frontend Bug 1180682. However, the infrastructure to store such data is provided by the s3 bucket s3://acoustic-models.
The “workflows" to create acoustic models from raw audio data for apropos speech recognition engines a. Workflow, hand triggered scripts, to create acoustic models for Kaldi recognition engine b. Workflow, hand triggered scripts, to create acoustic models for Pocketsphinx recognition engine is described for Pocketsphinx here[1]. The workflow for Kaldi will be created in future. As Kaldi is currently not in use, its workflow can safely be delayed. [1] http://bit.ly/1MdsnCd
With regards to workflows an Amazon AMI ami-f75e40c7 has been created that contains: a. All software required to adopt acoustic models, in particular: 1. sphinx_fe 2. pocketsphinx_mdef_convert 3. bw 4. mllr_solve 5. map_adapt 6. mk_s2sendump b. Base acoustic models and dictionaries for the languages 1. en-gb 2. en-za 3. en 4. es-ar 5. es-cl 6. es-es 7. es-mx 8. es 9. fr-nl 10. fr c. Field Id's of the texts to be read and texts to be read for all the above languages in the expected format[1]. [1] http://bit.ly/1MdsnCd
With regards to workflows, a repository[1] containing the scripts required to adapt the models has been created. This repository contains instructions on how to adapt a model. [1] http://bit.ly/1OuMLlz
In completion of this bug the following tasks have been completed: 1. A acoustic model infrastructure, S3 bucket "acoustic-models", was created that: a. Stores acoustic data and related data, e.g. the audio transcript - The audio transcripts that the contributors should read are stored in the following files: s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani.transcription where each line of the file corresponds to a single utterance a contributor should read. The texts are from the EuroParl corpus[1] which is not under any copyright restrictions[1]. Each line of vaani.transcription has a unique id identified by the correspond- ing line of s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani.fileids When the frontend is completed it should write to the folder s3://acoustic-models/raw/<BCP 47 Language Tag>/ for each utterance a 16Hz wav file in mono. The file should be named after the unique id of the utterance. So, the frontend will produce files of the form s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000001.wav s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000002.wav s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000003.wav s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000004.wav ... In addition to the audio data and audio transcript the folder also has an acoustic model which is to be refined using the utterances. This is in the directory s3://acoustic-models/raw/<BCP 47 Language Tag>/<BCP 47 Language Tag>/ It also contains a phonetic dictionary s3://acoustic-models/raw/<BCP 47 Language Tag>/cmudict-<BCP 47 Language Tag>.dict for testing the adapted model. The models and dictionaries are already contained in FxOS and/or in our language packs and have the proper licences. b. Serves acoustic data and related data, e.g. the audio transcript - The permissions of the audio transcript, acoustic data, and acoustic models will be made public as the S3 bucket is switched from my personal AWS account to that of Mozilla. Until then, access can be obtained by request. 2. Created resources to collect acoustic data for "en", "es", "fr", and variants: a. Created texts to be read for "en" and variants - English texts to be read were cre- ated from the EuroParl corpus[1] which is not under any copyright restrictions[1]. They are all in S3 in the following files s3://acoustic-models/raw/en/vaani.transcription s3://acoustic-models/raw/en-gb/vaani.transcription s3://acoustic-models/raw/en-za/vaani.transcription the format of each line is as follows <s> A "yes" vote is a vote to reunite Europe. </s> (vaani_0000001) The text contained within the <s>...</s> tags is the utterance and the identifier of the utterance is contained in (...), in this case vaani_0000001. b. Created texts to be read for "es" and variants - Spanish texts to be read were cre- ated from the EuroParl corpus[1] which is not under any copyright restrictions[1]. They are all in S3 in the following files s3://acoustic-models/raw/es/vaani.transcription s3://acoustic-models/raw/es-ar/vaani.transcription s3://acoustic-models/raw/es-cl/vaani.transcription s3://acoustic-models/raw/es-es/vaani.transcription s3://acoustic-models/raw/es-mx/vaani.transcription the format of each line is as follows <s> El "sí" es el voto en favor de la reunificación de Europa. </s> (vaani_0000001) The text contained within the <s>...</s> tags is the utterance and the identifier of the utterance is contained in (...), in this case vaani_0000001. c. Created texts to be read for "fr" and variants - French texts to be read were cre- ated from the EuroParl corpus[1] which is not under any copyright restrictions[1]. They are all in S3 in the following files s3://acoustic-models/raw/fr/vaani.transcription s3://acoustic-models/raw/fr-nl/vaani.transcription the format of each line is as follows <s> Voter "oui", c'est voter pour la réunification de l'Europe. </s> (vaani_0000001) The text contained within the <s>...</s> tags is the utterance and the identifier of the utterance is contained in (...), in this case vaani_0000001. In addition, the texts are parallel. This means texts with the same identifier in different languages are translations of one another. This will help us when we, possibly in future, decide to create an open source translation engine. 3. "Workflows" to create acoustic models from raw audio data for the Pocketsphinx speech recognition engines were created. a. A repository "acoustic-model-generator" was created https://github.com/kdavis-mozilla/acoustic-model-generator that contains the bash scripts required to refine the various acoustic models. The repository README.md contains instructions on its use. b. An Amazon AMI "pocketsphinx-acoustic-model-adaptor" was created that contains all the required software to refine the various acoustic models. This software is as follows: i. Sphinxbase - Which is on github[2] and under a license[3] which allows for Sphinxbase to be included in open source software, in fact it is already part of FxOS. ii. Pocketsphinx - Which is on github[4] and under a license[5] which allows for Pocketsphinx to be included in open source software, in fact it is already part of FxOS. iii. Sphinxtrain - Which is on github[6] and under a license[7] which allows for Sphinxtrain to be used in open source software, in fact the license is the same as that of Pocketsphinx and Sphinxbase. In addition this AMI contains acoustic models to be refined. These acoustic models are those that are already part of FxOS, either built in or on Market- place. Thus, they have the proper licenses. [1] http://www.statmt.org/europarl/ [2] https://github.com/cmusphinx/sphinxbase [3] https://github.com/cmusphinx/sphinxbase/blob/master/LICENSE [4] https://github.com/cmusphinx/pocketsphinx [5] https://github.com/cmusphinx/pocketsphinx/blob/master/LICENSE [6] https://github.com/cmusphinx/sphinxtrain [7] https://github.com/cmusphinx/sphinxtrain/blob/master/LICENSE
Is the work of comment 12 ok?
Flags: needinfo?(anatal)
That's great Kelly, thanks. So let's deploy this AMI into an EC2 and move the speech collection server side to it?
Flags: needinfo?(anatal)
(In reply to Andre Natal from comment #14) > That's great Kelly, thanks. > > So let's deploy this AMI into an EC2 and move the speech collection server > side to it? I am fine with collecting MozSpeech data on S3, but I want to wait until they transfer the billing from my credit card to Mozilla's. I have an open ServiceNow ticket RITM0039203 which when done will transfer billing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: