Closed
Bug 1180687
Opened 10 years ago
Closed 9 years ago
Complete Acoustic Model Back-end of "Community Portion" functionality in Vaani
Categories
(Cloud Services :: General, defect)
Cloud Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kdavis, Assigned: kdavis)
References
Details
(Whiteboard: [webspeechapi][vaani][systemsfe])
Attachments
(4 files)
Complete Acoustic Model Back-end of "Community Portion" functionality in Vaani:
1. Create acoustic model infrastructure that
a. Stores acoustic data and related data, e.g. the audio transcript
b. Serves acoustic data and related data, e.g. the audio transcript
2. Create resources to collect acoustic data for ‘en’ and ‘es’
a. Create texts to be read for ‘en’
b. Create texts to be read for ‘es'
3. Obtain acoustic data to create acoustic models from
a. Obtain acoustic, and related, data for the language 'en'
b. Store acoustic, and related, data for the language ‘en’ in the acoustic model infrastructure
c. Obtain acoustic, and related, data for the language ‘es'
d. Store acoustic, and related, data for the language ‘ es’ in the acoustic model infrastructure
4. Create “workflows" to create acoustic models from raw audio data for apropos speech recognition engines
a. Create workflow, hand triggered scripts, to create acoustic models for Kaldi recognition engine
b. Create workflow, hand triggered scripts, to create acoustic models for Pocketsphinx recognition engine
Field Id's of the texts to be read, see http://bit.ly/1MdsnCd
English texts to be read, see http://bit.ly/1MdsnCd
French texts to be read, see http://bit.ly/1MdsnCd
Spanish texts to be read, see http://bit.ly/1MdsnCd
The acoustic model infrastructure that
a. Stores acoustic data and related data, e.g. the audio transcript
b. Serves acoustic data and related data, e.g. the audio transcript
is provided by the s3 bucket s3://acoustic-models
The resources to collect acoustic data for ‘en’, ‘es’, and 'fr'
a. Texts to be read for ‘en’
b. Texts to be read for ‘es'
c. Texts to be read for ‘fr'
are attached to this bug and in the s3 bucket s3://acoustic-models
The acoustic data and storage infrastructure to:
a. Obtain acoustic, and related, data for the language 'en'
b. Store acoustic, and related, data for the language ‘en’ in the acoustic model infrastructure
c. Obtain acoustic, and related, data for the language ‘es'
d. Store acoustic, and related, data for the language ‘es’ in the acoustic model infrastructure
e. Obtain acoustic, and related, data for the language ‘fr'
f. Store acoustic, and related, data for the language ‘fr’ in the acoustic model infrastructure
is half finished in that obtaining the acoustic data has to wait
on the frontend Bug 1180682. However, the infrastructure to store
such data is provided by the s3 bucket s3://acoustic-models.
The “workflows" to create acoustic models from raw audio data for apropos speech recognition engines
a. Workflow, hand triggered scripts, to create acoustic models for Kaldi recognition engine
b. Workflow, hand triggered scripts, to create acoustic models for Pocketsphinx recognition engine
is described for Pocketsphinx here[1]. The workflow for Kaldi will be created in future. As Kaldi
is currently not in use, its workflow can safely be delayed.
[1] http://bit.ly/1MdsnCd
Assignee | ||
Comment 10•9 years ago
|
||
With regards to workflows an Amazon AMI ami-f75e40c7 has been created that contains:
a. All software required to adopt acoustic models, in particular:
1. sphinx_fe
2. pocketsphinx_mdef_convert
3. bw
4. mllr_solve
5. map_adapt
6. mk_s2sendump
b. Base acoustic models and dictionaries for the languages
1. en-gb
2. en-za
3. en
4. es-ar
5. es-cl
6. es-es
7. es-mx
8. es
9. fr-nl
10. fr
c. Field Id's of the texts to be read and texts to be read for all the above
languages in the expected format[1].
[1] http://bit.ly/1MdsnCd
Assignee | ||
Comment 11•9 years ago
|
||
With regards to workflows, a repository[1] containing the scripts required to adapt
the models has been created. This repository contains instructions on how to adapt
a model.
[1] http://bit.ly/1OuMLlz
Assignee | ||
Comment 12•9 years ago
|
||
In completion of this bug the following tasks have been completed:
1. A acoustic model infrastructure, S3 bucket "acoustic-models", was created that:
a. Stores acoustic data and related data, e.g. the audio transcript - The audio
transcripts that the contributors should read are stored in the following
files:
s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani.transcription
where each line of the file corresponds to a single utterance a contributor
should read. The texts are from the EuroParl corpus[1] which is not under
any copyright restrictions[1].
Each line of vaani.transcription has a unique id identified by the correspond-
ing line of
s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani.fileids
When the frontend is completed it should write to the folder
s3://acoustic-models/raw/<BCP 47 Language Tag>/
for each utterance a 16Hz wav file in mono. The file should be named after
the unique id of the utterance. So, the frontend will produce files of the
form
s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000001.wav
s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000002.wav
s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000003.wav
s3://acoustic-models/raw/<BCP 47 Language Tag>/vaani_0000004.wav
...
In addition to the audio data and audio transcript the folder also has
an acoustic model which is to be refined using the utterances. This is
in the directory
s3://acoustic-models/raw/<BCP 47 Language Tag>/<BCP 47 Language Tag>/
It also contains a phonetic dictionary
s3://acoustic-models/raw/<BCP 47 Language Tag>/cmudict-<BCP 47 Language Tag>.dict
for testing the adapted model. The models and dictionaries are already contained
in FxOS and/or in our language packs and have the proper licences.
b. Serves acoustic data and related data, e.g. the audio transcript - The permissions
of the audio transcript, acoustic data, and acoustic models will be made public as
the S3 bucket is switched from my personal AWS account to that of Mozilla. Until
then, access can be obtained by request.
2. Created resources to collect acoustic data for "en", "es", "fr", and variants:
a. Created texts to be read for "en" and variants - English texts to be read were cre-
ated from the EuroParl corpus[1] which is not under any copyright restrictions[1].
They are all in S3 in the following files
s3://acoustic-models/raw/en/vaani.transcription
s3://acoustic-models/raw/en-gb/vaani.transcription
s3://acoustic-models/raw/en-za/vaani.transcription
the format of each line is as follows
<s> A "yes" vote is a vote to reunite Europe. </s> (vaani_0000001)
The text contained within the <s>...</s> tags is the utterance and the identifier
of the utterance is contained in (...), in this case vaani_0000001.
b. Created texts to be read for "es" and variants - Spanish texts to be read were cre-
ated from the EuroParl corpus[1] which is not under any copyright restrictions[1].
They are all in S3 in the following files
s3://acoustic-models/raw/es/vaani.transcription
s3://acoustic-models/raw/es-ar/vaani.transcription
s3://acoustic-models/raw/es-cl/vaani.transcription
s3://acoustic-models/raw/es-es/vaani.transcription
s3://acoustic-models/raw/es-mx/vaani.transcription
the format of each line is as follows
<s> El "sí" es el voto en favor de la reunificación de Europa. </s> (vaani_0000001)
The text contained within the <s>...</s> tags is the utterance and the identifier
of the utterance is contained in (...), in this case vaani_0000001.
c. Created texts to be read for "fr" and variants - French texts to be read were cre-
ated from the EuroParl corpus[1] which is not under any copyright restrictions[1].
They are all in S3 in the following files
s3://acoustic-models/raw/fr/vaani.transcription
s3://acoustic-models/raw/fr-nl/vaani.transcription
the format of each line is as follows
<s> Voter "oui", c'est voter pour la réunification de l'Europe. </s> (vaani_0000001)
The text contained within the <s>...</s> tags is the utterance and the identifier
of the utterance is contained in (...), in this case vaani_0000001.
In addition, the texts are parallel. This means texts with the same identifier
in different languages are translations of one another. This will help us when
we, possibly in future, decide to create an open source translation engine.
3. "Workflows" to create acoustic models from raw audio data for the Pocketsphinx
speech recognition engines were created.
a. A repository "acoustic-model-generator" was created
https://github.com/kdavis-mozilla/acoustic-model-generator
that contains the bash scripts required to refine the various acoustic models.
The repository README.md contains instructions on its use.
b. An Amazon AMI "pocketsphinx-acoustic-model-adaptor" was created that contains
all the required software to refine the various acoustic models. This software
is as follows:
i. Sphinxbase - Which is on github[2] and under a license[3] which allows for
Sphinxbase to be included in open source software, in fact it is already
part of FxOS.
ii. Pocketsphinx - Which is on github[4] and under a license[5] which allows for
Pocketsphinx to be included in open source software, in fact it is already
part of FxOS.
iii. Sphinxtrain - Which is on github[6] and under a license[7] which allows for
Sphinxtrain to be used in open source software, in fact the license is the
same as that of Pocketsphinx and Sphinxbase.
In addition this AMI contains acoustic models to be refined. These acoustic
models are those that are already part of FxOS, either built in or on Market-
place. Thus, they have the proper licenses.
[1] http://www.statmt.org/europarl/
[2] https://github.com/cmusphinx/sphinxbase
[3] https://github.com/cmusphinx/sphinxbase/blob/master/LICENSE
[4] https://github.com/cmusphinx/pocketsphinx
[5] https://github.com/cmusphinx/pocketsphinx/blob/master/LICENSE
[6] https://github.com/cmusphinx/sphinxtrain
[7] https://github.com/cmusphinx/sphinxtrain/blob/master/LICENSE
Comment 14•9 years ago
|
||
That's great Kelly, thanks.
So let's deploy this AMI into an EC2 and move the speech collection server side to it?
Updated•9 years ago
|
Flags: needinfo?(anatal)
Assignee | ||
Comment 15•9 years ago
|
||
(In reply to Andre Natal from comment #14)
> That's great Kelly, thanks.
>
> So let's deploy this AMI into an EC2 and move the speech collection server
> side to it?
I am fine with collecting MozSpeech data on S3, but I want to wait until they transfer the
billing from my credit card to Mozilla's. I have an open ServiceNow ticket RITM0039203 which
when done will transfer billing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•