Closed
Bug 1184849
Opened 10 years ago
Closed 10 years ago
Build ARPA models for G2P from French and Spanish LM models
Categories
(Core :: Web Speech, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: anatal, Assigned: anatal)
References
Details
(Whiteboard: [webspeechapi][vaani])
We should build ARPA models from French and Spanish language models to allow G2P work on these languages.
Assignee | ||
Comment 1•10 years ago
|
||
This is the repo [1] with the instructions to generate the arpa models for Spanish, French and English.
And these are the models for Spanish[2], French[3], and English[4]
[1] https://github.com/mozilla/g2p
[2] https://github.com/mozilla/g2p/tree/master/dicts/spanish
[3] https://github.com/mozilla/g2p/tree/master/dicts/french
[4] https://github.com/mozilla/g2p/tree/master/dicts/english
Assignee | ||
Comment 2•10 years ago
|
||
kdavis and smaug, these instructions to generate the models, plus the models itself, are enough to you?
Flags: needinfo?(kdavis)
Flags: needinfo?(bugs)
(In reply to Andre Natal from comment #2)
> kdavis and smaug, these instructions to generate the models, plus the models
> itself, are enough to you?
I'll comment on the instructions in a bit, but one thing I noticed is the license of srilm,
the so-called "SRILM Research Community License"[1].
This looks to be relatively non-standard. I am wondering about the legal issues involved
in us using srilm. I'd guess gerv should take a look.
[1] https://github.com/mozilla/g2p/blob/master/deps/srilm/License
Flags: needinfo?(gerv)
(In reply to Andre Natal from comment #2)
> kdavis and smaug, these instructions to generate the models, plus the models
> itself, are enough to you?
A few points:
1. OpenFST says to configure and install OpenFST one should run:
localhost:openfst-1.4.1 kdavis$ ./configure
localhost:openfst-1.4.1 kdavis$ make
localhost:openfst-1.4.1 kdavis$ make install
while Phonetisaurus says to configure and OpenFST one should run:
localhost:openfst-1.4.1 kdavis$ ./configure --enable-far
localhost:openfst-1.4.1 kdavis$ make
localhost:openfst-1.4.1 kdavis$ make install
which should it be? Are finite-state archive's on or off?
2. Not really something you have control over, but Phonetisaurus
Makefile does not pick up standard include paths:
localhost:src kdavis$ make
g++ -O2 -I3rdparty/sparsehash -I3rdparty/utfcpp -c Phonetisaurus.cpp -o Phonetisaurus.o
Phonetisaurus.cpp:34:10: fatal error: 'fst/fstlib.h' file not found
#include <fst/fstlib.h>
^
1 error generated.
make: *** [Phonetisaurus.o] Error 1
localhost:src kdavis$ ls -l /usr/local/include/fst/fst.h
-rw-r--r-- 1 root admin 30539 Jul 27 15:07 /usr/local/include/fst/fst.h
3. Again, not really something you have control over, but Phonetisaurus,
more specifically its dependency sparsehash, expects TR1 extensions
to be installed and not just standard C++11
localhost:src kdavis$ make
g++ -O2 -I/usr/local/include -I3rdparty/sparsehash -I3rdparty/utfcpp -c Phonetisaurus.cpp -o Phonetisaurus.o
In file included from Phonetisaurus.cpp:39:
In file included from ./Phonetisaurus.hpp:36:
In file included from ./MBRDecoder.hpp:33:
3rdparty/sparsehash/google/dense_hash_map:106:10: fatal error: 'tr1/functional' file not found
#include HASH_FUN_H // defined in config.h
^
3rdparty/sparsehash/google/sparsehash/sparseconfig.h:10:20: note: expanded from macro 'HASH_FUN_H'
#define HASH_FUN_H <tr1/functional>
^
1 error generated.
make: *** [Phonetisaurus.o] Error 1
[I will assume all of the compilation problems are solvable and just continue as if they were]
4. Could you better explain the steps in "Install Sphinxbase"? I assume the steps are:
1. Download Sphinxbase from (https://github.com/cmusphinx/sphinxbase/tree/18aec4d11c5fc724a15f899bc1222bfcfe589def)
2. Build Sphinxbase using the instructions from (http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx)
If this is the case, it's just slightly confusing as the first link has sources *and*
instructions on how to install Sphinxbase and the second link is titled "Building
application with pocketsphinx", which doesn't seem like something we want to do in
this process, but it *also* has instructions on how to install Sphinxbase.
[1] http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
Flags: needinfo?(kdavis)
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to kdavis from comment #4)
> (In reply to Andre Natal from comment #2)
> > kdavis and smaug, these instructions to generate the models, plus the models
> > itself, are enough to you?
>
> A few points:
>
> 1. OpenFST says to configure and install OpenFST one should run:
>
> localhost:openfst-1.4.1 kdavis$ ./configure
> localhost:openfst-1.4.1 kdavis$ make
> localhost:openfst-1.4.1 kdavis$ make install
>
> while Phonetisaurus says to configure and OpenFST one should run:
>
> localhost:openfst-1.4.1 kdavis$ ./configure --enable-far
> localhost:openfst-1.4.1 kdavis$ make
> localhost:openfst-1.4.1 kdavis$ make install
>
> which should it be? Are finite-state archive's on or off?
>
--enable-far should be on. I fixed on the documentation.
>
> 2. Not really something you have control over, but Phonetisaurus
> Makefile does not pick up standard include paths:
>
> localhost:src kdavis$ make
> g++ -O2 -I3rdparty/sparsehash -I3rdparty/utfcpp -c Phonetisaurus.cpp
> -o Phonetisaurus.o
> Phonetisaurus.cpp:34:10: fatal error: 'fst/fstlib.h' file not found
> #include <fst/fstlib.h>
> ^
> 1 error generated.
> make: *** [Phonetisaurus.o] Error 1
> localhost:src kdavis$ ls -l /usr/local/include/fst/fst.h
> -rw-r--r-- 1 root admin 30539 Jul 27 15:07
> /usr/local/include/fst/fst.h
>
Yes, I noticed that, one should add these folder to ld.so.conf and run lddconfig to update the bindings.
> 3. Again, not really something you have control over, but Phonetisaurus,
> more specifically its dependency sparsehash, expects TR1 extensions
> to be installed and not just standard C++11
>
> localhost:src kdavis$ make
> g++ -O2 -I/usr/local/include -I3rdparty/sparsehash -I3rdparty/utfcpp
> -c Phonetisaurus.cpp -o Phonetisaurus.o
> In file included from Phonetisaurus.cpp:39:
> In file included from ./Phonetisaurus.hpp:36:
> In file included from ./MBRDecoder.hpp:33:
> 3rdparty/sparsehash/google/dense_hash_map:106:10: fatal error:
> 'tr1/functional' file not found
> #include HASH_FUN_H // defined in config.h
> ^
> 3rdparty/sparsehash/google/sparsehash/sparseconfig.h:10:20: note:
> expanded from macro 'HASH_FUN_H'
> #define HASH_FUN_H <tr1/functional>
> ^
> 1 error generated.
> make: *** [Phonetisaurus.o] Error 1
>
Yes, I updated on the doc.
> [I will assume all of the compilation problems are solvable and just
> continue as if they were]
>
> 4. Could you better explain the steps in "Install Sphinxbase"? I assume the
> steps are:
>
> 1. Download Sphinxbase from
> (https://github.com/cmusphinx/sphinxbase/tree/
> 18aec4d11c5fc724a15f899bc1222bfcfe589def)
> 2. Build Sphinxbase using the instructions from
> (http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx)
>
> If this is the case, it's just slightly confusing as the first link has
> sources *and*
> instructions on how to install Sphinxbase and the second link is titled
> "Building
> application with pocketsphinx", which doesn't seem like something we want
> to do in
> this process, but it *also* has instructions on how to install Sphinxbase.
>
> [1]
> http://sourceforge.net/projects/cmusphinx/files/
> Acoustic%20and%20Language%20Models/
I removed the link to the instructions about building and left the link to the right commit.
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to kdavis from comment #3)
> (In reply to Andre Natal from comment #2)
> > kdavis and smaug, these instructions to generate the models, plus the models
> > itself, are enough to you?
>
> I'll comment on the instructions in a bit, but one thing I noticed is the
> license of srilm,
> the so-called "SRILM Research Community License"[1].
>
> This looks to be relatively non-standard. I am wondering about the legal
> issues involved
> in us using srilm. I'd guess gerv should take a look.
>
>
> [1] https://github.com/mozilla/g2p/blob/master/deps/srilm/License
Yes. We are not distributing it, but yes distributing the product generated by it. Maybe we shouldn't put the sources on the repo?
If this license is restricting about doing that, we can replace it by mitlm (https://code.google.com/p/mitlm/). Gerv, please advise us.
Comment 7•10 years ago
|
||
(In reply to Andre Natal from comment #6)
> > [1] https://github.com/mozilla/g2p/blob/master/deps/srilm/License
This license is not open source. We can't check in or ship code under it.
> Yes. We are not distributing it, but yes distributing the product generated
> by it. Maybe we shouldn't put the sources on the repo?
The code is only licensed for particular purposes - see 1.12. If you were to use it for purposes not listed, you would be using it in contravention of its license. I'm not going to give an opinion in a public non-legally-privileged bug on whether or not we would be contravening the license by using this code to generate stuff for Mozilla, but:
> If this license is restricting about doing that, we can replace it by mitlm
> (https://code.google.com/p/mitlm/).
I suggest you do this, for the sake of clarity if nothing else :-)
Gerv
Flags: needinfo?(gerv)
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Gervase Markham [:gerv] from comment #7)
> (In reply to Andre Natal from comment #6)
> > > [1] https://github.com/mozilla/g2p/blob/master/deps/srilm/License
>
> This license is not open source. We can't check in or ship code under it.
>
> > Yes. We are not distributing it, but yes distributing the product generated
> > by it. Maybe we shouldn't put the sources on the repo?
>
> The code is only licensed for particular purposes - see 1.12. If you were to
> use it for purposes not listed, you would be using it in contravention of
> its license. I'm not going to give an opinion in a public
> non-legally-privileged bug on whether or not we would be contravening the
> license by using this code to generate stuff for Mozilla, but:
>
> > If this license is restricting about doing that, we can replace it by mitlm
> > (https://code.google.com/p/mitlm/).
>
> I suggest you do this, for the sake of clarity if nothing else :-)
>
> Gerv
Thank you Gerv. I updated the whole toolchain to use mitlm.
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(bugs)
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•