Closed Bug 1124459 (Yoruba-WordPrediction) Opened 9 years ago Closed 8 years ago

Add Yoruba (yo) wordlist/dictionary

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: delphine, Unassigned, NeedInfo)

References

Details

Yoruba word prediction needed from 2.0 and onwards
Alias: Yoruba-WordPrediction
Adding Kevin and localizers who might be able to help with word prediction input. thanks!
Flags: needinfo?(soki2ng)
As no response from localizer, asking Devon and Ian from Rubric to pitch in here.
thanks!
Flags: needinfo?(ian.henderson)
Flags: needinfo?(devon.bezuidenhout)
Flags: needinfo?(ian.henderson)
(In reply to Delphine Lebédel [:delphine - use need info] from comment #2)
> As no response from localizer, asking Devon and Ian from Rubric to pitch in
> here.
> thanks!

Once we have the FFOS translated, we can supply the translated po files to Kevin as per Bug 1121734.
Comment offline from Dwayne: "The other issue is languages that use tone marking (Youruba) i.e. lots of diacritics or where the wordlist is very dirty i.e. usually a bad input where people use ASCII equivalents for missing letter .  I think it would be good to query Kevin and see if anything is needed in code.  Kevin's home grown scripts might be enough.  But I think it would be helpful if we're thinking on how to more easily involve the community in a process of iteratively improving these wordlists."
Flagging Kevin so he can speak to this, and what we might need to think on this before anything else?
Flags: needinfo?(kscanne)
I have code for statistical diacritic restoration (http://sourceforge.net/projects/lingala/files/charlifter/) but Yoruba is exceptionally challenging (a) because it uses so many tone marks and (b) there is virtually no training data and no free word list with all of the correct marks.  So that approach is probably a dead end in this case (as would be any approach that relies on web crawling).  My best suggestion would be to partner with a language expert like Tunde Adegbola who's already worked on stuff like this to see if a good word list can be assembled manually.
Flags: needinfo?(kscanne)
Hello Kevin,

Have you looked at: http://yo.wikipedia.org/wiki/Oj%C3%BAew%C3%A9_%C3%80k%E1%BB%8D%CC%81k%E1%BB%8D%CC%81

You can use Wikipedia as a corpus source for many languages.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.