Closed Bug 1119426 Opened 9 years ago Closed 8 years ago

Add Xhosa dictionary/wordlist

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: delphine, Assigned: kscanne)

References

Details

Attachments

(6 files)

164.49 KB, application/x-zip-compressed
Details
631.93 KB, application/x-zip-compressed
Details
86.59 KB, application/x-zip-compressed
Details
305.28 KB, application/x-zip-compressed
Details
9.96 KB, application/x-zip-compressed
Details
3.95 KB, application/x-zip-compressed
Details
Xhosa shipping on 2.0. Needs dictionary/wordlist.
Friedel, Kevin: can you help out on this as well? 
thanks! :)
[Blocking Requested - why for this release]:
As per our call with Bus Dev, Rel Man and l10n team this morning, it is confirmed that this is needed on 2.0 (and onwards). Thus nominating for 2.0 work
blocking-b2g: --- → 2.0?
blocking-b2g: 2.0? → 2.0+
This is a complex problem, just as with Zulu (bug 1113395) and Swahili (bug 1098502). I'm not sure there is a quick solution for 2.0.

Kevin do you think it is worthwhile to simply brute force it to try and get something minimally useful together? I think I'll have to rely on you for a good corpus. I can add some of what I have here, but it won't be enough on its own. I'm not sure how well it would work... maybe we can discuss outside the bug if you have time.
(In reply to Friedel Wolff from comment #2)
 
> Kevin do you think it is worthwhile to simply brute force it to try and get
> something minimally useful together?

I don't think it's worth doing by brute force.  I know you understand these issues well but let me attach some numbers to this for the people working on FxOS who might not be familiar with Bantu languages.  

Here's a quick experiment one can do for any language: take a sample corpus of, say, 250k words in the language and create a autocorrect dictionary from that - just include all the words you see.  Then choose a disjoint test corpus with ~25k words and count the percentage of words that are recognized by the autocorrect dictionary.  For web text in English, Spanish, Italian, French, etc. this number hovers around 96-97%.   Turkish, Basque, Finnish, Hungarian, Georgian are between 87-90% which is bordering on unusable (> 1 out of every 10 words unrecognized).  When I do this for Zulu I get 82% and Xhosa 81%, more like 1 out of every 5 words unrecognized.
Hi Kevin, Friedel, 

Thanks for your input. Then in your opinion, is it feasible to have something basic ready for 2.0? 

David
I don't think we can do anything that can be beneficial for the user in v2.0 time frame, per comment 3. I recommend we remove the 2.0 requirement for this feature, and properly work on this in a later release.
Hey Kevin and Friedel,
I know this is kind of last minute, but do you think we could have something ready by this upcoming Friday? We might also be able to find someone who can help you out as well if needed.
And if this is really unrealistic date, can you please take this work up so that it will be available in the future? thanks!
Flags: needinfo?(kscanne)
Flags: needinfo?(friedel)
See comment 3.  In short, anything we do via the existing autocorrect engine is going to be virtually unusable for Xhosa and Zulu.
Flags: needinfo?(kscanne)
Flags: needinfo?(friedel)
Is it possible to generate a dictionary/wordlist in the meantime, while we figure out internally the engine issue? (sorry if I've missed something, this is really tricky for me ;) )
Dear Delphine,
I just discussed with Wesly and I think we need to rejustify whether this is really needed in 2.0 for Fire E according to launch plan of https://mana.mozilla.org/wiki/display/PM/T2M. Wesly also mentioned he will discuss with partner to see whether they can create Xhosa dictionary/wordlist by themselves. 

I agree we should try to have our own Xhosa dictionary/wordlist thus I am nominating this to 2.2?

Dear Howie,
Please help to triage this bug for 2.2 Thanks!
blocking-b2g: 2.0+ → 2.2?
Flags: needinfo?(hochang)
Josh: according to https://mana.mozilla.org/wiki/display/PM/Firefox+OS+Wave+Launch+Cross+Functional+View and our weekly discussion with Business development and PMs, Xhosa is committed to 2.0.
David Palomino mentioned he was supposed to hand this to partners last Friday (comment 6). I have to admit I don't understand why we're still going back and forth on this and can't define clearly the scope, after multiple conversations. If this has changed, then the mana has to be updated.
Flagging Karen Ward and David Palomino so they can confirm the scope of this and advise on how to go forward. thanks
Flags: needinfo?(kward)
Flags: needinfo?(dpalomino.bugzilla)
Hi Delphine, 

There have been no changes in the schedule for the launch (apart from a couple of days as we're closing some details regarding preload of apps). We just needed to confirm that everything was ready from the l10n part. TCL will generate the build in one or two days.

And a BIG thanks to all of you for the effort committing the South African languages on time to launch with the partners. This for sure will help a lot to the launch (for product, marketing, etc).

Thanks!
David
Flags: needinfo?(dpalomino.bugzilla)
clearing ni
Flags: needinfo?(kward)
hi David. Basically this bug means that there will not be a dictionary/wordlist available for Xhosa on time for the launch. Is that ok? Just want to make sure we'll are on the same page. thanks
Flags: needinfo?(dpalomino.bugzilla)
Hi Delphine, 

I think we have no choice here, IMO is not a blocking issue, but it is something that it'd be very nice to have in the future, so agree to have this in 2.2. If there would be also some plans regarding 2.1 I'll let you know. 

Thanks!
David
Flags: needinfo?(dpalomino.bugzilla)
Per David's non-blocking comment #14, removing the 2.2? nom.
blocking-b2g: 2.2? → ---
(In reply to Stephany Wilkes from comment #15)
> Per David's non-blocking comment #14, removing the 2.2? nom.

Hi Stephany, 

I meant in comment #14 that we cannot delay 2.0 launch because of not having the wordlist for Xhosa, but definitely we're missing a lot that functionality. I think we need to include this in future releases, restoring 2.2? nom

Cheers, 
David
blocking-b2g: --- → 2.2?
Hi,

As commented in bug #1113395, we'd need this for 2.2 (even not having committed launches for 2.2 yet). Just copying here the comment.  

South Africa is one of our tier 1 countries, so it is expected to continue the work there with 2.2. 

The problem is that the timing managed by carriers and OEMs is different than ours, and when they will decide to go with 2.2, it will be probably too late to include this in 2.2, or even 2.2 would be closed. 

Please, let me know if we need to include this info in mana to get the 2.2+ (I think it can add some confusion, I'd prefer not to include this if it's not needed). 

Cheers, 
David
Triage: Not blocking, it's too late for 2.2 feature. But to keep moving forward.
blocking-b2g: 2.2? → -
tracking-b2g: --- → +
Flags: needinfo?(hochang)
Attached file firefoxos_2.0-xh.zip
Kevin,
FYI
Spoke offline with Josh, explained Howie's concern. This needs min. engineering resource and has min. risk to land on 2.2. Patch done by community/contractors, just needs to land once it's there
Renominating for 2.2. Thanks!
blocking-b2g: - → 2.2?
Thanks for the clarification on Comment 20, blocking as 2.2+
blocking-b2g: 2.2? → 2.2+
tracking-b2g: + → ---
Delphine, which patch are you referring to? I didn't see any mention of a patch, just repeated comments that this is currently an unsolved problem.
Flags: needinfo?(lebedel.delphine)
Assignee: nobody → ian.henderson
Attached file firefox-xh.zip
FF37 corpus text
Attached file mobile-xh.zip
FF37 corpus text
Attached file mozilla_lang-xh.zip
FF37 corpus text
I have added various xh corpus files. That is about as much as we are able to do.
Assignee: ian.henderson → kscanne
Depends on: 1139255
See Also: 1139255
(Friedel: meant when patch will be there. Was talking quickly and on multiple bugs ;) )
Flags: needinfo?(lebedel.delphine)
Attached file fireplace-xh.zip
Attached file spartacus-xh.zip
This keyboard is confirmed complete: http://www.101languages.net/xhosa/keyboard/
So are we ready to land and close this one?
Flags: needinfo?(ian.henderson)
Hi Howie: Ian actually doesn't land stuff, he works with Kevin Scannell on generating wordlists for this. Also, there's no patch in this bug.
I think this is currently still a WIP.
Also, as per comment 7, we need to work on an autocorrect engine to make this happen. We're looking into resolving this issue
Flags: needinfo?(ian.henderson)
Moving this out of 2.2 as our engine needs an update.  Bug 1139255 will need to be completed first.
blocking-b2g: 2.2+ → ---
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: