Add Xhosa dictionary/wordlist

RESOLVED WONTFIX

Status

Firefox OS
Gaia::Keyboard
RESOLVED WONTFIX
4 years ago
2 years ago

People

(Reporter: delphine, Assigned: Kevin Scannell)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments)

164.49 KB, application/x-zip-compressed
Details
631.93 KB, application/x-zip-compressed
Details
86.59 KB, application/x-zip-compressed
Details
305.28 KB, application/x-zip-compressed
Details
9.96 KB, application/x-zip-compressed
Details
3.95 KB, application/x-zip-compressed
Details
Xhosa shipping on 2.0. Needs dictionary/wordlist.
Friedel, Kevin: can you help out on this as well? 
thanks! :)
[Blocking Requested - why for this release]:
As per our call with Bus Dev, Rel Man and l10n team this morning, it is confirmed that this is needed on 2.0 (and onwards). Thus nominating for 2.0 work
blocking-b2g: --- → 2.0?

Updated

4 years ago
blocking-b2g: 2.0? → 2.0+

Comment 2

4 years ago
This is a complex problem, just as with Zulu (bug 1113395) and Swahili (bug 1098502). I'm not sure there is a quick solution for 2.0.

Kevin do you think it is worthwhile to simply brute force it to try and get something minimally useful together? I think I'll have to rely on you for a good corpus. I can add some of what I have here, but it won't be enough on its own. I'm not sure how well it would work... maybe we can discuss outside the bug if you have time.
(Assignee)

Comment 3

4 years ago
(In reply to Friedel Wolff from comment #2)
 
> Kevin do you think it is worthwhile to simply brute force it to try and get
> something minimally useful together?

I don't think it's worth doing by brute force.  I know you understand these issues well but let me attach some numbers to this for the people working on FxOS who might not be familiar with Bantu languages.  

Here's a quick experiment one can do for any language: take a sample corpus of, say, 250k words in the language and create a autocorrect dictionary from that - just include all the words you see.  Then choose a disjoint test corpus with ~25k words and count the percentage of words that are recognized by the autocorrect dictionary.  For web text in English, Spanish, Italian, French, etc. this number hovers around 96-97%.   Turkish, Basque, Finnish, Hungarian, Georgian are between 87-90% which is bordering on unusable (> 1 out of every 10 words unrecognized).  When I do this for Zulu I get 82% and Xhosa 81%, more like 1 out of every 5 words unrecognized.
Hi Kevin, Friedel, 

Thanks for your input. Then in your opinion, is it feasible to have something basic ready for 2.0? 

David
I don't think we can do anything that can be beneficial for the user in v2.0 time frame, per comment 3. I recommend we remove the 2.0 requirement for this feature, and properly work on this in a later release.

Updated

3 years ago
Blocks: 1129838
Hey Kevin and Friedel,
I know this is kind of last minute, but do you think we could have something ready by this upcoming Friday? We might also be able to find someone who can help you out as well if needed.
And if this is really unrealistic date, can you please take this work up so that it will be available in the future? thanks!
Flags: needinfo?(kscanne)
Flags: needinfo?(friedel)
(Assignee)

Comment 7

3 years ago
See comment 3.  In short, anything we do via the existing autocorrect engine is going to be virtually unusable for Xhosa and Zulu.
Flags: needinfo?(kscanne)
Flags: needinfo?(friedel)
Is it possible to generate a dictionary/wordlist in the meantime, while we figure out internally the engine issue? (sorry if I've missed something, this is really tricky for me ;) )

Comment 9

3 years ago
Dear Delphine,
I just discussed with Wesly and I think we need to rejustify whether this is really needed in 2.0 for Fire E according to launch plan of https://mana.mozilla.org/wiki/display/PM/T2M. Wesly also mentioned he will discuss with partner to see whether they can create Xhosa dictionary/wordlist by themselves. 

I agree we should try to have our own Xhosa dictionary/wordlist thus I am nominating this to 2.2?

Dear Howie,
Please help to triage this bug for 2.2 Thanks!
blocking-b2g: 2.0+ → 2.2?
Flags: needinfo?(hochang)
Josh: according to https://mana.mozilla.org/wiki/display/PM/Firefox+OS+Wave+Launch+Cross+Functional+View and our weekly discussion with Business development and PMs, Xhosa is committed to 2.0.
David Palomino mentioned he was supposed to hand this to partners last Friday (comment 6). I have to admit I don't understand why we're still going back and forth on this and can't define clearly the scope, after multiple conversations. If this has changed, then the mana has to be updated.
Flagging Karen Ward and David Palomino so they can confirm the scope of this and advise on how to go forward. thanks
Flags: needinfo?(kward)
Flags: needinfo?(dpalomino.bugzilla)
Hi Delphine, 

There have been no changes in the schedule for the launch (apart from a couple of days as we're closing some details regarding preload of apps). We just needed to confirm that everything was ready from the l10n part. TCL will generate the build in one or two days.

And a BIG thanks to all of you for the effort committing the South African languages on time to launch with the partners. This for sure will help a lot to the launch (for product, marketing, etc).

Thanks!
David
Flags: needinfo?(dpalomino.bugzilla)
clearing ni
Flags: needinfo?(kward)
hi David. Basically this bug means that there will not be a dictionary/wordlist available for Xhosa on time for the launch. Is that ok? Just want to make sure we'll are on the same page. thanks
Flags: needinfo?(dpalomino.bugzilla)
Hi Delphine, 

I think we have no choice here, IMO is not a blocking issue, but it is something that it'd be very nice to have in the future, so agree to have this in 2.2. If there would be also some plans regarding 2.1 I'll let you know. 

Thanks!
David
Flags: needinfo?(dpalomino.bugzilla)

Comment 15

3 years ago
Per David's non-blocking comment #14, removing the 2.2? nom.
blocking-b2g: 2.2? → ---
(In reply to Stephany Wilkes from comment #15)
> Per David's non-blocking comment #14, removing the 2.2? nom.

Hi Stephany, 

I meant in comment #14 that we cannot delay 2.0 launch because of not having the wordlist for Xhosa, but definitely we're missing a lot that functionality. I think we need to include this in future releases, restoring 2.2? nom

Cheers, 
David
blocking-b2g: --- → 2.2?
Hi,

As commented in bug #1113395, we'd need this for 2.2 (even not having committed launches for 2.2 yet). Just copying here the comment.  

South Africa is one of our tier 1 countries, so it is expected to continue the work there with 2.2. 

The problem is that the timing managed by carriers and OEMs is different than ours, and when they will decide to go with 2.2, it will be probably too late to include this in 2.2, or even 2.2 would be closed. 

Please, let me know if we need to include this info in mana to get the 2.2+ (I think it can add some confusion, I'd prefer not to include this if it's not needed). 

Cheers, 
David

Comment 18

3 years ago
Triage: Not blocking, it's too late for 2.2 feature. But to keep moving forward.
blocking-b2g: 2.2? → -
tracking-b2g: --- → +
Flags: needinfo?(hochang)

Comment 19

3 years ago
Created attachment 8571935 [details]
firefoxos_2.0-xh.zip

Kevin,
FYI
Spoke offline with Josh, explained Howie's concern. This needs min. engineering resource and has min. risk to land on 2.2. Patch done by community/contractors, just needs to land once it's there
Renominating for 2.2. Thanks!
blocking-b2g: - → 2.2?

Comment 21

3 years ago
Thanks for the clarification on Comment 20, blocking as 2.2+
blocking-b2g: 2.2? → 2.2+
tracking-b2g: + → ---

Comment 22

3 years ago
Delphine, which patch are you referring to? I didn't see any mention of a patch, just repeated comments that this is currently an unsolved problem.
Flags: needinfo?(lebedel.delphine)

Updated

3 years ago
Assignee: nobody → ian.henderson

Comment 23

3 years ago
Created attachment 8572536 [details]
firefox-xh.zip

FF37 corpus text

Comment 24

3 years ago
Created attachment 8572537 [details]
mobile-xh.zip

FF37 corpus text

Comment 25

3 years ago
Created attachment 8572538 [details]
mozilla_lang-xh.zip

FF37 corpus text

Comment 26

3 years ago
I have added various xh corpus files. That is about as much as we are able to do.
Assignee: ian.henderson → kscanne
Depends on: 1139255
See Also: bug 1139255
(Friedel: meant when patch will be there. Was talking quickly and on multiple bugs ;) )
Flags: needinfo?(lebedel.delphine)

Comment 28

3 years ago
Created attachment 8573934 [details]
fireplace-xh.zip

Comment 29

3 years ago
Created attachment 8573935 [details]
spartacus-xh.zip

Comment 30

3 years ago
This keyboard is confirmed complete: http://www.101languages.net/xhosa/keyboard/

Comment 32

3 years ago
So are we ready to land and close this one?
Flags: needinfo?(ian.henderson)
Hi Howie: Ian actually doesn't land stuff, he works with Kevin Scannell on generating wordlists for this. Also, there's no patch in this bug.
I think this is currently still a WIP.
Also, as per comment 7, we need to work on an autocorrect engine to make this happen. We're looking into resolving this issue

Updated

3 years ago
Flags: needinfo?(ian.henderson)
Moving this out of 2.2 as our engine needs an update.  Bug 1139255 will need to be completed first.
blocking-b2g: 2.2+ → ---
(Assignee)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.