Closed
Bug 896363
Opened 11 years ago
Closed 11 years ago
[ca] Catalan wordlist of text prediction
Categories
(Firefox OS Graveyard :: Gaia::Keyboard, defect)
Firefox OS Graveyard
Gaia::Keyboard
Tracking
(blocking-b2g:leo+, b2g18 verified, b2g-v1.1hd fixed)
RESOLVED
FIXED
blocking-b2g | leo+ |
People
(Reporter: Pike, Assigned: djf)
References
Details
Attachments
(1 file)
Kevin, can you help with this? We're looking to add Catalan to fx os, and need the text prediction stuff, of course.
Comment 1•11 years ago
|
||
Only to remark that prediction text is not mandatory for 1.1, we can include it in later reelases.
Comment 2•11 years ago
|
||
Glad to help - I can probably have something by the end of the week.
Comment 3•11 years ago
|
||
Here's a draft of the file needed for predictive text in Catalan: http://borel.slu.edu/obair/ca.zip It's based on a corpus of ~45 million words of Catalan crawled from the web. I only kept words that are accepted by v. 2.5.0 of the Firefox spellchecking addon: https://addons.mozilla.org/en-us/firefox/addon/general-catalan-dictionary/ Let me know how this looks.
Comment 4•11 years ago
|
||
(In reply to Kevin P. Scannell from comment #3) > Here's a draft of the file needed for predictive text in Catalan: > > http://borel.slu.edu/obair/ca.zip > > It's based on a corpus of ~45 million words of Catalan crawled from the web. > I only kept words that are accepted by v. 2.5.0 of the Firefox spellchecking > addon: > > https://addons.mozilla.org/en-us/firefox/addon/general-catalan-dictionary/ > > Let me know how this looks. That's awesome, Kevin. Joan, can you test it? I'll try it as well.
Reporter | ||
Comment 5•11 years ago
|
||
Hi David, as you mentioned on the mailing list that you'd take this, I'm assigning it to you.
Assignee: nobody → dflanagan
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Toni Hermoso Pulido from comment #4) > > That's awesome, Kevin. Joan, can you test it? I'll try it as well. Toni and Joan: note that there isn't yet anything to test, unless you want to just look at Kevin's word list. I now need to take that wordlist, convert it to a binary dictionary and create a patch to add the dictionary to Gaia. Nominating this bug for 1.1 because I've heard rumors that Leo will turn off auto-correction by default unless Catalan is supported.
blocking-b2g: --- → leo?
Comment 7•11 years ago
|
||
Kavin, thanks for this great word list!!! I will check it, but reading in plain text, I'm sure you've done a really good job, :) Can we use it in other opensource projects like Android? David, thanks for info. I will do some minor tests for its use in Catalan (l·l digraph, apostrophe and hyphen), but this frequency word list is the best available under an open-source licence and it's a very good starting point.
Comment 8•11 years ago
|
||
(In reply to Joan Montané from comment #7) > Kavin, thanks for this great word list!!! I will check it, but reading in > plain text, I'm sure you've done a really good job, :) Can we use it in > other opensource projects like Android? > Yes, feel free to use under any open source license you like.
Comment 9•11 years ago
|
||
(In reply to David Flanagan [:djf] from comment #6) > (In reply to Toni Hermoso Pulido from comment #4) > > > > That's awesome, Kevin. Joan, can you test it? I'll try it as well. > > > Toni and Joan: note that there isn't yet anything to test, unless you want > to just look at Kevin's word list. I now need to take that wordlist, > convert it to a binary dictionary and create a patch to add the dictionary > to Gaia. > > Nominating this bug for 1.1 because I've heard rumors that Leo will turn off > auto-correction by default unless Catalan is supported. Hi David, just in case, before you start preparing the binary, Kevin is generating new versions from Joan (and Jaume, not in Cc) feedback. We will comment back, hopefully soon, when there is a new version.
Comment 10•11 years ago
|
||
Since we're attempting to ship Catalan as part of 1.1, this is leo+ for now at least.
blocking-b2g: leo? → leo+
Comment 11•11 years ago
|
||
Hi, Kevin has build a new version for Catalan predictive list: http://borel.slu.edu/obair/ca-v3.zip It's much better than the 1st one. So, if possible, replace 1st list with this last one.
Assignee | ||
Comment 12•11 years ago
|
||
Rudy, This patch adds a Catalan wordlist and dictionary, and includes a trivial change to layout.js to associate the dictionary with the already-existing Catalan keyboard layout.
Attachment #782714 -
Flags: review?(rlu)
Assignee | ||
Comment 13•11 years ago
|
||
(In reply to Joan Montané from comment #11) > Hi, > > Kevin has build a new version for Catalan predictive list: > > http://borel.slu.edu/obair/ca-v3.zip > > It's much better than the 1st one. So, if possible, replace 1st list with > this last one. The patch above is based on this latest version of the wordlist.
blocking-b2g: leo+ → leo?
Comment 14•11 years ago
|
||
After some testing (I generated a ca.dic) and uploaded in a Unagi, I must tell that experience is really good and I'd say that is suitable to be included. The only issue is with words with l·l (goril·la, tranquil·litat, paral·lel), which seem not to be suggested if 'l·l' is entered from 'alt l' (3 chars in one). No problem if this is entered as 3 chars one after the other (· is alt of .)
Comment 15•11 years ago
|
||
Comment on attachment 782714 [details] link to patch on github Looks good, r=me. I have seen what Toni mentioned in Comment 14, but I think that could be handled by a follow-up bug.
Attachment #782714 -
Flags: review?(rlu) → review+
Assignee | ||
Comment 16•11 years ago
|
||
Toni, Thanks for reporting the issues with l·l. It looks like there is an issue with all alternate keys that have more than one character: none of them get sent to the input method at all, do not interact with auto-correct, and put the input method into an inconsistent state, breaking future auto-correct. I'm going to fix it as part of this bug because it already has leo+, and it is a serious bug that needs to be fixed.
Assignee | ||
Comment 17•11 years ago
|
||
I notice that at the beginning of a sentence, l·l gets capitalized to L·L, but Wikipedia tells me that L·l is correct. I'll make sure this gets fixed, too.
Comment 18•11 years ago
|
||
(In reply to David Flanagan [:djf] from comment #17) > I notice that at the beginning of a sentence, l·l gets capitalized to L·L, > but Wikipedia tells me that L·l is correct. I'll make sure this gets fixed, > too. I'm not fully sure about this. Joan could tell more. Where is it said in Wikipedia? Actually, in Catalan wikipedia, the main Wikipedia entry is 'L·L' http://ca.wikipedia.org/wiki/L%C2%B7L In any case, there is not any single word starting with 'l·l', so the dilemma of L·l vs L·L would never happen.
Updated•11 years ago
|
blocking-b2g: leo? → leo+
Assignee | ||
Comment 19•11 years ago
|
||
Comment on attachment 782714 [details]
link to patch on github
Rudy,
I've added a new commit to the PR to correctly handle l.l (and other multi-character alternatives) and to correctly capitalize them.
l.l will capitalize to L.l normally, but to L.L if caps lock is on. This seems like the right thing to me. I don't think any of our other keyboard layouts have similar cases. Other multi-character alternatives are already in uppercase (like R$) or begin with a digit (like 3rd) or are in the alt layout without a shift key and can't be upper-cased.
You may notice that this patch does not affect the ".com" key on the URL keyboard. That one emits lowercase ".com" regardless of the uppercase or caps lock state of the keyboard. That is because of line 888 in getUpperCaseValue(). Do you think I should change it so that if caps lock is on the .com key emits .COM?
Attachment #782714 -
Flags: review+ → review?(rlu)
Comment 20•11 years ago
|
||
(In reply to David Flanagan [:djf] from comment #19) > Comment on attachment 782714 [details] > link to patch on github > > Rudy, > > I've added a new commit to the PR to correctly handle l.l (and other > multi-character alternatives) and to correctly capitalize them. > > l.l will capitalize to L.l normally, but to L.L if caps lock is on. This > seems like the right thing to me. I don't think any of our other keyboard > layouts have similar cases. Other multi-character alternatives are already > in uppercase (like R$) or begin with a digit (like 3rd) or are in the alt > layout without a shift key and can't be upper-cased. > > You may notice that this patch does not affect the ".com" key on the URL > keyboard. That one emits lowercase ".com" regardless of the uppercase or > caps lock state of the keyboard. That is because of line 888 in > getUpperCaseValue(). Do you think I should change it so that if caps lock is > on the .com key emits .COM? I think we don't have to. I checked my iphone and it won't output .COM even when the uppercase/capsLock is on. Thanks for handling this.
Comment 21•11 years ago
|
||
Comment on attachment 782714 [details]
link to patch on github
This looks really great, r+.
Thanks again.
Attachment #782714 -
Flags: review?(rlu) → review+
Assignee | ||
Comment 22•11 years ago
|
||
Landed on master: https://github.com/mozilla-b2g/gaia/commit/247eec1e8cf4a5ccb4077ecc3e48dd3bda6ef108
Assignee | ||
Updated•11 years ago
|
Assignee | ||
Comment 23•11 years ago
|
||
This patch does not apply cleanly to v1-train. It looks like we've got to at least uplift some previous fix that added the Catalan keyboard layout. I didn't realize that wasn't already in v1-train. Setting needinfo on myself so I don't forget about uplifting this bug now that it has been closed.
Flags: needinfo?(dflanagan)
Assignee | ||
Comment 24•11 years ago
|
||
I've uplifted bug 866746 to v1-train, adding the Catalan keyboard layout, so this patch should uplift much more cleanly now.
Assignee | ||
Comment 25•11 years ago
|
||
uplifted to v1-train: https://github.com/mozilla-b2g/gaia/commit/d98a10641f1c6d87b5eb9914cde23e836d1d03c7
Comment 26•11 years ago
|
||
Checking in a unagi build. This works nice!
Comment 27•11 years ago
|
||
Verified on Leo V1.1 MOZ RIL, Catalan text prediction is working as expected Environmental Variables: Build ID: 20130806071254 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/a2a9b89ef5ee Gaia: 4c1a20570e20f64782ba170c14604395c48f7381 Platform Version: 18.1
Comment 28•11 years ago
|
||
v1.1.0hd: d98a10641f1c6d87b5eb9914cde23e836d1d03c7 v1.1.0hd: 5c2bf86ec9fde0c52a92abf4afdc0575c01389a7
You need to log in
before you can comment on or make changes to this bug.
Description
•