[ca] Catalan wordlist of text prediction

RESOLVED FIXED

Status

Firefox OS
Gaia::Keyboard
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: Pike, Assigned: djf)

Tracking

unspecified
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:leo+, b2g18 verified, b2g-v1.1hd fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
Kevin, can you help with this?

We're looking to add Catalan to fx os, and need the text prediction stuff, of course.
Only to remark that prediction text is not mandatory for 1.1, we can include it in later reelases.

Comment 2

5 years ago
Glad to help - I can probably have something by the end of the week.

Comment 3

5 years ago
Here's a draft of the file needed for predictive text in Catalan:

http://borel.slu.edu/obair/ca.zip

It's based on a corpus of ~45 million words of Catalan crawled from the web.  I only kept words that are accepted by v. 2.5.0 of the Firefox spellchecking addon:

https://addons.mozilla.org/en-us/firefox/addon/general-catalan-dictionary/

Let me know how this looks.

Comment 4

5 years ago
(In reply to Kevin P. Scannell from comment #3)
> Here's a draft of the file needed for predictive text in Catalan:
> 
> http://borel.slu.edu/obair/ca.zip
> 
> It's based on a corpus of ~45 million words of Catalan crawled from the web.
> I only kept words that are accepted by v. 2.5.0 of the Firefox spellchecking
> addon:
> 
> https://addons.mozilla.org/en-us/firefox/addon/general-catalan-dictionary/
> 
> Let me know how this looks.

That's awesome, Kevin. Joan, can you test it? I'll try it as well.
(Reporter)

Comment 5

5 years ago
Hi David, as you mentioned on the mailing list that you'd take this, I'm assigning it to you.
Assignee: nobody → dflanagan
(Assignee)

Comment 6

5 years ago
(In reply to Toni Hermoso Pulido from comment #4)
> 
> That's awesome, Kevin. Joan, can you test it? I'll try it as well.


Toni and Joan: note that there isn't yet anything to test, unless you want to just look at Kevin's word list.  I now need to take that wordlist, convert it to a binary dictionary and create a patch to add the dictionary to Gaia.

Nominating this bug for 1.1 because I've heard rumors that Leo will turn off auto-correction by default unless Catalan is supported.
blocking-b2g: --- → leo?

Comment 7

5 years ago
Kavin, thanks for this great word list!!! I will check it, but reading in plain text, I'm sure you've done a really good job, :) Can we use it in other opensource projects like Android?

David, thanks for info. I will do some minor tests for its use in Catalan (l·l digraph, apostrophe and hyphen), but this frequency word list is the best available under an open-source licence and it's a very good starting point.

Comment 8

5 years ago
(In reply to Joan Montané from comment #7)
> Kavin, thanks for this great word list!!! I will check it, but reading in
> plain text, I'm sure you've done a really good job, :) Can we use it in
> other opensource projects like Android?
> 

Yes, feel free to use under any open source license you like.

Comment 9

5 years ago
(In reply to David Flanagan [:djf] from comment #6)
> (In reply to Toni Hermoso Pulido from comment #4)
> > 
> > That's awesome, Kevin. Joan, can you test it? I'll try it as well.
> 
> 
> Toni and Joan: note that there isn't yet anything to test, unless you want
> to just look at Kevin's word list.  I now need to take that wordlist,
> convert it to a binary dictionary and create a patch to add the dictionary
> to Gaia.
> 
> Nominating this bug for 1.1 because I've heard rumors that Leo will turn off
> auto-correction by default unless Catalan is supported.

Hi David, 
just in case, before you start preparing the binary, Kevin is generating new versions from Joan (and Jaume, not in Cc) feedback.
We will comment back, hopefully soon, when there is a new version.
Since we're attempting to ship Catalan as part of 1.1, this is leo+ for now at least.
blocking-b2g: leo? → leo+

Comment 11

5 years ago
Hi,

Kevin has build a new version for Catalan predictive list:

http://borel.slu.edu/obair/ca-v3.zip

It's much better than the 1st one. So, if possible, replace 1st list with this last one.
(Assignee)

Comment 12

5 years ago
Created attachment 782714 [details]
link to patch on github

Rudy,

This patch adds a Catalan wordlist and dictionary, and includes a trivial change to layout.js to associate the dictionary with the already-existing Catalan keyboard layout.
Attachment #782714 - Flags: review?(rlu)
(Assignee)

Comment 13

5 years ago
(In reply to Joan Montané from comment #11)
> Hi,
> 
> Kevin has build a new version for Catalan predictive list:
> 
> http://borel.slu.edu/obair/ca-v3.zip
> 
> It's much better than the 1st one. So, if possible, replace 1st list with
> this last one.

The patch above is based on this latest version of the wordlist.
blocking-b2g: leo+ → leo?

Comment 14

5 years ago
After some testing (I generated a ca.dic) and uploaded in a Unagi, I must tell that experience is really good and I'd say that is suitable to be included. 
The only issue is with words with l·l (goril·la, tranquil·litat, paral·lel), which seem not to be suggested if 'l·l' is entered from 'alt l' (3 chars in one). No problem if this is entered as 3 chars one after the other (· is alt of .)
Comment on attachment 782714 [details]
link to patch on github

Looks good, r=me.

I have seen what Toni mentioned in Comment 14, but I think that could be handled by a follow-up bug.
Attachment #782714 - Flags: review?(rlu) → review+
(Assignee)

Comment 16

5 years ago
Toni,

Thanks for reporting the issues with l·l.  It looks like there is an issue with all alternate keys that have more than one character: none of them get sent to the input method at all, do not interact with auto-correct, and put the input method into an inconsistent state, breaking future auto-correct.
 
I'm going to fix it as part of this bug because it already has leo+, and it is a serious bug that needs to be fixed.
(Assignee)

Comment 17

5 years ago
I notice that at the beginning of a sentence, l·l gets capitalized to L·L, but Wikipedia tells me that L·l is correct. I'll make sure this gets fixed, too.

Comment 18

5 years ago
(In reply to David Flanagan [:djf] from comment #17)
> I notice that at the beginning of a sentence, l·l gets capitalized to L·L,
> but Wikipedia tells me that L·l is correct. I'll make sure this gets fixed,
> too.

I'm not fully sure about this. Joan could tell more. Where is it said in Wikipedia? Actually, in Catalan wikipedia, the main Wikipedia entry is 'L·L' http://ca.wikipedia.org/wiki/L%C2%B7L 
In any case, there is not any single word starting with 'l·l', so the dilemma of L·l vs L·L would never happen.

Updated

5 years ago
blocking-b2g: leo? → leo+
(Assignee)

Comment 19

5 years ago
Comment on attachment 782714 [details]
link to patch on github

Rudy,

I've added a new commit to the PR to correctly handle l.l (and other multi-character alternatives) and to correctly capitalize them.

l.l will capitalize to L.l normally, but to L.L if caps lock is on.  This seems like the right thing to me. I don't think any of our other keyboard layouts have similar cases.  Other multi-character alternatives are already in uppercase (like R$) or begin with a digit (like 3rd) or are in the alt layout without a shift key and can't be upper-cased.

You may notice that this patch does not affect the ".com" key on the URL keyboard. That one emits lowercase ".com" regardless of the uppercase or caps lock state of the keyboard.  That is because of line 888 in getUpperCaseValue(). Do you think I should change it so that if caps lock is on the .com key emits .COM?
Attachment #782714 - Flags: review+ → review?(rlu)
(In reply to David Flanagan [:djf] from comment #19)
> Comment on attachment 782714 [details]
> link to patch on github
> 
> Rudy,
> 
> I've added a new commit to the PR to correctly handle l.l (and other
> multi-character alternatives) and to correctly capitalize them.
> 
> l.l will capitalize to L.l normally, but to L.L if caps lock is on.  This
> seems like the right thing to me. I don't think any of our other keyboard
> layouts have similar cases.  Other multi-character alternatives are already
> in uppercase (like R$) or begin with a digit (like 3rd) or are in the alt
> layout without a shift key and can't be upper-cased.
> 
> You may notice that this patch does not affect the ".com" key on the URL
> keyboard. That one emits lowercase ".com" regardless of the uppercase or
> caps lock state of the keyboard.  That is because of line 888 in
> getUpperCaseValue(). Do you think I should change it so that if caps lock is
> on the .com key emits .COM?

I think we don't have to.
I checked my iphone and it won't output .COM even when the uppercase/capsLock is on. 

Thanks for handling this.
Comment on attachment 782714 [details]
link to patch on github

This looks really great, r+.
Thanks again.
Attachment #782714 - Flags: review?(rlu) → review+
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
status-b2g18: --- → affected
Resolution: --- → FIXED
(Assignee)

Comment 23

5 years ago
This patch does not apply cleanly to v1-train. It looks like we've got to at least uplift some previous fix that added the Catalan keyboard layout. I didn't realize that wasn't already in v1-train.

Setting needinfo on myself so I don't forget about uplifting this bug now that it has been closed.
Flags: needinfo?(dflanagan)
(Assignee)

Updated

5 years ago
Depends on: 866746
(Assignee)

Comment 24

5 years ago
I've uplifted bug 866746 to v1-train, adding the Catalan keyboard layout, so this patch should uplift much more cleanly now.
(Assignee)

Comment 25

5 years ago
uplifted to v1-train: https://github.com/mozilla-b2g/gaia/commit/d98a10641f1c6d87b5eb9914cde23e836d1d03c7
status-b2g18: affected → fixed
status-b2g-v1.1hd: --- → affected
Flags: needinfo?(dflanagan)

Comment 26

5 years ago
Checking in a unagi build. This works nice!

Updated

5 years ago
Depends on: 900355

Updated

5 years ago
Whiteboard: [LeoVB+]

Updated

5 years ago
Whiteboard: [LeoVB+]

Comment 27

5 years ago
Verified on Leo V1.1 MOZ RIL,
Catalan text prediction is working as expected

Environmental  Variables:
Build ID: 20130806071254
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/a2a9b89ef5ee
Gaia: 4c1a20570e20f64782ba170c14604395c48f7381
Platform Version: 18.1
status-b2g18: fixed → verified
v1.1.0hd: d98a10641f1c6d87b5eb9914cde23e836d1d03c7
v1.1.0hd: 5c2bf86ec9fde0c52a92abf4afdc0575c01389a7
status-b2g-v1.1hd: affected → fixed
You need to log in before you can comment on or make changes to this bug.