Closed
Bug 1138782
Opened 10 years ago
Closed 10 years ago
validChars() in predictions.js should consider different casing
Categories
(Firefox OS Graveyard :: Gaia::Keyboard, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
2.2 S9 (3apr)
People
(Reporter: mnjul, Assigned: mnjul)
References
Details
(Whiteboard: [p=2])
Attachments
(2 files, 1 obsolete file)
This came as frustration during my work in bug 1102835: validChars() in predictions.js currently doesn't accept variant forms (different casing or accentedness) of characters in the input.
As input may be substituted with variant forms during process(), we should accept them.
This issue is especially prominent when dealing with user dictionaries: Suppose that there is only one single word in user dictionary: levicorpus[*], the character table will only contain the lowercase version of L. If user types Levic with the intention to type Levicorpus (such as at the beginning of a sentence), the predictions engine for user dictionary rejects the input right away with validChars() even though Levicorpus will be suggested should it get passed to processCandidates() -- and it's the only one that gets suggested and that matches the user's intention, at that.
Of course this change will modify the behavior of built-in dictionary too, but:
1. Built-in dictionary character table is much much less likely to have only a specific casing of a character [†].
2. I actually think it has benefits for built-in dictionary that we don't reject potentially viable candidates here at validChars().
[*] A spell from Harry Potter for that matter
[†] Since our substitution rule says that we can't substitute an accented version of a char with a plain version of a char without counting it as a correction, we'll actually only consider different casing here, as "ç" should not be valid for "c".
Assignee | ||
Comment 1•10 years ago
|
||
(In reply to John Lu [:mnjul] [MoCoTPE] from comment #0)
> This came as frustration during my work in bug 1102835: validChars() in
> predictions.js currently doesn't accept variant forms (different casing or
> accentedness) of characters in the input.
>
> As input may be substituted with variant forms during process(), we should
> accept them.
>
> This issue is especially prominent when dealing with user dictionaries:
> Suppose that there is only one single word in user dictionary:
> levicorpus[*], the character table will only contain the lowercase version
> of L. If user types Levic with the intention to type Levicorpus (such as at
> the beginning of a sentence), the predictions engine for user dictionary
> rejects the input right away with validChars() even though Levicorpus will
> be suggested should it get passed to processCandidates() -- and it's the
> only one that gets suggested and that matches the user's intention, at that.
>
> Of course this change will modify the behavior of built-in dictionary too,
> but:
> 1. Built-in dictionary character table is much much less likely to have only
> a specific casing of a character [†].
> 2. I actually think it has benefits for built-in dictionary that we don't
> reject potentially viable candidates here at validChars().
>
> [*] A spell from Harry Potter for that matter
> [†] Since our substitution rule says that we can't substitute an accented
> version of a char with a plain version of a char without counting it as a
> correction, we'll actually only consider different casing here, as "ç"
> should not be valid for "c".
Neh, that's wrong. If user dictionary has only "Çelik", we shouldn't reject "celik". This adds complexity :/
Comment 2•10 years ago
|
||
Comment 3•10 years ago
|
||
Thanks for investigating, I noticed this behavior before but didn't know this was the cause. I thought it was low frequency or whatever.
Assignee | ||
Comment 4•10 years ago
|
||
Comment on attachment 8571792 [details] [review]
[gaia] mnjul:bug_1138782_validchar_consider_variants > mozilla-b2g:master
Jan, please check the patch out. Thanks!
Attachment #8571792 -
Flags: review?(janjongboom)
Comment 5•10 years ago
|
||
Wow, I just started looking through the code today to see how I could fix this for Greek. I'm so happy that it's already being taken care of!
Comment 6•10 years ago
|
||
I'm trying to see this problem on action on gaia master but I cannot find any case where the suggestions go wrong. F.e. based on your description I'd figure that (en_us) <w f="83" flags="">émigré</w> would be problematic but it gets shown as suggestion for any input like 'emigre', 'Emigre', 'Emigré', etc.
Can you find a use case reproducible on master?
Flags: needinfo?(jlund)
Comment 7•10 years ago
|
||
I can:
1) enable the Greek keyboard
2) open the contacts app and click the + icon to add a contact
3) click on the first name and type: Κάθ
4) see how Κάθε (every) is suggested
5) now delete that and type Καθ instead
6) observe that Κάθε is no longer suggested. Even if you type Καθε, it still won't appear in the list
I admit I haven't tried this patch, but hopefully it takes care of this issue.
Comment 8•10 years ago
|
||
I'm guessing you mean to ni jlu@mozilla.com :)
/me 302's on that assumption
Flags: needinfo?(jlund) → needinfo?(jlu)
Assignee | ||
Comment 9•10 years ago
|
||
(In reply to Jan Jongboom [:janjongboom] (Telenor) from comment #6)
> I'm trying to see this problem on action on gaia master but I cannot find
> any case where the suggestions go wrong. F.e. based on your description I'd
> figure that (en_us) <w f="83" flags="">émigré</w> would be problematic but
> it gets shown as suggestion for any input like 'emigre', 'Emigre', 'Emigré',
> etc.
>
> Can you find a use case reproducible on master?
Uhm...not really. since é, e, and E are all in the dictionary's character table, the inputs you said will be accepted normally. It can become problematic if dictionary does not have 'E' or 'e' at all, and only has 'é'.
Welllllll....this is a little bit embarrassing. I dug through all the latin dictionaries and layouts and I believe all of them are "well-behaved" and cannot exhibit this buggy behavior. ("Well-behaved" is that all words in the dict file can be input'ed by corresponding layout definition's keys.)
And this bug might only be relevant when we need to predict from user dictionary; with that said, though, applying the bug's patch doesn't affect the existing behavior with built-in dictionaries either (since all dicts+layouts are "well-behaved".))
So I guess this fix should be folded into bug 1102835. Jan, what do you think about this?
Flags: needinfo?(jlu) → needinfo?(janjongboom)
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Panos Astithas [:past] from comment #7)
> I can:
> 1) enable the Greek keyboard
> 2) open the contacts app and click the + icon to add a contact
> 3) click on the first name and type: Κάθ
> 4) see how Κάθε (every) is suggested
> 5) now delete that and type Καθ instead
> 6) observe that Κάθε is no longer suggested. Even if you type Καθε, it still
> won't appear in the list
>
> I admit I haven't tried this patch, but hopefully it takes care of this
> issue.
Unfortunately that's a different issue. But rest assured, I've got the cause identified. Please see bug 1145955.
Comment 11•10 years ago
|
||
Nah, I don't mind that it's only relevant to user dictionaries. Just need a reproducible path. Let's keep it in this bug :-)
Assignee | ||
Comment 12•10 years ago
|
||
Alright, you need to reproduce the issue with bug 1102835's patch at https://github.com/mozilla-b2g/gaia/pull/28648/commits . The patch has got four commits, one being from this bug.
A1) Please first flash your Gaia at the tip commit of that PR branch.
A2) With user dictionary settings (Settings -> Keyboards -> Built-in Keyboard -> User dictionary) add a user word, say, "émigra".
A3) In SMS app's message field, type "Emig" or "Emigr", observe that Émigra is suggested (see attached screenshot).
B1) Now revert the bug 1138782's commit at that branch, which is currently c1fae95. I don't know why I got a revert conflict there but there is only one conflicting chunk, for which you can just paste what's there in master: https://github.com/mozilla-b2g/gaia/blob/master/apps/keyboard/js/imes/latin/predictions.js#L367-L391.
B2) Flash your Gaia with |make install-gaia APP=keyboard| such that the user word doesn't get erased (as with reset-gaia).
B3) In SMS app's message field, type "Emig" or "Emigr", observe that Émigra is no longer suggested.
B4) ...and this is because e is not in the user dictionary's character table so it gets rejected by |validChars()|.
Comment 13•10 years ago
|
||
Comment on attachment 8571792 [details] [review]
[gaia] mnjul:bug_1138782_validchar_consider_variants > mozilla-b2g:master
See GitHub, idea is good.
Flags: needinfo?(janjongboom)
Attachment #8571792 -
Flags: review?(janjongboom) → review-
Assignee | ||
Comment 14•10 years ago
|
||
Comment on attachment 8571792 [details] [review]
[gaia] mnjul:bug_1138782_validchar_consider_variants > mozilla-b2g:master
Hi Jan,
Thanks for the comments. I have drastically revised the patch and we now have a new Set() to contain all valid chars, inferred from the dictionary’s character table, the layout’s nearbykeys, and variants forms of them. During prediction we just check if all the characters in input string are present in the Set.
There is however something that’s not been done very nicely. As the inference of the validChars Set requires knowledge of both |nearbyKeys| and |variants|, we want to generate the Set after both setNearByKeys() and setDictionary() are called. This is currently ensured by placing the generation code at setDict(), because from the client codes we always call setNBK before setDict, and setNBK may be optional, from latin.js#activate() to latin.js#setLanguageSync() (and from unit testing codes too). Sadly, it’s quite hard to lessen this ordering constraint, as predictions.js doesn’t really have knowledge on whether it’s to be re-initialized, and whether it should wait for any potential incoming setNBK, when setDict is called (or any combination of them), since we don’t really have a clear-cut “reinitialization” command for it. IMO, any attempt to reason such reinitialization concept out of current codes wouldn’t be architecturally better than my current assumption about the order of setNBK and setDict. How do you think about this?
Anyway, please check out the patch again. Thanks!
Attachment #8571792 -
Flags: review- → review?(janjongboom)
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jlu
Status: NEW → ASSIGNED
Whiteboard: [p=2]
Target Milestone: --- → 2.2 S9 (3apr)
Comment 15•10 years ago
|
||
Comment on attachment 8571792 [details] [review]
[gaia] mnjul:bug_1138782_validchar_consider_variants > mozilla-b2g:master
Good, way cleaner.
Attachment #8571792 -
Flags: review?(janjongboom) → review+
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Updated•10 years ago
|
Keywords: checkin-needed
Comment 16•10 years ago
|
||
http://docs.taskcluster.net/tools/task-graph-inspector/#DYgR-OagTSmJry0FNUORVg
The pull request failed to pass integration tests. It could not be landed, please try again.
Assignee | ||
Comment 17•10 years ago
|
||
Turns out all the errors are not related. Let me try again...
Keywords: checkin-needed
Updated•10 years ago
|
Keywords: checkin-needed
Comment 18•10 years ago
|
||
http://docs.taskcluster.net/tools/task-graph-inspector/#4CV9geYGQJmC-EhHrqojAA
The pull request failed to pass integration tests. It could not be landed, please try again.
Updated•10 years ago
|
Keywords: checkin-needed
Comment 20•10 years ago
|
||
Pull request has landed in master: https://github.com/mozilla-b2g/gaia/commit/dc27ee5eae72f2f3e50402c36e08fb1fb0a2d749
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 21•10 years ago
|
||
Thanks Jan and Autolander(?) !
Comment 22•10 years ago
|
||
Backed out for frequent Gip failures. ~50% failure rate.
Master: https://github.com/mozilla-b2g/gaia/commit/b3b95731fa37d4f73471292b123880759972b799
https://treeherder.mozilla.org/logviewer.html#?job_id=1573118&repo=b2g-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=1573074&repo=b2g-inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(jlu)
Comment 23•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Attachment #8571792 -
Attachment is obsolete: true
Flags: needinfo?(jlu)
Assignee | ||
Comment 24•10 years ago
|
||
Timing issue. Added a guard on that. Rudy, as offline discussed, could you carry over the review+ from attachment 8571792 [details] [review]? Thanks!
Flags: needinfo?(rlu)
Comment 25•10 years ago
|
||
Comment on attachment 8583607 [details] [review]
[gaia] mnjul:bug_1138782_validchar_consider_variants > mozilla-b2g:master
Rubber stamp to carry over the r+, this is for autolander to work.
Flags: needinfo?(rlu)
Attachment #8583607 -
Flags: review+
Assignee | ||
Comment 26•10 years ago
|
||
All those Gip tests and retriggers look happy.
Keywords: checkin-needed
Updated•10 years ago
|
Keywords: checkin-needed
Comment 27•10 years ago
|
||
Pull request has landed in master: https://github.com/mozilla-b2g/gaia/commit/e04b63ca46a741ee6c228b9b86fa596d91853af9
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•