Closed Bug 890896 Opened 12 years ago Closed 12 years ago

Autocorrect gives different scores when capitalized is true

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: janjongboom, Unassigned)

Details

Attachments

(2 files)

Consider the following scenario's: - Input 'City'. Results: [ 'City', 23.759999999999998 ], [ 'City\'s', 3.0096000000000003 ], [ 'Citywide', 0.30412800000000006 ] - Input 'city'. Results: [ 'city', 24 ], [ 'city\'s', 3.0400000000000005 ], [ 'citywide', 0.3072000000000001 ] These words should generate the same scores I would say. Some words are worse than others, f.e. (with my profanity patch): - 'Penis': [ 'Panis', 1.3860000000000001 ], [ 'Punish', 1.1880000000000002 ], [ 'Ponies', 0.7722 ] - 'penis': [ 'penis', 1 ]
Attached file Worker_tests diff
Here are tests to be added when the worker_tests are merged into master
Attached file Patch
We give different results back at the moment if the first character is a capital. I don't think we should do that, because in 99% of the case the capital is not selected by the user but rather the result of auto-capitalization. In this patch we'll always look for the lowercase version of the word so we get the same results in both cases. :RudyL I think this is the behavior as it should be but feel free to voice your ideas.
Attachment #772046 - Flags: review?(rlu)
Attachment #772046 - Flags: review?(rlu) → review?(dflanagan)
Comment on attachment 772046 [details] [review] Patch The patch looks okay, though a more thorough way to do it would be to modify the variants[] array if case differences are no longer going to be taken into account. But then I suppose you'd have to convert the entire string to lowercase. I'm giving r- here, though, because I'm not convinced that there is actually a bug. It is intentional that the scores are slightly different. Mis-matched case is covered by the variant form multiplier and the weight is multiplied by .99. The idea being that exact matches are slightly favored over inexact matches. Most of the time (as in your first example) the suggested words are exactly the same despite the slightly different weights. If the user does explicitly use the shift key, then we ought to give preference to capitalized proper nouns. I would welcome a patch that undoes the effect of auto-capitalization in latin.js and sends a lower-case string to the prediction engine in that case. But when the user explicitly shifts case, it would be nice to honor that in the predictions if we can. Or, if you want to land this patch, please give more examples of where the current behavior gives sub-optimal corrections. If there really is a problem here, we should fix it. I'm just not convinced yet.
Attachment #772046 - Flags: review?(dflanagan) → review-
I'm pretty sure there is a bug, because f.e. in the example of city, all variants are lower case in the dictionary, but the score differs in a non-structured way (City is lower, City's is lower, Citywide is higher). This gets worse for profane words because they will never be matched due to low frequency; e.g. unable to type a word considered profane when autocapitalize is on. The idea of fixing this in latin.js might be nicer, but there was already code covering lcase/ucase in the predictions worker. I'll see if I can move it.
I was playing around with frequency levels and things get worse there: ◦ [keyboard] Capital input should give capital output: { cmd: 'predictions', input: 'City', suggestions: [ [ 'City', 23.759999999999998 ], [ 'City\'s', 3.168 ], [ 'Cite', 0.4752 ] ] } 2) [keyboard] Capital input should give capital output ◦ [keyboard] Non-Capital input should give non-capital output: { cmd: 'predictions', input: 'city', suggestions: [ [ 'city', 24 ], [ 'cite', 0.48 ], [ 'city\'s', 0.096 ] ] } So now the difference gets really high...
Hmm, I think that `Citywide` might actually be correct. There is `Cityscape` (with capital C) in the dictionary and s and w are close to each other. That's why it's overruling here. So that's one mystery solved :-)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
Attachment mime type: text/plain → text/x-github-pull-request
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: