Closed Bug 1164421 Opened 6 years ago Closed 6 years ago
`im` should autocorrect to `I'm` instead of `in`
46 bytes, text/x-github-pull-request
|Details | Review|
46 bytes, text/x-github-pull-request
|Details | Review|
52.19 KB, image/png
One of the biggest painpoints right now that's shared between dogfooders is that our autocorrect *feels* off in for a wide variety of words. The biggest offender is probably `im` correcting to `in` instead of `I'm`.
Recommending blocking here because currently typing on the keyboard is an extremely frustrating experience, and this is one of the most egregious offenders.
blocking-b2g: --- → 2.2?
Not to pile on, but if we are going to make this change, can we also stop autocorrect from changing 'jam' to 'ham' all the time? 'jam' is a word. This is really hampering my ability to write jamming-related messages.
Think this is a recent regression. Last time I dogfooded I doubt this was the case.
On Firefox OS 2.1 en-US keyboard Im becomes I'm. On Firefox OS 2.2 en-US keyboard Im becomes In
Thanks Jan. In requesting a regression window here if possible between 2.1 and 2.2.
b2g-inbound Regression Window: Last Working Environmental Variables: Device: Flame 2.2 BuildID: 20141212003454 Gaia: 2e6a35dfa098aeb274835e812b18b5f6104622b0 Gecko: c73a1414a9b1 Version: 37.0a1 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 First Broken Environmental Variables: Device: Flame 2.2 BuildID: 20141212014254 Gaia: 739dd45496e9d0f1806dc67faf0244c18c5dd3ac Gecko: 6fe07f7f9c81 Version: 37.0a1 (2.2) Firmware Version: v18D-1 User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0 Last Working Gaia First Broken Gecko: Issue does NOT reproduce Gaia: 2e6a35dfa098aeb274835e812b18b5f6104622b0 Gecko: 6fe07f7f9c81 First Broken Gaia Last Working Gecko: Issue DOES reproduce Gaia: 739dd45496e9d0f1806dc67faf0244c18c5dd3ac Gecko: c73a1414a9b1 https://github.com/mozilla-b2g/gaia/compare/2e6a35dfa098aeb274835e812b18b5f6104622b0...739dd45496e9d0f1806dc67faf0244c18c5dd3ac Caused by bug 971688
Jan, can you take a look at this please? This might have been caused by the work done for bug 971688.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(janjongboom)
Triage: regression, blocking.
blocking-b2g: 2.2? → 2.2+
Hi Jan, do you mind take a look? Thank you.
Hello David, do you have any suggestion here? Fixing bug 971688 is causing this bug, and I'd say the case here is more important/frequently used than the case in bug 971688. I'd propose the back out bug 971688 if fine tuning the algorithm is not something we can easily achieve soon.
I don't think we should backout bug 971688. Jan argues convincingly in that bug that it can only make suggestions better by considering more possibilities. I'm guessing that before 971688 landed, the autocorrect code was so busy looking at words that begin with "im" that it never even noticed "in" as a possibility. That was a bug, and 971688 fixed it so that "in" could be considered. (I should flash a 2.1 build and see what the suggestions for "im" are on that release; my guess is that they do not include "in"). So now that the autocorrection code correctly finds "I'm" and "in" as the two best possible matches for "im" the issue is that our weighting is off and we're picking the wrong one. I'll take this and investigate whether a simple weighting tweak can fix it.
Assignee: nobody → dflanagan
I've confirmed the hypothesis above. In 2.1, typing "im" suggests "I'm", "IMF" and "IMG". It doesn't even consider "in" as an option.
When we type "im", the prediction engine returns these results: [["in",21.743896484375],["I'm",16.929],["km",9.737735704393822],["um",9.118408203125]] If I type "Im", it returns: [["In",21.52645751953125],["I'm",17.099999999999998],["Km",9.737735704393822],["Um",9.118408203125]] I'm surprised that this would be different by such a small amount. I'm surprised that we autocorrect to "in", given that "I'm" is also a highly-weighted candidate. Even if we don't autocorrect to I'm, we should at least not autocorrect to "in". So that is another minimal approach we could take here is tuning the threshold for deciding when to autocorrect instead of just suggesting.
Here's the relevant code in latin.js for how we decide to autocorrect: // Don't offer to autocorrect unless we're reasonably certain that the // user wants this correction. The first suggested word must be at least // this much more highly weighted than the second suggested word. var AUTO_CORRECT_THRESHOLD = 1.30; 21.53 / 17.1 is 1.26, which is less than this threshold, so this should not have been an autocorrect in the first place. Something is going wrong here.
Nevermind. I misremembered. If the user's input is not a valid word, then we always autocorrect to a valid word. It is only when we're overriding one valid word with an autocorrection that we apply that threshold value.
Looking at the english wordlist now. Part of the problem here is that "in" is the 5th most common word in english, with a weight of 210. "I'm" is down below the 5000th word, and has a weight of 116. So it would take a pretty big adjustment to our weight to overcome that difference. Here's a thought, though. English apostrophes are special, and users should never have to type them. So in an important sense, when the user types "im" we should treat that as having typed a valid word even though it does not exactly match any of the suggestions. If we did that, then the threshold thing above would come into play and I'm would be the autocorrection. I'll see if I can do that in latin.js, which has the benefit of not needing to mess with predictions.js
'id' currently autocorrects to 'is'. By the logic of this bug, it should autocorrect to "I'd" instead. According to the wordlist "I'd" is much more common than "I'm" so autocorrecting "id" to "is" is even worse than autocorrecting "im" to "in".
(In reply to David Flanagan [:djf] from comment #17) > 'id' currently autocorrects to 'is'. By the logic of this bug, it should > autocorrect to "I'd" instead. According to the wordlist "I'd" is much more > common than "I'm" so autocorrecting "id" to "is" is even worse than > autocorrecting "im" to "in". for the record, I do actually find this really annoying too - especially if I capitalise the 'i' and it still auto-corrects to is (and same with Im auto-correcting to In).
Comment on attachment 8609016 [details] [review] [gaia] davidflanagan:1164421 > mozilla-b2g:master This patch makes "im" autocorrect to "I'm" and "id" autocorrect to "I'd". Rudy: can you review this, please, or find a reviewer? (Note that it is a 2.2+ bug) Jan: if you have time to take a look, your feedback on this (or your review if you want to steal it from Rudy) would be great.
Comment on attachment 8609016 [details] [review] [gaia] davidflanagan:1164421 > mozilla-b2g:master I could rubber stamp on this patch since you understand latin.js much more than I do. :) Thanks for your help to fix this blocker.
Attachment #8609016 - Flags: review?(rlu) → review+
Comment on attachment 8609016 [details] [review] [gaia] davidflanagan:1164421 > mozilla-b2g:master [Approval Request Comment] [Bug caused by] (feature/regressing bug #): 971688 [User impact] if declined: really frustrating autocorrect behavior where correct input "im" and "id" is turned into things that the user did not type. [Testing completed]: yes, but the change needs some more dogfooding to make sure it has not introduced other unwelcome autocorrect behavior changes. [Risk to taking this patch] (and alternatives if risky): the code itself is simple and not riskty: it simply re-arranges word suggestions from the suggestion engine so that for the user input `im` the word `I'm` comes before a more-common word `in` because if we ignore case and punctuation, the user typed the letters in `I'm`. This seems like it will always make autocorrection better. But we don't have a test suite for all possible autocorrections, so there is a chance that this change could break some other common autocorrection. I still think we should uplift it, however. I can't think of any simpler or safer way to fix this bug. [String changes made]: none
The patch will not cherry-pick cleanly to v2.2. I will prepare a PR for that branch
http://docs.taskcluster.net/tools/task-graph-inspector/#jQsEm7pJRUW7pmgaD6TLOg The pull request failed to pass integration tests. It could not be landed, please try again.
The tests are passing when I push the commit, so I assume these are infrastructure failures. Trying again.
http://docs.taskcluster.net/tools/task-graph-inspector/#z7atNthxSXqxURMk2sIyHA The pull request failed to pass integration tests. It could not be landed, please try again.
I think this is some autolander bug + issues since the tree closure yesterday, might have been resolved after a rebase, though you shouldn't need to do that. Sorry about that, let's manually land for now: https://github.com/mozilla-b2g/gaia/commit/99711e1fb0561d1413befed10d2eb87ce79e6c4f
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 8609447 [details] [review] [gaia] davidflanagan:bug1164421-v2.2 > mozilla-b2g:v2.2 This is the version of the patch that applies to 2.2, so moving the uplift request to this one. See comment #22 for the approval uplift commenet
Attachment #8609447 - Flags: approval-gaia-v2.2?(jocheng)
Comment on attachment 8609447 [details] [review] [gaia] davidflanagan:bug1164421-v2.2 > mozilla-b2g:v2.2 Thanks David!
Attachment #8609447 - Flags: approval-gaia-v2.2?(jocheng) → approval-gaia-v2.2+
Requesting QA verifyme and exploratory testing for autocorrections on 2.2 and master.
Target Milestone: --- → 2.2 S13 (29may)
This bug has been verified as pass on latest Nightly build of Flame v2.2&3.0 and Nexus 5 v2.2&3.0 by the STR in Comment 0. See attachment: verified_v2.2&3.0.png Reproduce rate: 0/6 STR: 1.Open Messages app. 2.Input "im" /"id" in message content area. **it shows "I'm" /"I'd" correctly with highly-weighted candidate on keyboard. ------------------------------------------------------------------------ Device: Flame v2.2 build(Pass) Build ID 20150526162504 Gaia Revision 15fa6e486b5c82956a8e8f8a99c39d297e8f0523 Gaia Date 2015-05-26 20:07:31 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/663bf4703588 Gecko Version 37.0 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150526.201446 Firmware Date Tue May 26 20:14:55 EDT 2015 Bootloader L1TC000118D0 Device: Flame v3.0 build(Pass) Build ID 20150526160204 Gaia Revision 8ca93673869a64e09ed6153c5402896822dfb253 Gaia Date 2015-05-26 19:31:37 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/1e4e369822ac Gecko Version 41.0a1 Device Name flame Firmware(Release) 4.4.2 Firmware(Incremental) eng.cltbld.20150526.195035 Firmware Date Tue May 26 19:50:45 EDT 2015 Bootloader L1TC000118D0 Device: Nexus 5 v2.2 build(Pass) Build ID 20150526162504 Gaia Revision 15fa6e486b5c82956a8e8f8a99c39d297e8f0523 Gaia Date 2015-05-26 20:07:31 Gecko Revision https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/663bf4703588 Gecko Version 37.0 Device Name hammerhead Firmware(Release) 5.1 Firmware(Incremental) eng.cltbld.20150526.201116 Firmware Date Tue May 26 20:11:30 EDT 2015 Bootloader HHZ12f Device: Nexus 5 v3.0 build(Pass) Build ID 20150526160204 Gaia Revision 8ca93673869a64e09ed6153c5402896822dfb253 Gaia Date 2015-05-26 19:31:37 Gecko Revision https://hg.mozilla.org/mozilla-central/rev/1e4e369822ac Gecko Version 41.0a1 Device Name hammerhead Firmware(Release) 5.1 Firmware(Incremental) eng.cltbld.20150526.195039 Firmware Date Tue May 26 19:50:56 EDT 2015 Bootloader HHZ12f
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][MGSEI-Triage+]
Attachment #8611097 - Attachment is obsolete: true
Comment on attachment 8609016 [details] [review] [gaia] davidflanagan:1164421 > mozilla-b2g:master Late f+
Attachment #8609016 - Flags: feedback?(janjongboom) → feedback+
You need to log in before you can comment on or make changes to this bug.