Implement predictive text using ternary DAGs

VERIFIED FIXED

Status

Firefox OS
Gaia::Keyboard
VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: ckerschb, Assigned: ckerschb)

Tracking

({access})

unspecified
All
Gonk (Firefox OS)
access
Dependency tree / graph
Bug Flags:
in-moztrap +

Firefox Tracking Flags

(blocking-b2g:-, b2g18+ fixed)

Details

(Whiteboard: ux-tracking)

Attachments

(1 attachment)

(Assignee)

Description

5 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:18.0) Gecko/20100101 Firefox/18.0
Build ID: 20130116073211
Assignee: nobody → mozilla
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: DOM: Device Interfaces → Gaia::Keyboard
Product: Core → Boot2Gecko
QA Contact: wachen
(Assignee)

Comment 1

5 years ago
dictionary size improvement: ternary DAGs vs current implementation
de.dict    3.4MB (5.6MB)
en_us.dict 1.8MB (3.6MB)
es.dict    1.7MB (3.1MB)
fr.dict    2.8MB (4.7MB)
pt_br.dict 2.0MB (3.7MB)
Apologies; a commit landed with this bug number instead of Bug 838827. Leaving this as a breadcrumb trail.

https://hg.mozilla.org/mozilla-central/rev/595da71ae45f
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → All
(Assignee)

Comment 3

5 years ago
Performance comparison for input word 'restaurant'. (left = DAG, right = prefix trie)

r         48 ms  |   97 ms
re        49 ms  |  192 ms
res       72 ms  |  141 ms
rest      56 ms  |  108 ms
resta     65 ms  |   85 ms
restau    69 ms  |   49 ms
restaur   75 ms  |   50 ms
(Assignee)

Comment 4

5 years ago
Currently we are evaluating two different node encodings:

node {
  int16 ch;
  int16 lPtr;  // left child
  int16 cPtr;  // center child
  int16 rPtr;  // right child
  int16 nPtr;  // next child, holds a pointer to the node with the next highest frequency
  int16 high;  // holds an overflow byte for lPtr, cPtr, rPtr, nPtr
  in16 frequency;
}

de.dict    3.4MB
en_us.dict 1.8MB
es.dict    1.7MB
fr.dict    2.8MB
pt_br.dict 2.0MB

r         48 ms
re        49 ms
res       72 ms
rest      56 ms
resta     65 ms
restau    69 ms
restaur   75 ms

source: https://github.com/ckerschb/gaia/tree/dag_ll

==============================================================

node {
  int16 frequency;
  int16 ch;
  int32 lPtr;
  int32 cPtr;
  int32 rPtr;
}

de.dict    6.6MB
en_us.dict 3.1MB
es.dict    2.8MB
fr.dict    4.9MB
pt_br.dict 3.1MB

r         14 ms
re        78 ms
res       70 ms
rest      42 ms
resta      6 ms
restau    26 ms
restaur    6 ms

source: https://github.com/ckerschb/gaia/tree/dag_typos
(Assignee)

Comment 5

5 years ago
Next, we should try to apply the same technique with the pointer overflow byte to the second approach and see if we can further reduce the size of the dictionaries.
(Assignee)

Comment 6

5 years ago
We improved the quality of the engine, by doing:
1) insertion; deletion; replacement; and transposition of characters
2) upper/lower case predictions
3) diacritics (umlaut)

Performance:
r          49 ms
re         51 ms
res        25 ms
rest        4 ms
resta       8 ms
restau      8 ms
restaur     5 ms
restaura   54 ms
restauran   7 ms

find the code here:
https://github.com/ckerschb/gaia/tree/tst


I guess, this is ready to merge, I will do a pull request!
(Assignee)

Comment 7

5 years ago
Created attachment 721279 [details] [diff] [review]
new pull request on github
Attachment #721279 - Flags: review+
Attachment #721279 - Flags: review+ → review?(dflanagan)
Sorry the review is taking a while. Its in my queue, but I've got some blocker bugs that are taking up my time.  I look forward to reviewing this soon!

Comment 10

5 years ago
r=me with the nits by gregor and me addressed. David, feel free to request additional changes as a follow-up when you get around to this. Very nice work guys.

Updated

5 years ago
Attachment #721279 - Flags: review?(dflanagan)
Attachment #721279 - Flags: review?
Attachment #721279 - Flags: review+
(Assignee)

Comment 11

5 years ago
updated patch following the suggestions of Gregor and Andreas.
(Assignee)

Comment 12

5 years ago
please find the new pull request here:

https://github.com/mozilla-b2g/gaia/pull/8605
Guys, can you clarify the benefits of this implementation? I did some background reading at http://www.codeproject.com/Articles/32089/Using-Ternary-DAGs-for-Spelling-Correction my understanding is Ternary DAG enables 1) smaller dictionary file sizes and 2) faster word suggestions. 

If so, is the determination of _how_ to surface those recommendations (eg: as a row of buttons across the top of the keyboard, versus as a button hovering near the cursor in the text field) an independent issue?
Whiteboard: u=user c=keyboard s=ux-most-wanted
(Assignee)

Comment 14

5 years ago
both correct, ternary dags allow for 1) smaller dictionaries and 2) faster suggestions.

This implementation solely focuses on the prediction engine, which takes an input word (partial word) and returns an array of suggestions, everything else is an independent issue.
Attachment #721279 - Flags: review? → review+
https://github.com/mozilla-b2g/gaia/commit/a783fe0514709000bc63ae73ea5138707b82348d
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
tracking? -- significant improvements in text prediction
tracking-b2g18: --- → ?
Blocks: 851314
This looks like a feature and we're now past the 3/15 feature complete date - is there a partner or product push for this to be uplifted?
Adding access keyword because this helps accessibility (input impairment).
Keywords: access
(In reply to Lukas Blakk [:lsblakk] from comment #17)
> This looks like a feature and we're now past the 3/15 feature complete date
> - is there a partner or product push for this to be uplifted?

What does this mean now? I understand that we don't want to take it for 1.0.1 but we are shipping future versions based on b2g18. Does this mean we are not allowed to put any code in b2g18 that is not requested by a partner? That doesn't sound right to me.
No partner is (yet) asking for this -- this one is sponsored by Mozilla. UX can probably best speak to how important they think this is for users. needinfo to Josh. 

Ideally I think we'd like to ship with text prediction turned on but we were forced to turn it off by default in 1.0.1 because performance was too slow.
Flags: needinfo?(jcarpenter)
Its definitely too late for 1.0.1.  1.1 could be an option but we are late in the cycle for features too and there's no room in that schedule for chasing regressions and performance work for non-blocking features.  IF we want to land it we should do so now though to enable maximum testing time.
Sorry, I didn't mean to imply 1.0.1 -- I would like to lobby for 1.1, though.
If the UX team is making a case for this in v1.1 go ahead with nomination for uplift.
tracking-b2g18: ? → +
Hello folks. Sorry for the slow response; was on PTO last week. 

(In reply to Lucas Adamski [:ladamski] (plz needinfo) from comment #21)
> Its definitely too late for 1.0.1.  1.1 could be an option but we are late
> in the cycle for features too and there's no room in that schedule for
> chasing regressions and performance work for non-blocking features.  IF we
> want to land it we should do so now though to enable maximum testing time.

In general we're strongly in favor of improvements to keyboard quality. If this improves performance of Word Suggestions to the point that we can turn them on by default, that's quite exciting. Who can I work with to get this loaded onto a build and tested? If the responsiveness if acceptable I'd push strongly for a v1.1 uplift.
Flags: needinfo?(jcarpenter)
Hey folks, I'm working w/ Gordon to test this and will follow up tomorrow (Friday Mar 29).
Depends on: 851565
Hi all, let's uplift this. Improved keyboard performance is one of our highest priorities right now, and this looks like it may enable us to turn suggestions back on, which is huge. Let me know what I can do to nudge the process along. I'm not up to speed on the latest processes for nominations, uplifts, etc.
Comment on attachment 721279 [details] [diff] [review]
new pull request on github

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Current performance of predictive text is known to be slow and partners are unhappy. We are unable to turn on word suggestions by default so the keyboard experience is compromised
Testing completed: 
Risk to taking this patch (and alternatives if risky): relatively low risk in that we have only changed the way we generate the predictions and have not made any changes to UX. However, we will also need to fix bug 851565 in a follow up.
String or UUID changes made by this patch: none
Attachment #721279 - Flags: approval-gaia-v1?
Blocks: 856521

Comment 28

5 years ago
Comment on attachment 721279 [details] [diff] [review]
new pull request on github

Approving for v1 due to the keyboard performance concerns people are raising.
Attachment #721279 - Flags: approval-gaia-v1? → approval-gaia-v1+

Comment 29

5 years ago
Marking as blocking tef? and requesting uplift.
blocking-b2g: --- → tef?
The pull request doesn't show it being merged to master....  did it land there?  What commit is it on master branch?
Depends on: 857239
(In reply to John Ford [:jhford] from comment #30)
> The pull request doesn't show it being merged to master....  did it land
> there?  What commit is it on master branch?

Hm isn't this master?
https://github.com/mozilla-b2g/gaia/commit/a783fe0514709000bc63ae73ea5138707b82348d
Ahh, didn't see comment 15 and the pull request was closed without the merge taking place there.  You also already cherry-picked this onto v1-train :)

v1-train: 1dfb75d6a6c7478b2d3251ea510769fc4a8a26b9
status-b2g18: --- → fixed
I don't think is the right time to add this to v1.0.1, but happy to hear other opinions.
blocking-b2g: tef? → -
Christoph,

As a very late review comment, I'd like to ask you to document (here, or even better in the code) how the linked list (nPtr) works.

I haven't been able to figure it out by reading your code, and I have serious concerns about our ability to make good predictions because of the way the conversion from TST to DAG combines suffixes and discards word frequency information. I assume that the linked list is intended to work around this problem, but I'd like to understand the workaround before we go any further with the DAG-based dictionaries.

http://www.strchr.com/ternary_dags is about offering suggestions for misspelled words. In this case the input is a complete word that the user is done typing. It is only necessary to find a full list of correctly-spelled words that are close to it. This can be done adequately without word frequency information.

But for word prediction, we're trying to guess what the user wants to type, and I think that doing that requires us to know how common different words are. Since the DAG approach combines common suffixes, how do we store the frequency for two words (like "reading" and "writing") that end with the same suffix?  The word "being" has a frequency of 167, and the word "yacking" has a frequency of 1.  I see code in the dictionary creation script that averages node frequencies. Is that what's happening in this case?  Would being and yacking end up with the same frequency? Or does the linked list allow us to avoid this problem?

For quality auto-correction and word prediction, I don't think we want to compromise on the word frequency data. So if the linked list isn't a fully-adequate workaround, I think we should use a bigger dictionary to get better results. We could switch back to a plain TST (doubling the dictionary size) or compromise and only allow combining of suffixes for words whose frequency is less than some threshold.  (In the english dictionary, there are 12.5K words with a frequency of 100 and above, and 148K words with frequencies 99 and below, so it might not be too big a memory hit to create a complete tree for all of the high-frequency words).
(Assignee)

Comment 35

5 years ago
David,

for illustration purposes I created a XML file based on your example:

<wordlist locale="en_US" description="English (US)" date="1340038693" version="16">
<w f="167" flags="">being</w>
<w f="2" flags="">yacking</w>
</wordlist>

Please note, that I updated the frequency of yacking from 1 to 2 because we skip words with a frequency less or equal to 1.

I compiled the XML to a dict file, using:
python3 xml2dict.py dictionaries/test.xml -v -o test.dict

Using the -v option allows us to inspect the generated TST (including the DAG).

[1] { ch: b, L: 0, C: 2, R: 3, N: 3, h: 0, f: 167}
[2] { ch: e, L: 0, C: 4, R: 0, N: 0, h: 0, f: 167}
[3] { ch: y, L: 0, C: 5, R: 0, N: 0, h: 0, f: 2}
[4] { ch: i, L: 0, C: 6, R: 0, N: 0, h: 0, f: 84}
[5] { ch: a, L: 0, C: 7, R: 0, N: 0, h: 0, f: 2}
[6] { ch: n, L: 0, C: 8, R: 0, N: 0, h: 0, f: 84}
[7] { ch: c, L: 0, C: 9, R: 0, N: 0, h: 0, f: 2}
[8] { ch: g, L: 0, C: 10, R: 0, N: 0, h: 0, f: 84}
[9] { ch: k, L: 0, C: 4, R: 0, N: 0, h: 0, f: 2}
[10] { ch: $, L: 0, C: 0, R: 0, N: 0, h: 0, f: 84}

You are right, we store the average for frequencies in common suffix nodes (e.g. node [4]).

Loading the dict file and applying the following diff shows what happens behind the scenes:

diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/latin/predictions.js
index c4adf30..cc66e84 100644
--- a/apps/keyboard/js/imes/latin/predictions.js
+++ b/apps/keyboard/js/imes/latin/predictions.js
@@ -277,6 +277,7 @@ var Predictions = function() {
       }
       // Record the suggestion and move to the next best candidate
       if (!(prefix in _suggestions_index)) {
+        dump("cand: " + cand.prefix + ", sugg: " + prefix + ", cand.freq: " + cand.freq + ", mult: " + cand.multiplier + "\n");
         _suggestions.push(prefix);
         _suggestions_index[prefix] = true;
       }

The user taps 'b':
  cand: b, sugg: being, cand.freq: 668, mult: 4
  cand: , sugg: yacking, cand.freq: 2, mult: 1

The user taps 'y':
  cand: , sugg: being, cand.freq: 167, mult: 1
  cand: y, sugg: yacking, cand.freq: 8, mult: 4

We see that both, 'being' for 'b' and 'yacking' for 'y' are suggested displaying the right frequency.

For longer input words, the user taps 'bei':
  cand: bei, sugg: being, cand.freq: 168, mult: 2

You were again right with your assumption that we now use the average frequency in the node.
In my opinion, this doesn't really matter because we have already narrowed the search space to 'bei'.

Anyway, one of the big issues we had with our previous engine was the dictionary size and lookup speed.
We have optimized the file size of the dictionary as well as the lookup speed using TDAGs.

In my opinion we are providing accurate predictions even with averaging the suffix frequencies in the node.

I guess it's not my decision, Andreas?
Thanks for working through the example. I wish I'd proposed something more realistic!

I'm not so worried about the frequency of "being" being under-estimated when the user types "bei" as I am about the frequency of "yacking" being inflated when the user types "yacki".  If "yackic" was a real word with a medium frequency of 50 or something, wouldn't the low-frequency "yacking" be suggested first?

Again, I wish I had a realistic example. http://www.strchr.com/ternary_dags mentions "head" and "read" as words that share a suffix (and I'd add "bead", "dead","lead", and "mead"). The fact that these have common suffixes so close to their stem means that their frequency will be indistinguishable after the user has typed the e.  "lead" is common and "mead" is uncommon. But l and m are close on the keyboard and one could be mis-typed for the other.  Once we start looking at the actual x,y coordinates of the user's touch, consider what happens if the user types right in between the l and the m and then types e and a (and hits those two key right in the center so they are unambiguous). If we don't have frequency information to distinguish the common word from the uncommon one we can't make a meaningful prediction here.

I'm not sure that exaample is one that would make a practical difference, so I'm sort of playing devil's advocate here, but I am still concerned about this
(Assignee)

Comment 37

5 years ago
David,

I am totally with you. There is a realistic chance that this algorithm overestimates the actual frequency of a candidate. On the other hand, we return 3 suggestions total, which means we would still display 2 other suggestions in such a case. Having the same overestimation problem on all of the 3 suggestions is rather unlikely, isn't it?

As already mentioned, the dictionary size was a bottleneck in the previous implementation of the engine. Therefore I applied the following patch and compiled all dictionaries again to see what difference it would make if we only compress suffix-nodes with the same frequency (which would avoid averaging the frequency in compressed nodes).

diff --git a/apps/keyboard/js/imes/latin/dictionaries/xml2dict.py b/apps/keyboard/js/imes/latin/dictionaries/xml2dict.py
index e7ba3e1..04d0243 100644
--- a/apps/keyboard/js/imes/latin/dictionaries/xml2dict.py
+++ b/apps/keyboard/js/imes/latin/dictionaries/xml2dict.py
@@ -218,6 +218,7 @@ class TSTTree:
         # children are equal
         mapping[nodeA] = nodeB
         return mapping if nodeA.ch == nodeB.ch and \
+               nodeA.frequency == nodeB.frequency and \
                self.equal(nodeA.left,   nodeB.left, mapping) and \
                self.equal(nodeA.center, nodeB.center, mapping) and \
                self.equal(nodeA.right,  nodeB.right, mapping) else False

This minor fix compresses suffix nodes only if the frequency of every node matches in addition to the character, which is reflected in the bigger file size of the dictionaries:

de.dict    3.4MB VS 11.9MB
en_us.dict 1.8MB VS  5.5MB
es.dict    1.7MB VS  5.7MB
fr.dict    2.8MB VS  8.9MB
pt_br.dict 2.0MB VS  6.4MB

As we can see, compressing suffix nodes only if the character and the frequency in the node match increases the file size dramatically.

For the en_us.dict for example, we create:
  (474298 - 79632 = 394666 nodes)

in comparsion when we would average the frequency in compressed nodes:
  (474298 - 345723 = 128575 nodes)

This means, our algorithm compresses 345k nodes if only the character in the node matches whereas we can only compress 79k nodes if the character and the frequency in the nodes matches.

I guess if we really want to get more accuracy here, it's a better idea to switch algorithms again, because such big dictionaries are not an option what I remember.

Updated

5 years ago
Blocks: 797170

Updated

5 years ago
Whiteboard: u=user c=keyboard s=ux-most-wanted → ux-tracking

Updated

5 years ago
Blocks: 873934

Comment 38

5 years ago
Predictive Text keyboard feature has been implemented and is functional on the current master build. 
Tested on b2g/nightly/mozilla-central-unagi/latest.

Kernel Date: Dec 5
Gecko: http://hg.mozilla.org/mozilla-central/rev/3c6f2394995d
Gaia: d2ad0f1a5a6be8a2d6df721ba3c2b33e037c423b
Status: RESOLVED → VERIFIED
Please add a testcase for this bug to moztrap for 1.1 testsuite.  If yes, mark this in-moztrap+ when completed.  If not, mark this in-moztrap-.
Flags: in-moztrap?(cschmoeckel)

Comment 40

5 years ago
Added Keyboard Suite Test Case #8501 - [Keyboard] Auto Correct's text prediction works correctly in the English Language & TC #8502 [Keyboard] Text prediction text entry in English keeps the capitalization entered by the user
Flags: in-moztrap?(cschmoeckel) → in-moztrap+
You need to log in before you can comment on or make changes to this bug.