Closed Bug 857850 Opened 11 years ago Closed 11 years ago

[keyboard] predictions aren't ranked correctly

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: djf, Unassigned)

References

Details

      No description provided.
I added the following patch to predictions.js to help me understand what the prediction engine was doing:

diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/
index c4adf30..515eeb4 100644
--- a/apps/keyboard/js/imes/latin/predictions.js
+++ b/apps/keyboard/js/imes/latin/predictions.js
@@ -277,6 +277,10 @@ var Predictions = function() {
       }
       // Record the suggestion and move to the next best candidate
       if (!(prefix in _suggestions_index)) {
+        log("candidate: " + cand.prefix +
+            " suggestion: " + prefix +
+            " frequency: " + node.freq +
+            " multiplier: " + cand.multiplier);
         _suggestions.push(prefix);
         _suggestions_index[prefix] = true;
       }

When I typed 'r', I got this output:

E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: r suggestion: released frequency: 47 multiplier: 4
E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: re suggestion: received frequency: 47 multiplier: 4
E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: rec suggestion: record frequency: 154 multiplier: 4

Notice that the third candidate has much higer frequency than the first two.

Also, after considering 'r' itself and picking 'released' as the best match, it then uses 're' as the candidate, picks 'received', and then uses 'rec' as the candiate and suggests 'record'.  It doesn't seem to consider words beginning with 'ra', 'ri', etc.

As another example, if I type 'te', I get this output:

E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: team frequency: 164 multiplier: 2.5
E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: television frequency: 88 multiplier: 2.5
E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: term frequency: 153 multiplier: 2.5

The second candidate has a much lower frequency than the third candidate.
Blocks: 797170
You just printed the wrong freq, the one stored in the candidate is the right one, not the one in the node. Try applying this diff.

diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/latin/predictions.js
index c4adf30..cc66e84 100644
--- a/apps/keyboard/js/imes/latin/predictions.js
+++ b/apps/keyboard/js/imes/latin/predictions.js
@@ -277,6 +277,7 @@ var Predictions = function() {
       }
       // Record the suggestion and move to the next best candidate
       if (!(prefix in _suggestions_index)) {
+        dump("cand: " + cand.prefix + ", sugg: " + prefix + ", cand.freq: " + cand.freq + ", mult: " + cand.multiplier + "\n");
         _suggestions.push(prefix);
         _suggestions_index[prefix] = true;
       }

Tapping 'r' returns this:

cand: r, sugg: released, node.freq: 648, mult: 4
cand: re, sugg: received, node.freq: 628, mult: 4
cand: rec, sugg: record, node.freq: 616, mult: 4

which makes sense because, e.g., realeased frequency: 162 * 4 = 648.

Nevertheless, I agree we should use multipliers in the range of 1.1 to 1.4 which on the one hand pushes for matched prefixes, but on the other hand leaves room for alternative suggestions to be ranked higher, but still in the range less than 255.
Its hard to believe that "released", "received" and "record" are the three most common words that start with r in English, but that is what the dictionary says. I wonder what sort of corpus Google was using when compiling those? Sounds like technical or business language.

So I guess that for any given node in the tree, the frequency is the frequency of the most common word underneath that node?  I need to pass this frequency back to latin.js, so I'll change my code to use cand.freq instead of node.freq.
David, can we close this bug?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Blocks: 873934
You need to log in before you can comment on or make changes to this bug.