Closed Bug 857850 Opened 12 years ago Closed 12 years ago

[keyboard] predictions aren't ranked correctly

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: djf, Unassigned)

References

Details

David Flanagan [:djf]

Reporter

Description

•

12 years ago

No description provided.

David Flanagan [:djf]

Reporter

Comment 1

•

12 years ago

I added the following patch to predictions.js to help me understand what the prediction engine was doing: diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/ index c4adf30..515eeb4 100644 --- a/apps/keyboard/js/imes/latin/predictions.js +++ b/apps/keyboard/js/imes/latin/predictions.js @@ -277,6 +277,10 @@ var Predictions = function() { } // Record the suggestion and move to the next best candidate if (!(prefix in _suggestions_index)) { + log("candidate: " + cand.prefix + + " suggestion: " + prefix + + " frequency: " + node.freq + + " multiplier: " + cand.multiplier); _suggestions.push(prefix); _suggestions_index[prefix] = true; } When I typed 'r', I got this output: E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: r suggestion: released frequency: 47 multiplier: 4 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: re suggestion: received frequency: 47 multiplier: 4 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: rec suggestion: record frequency: 154 multiplier: 4 Notice that the third candidate has much higer frequency than the first two. Also, after considering 'r' itself and picking 'released' as the best match, it then uses 're' as the candidate, picks 'received', and then uses 'rec' as the candiate and suggests 'record'. It doesn't seem to consider words beginning with 'ra', 'ri', etc. As another example, if I type 'te', I get this output: E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: team frequency: 164 multiplier: 2.5 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: television frequency: 88 multiplier: 2.5 E/GeckoConsole( 7019): Content JS LOG at app://keyboard.gaiamobile.org/js/imes/latin/latin.js:175 in anonymous: candidate: te suggestion: term frequency: 153 multiplier: 2.5 The second candidate has a much lower frequency than the third candidate.

David Flanagan [:djf]

Reporter

Updated

•

12 years ago

Blocks: 797170

Christoph Kerschbaumer [:ckerschb, back Sept 8th]

Comment 2

•

12 years ago

You just printed the wrong freq, the one stored in the candidate is the right one, not the one in the node. Try applying this diff. diff --git a/apps/keyboard/js/imes/latin/predictions.js b/apps/keyboard/js/imes/latin/predictions.js index c4adf30..cc66e84 100644 --- a/apps/keyboard/js/imes/latin/predictions.js +++ b/apps/keyboard/js/imes/latin/predictions.js @@ -277,6 +277,7 @@ var Predictions = function() { } // Record the suggestion and move to the next best candidate if (!(prefix in _suggestions_index)) { + dump("cand: " + cand.prefix + ", sugg: " + prefix + ", cand.freq: " + cand.freq + ", mult: " + cand.multiplier + "\n"); _suggestions.push(prefix); _suggestions_index[prefix] = true; } Tapping 'r' returns this: cand: r, sugg: released, node.freq: 648, mult: 4 cand: re, sugg: received, node.freq: 628, mult: 4 cand: rec, sugg: record, node.freq: 616, mult: 4 which makes sense because, e.g., realeased frequency: 162 * 4 = 648. Nevertheless, I agree we should use multipliers in the range of 1.1 to 1.4 which on the one hand pushes for matched prefixes, but on the other hand leaves room for alternative suggestions to be ranked higher, but still in the range less than 255.

David Flanagan [:djf]

Reporter

Comment 3

•

12 years ago

Its hard to believe that "released", "received" and "record" are the three most common words that start with r in English, but that is what the dictionary says. I wonder what sort of corpus Google was using when compiling those? Sounds like technical or business language. So I guess that for any given node in the tree, the frequency is the frequency of the most common word underneath that node? I need to pass this frequency back to latin.js, so I'll change my code to use cand.freq instead of node.freq.

Christoph Kerschbaumer [:ckerschb, back Sept 8th]

Comment 4

•

12 years ago

David, can we close this bug?

David Flanagan [:djf]

Reporter

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → INVALID

Leo

Updated

•

12 years ago

Blocks: 873934

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[keyboard] predictions aren't ranked correctly

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

Tracking

(Not tracked)

People

(Reporter: djf, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Updated