Closed Bug 902120 Opened 11 years ago Closed 11 years ago

Wordlist files and dictionary files updated

Categories

(Firefox OS Graveyard :: Gaia::Keyboard, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: luigitedone, Assigned: luigitedone)

References

Details

Attachments

(1 file)

Some wordlists are missing or are outdated.
Summary: Added and updated wordlist files → Add and update wordlist files
Attached file Proposed patch
Attachment #786462 - Flags: review?(dflanagan)
Comment on attachment 786462 [details]
Proposed patch

><!DOCTYPE html>
><meta charset="utf-8">
><meta http-equiv="refresh" content="1;https://github.com/mozilla-b2g/gaia/pull/11386/">
><title>Bugzilla Code Review</title>
><p>Redirecting to <a href="https://github.com/mozilla-b2g/gaia/pull/11386/">» pull request on github</a></p>
Attachment #786462 - Attachment mime type: text/plain → text/html
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Comment on attachment 786462 [details]
Proposed patch

Luigi,

Thanks for this. I'm giving r- because it is a duplicate of bug 884752. The reason I haven't landed a fix for that bug is that we want to make it configurable, so that vendors can choose which dictionaries to ship with a build.  If we put all the dictionaries into the build, we worry that it will reduce the space available for apps.
Attachment #786462 - Flags: review?(dflanagan) → review-
Hi David,
I'm sorry for this mistake, I didn't find the existing bug. I did it in order to implement new keyboard layouts with the corresponding dictionary for the autocomplete function.
Assignee: nobody → luigitedone
I'm reopening this bug because I noticed that Andreas' patch have got older wordlist files than my patch. For example in my patch the version number for the en_US wordlist is 31 while for Andreas' patch is 16. Can you confirm it?
Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(timdream)
Flags: needinfo?(rlu)
Flags: needinfo?(dflanagan)
Resolution: DUPLICATE → ---
I have no idea...
Flags: needinfo?(timdream)
We also pinged David with a related question, Bug884752#c84.
Flags: needinfo?(rlu)
Rudy: I responded in bug 884752. Updating the word lists doesn't need to be a blocker, but is probably worth doing.

Luigi: thanks for remembering this.  The keyboard/dictionaries/README file has the link to the Android sources. Note that the Android wordlists have been updated again beyond what was in Andreas' original patch.  And the file format has changed from xml to a nested CSV format.  So in order to update the wordlists to the very latest version, you'll need to write a conversion script to convert the new format to the old xml format. (Because the wordlists we get from Kevin Scannell are in xml)
Flags: needinfo?(dflanagan)
Summary: Add and update wordlist files → Wordlist files and dictionary files updated
Hi David, I updated my patch in order to match the current keyboard layout. I took the latest wordlist files from Android repository and the I converted them in XML; after this I generated the .dict files.
Attachment #786462 - Flags: review- → review?(janjongboom)
Could you care to also check in the CSV->XML converter? Would come in handy.
Comment on attachment 786462 [details]
Proposed patch

Hi Luigi, thanks for the patch.

1. running |make| fails with `Makefile:33: *** missing separator.  Stop.` Something with a missing hard tab.
2. It's not possible to write profane words anymore, Typing `fuck` gets autocorrected to `duck`. The tests for profane words are also failing (run |worker_test.js|).
Attachment #786462 - Flags: review?(janjongboom) → review-
Hi Jan, the first problem is easy to fix. About the second one I found this comment in gaia/keyboard/dictionaries/xml2dict.py:

    <!-- This is a sample wordlist that can be converted to a binary
         dictionary for use by the Latin IME. The format of the word
         list is a flat list of word entries. Each entry has a frequency
         between 255 and 0. Highest frequency words get more weight in
         the prediction algorithm. As a special case, a weight of 0 is
         taken to mean profanity - words that should not be considered a
         typo, but that should never be suggested explicitly. You can
         capitalize words that must always be capitalized, such as
         "January". You can have a capitalized and a non-capitalized
         word as separate entries, such as "robin" and "Robin". -->

So should we suggest profane words? If not, tests are wrong.
Flags: needinfo?(janjongboom)
We don't suggest profane words (don't suggest |fuck| if you type |fuc|), but if you type the whole word (|fuck|) it should never fall back to a different word (in this case, |duck| gets selected). In your patch this is the case, in master it isn't.

So "but that should never be suggested explicitly" just means that we don't *suggest* explicitly. The words should still be in the dictionary.
Flags: needinfo?(janjongboom) → needinfo?(luigitedone)
Comment on attachment 786462 [details]
Proposed patch

Hi Jan, I updated the patch. It should be ok right now.
Attachment #786462 - Flags: review- → review?(janjongboom)
Patch updated with latest dictionaries and wordlist files
Flags: needinfo?(luigitedone)
Blocks: 900626
Comment on attachment 786462 [details]
Proposed patch

r=me. Autosuggest still works, profanity still works, all tests pass, makefile works. So lgtm!
Attachment #786462 - Flags: review?(janjongboom) → review+
Landed in https://github.com/mozilla-b2g/gaia/commit/84e58eb4d0a22a10dabfce9d6c28f60b5e50339e
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: