Import custom words from Chrome into spellchecker

RESOLVED FIXED in Firefox 59

Status

()

enhancement
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: mlissner, Assigned: ananuti)

Tracking

Trunk
mozilla59
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox59 fixed)

Details

Attachments

(1 attachment, 3 obsolete attachments)

Chrome's wordlist is here: https://cs.chromium.org/codesearch/f/chromium/src/third_party/hunspell_dictionaries/en_US.dic_delta?cl=a9bac57ce6c9d390a52ebaad3259f5fdb871210e

And they have a second list with additional words here: https://cs.chromium.org/codesearch/f/chromium/src/third_party/hunspell_dictionaries/en_US.dic?cl=a9bac57ce6c9d390a52ebaad3259f5fdb871210e

The first list is essentially worse than the list in Firefox. I diffed it, skimmed through the first 1000 lines or so, and didn't see any words that weren't already in FF.

The second list is more useful. If I figure out which of those words are useful additions, do we have any copyright concerns about pulling these in? 

It looks like these are available under the MPL: https://cs.chromium.org/chromium/src/third_party/hunspell_dictionaries/COPYING?dr
Seems like this should block 499593, but I don't know how to flag it as such.
Ekanan, you've been merging words into the en-US dictionary. Maybe you're the right person to make a decision on this.

(In reply to mlissner from comment #1)
> Seems like this should block 499593, but I don't know how to flag it as such.

That's for adding specific words to the en-US dictionary. I don't think a mass import like this fits there.
Flags: needinfo?(ananuti)
(In reply to mlissner from comment #0)
> Chrome's wordlist is here:
> https://cs.chromium.org/codesearch/f/chromium/src/third_party/
> hunspell_dictionaries/en_US.
> dic_delta?cl=a9bac57ce6c9d390a52ebaad3259f5fdb871210e
> 
> And they have a second list with additional words here:
> https://cs.chromium.org/codesearch/f/chromium/src/third_party/
> hunspell_dictionaries/en_US.dic?cl=a9bac57ce6c9d390a52ebaad3259f5fdb871210e
> 
> The first list is essentially worse than the list in Firefox. I diffed it,
> skimmed through the first 1000 lines or so, and didn't see any words that
> weren't already in FF.

I've just had a little look at Chrome. This delta dic is alike a 5-mozilla-added file[a]. It's a diff of their modifications (binary dictionary) on top of upstream. we can use this to see which ones we can take.

> The second list is more useful. If I figure out which of those words are
> useful additions, do we have any copyright concerns about pulling these in? 
> 
> It looks like these are available under the MPL:
> https://cs.chromium.org/chromium/src/third_party/hunspell_dictionaries/
> COPYING?dr

The second list is not useful. It's the same as upstream version 2017.01.22 [b]

Firefox's dict based on recent hunspell 2017.08.24. 

[a] https://hg.mozilla.org/mozilla-central/file/tip/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added 
[b] https://sourceforge.net/projects/wordlist/files/speller/2017.01.22/hunspell-en_US-2017.01.22.zip/download
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(ananuti)
Posted file Chrome_modifications.txt (obsolete) —
these words are marked as misspelled.
(In reply to Ekanan Ketunuti from comment #4)
> Created attachment 8935326 [details]
> Chrome_modifications.txt
> 
> these words are marked as misspelled.

Hi Kevin, any of these affix is for?

licence//101
licences/14
licencing/14
Flags: needinfo?(kevin.bugzilla)
Assignee: nobody → ananuti
Posted file Chrome_modifications.txt (obsolete) —
rm false positives
Attachment #8935326 - Attachment is obsolete: true
Posted patch bug1423678.patch (obsolete) — Splinter Review
Only take words with 3-4 stars on Aspell checker and check with Oxford AmEng dictionary. Here are the results.
Flags: needinfo?(kevin.bugzilla)
Attachment #8935643 - Flags: review?(bugs)
Priority: -- → P1
Comment on attachment 8935643 [details] [diff] [review]
bug1423678.patch


> Letizia/M
>+Lett/M

Strike? Common misspelling.

> proboscis/MS
>+proc

Strike? Common misspelling.

> profuseness/M
>+prog

Strike? Common misspelling.

r+
Attachment #8935643 - Flags: review?(bugs) → review+
Attachment #8935342 - Attachment is obsolete: true
Attachment #8935643 - Attachment is obsolete: true
Attachment #8942476 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d350d4735de6
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You added Muhammad/M which was already in the file:
https://hg.mozilla.org/mozilla-central/rev/d350d4735de6#l1.76

You added +auteur/MS when you already had auteur's:
https://hg.mozilla.org/mozilla-central/rev/d350d4735de6#l1.186

These don't look right either:
 hexane's
+hexane/MS
 hexanes

 neuroscience's
+neuroscience/MS
 neurosciences

 polypeptide's
+polypeptide/MS
Flags: needinfo?(bugs)
Flags: needinfo?(ananuti)
Depends on: 1430714
(In reply to Jorg K (GMT+1) from comment #12)
> You added Muhammad/M which was already in the file:
> https://hg.mozilla.org/mozilla-central/rev/d350d4735de6#l1.76
> 
> You added +auteur/MS when you already had auteur's:
> https://hg.mozilla.org/mozilla-central/rev/d350d4735de6#l1.186
> 
> These don't look right either:
>  hexane's
> +hexane/MS
>  hexanes
> 
>  neuroscience's
> +neuroscience/MS
>  neurosciences
> 
>  polypeptide's
> +polypeptide/MS

Sorry, my mistake. :(
Flags: needinfo?(bugs)
Flags: needinfo?(ananuti)
You need to log in before you can comment on or make changes to this bug.