1235506 - en-US dictionary: Additional Mozilla words need to be cleaned up. Other issues discussed: See comment #10 and comment #12.

Assignee

Description

•

10 years ago

Firstly, in bug 1137544 the Mozilla maintained en-US dictionary got refreshed and many words previously contained got removed causing bugs: Bug 1183512, bug 1198052. These bugs wouldn't have been caused, had SCOWL's en_US-large been used: http://app.aspell.net/lookup?dict=en_US-large&words=relict%0D%0Aresiduary%0D%0Aenforceability%0D%0Aadvisor%0D%0Ainfeasible%0D%0Aclich%E9%0D%0ABogot%E1%0D%0Ainfeasible%0D%0Aunfeasible Secondly, there seems to be a problem with Mozilla's update process, since words that don't exist in the upstream dictionary ("Aurthur", bug 301712) seem to hang around for ten years: http://app.aspell.net/lookup?dict=en_US-large&words=Aurthur It appears less optimal to add single words only to en-US.dic as seems to be the practise: https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic Instead, they should either be fed upstream and/or also be kept in a separate file "Mozilla knows better than Aspell.net", like Fukushima: http://app.aspell.net/lookup?dict=en_US-large&words=Fukushima so they can easily be added on the next merge. Thirdly, we should ensure that accented words are included. Many are already included in en_US-large. There was talk of switching to UTF-8 in bug 1144254 to be able to include "naïve", which is not in the upstream dictionary, but that doesn't seem to be necessary, since the charset is already ISO8859-1 and that includes accented characters including ï. BTW, "naïve", if we want it, would go into the "Mozilla knows better" file.

331 wrong dictionary entries of the form "<verb>'s". 10 years ago Jorg K (CEST = GMT+2) 3.66 KB, text/plain		Details
Words Mozilla adds to the SCOWL data (unreviewed) 10 years ago Jorg K (CEST = GMT+2) 4.30 KB, text/plain		Details
340 wrong dictionary entries of the form "<verb>'s" or "<adjective>'s" 10 years ago Jorg K (CEST = GMT+2) 3.43 KB, text/plain		Details
336 wrong dictionary entries of the form "<verb>'s" or "<adjective>'s" 10 years ago Jorg K (CEST = GMT+2) 3.35 KB, text/plain		Details
387 Mozilla-added words (unreviewed) 10 years ago Jorg K (CEST = GMT+2) 4.36 KB, text/plain		Details
342 wrong dictionary entries of the form "<verb>'s" or "<adjective>'s" 10 years ago Jorg K (CEST = GMT+2) 3.43 KB, text/plain		Details
347 Mozilla-added words (reviewed), 10 to be removed. 10 years ago Jorg K (CEST = GMT+2) 3.89 KB, text/plain		Details
List of 5670 words that got removed in bug 1137544 10 years ago Jorg K (CEST = GMT+2) 57.68 KB, text/plain		Details
354 dictionary corrections 10 years ago Jorg K (CEST = GMT+2) 76.50 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
353 dictionary corrections 10 years ago Jorg K (CEST = GMT+2) 76.25 KB, patch	jorgk-bmo : review+ Sylvestre : approval-mozilla-aurora+	Details \| Diff \| Splinter Review
add-variants.patch 10 years ago Kevin Atkinson 1.28 MB, patch		Details \| Diff \| Splinter Review
add-variants.patch 10 years ago Kevin Atkinson 1.28 MB, patch		Details \| Diff \| Splinter Review
add-variants.patch 10 years ago Kevin Atkinson 217.61 KB, patch		Details \| Diff \| Splinter Review
add-variants.patch 10 years ago Kevin Atkinson 217.62 KB, patch		Details \| Diff \| Splinter Review