Updated pt-BR Hunspell Dictionary
Categories
(Mozilla Localizations :: pt-BR / Portuguese (Brazil), enhancement)
Tracking
(Not tracked)
People
(Reporter: hultmann, Assigned: flod)
Details
Attachments
(1 obsolete file)
Attached is the updated pt-BR Hunspell dictionary, provided as an extension for your testing purposes.
Here are the key changes:
-
Shifted from UTF-8 to ISO-8859-1 to address reported issues with accented letters not being recognized (while I have been unable to reproduce the problem myself, I suspect it may be related to encoding)
-
Revamped suggestion system to provide more accurate and relevant recommendations for misspelled words
-
The dictionary file has undergone several thousand changes accumulated over the past few years
| Assignee | ||
Comment 1•1 year ago
|
||
I will land this during the week, since today is merge day for release candidate.
Make sure to test this extensively, since it's a pretty significant change.
As for the problem, you might be hitting this? https://bugzilla.mozilla.org/show_bug.cgi?id=1164263
| Reporter | ||
Updated•1 year ago
|
| Reporter | ||
Comment 2•1 year ago
|
||
(In reply to Francesco Lodolo [:flod] from comment #1)
Make sure to test this extensively, since it's a pretty significant change.
I've developed a script that automatically checks correct spellings, identifies misspelled words, and evaluates the quality of suggestions. This gives me a high level of confidence in the state of the dictionary.
The updated dictionary will be available on AMO, 66k users.
Additionally, I use a handy bookmarklet that enables 'contentEditable=true' on pages. This allows me to inspect the spellchecking on every page
As for the problem, you might be hitting this? https://bugzilla.mozilla.org/show_bug.cgi?id=1164263
No, it's related to suggestions. I believe it may be connected to the functioning of hunspell: slower machines tend to yield inferior suggestions.
I've just come up with an idea to enhance a few crucial suggestions. I'll be updating some REP rules, consequently deprecating the previous dictionary.
| Assignee | ||
Comment 3•1 year ago
|
||
Sounds good. Landed in: https://hg.mozilla.org/l10n-central/pt-BR/rev/10c434a26e673495ab186659086ee00abbb7c897
I've just come up with an idea to enhance a few crucial suggestions. I'll be updating some REP rules, consequently deprecating the previous dictionary.
I didn't understand the problem was limited to suggestions. Might be worth filing a bug if you can pinpoint the issue to a performance problem.
English has a minimal affix file and still uses ISO8859-1 encoding, and yet it often provides spelling suggestions that are completely off, so I always assumed it was due to Hunspell limitations.
| Reporter | ||
Comment 4•1 year ago
|
||
(In reply to Francesco Lodolo [:flod] from comment #3)
English has a minimal affix file and still uses ISO8859-1 encoding, and yet it often provides spelling suggestions that are completely off, so I always assumed it was due to Hunspell limitations.
Indeed, it has the limitation that it only corrects one error; for more complex misspellings, you have to rely on NGRAM suggestions.
But I just ran the en-US dictionary to check misspellings from https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines, and the Firefox dictionary would offer the right spelling for 92% of them. Good enough?
Description
•