Closed Bug 1874726 Opened 1 year ago Closed 1 year ago

Updated pt-BR Hunspell Dictionary

Categories

(Mozilla Localizations :: pt-BR / Portuguese (Brazil), enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hultmann, Assigned: flod)

Details

Attachments

(1 obsolete file)

Attached file dic_pt-BR-ao_123.2024.15.112.xpi (obsolete) —

Attached is the updated pt-BR Hunspell dictionary, provided as an extension for your testing purposes.

Here are the key changes:

  • Shifted from UTF-8 to ISO-8859-1 to address reported issues with accented letters not being recognized (while I have been unable to reproduce the problem myself, I suspect it may be related to encoding)

  • Revamped suggestion system to provide more accurate and relevant recommendations for misspelled words

  • The dictionary file has undergone several thousand changes accumulated over the past few years

I will land this during the week, since today is merge day for release candidate.

Make sure to test this extensively, since it's a pretty significant change.

As for the problem, you might be hitting this? https://bugzilla.mozilla.org/show_bug.cgi?id=1164263

Attachment #9372877 - Attachment is obsolete: true

(In reply to Francesco Lodolo [:flod] from comment #1)

Make sure to test this extensively, since it's a pretty significant change.

I've developed a script that automatically checks correct spellings, identifies misspelled words, and evaluates the quality of suggestions. This gives me a high level of confidence in the state of the dictionary.

The updated dictionary will be available on AMO, 66k users.

Additionally, I use a handy bookmarklet that enables 'contentEditable=true' on pages. This allows me to inspect the spellchecking on every page

As for the problem, you might be hitting this? https://bugzilla.mozilla.org/show_bug.cgi?id=1164263

No, it's related to suggestions. I believe it may be connected to the functioning of hunspell: slower machines tend to yield inferior suggestions.

I've just come up with an idea to enhance a few crucial suggestions. I'll be updating some REP rules, consequently deprecating the previous dictionary.

Sounds good. Landed in: https://hg.mozilla.org/l10n-central/pt-BR/rev/10c434a26e673495ab186659086ee00abbb7c897

I've just come up with an idea to enhance a few crucial suggestions. I'll be updating some REP rules, consequently deprecating the previous dictionary.

I didn't understand the problem was limited to suggestions. Might be worth filing a bug if you can pinpoint the issue to a performance problem.

English has a minimal affix file and still uses ISO8859-1 encoding, and yet it often provides spelling suggestions that are completely off, so I always assumed it was due to Hunspell limitations.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

(In reply to Francesco Lodolo [:flod] from comment #3)

English has a minimal affix file and still uses ISO8859-1 encoding, and yet it often provides spelling suggestions that are completely off, so I always assumed it was due to Hunspell limitations.

Indeed, it has the limitation that it only corrects one error; for more complex misspellings, you have to rely on NGRAM suggestions.

But I just ran the en-US dictionary to check misspellings from https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines, and the Firefox dictionary would offer the right spelling for 92% of them. Good enough?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: