Closed Bug 471799 Opened 16 years ago Closed 2 years ago

Hunspell doesn't recognize misspelled words if they are in different encoding

Categories

(Core :: Spelling checker, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: rail, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090101 Minefield/3.2a1pre Ubiquity/0.1.4
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090101 Minefield/3.2a1pre Ubiquity/0.1.4

Hunspell cannot properly handle words which use encoding not "compatible" with the current dictionary encoding. At least this is true for inline spell checking.

Reproducible: Always

Steps to Reproduce:
1. Set the current dictionary to en-US (which uses ISO8859-1)
2. Write a _wrong_ word which uses non-western symbols. For example, I use Russian word "тестх" (right one is "тест")

Actual Results:  
Wrong word is not underlined.

Expected Results:  
Wrong word should be underlined.

The problem can be worked around if we change the encoding of en-US dictionary (s/SET ISO8859-1/SET UTF-8/ and recode if needed).
Not investigated, but I think, there are some lost after character conversion within hunspell module.
BTW, hunspell used in OpenOffice.org returns the expected result.
Version: unspecified → Trunk
Confirmed with Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.2a1pre) Gecko/20081226 Minefield/3.2a1pre ID:20081226103856
Status: UNCONFIRMED → NEW
Ever confirmed: true
s/SET ISO8859-1/SET UTF-8/ definitely not enough. I get the following errors with this change: 

This UTF-8 encoding can't convert to UTF-16:
smörgåsbord
UTF-8 encoding error. Missing continuation byte in 5. character position:
soigné

So definitely need to recode.

It seems this is in multiple language design area in some way, and I think it works now if I understand this correctly.

Status: NEW → RESOLVED
Closed: 2 years ago
Depends on: 69687
Resolution: --- → WORKSFORME

The behaviour was changed in Bug 1773802 when multiple dictionaries are enabled. If only a single dictionary is enabled, we still don't spellcheck words in using other encodings. This was a deliberate choice to not change the behaviour for users with a single dictionary.

You need to log in before you can comment on or make changes to this bug.