Hunspell doesn't recognize misspelled words if they are in different encoding

NEW
Unassigned

Status

()

Core
Spelling checker
9 years ago
6 years ago

People

(Reporter: rail, Unassigned)

Tracking

Trunk
x86
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090101 Minefield/3.2a1pre Ubiquity/0.1.4
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090101 Minefield/3.2a1pre Ubiquity/0.1.4

Hunspell cannot properly handle words which use encoding not "compatible" with the current dictionary encoding. At least this is true for inline spell checking.

Reproducible: Always

Steps to Reproduce:
1. Set the current dictionary to en-US (which uses ISO8859-1)
2. Write a _wrong_ word which uses non-western symbols. For example, I use Russian word "тестх" (right one is "тест")

Actual Results:  
Wrong word is not underlined.

Expected Results:  
Wrong word should be underlined.

The problem can be worked around if we change the encoding of en-US dictionary (s/SET ISO8859-1/SET UTF-8/ and recode if needed).
Not investigated, but I think, there are some lost after character conversion within hunspell module.
BTW, hunspell used in OpenOffice.org returns the expected result.
(Reporter)

Updated

9 years ago
Version: unspecified → Trunk
Confirmed with Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.2a1pre) Gecko/20081226 Minefield/3.2a1pre ID:20081226103856
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 2

9 years ago
s/SET ISO8859-1/SET UTF-8/ definitely not enough. I get the following errors with this change: 

This UTF-8 encoding can't convert to UTF-16:
smörgåsbord
UTF-8 encoding error. Missing continuation byte in 5. character position:
soigné

So definitely need to recode.
You need to log in before you can comment on or make changes to this bug.