Closed Bug 466127 Opened 17 years ago Closed 16 years ago

The tokenization of words for spellcheck is wrong when there is a hyphen ('-')in the word.

Categories

(Core :: Spelling checker, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 355178

People

(Reporter: in_fant, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.18) Gecko/20081029 Firefox/2.0.0.18 Build Identifier: Gecko rv:1.8.*, Gecko rv:1.9.* Engine-Gecko-incorrectly-tokenizes-words-for-the-spelling-module. For-example-in-this-text-of-a-word-are-"correctly"-distinguished,-in-spite-of-the-fact-that-in-it-there-is-no-blank!!! To-be-convinced-of-it,-simply-copy-this-text-in-any-text-field-and-check-up-its-spelling. Reproducible: Always Steps to Reproduce: 1. Copy the text resulted above in any text box. 2. Check up its spelling. P.S. For check the installed dictionaries are necessary for appropriate language. Actual Results: The hyperword with, consisting of words divided by hyphens is not underlined as erroneous as the hyperword is distinguished as separate words. Expected Results: Hyperword underline as absent in the dictionary.
In some languages, for example in Russian, the dash can be a part (character) of compound words: "кто-то" (someone), "где-либо" (somewhere), "когда-нибудь" (sometime), etc. Because of described above an error these words are divided into two separate words not all from which exist separately. For example a word "нибудь". Hunspell by means of parameter WORDCHARS allows extends tokenizer with additional word character. For example, dot, dash, n-dash, numbers, percent sign are word character in Hungarian. However Gecko transfers for the analysis hunspell already two separate words instead of one compound.
WFM in Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.1b2pre) Gecko/20081122 Minefield/3.1b2pre ID:20081122072820 Looks like it was fixed in Hunspell 1.2.3, see http://sourceforge.net/project/shownotes.php?release_id=607512&group_id=143754
Has passed one and a half month, and the bug all in not confirmed hangs. It so quits, what only for me such behaviour is observed? By the way, OpenOffice.org behaves similarly, see a screenshot.
Development version of Firefox can handle this problem correctly. Without any dictionary modification the newly integrated Hunspell 1.2.8 can break the input token at hyphens, like the recent tokenizator (default back compatibility), but Hunspell checks also the whole token (with hyphen, like in "кто-то" or "scot-free") before its tokenization. (This tokenization can be modified by the BREAK parameter of the Hunspell affix file.) Only task for Mozilla development is to change the tokenization at hyphens. This report is a duplicate of the Bug 355178.
About OpenOffice.org see http://www.openoffice.org/issues/show_bug.cgi?id=64400. I hope, this problem will be fixed in OOo 3.1, too.
co-ordinating
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: