Closed
Bug 466127
Opened 17 years ago
Closed 16 years ago
The tokenization of words for spellcheck is wrong when there is a hyphen ('-')in the word.
Categories
(Core :: Spelling checker, defect)
Core
Spelling checker
Tracking
()
RESOLVED
DUPLICATE
of bug 355178
People
(Reporter: in_fant, Unassigned)
Details
Attachments
(1 file)
|
23.61 KB,
image/png
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.18) Gecko/20081029 Firefox/2.0.0.18
Build Identifier: Gecko rv:1.8.*, Gecko rv:1.9.*
Engine-Gecko-incorrectly-tokenizes-words-for-the-spelling-module.
For-example-in-this-text-of-a-word-are-"correctly"-distinguished,-in-spite-of-the-fact-that-in-it-there-is-no-blank!!!
To-be-convinced-of-it,-simply-copy-this-text-in-any-text-field-and-check-up-its-spelling.
Reproducible: Always
Steps to Reproduce:
1. Copy the text resulted above in any text box.
2. Check up its spelling.
P.S. For check the installed dictionaries are necessary for appropriate language.
Actual Results:
The hyperword with, consisting of words divided by hyphens is not underlined as erroneous as the hyperword is distinguished as separate words.
Expected Results:
Hyperword underline as absent in the dictionary.
In some languages, for example in Russian, the dash can be a part (character) of compound words: "кто-то" (someone), "где-либо" (somewhere), "когда-нибудь" (sometime), etc. Because of described above an error these words are divided into two separate words not all from which exist separately. For example a word "нибудь".
Hunspell by means of parameter WORDCHARS allows extends tokenizer with additional word character. For example, dot, dash, n-dash, numbers, percent sign are word character in Hungarian. However Gecko transfers for the analysis hunspell already two separate words instead of one compound.
Comment 2•17 years ago
|
||
WFM in Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.1b2pre) Gecko/20081122 Minefield/3.1b2pre ID:20081122072820
Looks like it was fixed in Hunspell 1.2.3, see http://sourceforge.net/project/shownotes.php?release_id=607512&group_id=143754
Has passed one and a half month, and the bug all in not confirmed hangs. It so quits, what only for me such behaviour is observed?
By the way, OpenOffice.org behaves similarly, see a screenshot.
Comment 4•17 years ago
|
||
Development version of Firefox can handle this problem correctly. Without any dictionary modification the newly integrated Hunspell 1.2.8 can break the input token at hyphens, like the recent tokenizator (default back compatibility), but Hunspell checks also the whole token (with hyphen, like in "кто-то" or "scot-free") before its tokenization. (This tokenization can be modified by the BREAK parameter of the Hunspell affix file.) Only task for Mozilla development is to change the tokenization at hyphens. This report is a duplicate of the Bug 355178.
Comment 5•17 years ago
|
||
About OpenOffice.org see http://www.openoffice.org/issues/show_bug.cgi?id=64400. I hope, this problem will be fixed in OOo 3.1, too.
Comment 6•16 years ago
|
||
co-ordinating
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•