Closed
Bug 360240
Opened 18 years ago
Closed 3 years ago
Spell checking should allow for word groups, abbreviations, hyphenated words
Categories
(Core :: Spelling checker, defect)
Core
Spelling checker
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: hendrik, Unassigned)
References
()
Details
Sometimes, words cannot occur by themselves but have to be followed/preceded by another one in order to be correct. Some spell checking word lists contain such word groups. However, FF still marks the offending word as incorrect. The same applies for common abbreviations of words with dashes.
Examples: in Dutch, ‘laag-bij-de-gronds’ is correct, and is in the word list which is being assembled by the OpenTaal project (www.opentaal.org, the list can be found here: https://www.uitwisselplatform.nl/frs/download.php/178/nl_NL-pack_b3.xpi, there still seems to be a problem with the packaging).
Now, when this language is chosen, ‘gronds’ is marked as wrong, which would be correct if it were a single word, but it should be marked correct in the above phrase. Similar applies for English spelling with abbreviations like ‘i.e.’, where the i is marked as wrong.
Another one would be ‘a priori’, where ‘priori’ is not supposed to be considered correct if the ‘a’ is not present.
Comment 1•18 years ago
|
||
laag-bij-de-gronds is an existing word in the dictionary downloaded here: https://addons.mozilla.org/firefox/3291/
Assignee: nobody → mscott
Component: Form Manager → Spelling checker
Product: Firefox → Core
QA Contact: form.manager → spelling-checker
Version: 2.0 Branch → Trunk
Reporter | ||
Comment 2•18 years ago
|
||
(In reply to comment #1)
> laag-bij-de-gronds is an existing word in the dictionary downloaded here:
> https://addons.mozilla.org/firefox/3291/
Can be, but that doesn’t change anything. It accepts ‘laag-bij-de-gronds’ because it accepts every single word in it, even ‘gronds’, which is not correct. (It does occur in Dutch pages, but mostly in the combination above, where people often (incorrectly!) leave the dashes out.)
Comment 3•18 years ago
|
||
Hardware and OS should be "all"
Updated•18 years ago
|
OS: Linux → All
Hardware: PC → All
I don't know whether this should be classified as a bug or enhancement, but I agree that it would be nice for the spellchecker to support phrases. Multiple words are not treated as a single unit even when they are added to the dictionary as such.
Comment 5•18 years ago
|
||
The fact that words with dashes are not treated as single units is a bug for Irish. We have many words like "dea-scéal" for which the "dea" prefix is not a correct word on its own, but FF/TB flag it as an error.
Even more bothersome (since they are much much more common) are words like "t-ainm", "n-ainm", where the single "t" or "n" is flagged as an error. If it's too complicated to solve the general problem (compare the similar OOo bug: http://www.openoffice.org/issues/show_bug.cgi?id=64400) I'd be satisfied if the spellchecker just ignored one-letter words, as is common practice for standalone checkers like ispell/aspell.
Comment 6•17 years ago
|
||
iMO, the real solution is to stop treating hyphenated words as separate words
and treat them as a single word. Today, it is possible to add a hyphenated
word to the dictionary, but the spell checker fails to ever match on the
added definition, because it never checks hyphenated words as a single word.
The name Wan-Teh frequently appears in my correspondence. The spell checker
always triggers on Teh. I do not want to add Teh to the dictionary, because
it is a common misspelling of The. But I do want the hyphenated word to be
in the dictionary, and to stop triggering spelling errors.
Summary: Spell checking should allow for word groups, abbreviations, words with dashes... → Spell checking should allow for word groups, abbreviations, hyphenated words
Comment 7•16 years ago
|
||
This is the problem for Ukrainian too, the words with dash often are correct even if the component words inside it are not. Currently hunspell allows to treat Ukrainian words right with two lines in affix file:
WORDCHARS -
BREAK 1
BREAK -
But Firefox does not pass compound words to spellchecker.
Updated•16 years ago
|
Assignee: mscott → nobody
Comment 8•16 years ago
|
||
Hunspell can check word groups, abbreviations and hyphenated words, so this is a tokenization and spell checker usage problem in Mozilla. Recently integrated Hunspell (version 1.2.8) has an improved BREAK method with better suggestions and with default tokenization of hyphenated words (Changelog: http://sourceforge.net/project/shownotes.php?group_id=143754&release_id=637489). Hyphen character will be a word character in OpenOffice.org (http://www.openoffice.org/issues/show_bug.cgi?id=64400) and it could be in Mozilla to solve this and other issues (See Bug 355178).
I can give some English examples that still get marked with a recent nightly. (Worth noting that I'm basing this on SeaMonkey trunk)
Las Vegas -- "Las" is flagged
vice versa -- "versa" is flagged
Notre Dame -- "Notre" is flagged
Abu Dhabi
children's -- irregular plural possessive
women's -- sometimes flagged, sometimes not
Los Angeles
de facto
Comment 10•15 years ago
|
||
Children's, women's are a different issue -- lack of 's possessive rule for those nouns -- and will be fixed by bug 479334.
Hyphenation issue is Bug 355178.
Multi-word group issue ("Las Vegas") is slightly different so I will leave this bug open, marking it dependent on Bug 355178.
Depends on: 355178
Updated•14 years ago
|
Comment 12•14 years ago
|
||
There seems to be some development:
http://ehsanakhgari.org/blog/2011-02-09/important-changes-firefox-4-spell-checker
Comment 13•3 years ago
|
||
All the referenced bugs are fixed now.
All the examples from comment 9 work now except for "Abu Dhabi" which I assume is then just missing in the dictionary.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•