Closed Bug 306336 Opened 15 years ago Closed 13 years ago
Spelling does not handle compound words (prefixes and postfixes)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6 In a lot of Western European languages you create new words by combining old words. The spellchecker in Thunderbird does not handle this very well. For example, if I have the words 'sail', 'boat' and 'name' in my dictionary and spellcheck the word 'sailboat' it suggests 'sail boat' and you have to add 'sailboat' if you want to avoid a warning on this word. If you make it even more complicated write 'sailboatname' (a perfectly valid compound word in many European languages) you will have to add that word too, even if you have all the words it is compounded from in your dictionary. If the spell checker can't handle situations like this dictionaries must be enormous to handle all possible combinations of word. Size might not be an issue today but it might slow the spell checking down and no dictionary will ever be close to complete. It is just not possible. Non-complete dictionaries are 'less useful'. Reproducible: Always Expected Results: The spellchecker should handle compound words. It should also support some grammatical rules because some compound words sometimes need a 'joint letter' to be valid, for example, in the sailboat case an 'S' might be necessary: 'sailboatSname'. This, of course, varies between different languages. My suggestion is that the spellchecker as a first step, when it encounters a word it suspects is misspelled, should try to see if it could combine words in the simplest possble way, i.e., by just trying to find a concatenation of existing words, that matches the suspected word. If it does, and it is a language where compound words are common, it should accept the spelling.
How does OOo handle that?
this was just added to myspell recently and is in the latest ooo beta. The next time we do our regular myspell update on the trunk, we'll probably end up icking this up.
AFAIK, support for compounds was added with bug 240600. Thunderbird builds from trunk and 1.8 branch, with a Swedish dictionary, happily accepts compounds like "segelbåtsnamnförvirring". Tb 1.0.6 does not however. There have been older versions of the Swedish dictionary around that didn't have support for compounds, so be sure to get the latest version from http://www.mozilla.org/products/thunderbird/dictionaries.html Bug 240600, comment#25 says that this was also fixed on the aviary branch, but it seems it was only checked in to AVIARY_1_0_20040515_BRANCH, but not to AVIARY_1_0_1_20050124_BRANCH where Tb 1.0.2 and 1.0.6 comes from.
I don't understand what you mean, can I upgrade the dictionary to add support for compound words in TB 1.0.6? I have the most recent version of the dictionaries and they still don't support compounds. Btw, where can I report misspellings in the dictionaries available at http://www.mozilla.org/products/thunderbird/dictionaries.html? Specifically the word 'återkomm'.
No, Thunderbird 1.0.6 can't handle compounds, no matter what dictionary you use. But please try a nightly build from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/ and see if you can reproduce this bug. Compounds should work in these new Thunderbird builds.
How stable are those nighlty builds? I am using TB on a production machine and I want to avoid being forced to recreate my profile, lose mail, ask my admins to restore a backup and so on. Do TB 1 and newer nightly builds use the same format for profiles or do installing a newer build require me to create a new profile and similar issues?
the profile format is the same, and in general, you can go back and forth between builds while using the same profile. We're always backwards compatible and we try to maintain forward compatibility as well. The stability of the trunk nightly builds varies from day to day - they're mainly recommended for testing. However, the 1.8 branch builds are relatively stable, and should be close to beta quality.
db, can you verify this works in the latest builds?
I don't understand the version numbering in that directory. Is this version <http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/thunderbird-1.4.en-US.mac.dmg> supposed to have a fix for the compound word bug?
I downloaded the linked version TB but I can't install a dictionary in it. I go through the usual steps: 1. download the xpi 2. install extension with no error message 3. restart TB but no new dictionary appears in the dicitionary pop-up in prefs -> spelling
Even the OpenOffice implementation of myspell has many difficulties with compound words. The Thunderbird 1.5 branch has a better myspell that capable of somehow handle compound words with a proper dictionary. The 1.0.x branch is unusable eg. with Hungarian, as Hungarian is really an agglutinative language, every 2nd words is compound... The same is for German and Germanic languages, and according to some other researches half of the world languages. BUT the big news is that there is hunspell. Hunspell was initally developed as a Hungarian version of myspell (some features were backported to myspell, too), but now the project went far beyond. It seems that Hunspell will replace MySpell in OOo 2.0.1 or 2.0.2, and it will be the default spell checker. I suggest the move to hunspell in mozilla too. Here you can find the transition log for OOo: http://qa.openoffice.org/issues/show_bug.cgi?id=52383 The project homepage is here: http://hunspell.sourceforge.net/ The source downloadable from here: http://sourceforge.net/project/showfiles.php?group_id=143754 And the comment of MySpell owner: From Kevin B. Hendricks (author of Myspell) about public prerelease: " You have really grown past MySpell and fixed most of its main faults. I am truly impressed. I think this is great and would love to replace Myspell with your HunSpell officially in OOo if you are willing to make sure of the following: " Let me know your opinion! Zoltan
*** Bug 355017 has been marked as a duplicate of this bug. ***
Confirming based on a dupe. Reassigning component to core.
Status: UNCONFIRMED → NEW
Component: General → Spelling checker
Ever confirmed: true
Product: Thunderbird → Core
QA Contact: spelling-checker
Summary: Spelling does not handle compound words → Spelling does not handle compound words (prefixes and postfixes)
Version: unspecified → Trunk
This is fixed by bug 319778, which replaced MySpell with Hunspell. This is only available in trunk builds yet, so it will be in Firefox 3 and Thunderbird 3. I installed the German Hunspell dictionary from here: http://www.j3e.de/ispell/igerman98/ http://j3e.de/hunspell/de_DE.zip It accepts Segelbootkapitän despite only Segelboot and Kapitän being in the dictionary.
Depends on: hunspell
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9 M8
You need to log in before you can comment on or make changes to this bug.