Closed Bug 306336 Opened 19 years ago Closed 17 years ago

Spelling does not handle compound words (prefixes and postfixes)

Categories

(Core :: Spelling checker, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.9alpha8

People

(Reporter: lagrave+bugs+mozilla.org, Assigned: mscott)

References

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

In a lot of Western European languages you create new words by combining old
words. The spellchecker in Thunderbird does not handle this very well.

For example, if I have the words 'sail', 'boat' and 'name' in my dictionary and
spellcheck the word 'sailboat' it suggests 'sail boat' and you have to add
'sailboat' if you want to avoid a warning on this word. If you make it even more
complicated write 'sailboatname' (a perfectly valid compound word in many
European languages) you will have to add that word too, even if you have all the
words it is compounded from in your dictionary.

If the spell checker can't handle situations like this dictionaries must be
enormous to handle all possible combinations of word. Size might not be an issue
today but it might slow the spell checking down and no dictionary will ever be
close to complete. It is just not possible. Non-complete dictionaries are 'less
useful'.

Reproducible: Always



Expected Results:  
The spellchecker should handle compound words. It should also support some
grammatical rules because some compound words sometimes need a 'joint letter' to
be valid, for example, in the sailboat case an 'S' might be necessary:
'sailboatSname'. This, of course, varies between different languages.

My suggestion is that the spellchecker as a first step, when it encounters a
word it suspects is misspelled, should try to see if it could combine words in
the simplest possble way, i.e., by just trying to find a concatenation of
existing words, that matches the suspected word. If it does, and it is a
language where compound words are common, it should accept the spelling.
How does OOo handle that?
this was just added to myspell recently and is in the latest ooo beta. The next
time we do our regular myspell update on the trunk, we'll probably end up icking
this up.
AFAIK, support for compounds was added with bug 240600. Thunderbird builds from
trunk and 1.8 branch, with a Swedish dictionary, happily accepts compounds like
"segelbåtsnamnförvirring". Tb 1.0.6 does not however.

There have been older versions of the Swedish dictionary around that didn't have
support for compounds, so be sure to get the latest version from
http://www.mozilla.org/products/thunderbird/dictionaries.html

Bug 240600, comment#25 says that this was also fixed on the aviary branch, but
it seems it was only checked in to AVIARY_1_0_20040515_BRANCH, but not to
AVIARY_1_0_1_20050124_BRANCH where Tb 1.0.2 and 1.0.6 comes from.
I don't understand what you mean, can I upgrade the dictionary to add support
for compound words in TB 1.0.6? I have the most recent version of the
dictionaries and they still don't support compounds.

Btw, where can I report misspellings in the dictionaries available at
http://www.mozilla.org/products/thunderbird/dictionaries.html? Specifically the
word 'återkomm'.
No, Thunderbird 1.0.6 can't handle compounds, no matter what dictionary you use.
But please try a nightly build from
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/
and see if you can reproduce this bug. Compounds should work in these new
Thunderbird builds. 
How stable are those nighlty builds? I am using TB on a production machine and I
want to avoid being forced to recreate my profile, lose mail, ask my admins to
restore a backup and so on.

Do TB 1 and newer nightly builds use the same format for profiles or do
installing a newer build require me to create a new profile and similar issues?
the profile format is the same, and in general, you can go back and forth
between builds while using the same profile. We're always backwards compatible
and we try to maintain forward compatibility as well.

The stability of the trunk nightly builds varies from day to day - they're
mainly recommended for testing. However, the 1.8 branch builds are relatively
stable, and should be close to beta quality.
db, can you verify this works in the latest builds?
I don't understand the version numbering in that directory.

Is this version

<http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-mozilla1.8/thunderbird-1.4.en-US.mac.dmg>

supposed to have a fix for the compound word bug?
I downloaded the linked version TB but I can't install a dictionary in it. I go
through the usual steps:

1. download the xpi
2. install extension with no error message
3. restart TB but no new dictionary appears in the dicitionary pop-up in prefs
-> spelling
Even the OpenOffice implementation of myspell has many difficulties with compound words. The Thunderbird 1.5 branch has a better myspell that capable of somehow handle compound words with a proper dictionary. The 1.0.x branch is unusable eg. with Hungarian, as Hungarian is really an agglutinative language, every 2nd words is compound... The same is for German and Germanic languages, and according to some other researches half of the world languages.

BUT the big news is that there is hunspell. Hunspell was initally developed as a Hungarian version of myspell (some features were backported to myspell, too), but now the project went far beyond. It seems that Hunspell will replace MySpell in OOo 2.0.1 or 2.0.2, and it will be the default spell checker.

I suggest the move to hunspell in mozilla too. Here you can find the transition log for OOo:
http://qa.openoffice.org/issues/show_bug.cgi?id=52383

The project homepage is here:
http://hunspell.sourceforge.net/

The source downloadable from here:
http://sourceforge.net/project/showfiles.php?group_id=143754

And the comment of MySpell owner:
From Kevin B. Hendricks (author of Myspell) about public prerelease:
"
You have really grown past MySpell and fixed most of its main  
faults.  I am truly impressed.

I think this is great and would love to replace Myspell with your  
HunSpell officially in OOo if you are willing to make sure of the  
following:
"

Let me know your opinion!

Zoltan
*** Bug 355017 has been marked as a duplicate of this bug. ***
Confirming based on a dupe.  Reassigning component to core.
Status: UNCONFIRMED → NEW
Component: General → Spelling checker
Ever confirmed: true
Product: Thunderbird → Core
QA Contact: spelling-checker
Summary: Spelling does not handle compound words → Spelling does not handle compound words (prefixes and postfixes)
Version: unspecified → Trunk
This is fixed by bug 319778, which replaced MySpell with Hunspell. This is only available in trunk builds yet, so it will be in Firefox 3 and Thunderbird 3.

I installed the German Hunspell dictionary from here:
http://www.j3e.de/ispell/igerman98/
http://j3e.de/hunspell/de_DE.zip

It accepts Segelbootkapitän despite only Segelboot and Kapitän being in the dictionary.
Depends on: 319778
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9 M8
You need to log in before you can comment on or make changes to this bug.