Last Comment Bug 319778 - (hunspell) Replace MySpell with HunSpell
(hunspell)
: Replace MySpell with HunSpell
Status: VERIFIED FIXED
:
Product: Core
Classification: Components
Component: Spelling checker (show other bugs)
: Trunk
: All All
: -- enhancement with 16 votes (vote)
: mozilla1.9alpha8
Assigned To: Scott MacGregor
:
:
Mentors:
Depends on: 391147 391447
Blocks: 338291 254814 306336 335813 340362 366077 379847 381860 391659 397150 403347 413950
  Show dependency treegraph
 
Reported: 2005-12-10 02:04 PST by Petres, Zoltan
Modified: 2014-04-26 03:21 PDT (History)
81 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch v1, based on hunspell-1.1.4 (706.30 KB, patch)
2006-03-21 11:51 PST, Michiel van Leeuwen (email: mvl+moz@)
no flags Details | Diff | Splinter Review
patch to upgrade myspell to hunspell (435.79 KB, patch)
2006-11-12 08:42 PST, Caolan McNamara
no flags Details | Diff | Splinter Review
Hunspell 1.1.5 patch to Firefox 2.0.0.2 (527.89 KB, patch)
2007-03-20 09:57 PDT, Németh László
no flags Details | Diff | Splinter Review
OpenOffice.org Hunspell en-US dictionary patch (with Mozilla words and several bug fixes) (735.51 KB, patch)
2007-03-20 10:02 PDT, Németh László
no flags Details | Diff | Splinter Review
Hunspell 1.1.5 patch to Firefox 2.0.0.2 (fixed) (527.89 KB, patch)
2007-03-27 00:25 PDT, Németh László
no flags Details | Diff | Splinter Review
OpenOffice.org Hunspell en-US dictionary patch with Mozilla words and several bug fixes (fixed) (734.97 KB, patch)
2007-03-27 00:31 PDT, Németh László
no flags Details | Diff | Splinter Review
Hunspell 1.1.5 trunk patch WIP (537.12 KB, patch)
2007-06-05 21:21 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
OOo Hunspell en-US dictionary trunk patch (796.70 KB, patch)
2007-06-05 21:23 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
Hunspell 1.1.5 trunk patch WIP2 (536.95 KB, patch)
2007-06-06 07:11 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
Hunspell 1.1.5 trunk patch WIP3 (533.21 KB, patch)
2007-06-17 18:37 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
Hunspell 1.2.0beta WIP4 - Hunspell Bits (282.33 KB, application/zip)
2007-06-23 07:49 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details
Hunspell 1.2.0beta WIP4 - Mozilla Bits (37.80 KB, patch)
2007-06-23 07:51 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
Hunspell 1.2.0beta WIP5 - Hunspell Bits (285.30 KB, application/x-zip-compressed)
2007-06-27 22:00 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details
Hunspell 1.2.0beta WIP5.1 - Hunspell Bits (285.74 KB, application/x-zip-compressed)
2007-06-27 22:39 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details
Hunspell 1.2beta patch to Hunspell 1.1.6 (released version of the next Hunspell) (1.99 KB, patch)
2007-06-29 04:46 PDT, Németh László
no flags Details | Diff | Splinter Review
Hunspell 1.2beta patch to Hunspell 1.1.6 (released version of the next Hunspell) (1.99 KB, patch)
2007-06-29 04:47 PDT, Németh László
no flags Details | Diff | Splinter Review
[checked in]Hunspell 1.1.6 WIP6 - Hunspell Bits (286.91 KB, application/x-zip-compressed)
2007-06-29 09:32 PDT, Ryan VanderMeulen [:RyanVM]
mscott: superreview+
Details
Hunspell Mozilla Bits WIP5 (34.55 KB, patch)
2007-07-09 20:03 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
Hunspell Mozilla Bits WIP5.1 (32.85 KB, patch)
2007-07-09 20:39 PDT, Ryan VanderMeulen [:RyanVM]
mscott: review+
mscott: superreview+
Details | Diff | Splinter Review
Hunspell 1.1.6 to 1.1.7 patch (12.48 KB, patch)
2007-07-15 17:33 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Splinter Review
[checked in] Hunspell 1.1.6 to 1.1.8 patch (33.10 KB, patch)
2007-07-16 15:19 PDT, Ryan VanderMeulen [:RyanVM]
mscott: review+
Details | Diff | Splinter Review

Description Petres, Zoltan 2005-12-10 02:04:28 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

The MySpell is not developed for quite a long time either in Mozilla and in OpenOffice.org. HunSpell is a fork of MySpell, it has great improvements is several ways, and fully backward compatible with myspell. From OpenOffice.org 2.0.2, Hunspell will be the default spell check engine. I guess users of Mozilla would also benefit from it.

From HunSpell's homepage:
"Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex word compounding or character encoding. Hunspell interfaces: Ispell-like terminal interface using Curses library, Ispell pipe interface, OpenOffice.org UNO module.

Hunspell's code base comes from the OpenOffice.org MySpell. Hunspell is designed to eventually replace Myspell in OpenOffice.org.

Main features of Hunspell spell checker and morphological analyzer:

- Unicode support (first 65535 Unicode character)

- morphological analysis (in custom item and arrangement style)

- Max. 65535 affix classes and twofold affix stripping (for agglutinative languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.)

- Support complex compoundings (for example, Hungarian and German)

- Support language specific algorithms (for example, handling Azeri and Turkish dotted i, or German sharp s)

- Handling conditional affixes, circumfixes, fogemorphemes, forbidden words, pseudoroots and homonyms.

- LGPL license"

You can download it from:
http://hunspell.sourceforge.net

Zoltan


Reproducible: Always
Comment 1 Michiel van Leeuwen (email: mvl+moz@) 2005-12-10 02:50:50 PST
> - LGPL license"

And that's a bug problem. lgpl can't just be put into the mozilla tree. It should be gpl/lgpl/mpl (And mpl would be the problem. It allows more than lgpl allows)
Comment 2 zug_treno 2005-12-10 15:57:35 PST
See also Core bug 307052 comment 22.
Comment 3 Andras Timar 2005-12-11 01:15:24 PST
The license should not be a problem. It can be changed to LGPL/MPL if required. I would like to know if Mozilla developers are interested in integrating hunspell library, if there are technical issues and if it could be done, how long it would take. Meanwhile I'll talk to author of hunspell and ask for a license change. I have talked to him about this issue a couple of times. He is not against extending the license. If Mozilla developers send a clear message that they want hunspell, the license issue can be sorted out.
Comment 4 Andras Timar 2006-01-05 14:03:29 PST
Hunspell 1.1.3 was released today. License changed to GPL/LGPL/MPL.
http://sourceforge.net/project/shownotes.php?group_id=143754&release_id=383043
Comment 5 Németh László 2006-01-07 17:34:53 PST
It seems, spell checking is a wanted feature also in the browser: Google toolbar (http://toolbar.google.com/firefox/index.html), SpellBound (http://spellbound.sourceforge.net/) or Linspire inline spell checking patch (https://bugzilla.mozilla.org/show_bug.cgi?id=58612).

With Hunspell, Firefox could support much more users, as MySpell: Hunspell is a real international spell checker. On Hunspell homepage, there are 11 UTF-8 encoded Aspell dictionaries converted to Hunspell format (Hunspell handles also MySpell dictionaries): Amharic (አማርኛ), Azerbaijani (Azərbaycanca), Bengali (বাংলা), Kashubian (Kaszëbsczi), Persian (فارسی), Hindi (हिंदी), Marathi (मराठी), Oriya (ଓଡ଼ିଆ), Punjabi (ਪੰਜਾਬੀ), Tamil (தமிழ்), Vietnamese (Việt ngữ). There are dictionary developers (and dictionaries) of Akan, Arabic, Finnish (http://www.hunspell-fi.org/), German (http://j3e.de/ispell/igerman98/), Gujarati, Hungarian (http://magyarispell.sourceforge.net), Moore, Nepali, Urdu etc.

Hunspell has improved support also for English (for example, much better suggestions: http://qa.openoffice.org/issues/show_bug.cgi?id=35725) and French (oe ligature, http://lingucomponent.openoffice.org/issues/show_bug.cgi?id=54980), and it is under active development.
Comment 6 Michiel van Leeuwen (email: mvl+moz@) 2006-03-21 11:51:05 PST
Created attachment 215797 [details] [diff] [review]
patch v1, based on hunspell-1.1.4

This patch makes mozilla use hunspell 1.1.4. It was actually not all that hard. I had to port the old myspell based Makefile, rename *.cxx to .cpp, copy some glue from myspell, fix some small problems in #ifdef debug, and things worked.

The problem with this current patch: codesize.
   text    data     bss     dec     hex filename
  77564    2160       8   79732   13774 myspell/src/libmyspell.so
 199938   66856       8  266802   41232 hunspell/libhunspell.so

and disk size:

-rwxrwxr-x 1 michiel michiel 681101 Mar 20 23:29 hunspell/libhunspell.so
-rwxrwxr-x 1 michiel michiel 366878 Mar 20 23:22 myspell/src/libmyspell.so

This are linux debug builds, but I expect the situation to be even worse for opt builds, because a lot of the size is due to charset related tables.
For myspell, a trick is used where teh tables are build at runtime. That trick can be ported, but that means a deviation from the original source. Do we want to go that road?
Comment 7 Scott MacGregor 2006-03-21 12:25:33 PST
Thanks for jumping in on this mvl. I'm not sure what the best course of action is for the charset tables at the moment.

Some unrelated questions (not necessarily for you)

1) Do we need to use new dictionary files to take advantage of hunspell? 
2) If there are new dictionaries, any chance they have an improved license story than lgpl which currently prevents us from shipping non english dictionaries? That would be a huge win :).

Comment 8 Michiel van Leeuwen (email: mvl+moz@) 2006-03-21 12:30:45 PST
(In reply to comment #7)
> 1) Do we need to use new dictionary files to take advantage of hunspell? 
Suggestion is improved, even with the old dictionary. With myspell 'permenant' suggested a bunch of words, the top one being 'newspapermen'. With hunspell (and the same dictionary file), it suggests 'permanent' as the top. Looks like an improvement to me :)

I forgot to mention that the glue (mozHun*) is just copied from the myspell files, with some minor tweaks.
Comment 9 Axel Hecht [:Pike] 2006-03-21 12:33:49 PST
Additional question on size, if we used hunspell dictionaries, how would that 
compare in size of the dictionaries themselves?
Comment 10 Frank 2006-03-21 12:47:47 PST
(In reply to comment #7)
> 2) If there are new dictionaries, any chance they have an improved license
> story than lgpl which currently prevents us from shipping non english
> dictionaries? That would be a huge win :).

I'm no expert, but OpenOffice.org also switched to hunspell and now they include their dictionaries in the distribution, at least the Dutch one.
Comment 11 Németh László 2006-03-23 05:07:39 PST
------- Comment #6 from mvl@exedo.nl  2006-03-21 11:51 PST -------
> This patch makes mozilla use hunspell 1.1.4. 

Many thanks for your work!

> The problem with this current patch: codesize and disk size.
> This are linux debug builds, but I expect the situation to be even worse for
> opt builds, because a lot of the size is due to charset related tables.
> For myspell, a trick is used where teh tables are build at runtime. That trick
> can be ported, but that means a deviation from the original source. Do we want
> to go that road?

I'm the author of the Hunspell, and I appreciate every suggestion, patch,
etc, and I would like to help. About charset tables: the Unicode
casing data won't be necessary to include, if Hunspell use external
Unicode case conversion functions (-50 kB Hunspell object code).
The runtime trick for 8-bit tables also can be ported (-13 kB). I can make
some optimization and remove unnecessary experimental codes, so
100 kB reduction is not impossible.

>------- Comment #7 from mscott@mozilla.org  2006-03-21 12:25 PST -------
> Some unrelated questions (not necessarily for you)

> 1) Do we need to use new dictionary files to take advantage of hunspell?

No, we don't. Hunspell has improved suggestions:
http://qa.openoffice.org/issues/show_bug.cgi?id=35725

With Hunspell, we can forbid the suggestion of taboo words
(http://lingucomponent.openoffice.org/issues/show_bug.cgi?id=55498)
and handle ordinal numbers (http://www.openoffice.org/issues/show_bug.cgi?id=53643), see improved English dictionaries of OpenOffice.org 2.0.2.
These features can be ported to the English dictionary of
Mozilla under the appropriate (MPL or other) license.

> 2) If there are new dictionaries, any chance they have an improved license
> story than lgpl which currently prevents us from shipping non english
> dictionaries? That would be a huge win :).

Next version of the Hungarian dictionary will be licensed under MPL, too.
It would be fine to ask other dictionary developers, too.
I will also recommend MPL license for the new dictionary developers.

> ------- Comment #9 from l10n@mozilla.com  2006-03-21 12:33 PST -------
> Additional question on size, if we used hunspell dictionaries, how would that
> compare in size of the dictionaries themselves?

Hunspell dictionaries may be more compact on disk.
For example, the Hungarian dic file was 4 MB, now it is only 1,2 MB
with two-level suffixes and alias compression. 

Also memory usage:
OOo's en_US dictionary uses 4.1 MB memory with MySpell, and 4,5 MB with
Hunspell, but only 3,9 MB with Hunspell's alias compression. I haven't
implemented the two-level suffix compression utility for Hunspell yet,
but I can imagine only 3,5 MB memory usage with two-level suffix
compression. (This compensates the bigger program size.) There
are plans to implement more compact data structure, too.

Memory usage of the affix rich Hungarian dictionary:

          8-bit          UTF-8        UTF-8 + alias compression (1.1.4 feature)
 MySpell  17 MB*           -***           -***
Hunspell  16 MB**         17 MB**      14 MB**

 * Magyar Ispell 0.94 (one-level suffixation for Ispell and MySpell)
** Magyar Ispell 1.0 (two-level suffixation, complete describing
of Hungarian derivational suffixes and extended vocabulary)
*** not supported

(UTF-8 is good for European languages, not only for the new African and
Asian spell checking dictionaries, because we can add foreign proper names
and idioms to the dictionary with correct spelling. OpenOffice.org 2.0.2 has already contained default UTF-8 encoded Hungarian dictionary.
In the English dictionary, for example, we could fix the name of the former Romanian dictator, Ceaușescu. And so on.)
Comment 12 Scott MacGregor 2006-03-24 15:48:09 PST
nemeth, thanks for the information and willingness to help out.

1) mvl, I think it makes sense to port the unicode table generation at runtime patch from myspell to the hunspell core. Sounds like nemeth may be able to do that for us and that way it's in the hunspell core, making it easier for us to synch up changes down the road. Some of nemeth's other optimizations sounded promising as well. 

2) nemeth, can you point me to the new english dictionaries that I assume openoffice is shipping with? We'd probably update our dictionary files in conjunction with moving to hunspell.

3) About licensing of non en-US dictionaries. Great to hear about the hungarian dictionary getting an MPL license. That's been a huge problem for us, lots of localizers would like to ship a dictionary for their language but they can't today due to the licensing issue. Any influence or help you can provide to help encourage relicensing would be *much* appreciated.

4) Did I read your comment correctly that hunspell works for Asian languages? Wow!

I think this change is really exciting and worth pursuing. Would anyone object if I put this on the Thunderbird 2 release train to make sure we don't lose track of it? 


Comment 13 Németh László 2006-03-25 18:13:41 PST
1) Hunspell uses run-time Unicode table generation (It generates a 6*64 kB
conversion table from the 6*8 kB constant data, when Hunspell loads an UTF-8 encoded dictionary). Perhaps Hunspell could use the Unicode conversion functions of Mozilla to reduce memory and disk usage.

2) By anonymous CVS in bash shell:

$ export CVSROOT=:pserver:anoncvs@anoncvs.services.openoffice.org:/cvs
$ cvs login # password is anoncvs
$ cvs co dictionaries/en_{GB,US}

These dictionaries haven't alias compressed yet (Hunspell's alias compression utility (makealias) there is in the src/tools/).

> 3. Any influence or help you can provide to help
> encourage relicensing would be *much* appreciated.

This problem also for the LGPLed OpenOffice.org,
so I will encourage GPL/LGPL/MPL tri-license in Lingucomponent project
of OpenOffice.org.

4) Did I read your comment correctly that hunspell works for Asian languages?

We have a lot of Asian dictionaries (mostly conversions from Aspell project, but Nepali Hunspell dictionary has been developed for OOo), and new developments for Urdu and Tamil. Unfortunatelly, Hunspell doesn't support Thai and Khmer word tokenization (these languages have not word separators traditionally),
and Hunspell handles only the Basic Multilingual Plane of Unicode standard, but this 2^16 character long plane contains most of the recent characters
of the Asian and non Asian languages. Other features of Hunspell are interesting also for European languages: Agglutinative languages of Europe, like 
Basque, Estonian, Finnish, Hungarian, Sami, Turkish have much more support by two-level suffixation and other improvements. Compounding features of Hunspell are useful for Dutch, German, Hungarian, Swedish etc.

Thanks for the questions, and your kind words!
Comment 14 Scott MacGregor 2006-03-31 18:45:25 PST
"These dictionaries haven't alias compressed yet (Hunspell's alias compression
utility (makealias) there is in the src/tools/)."

What's the advantage to using alias compression on the dictionary? Smaller dictionary disk footprint at the cost of run time performance? Does open office use alias compression on the dictionaries they ship?
Comment 15 Michiel van Leeuwen (email: mvl+moz@) 2006-04-01 07:38:09 PST
I tried to copy the trick I used in myspell to generate the case-conversion tables  (all except utf8), and it turned out that the resulting .so was larger. I don't know why, maybe because it has to link in all mozilla's case conversion libraries?
Comment 16 Németh László 2006-04-01 14:20:58 PST
(In reply to comment #14)
> What's the advantage to using alias compression on the dictionary? Smaller
> dictionary disk footprint at the cost of run time performance?

Alias compressed dictionaries have the same or better (CPU cache effect) run time performance, too. (Alias compressed dictionaries use constant flag vectors instead of redundant data. For example, alias
compressed 8-bit en_US dictionary allocates only 2743 flag vector strings
instead of 51,000 strings of the original uncompressed variant).
Only disadvantage is that MySpell cannot handle alias compressed Hunspell dictionaries (MySpell doesn't know affixation).

>  Does open office
> use alias compression on the dictionaries they ship?

No, it hasn't used alias compression, yet. This is a quite new feature (2006-01-05), and I have tested only with Arabic and Hungarian dictionaries (plus test data), yet. It works perfectly, so I will suggest alias compression for other dictionaries of OOo.
Comment 17 Axel Hecht [:Pike] 2006-05-05 05:00:31 PDT
Could we have separate patches for the hunspell landing, plus a diff of the 
changes?

Nemeth, what's #ifdef HUNSTEM?

Could we get rid of the printf statements in hunspell? I know that they are in 
myspell, too, but that doesn't make them better ;-)
I guess a macro that we could replace with NS_WARNING per -D would be great.

The daunting question I have, is 1.1.4 the hunspell version we should convert
to?

I'd still be interested in the use of the mozilla utf-8 code, and maybe a review
on why the dynamic table stuff blew up code size. If it helps, try to build
statically into libxul. We want to ship this for both fx and tb, on by default.
If there's a code size win, we should propose that.
Comment 18 Németh László 2006-05-15 06:40:17 PDT
This week I will make a reduced Hunspell version (Hunspell 1.1.4 minus experimental, morphological etc. codes, separated by a macro) for Mozilla and OpenOffice.org.
(HUNSTEM is for experimental stemming, namely, HUNSTEM dependent code stores words in the recursive compound word analysing function to stem compound words.)
Comment 19 Christian Persch (GNOME) (away; not receiving bug mail) 2006-06-06 14:01:38 PDT
-  mSpellCheckingEngine = do_GetService("@mozilla.org/spellchecker/myspell;1",&rv);
+  mSpellCheckingEngine = do_GetService("@mozilla.org/spellchecker/hunspell;1",&rv);

Can this please be replaced by a neutral contract ID, like "@mozilla.org/spellchecker/engine;1" so  embedders can have a constant contract ID to provide instead of chasing a moving target?
Comment 20 Zibi Braniecki [:gandalf][:zibi] 2006-06-21 15:35:36 PDT
(In reply to comment #19)
> -  mSpellCheckingEngine =
> do_GetService("@mozilla.org/spellchecker/myspell;1",&rv);
> +  mSpellCheckingEngine =
> do_GetService("@mozilla.org/spellchecker/hunspell;1",&rv);
> 
> Can this please be replaced by a neutral contract ID, like
> "@mozilla.org/spellchecker/engine;1" so  embedders can have a constant contract
> ID to provide instead of chasing a moving target?

Same request here!
Comment 21 Mike Hommey [:glandium] 2006-07-17 22:46:11 PDT
Is that targetted for firefox 2.0 ? Or is it already too late ?
Comment 22 Axel Hecht [:Pike] 2006-07-18 01:30:52 PDT
I doubt that this will make Firefox 2.
On the other hand, we don't have a patch yet anyway.

This should land and bake on the trunk first, and we should take it from there.
Comment 23 bjoern 2006-10-27 03:34:01 PDT
from a dictionary maintainer's view of a language with lots of comppound words, myspell is obsoleted by hunspell. Recent versions of the German dictionary used for OOo does only support hunspell and no longer myspell. myspell does not offer enough posibilities for languages with huge usage of compound words. Without compound word support such languages will always be badly supported. hunspell can be seen as a direct successor of myspell, I wonder why Firefox now starts with using myspell.
Comment 24 Kami 2006-10-27 12:04:21 PDT
I love HunSpell. It works perfectly in OpenOffice.org. For example the performance of current spelling capabilites for Hungarina language in Mozilla Firefox is bad. I hope HunSpell is got installed as soon as possible.
Thanks,
KAMI
Comment 25 guanxi 2006-11-04 11:38:34 PST
Slightly OT, but I've asked this question several places with no luck:

What dictionaries will be used, and who maintains them?  http://hunspell.sourceforge.net/ refers to both of the following:
   Aspell:  ftp://ftp.gnu.org/gnu/aspell/dict/en
   Myspell: http://wiki.services.openoffice.org/wiki/Dictionaries

I ask because I'm try to get errors corrected in the Myspell en-us dictionary's affixes (see bug 254814).

Comment 26 Caolan McNamara 2006-11-12 08:42:54 PST
Created attachment 245392 [details] [diff] [review]
patch to upgrade myspell to hunspell

another possibly plausible cut at upgrading myspell to hunspell, same set of changes to myspell reapplied for hunspell + dropin unicode tables and use moz upper/lower case when available (http://qa.openoffice.org/issues/show_bug.cgi?id=71449)
Comment 27 Christopher Aillon (sabbatical, not receiving bugmail) 2007-01-11 09:44:54 PST
So, we have dead code in the mozilla tree.  It's not going to be getting any updates and I doubt mozilla.org wants to maintain it.

We have committment from the Hunspell guy who has even been so kind as to relicense the entire source code as MPL-tri.

We even have a patch which has been sitting for 2 months now.

Who do I poke to look at this?  :)
Comment 28 Michiel van Leeuwen (email: mvl+moz@) 2007-01-11 10:01:33 PST
There also i my patch, which is about 10 months old. 

The newer patch looks hackish to me: it put hunspell in a directory called myspell. Quite confusing.
What is the codesize effect of the second patch? That's where my patch got stuck. (and with me getting busy with other things)
Comment 29 Axel Hecht [:Pike] 2007-01-11 10:02:33 PST
As I said in comment 22, we don't have a patch. We have experimental work in progress, but no patch, as you can tell from the missing follow ups to comment 18.

We're still waiting for a patch that is reasonably stable and gives good code and working set size.
Comment 30 Christopher Aillon (sabbatical, not receiving bugmail) 2007-01-11 12:52:40 PST
Axel, Comment 26 looks like a patch to me.  Nobody commented on that yet AFAICT.  If it's a bad patch, that's fine, but someone should at least say its a bad patch: here's what's wrong with it, and what needs to be done.
Comment 31 Axel Hecht [:Pike] 2007-01-11 13:14:49 PST
"possibly plausible" isn't really convincing to boot with. It doesn't seem to have any review requests either, not that I knew who'd be a good reviewer.

Nemeth on the other hand is the maintainer of hunspell, and looking at the corresponding bug at OOo, that isn't resolved yet, and nemeth has counter arguments to the corresponding patch there.
Comment 32 Axel Hecht [:Pike] 2007-01-11 15:06:11 PST
Comment on attachment 245392 [details] [diff] [review]
patch to upgrade myspell to hunspell

Dao, review requests without requestee are pointless.

I don't see Mozilla going for any hunspell patch without Nemeth's blessing, thus I'm pointing that review request at him.
Comment 33 Németh László 2007-01-17 01:53:21 PST
(In reply to comment #26)
> Created an attachment (id=245392) [details]
> patch to upgrade myspell to hunspell
> 
> another possibly plausible cut at upgrading myspell to hunspell, same set of
> changes to myspell reapplied for hunspell + dropin unicode tables and use moz
> upper/lower case when available
> (http://qa.openoffice.org/issues/show_bug.cgi?id=71449)
> 

(Oops, I have confused this Mozilla patch with Caolan's OOo Hunspell patch. I'm very sorry.) It's wonderful. The original Mozilla integration problem, the big unicode table is removed by Caolan. I will include the Mozilla specific codes in the next Hunspell releases. Thanks a lot.
Comment 34 Németh László 2007-01-17 02:01:04 PST
(In reply to comment #32)
> (From update of attachment 245392 [details] [diff] [review])
> Dao, review requests without requestee are pointless.
> 
> I don't see Mozilla going for any hunspell patch without Nemeth's blessing,
> thus I'm pointing that review request at him.
> 

It's o.k. I will use this patch to Hunspell, too. Thanks once again.
Comment 35 Németh László 2007-01-17 03:02:10 PST
(In reply to comment #25)
> Slightly OT, but I've asked this question several places with no luck:
> 
> What dictionaries will be used, and who maintains them? 
> http://hunspell.sourceforge.net/ refers to both of the following:
>    Aspell:  ftp://ftp.gnu.org/gnu/aspell/dict/en
>    Myspell: http://wiki.services.openoffice.org/wiki/Dictionaries
> 
> I ask because I'm try to get errors corrected in the Myspell en-us dictionary's
> affixes (see bug 254814).
> 

Thanks a lot. I believe, I have made the last modification of MySpell en_US dictionary in OOo's source, but http://wiki.services.openoffice.org/wiki/Dictionaries hasn't updated yet. So I will fix the problem that you reported, and update this Wiki page. I'd like to use also the en_US dictionary patch from Mozilla CVS. (It seems for me that the best place for the future bug reports would be the OpenOffice.org Wiki, using Bug report pages for every languages.)

Comment 36 thanosmoz 2007-01-19 10:52:41 PST
In case it's of some interest to you all.

This feature is currently targeted for Fedora 7: Fix the dictionary proliferation problem (http://fedoraproject.org/wiki/Releases/FeatureDictionary), where they will use hunspell for all gecko based apps, OOo and others.
Comment 37 Christopher Aillon (sabbatical, not receiving bugmail) 2007-02-01 12:20:28 PST
(In reply to comment #36)
> This feature is currently targeted for Fedora 7

For the record, while I agree that we should migrate to hunspell, I was not aware of my name being placed on that as owning anything and haven't signed up for anything on that wiki page and have not committed to a timeline.  I'm not sure who targetted it, but I was not involved in that process.

That said, I would like to see this patch go in, and it looks like we have buy-in from nemeth now...  What can I do to help move this forward?
Comment 38 Michiel van Leeuwen (email: mvl+moz@) 2007-02-01 12:35:37 PST
There is still my comment 28 that has not gotten any answer...
Comment 39 Németh László 2007-02-01 23:58:42 PST
(In reply to comment #28)
> There also i my patch, which is about 10 months old. 
> 
> The newer patch looks hackish to me: it put hunspell in a directory called
> myspell. Quite confusing.
> What is the codesize effect of the second patch? That's where my patch got
> stuck. (and with me getting busy with other things)
> 

Caolan's patch removes the 50 kB character table. I will to publish Hunspell 1.1.5 with this patch within a week, so you will be able to refresh your patch. Thanks in advance.
Comment 40 Michiel van Leeuwen (email: mvl+moz@) 2007-02-03 07:54:48 PST
But what is the total codesize of the new (mozilla) lib, compared to the current codesize of myspell?
Comment 41 Dão Gottwald [:dao] 2007-03-19 02:01:00 PDT
any progress?
Comment 42 Petres, Zoltan 2007-03-19 09:17:13 PDT
Today, a new release (1.1.5) has been published on Hunspell's website (hunspell.sf.net). This release addresses a lot of issues mentioned here in the previous month.
Comment 43 Steffen Wilberg 2007-03-19 10:12:55 PDT
Comment on attachment 245392 [details] [diff] [review]
patch to upgrade myspell to hunspell

Great to hear! So we need new patches here.

Would it make sende to just checkin the original hunspell code into extensions/spellcheck/hunspell and then create a patch with the mozilla-related changes we need? That would make updating to later hunspell revisions easier, and I guess nobody wants to review a >400K patch containing all the changes from myspell to hunspell anyway.

Also see http://mxr.mozilla.org/seamonkey/search?string=myspell for a list of places where we use the term myspell. Those need to be updated if we change the package name and signature.
Comment 44 Németh László 2007-03-19 12:44:18 PDT
(In reply to comment #40)
> But what is the total codesize of the new (mozilla) lib, compared to the
> current codesize of myspell?
> 

There is only 71 kB difference now, thanks for patches of David and Caolan and conditional compiling without experimental code and warning messages:

 libmyspell.so:  94 kB
libhunspell.so: 165 kB
Comment 45 Németh László 2007-03-19 12:59:52 PDT
(In reply to comment #42)
> Today, a new release (1.1.5) has been published on Hunspell's website
> (hunspell.sf.net). This release addresses a lot of issues mentioned here in the
> previous month.
> 

Thanks for the post. I will send also a patch based on the previous patches very soon.

(By the way, release notes of Hunspell 1.1.5:
  - optimizations: 10-100% speed up, smaller code size and memory footprint
    (conditional experimental code and warning messages)

  - extended Unicode support:
    - non BMP Unicode characters in dictionary words and affixes (except
      affix rules and conditions)
    - support BOM sequence in aff and dic file

  - IGNORE feature for Arabic diacritics and other optional characters

  - New edit distance suggestion methods:
    - capitalisation: nasa -> NASA
    - long swap: permenant -> permanent
    - long move: Ghandi -> Gandhi, greatful -> grateful
    - double two characters: vacacation -> vacation
    - spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)

  - patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
    German and Arabic language, etc.)
Comment 46 Scott MacGregor 2007-03-19 13:15:41 PDT
Cool! FYI, once Thunderbird 2 is released, I'll be interested to help shepard hunspell into the tree.
Comment 47 Piotr Komoda 2007-03-19 13:26:59 PDT
(In reply to comment #46)
> Cool! FYI, once Thunderbird 2 is released, I'll be interested to help shepard
> hunspell into the tree.

So when can we expect hunspell to be implemented in Thunderbird? 2.0.0.x or 3.0?
Comment 48 Ryan VanderMeulen [:RyanVM] 2007-03-19 13:46:53 PDT
From what Scott said, I would assume we're talking about the Gecko 1.9 (Fx3 & Tb3) timeframe.
Comment 49 Németh László 2007-03-20 09:57:32 PDT
Created attachment 259100 [details] [diff] [review]
Hunspell 1.1.5 patch to Firefox 2.0.0.2
Comment 50 Németh László 2007-03-20 10:02:43 PDT
Created attachment 259101 [details] [diff] [review]
OpenOffice.org Hunspell en-US dictionary patch (with Mozilla words and several bug fixes)
Comment 51 Németh László 2007-03-20 10:09:27 PDT
Changelog of the attached OpenOffice.org en_US Hunspell dictionary:

2007-03-20 nemeth AT OOo

- alot -> a_lot REP suggestion, add "a lot"
- add Mozilla words (blog, cafe, inline, online, eBay, PayPal, etc.)
- add cybercafé
- alias compression (saving 15 kB disk space + 0.8 MB memory)

Mozilla 355178 - add scot-free
Mozilla 374411 - add Scotty
Mozilla 359305 - add archaeology, archeological, archeologist
Mozilla 358754 - add doughnut
Mozilla 254814 - add gauging, canoeing, *canoing, proactively
Issue 71718 - remove *opthalmic, *opthalmology; *opthalmologic -> ophthalmologic
Issue 68550 - *estoppal -> estoppel
Issue 69345 - add tapenade
Issue 67975 - add assistive
Issue 63541 - remove *dessicate
Issue 62599 - add toolbar

2006-02-07 nemeth AT OOo

Issue 48060 - add ordinal numbers with COMPOUNDRULE (1st, 11th, 101st etc.)
Issue 29112, 55498 - add NOSUGGEST flags to taboo words
Issue 56755 - add sequitor (non sequitor)
Issue 50616 - add open source words (GNOME, KDE, OOo, OpenOffice.org)
Issue 56389 - add Mozilla words (Mozilla, Firefox, Thunderbird)
Issue 29110 - add okay
Issue 58468 - add advisors
Issue 58708 - add hiragana & katakana
Issue 60240 - add arginine, histidine, monovalent, polymorphism, pyroelectric, pyroelectricity

2005-11-01 dnaber AT OOo

Issue 25797 - add proven, advisor, etc.
Comment 52 Steffen Wilberg 2007-03-24 11:19:01 PDT
Comment on attachment 259100 [details] [diff] [review]
Hunspell 1.1.5 patch to Firefox 2.0.0.2

Németh, thanks a lot for working on this yourself. I notice the #ifdef MOZILLA_CLIENT in the hunspell code, I assume that will updating to a newer version in the future much easier. It would be nice nonetheless if you could explain what one has to do to import the latest hunspell code in a README.mozilla like http://mxr.mozilla.org/seamonkey/source/extensions/spellcheck/myspell/src/README.mozilla did.

When trying to apply the patch, I get this error:
patch: **** malformed patch at line 16661: diff -ru mozilla.orig/mailnews/compose/prefs/resources/content/pref-composing_messages.js mozilla/mailnews/compose/prefs/resources/content/pref-composing_messages.js
Comment 53 Steffen Wilberg 2007-03-24 11:41:47 PDT
+++ mozilla/browser/installer/removed-files.in	2007-03-19 15:01:30.000000000 -components/myspell/en-US.dic
-components/myspell/en-US.aff
Please don't remove these two lines. They're in removed-files.in so that the files get removed when installing a newer version into a directory containing an earlier version, because they're now located in (appdir)/dictionaries.

+components/hunspell/en-US.dic
+components/hunspell/en-US.aff
Don't add these two lines either. These two files will never get installed to components/hunspell, so we don't need to remove them.
Comment 54 Németh László 2007-03-27 00:25:59 PDT
Created attachment 259773 [details] [diff] [review]
Hunspell 1.1.5 patch to Firefox 2.0.0.2 (fixed)

fix bad diff
Comment 55 Németh László 2007-03-27 00:31:44 PDT
Created attachment 259774 [details] [diff] [review]
OpenOffice.org Hunspell en-US dictionary patch with Mozilla words and several bug fixes (fixed)
Comment 56 Németh László 2007-03-27 00:45:57 PDT
(In reply to comment #52)
Many thanks for your comments. I have fixed the patches.

README.mozilla: Hunspell and Hunspell in this Mozilla patch are identical
with a small difference: I have commented #include config.h out (but I will remove this difference by MOZILLA_CLIENT dependent code in the next Hunspell release, too).
Comment 57 Mike Hommey [:glandium] 2007-04-09 01:08:50 PDT
Why not just rename the component to a generic name, so that whatever the backend library is, the component remains the same, which would be much better for stuff using it (like epiphany, to name only one) ?
Comment 58 Mike Hommey [:glandium] 2007-04-15 06:16:22 PDT
Actually, it's even worse for epiphany, because it registers its own spell checker with the myspell contract ID, thus gets called by parts of the code refering to the myspell contract ID. Switching to a hunspell constract ID breaks this. The contract ID should really be generic...
Comment 59 Steffen Wilberg 2007-04-21 12:12:09 PDT
Comment on attachment 259773 [details] [diff] [review]
Hunspell 1.1.5 patch to Firefox 2.0.0.2 (fixed)

(In reply to comment #46)
> Cool! FYI, once Thunderbird 2 is released, I'll be interested to help shepard hunspell into the tree.
This patch as well as the dictionary patch apply to the 1.8 branch. It works just fine with Firefox.
To make it work in Thunderbird, you need to do a global s/myspell/hunspell/g in mozilla/mail.

Try some of hunspell's new features:
- capitalization: nasa -> NASA
- long swap: permenant -> permanent
- long move: Ghandi -> Gandhi, greatful -> grateful
- double two characters: vacacation -> vacation

Disksize effect is like this:
 79103 libmyspell.so
145684 libhunspell.so
Comment 60 Ryan VanderMeulen [:RyanVM] 2007-06-05 15:20:29 PDT
mvl, do the latest patches posted here address your previous concerns adequately?
Comment 61 Ryan VanderMeulen [:RyanVM] 2007-06-05 21:19:53 PDT
The forthcoming two patches are the previous two patches ported to the trunk (along with a few other relevant changes), however, they currently don't compile. Here's the error I get during compilation:
mozInlineSpellChecker.cpp
make[5]: *** mozInlineSpellWordUtil.cpp
No rule to make target `../hunspell/src/hunspell_s.lib', needed by `spellchk.lib'.  Stop.

If someone knows what to do for that problem, I'm all ears.

A few other notes:
1.) The relevant Mozilla license.html files need to be updated for Hunspell (and their Myspell equivalents removed). Or do they need to be there at all given that Hunspell is tri-licensed?
2.) Are /tools/module-deps/module2dir.map and /tools/module-deps/all.dot a simple s/myspell/hunspell replacement or does that need to be looked at more carefully?
3.) I'm assuming that the comments at lines 139 and 149 of /extensions/spellcheck/osxspell/src/mozOSXSpell.mm are still relevant to Hunspell.
4.) In /extensions/spellcheck/osxspell/src/mozOSXSpell.h, does the CID need to be changed (line #52)?
5.) I think /extensions/spellcheck/src/mozSpellCheckerFactory.cpp needs some cleanup, but I decided it was outside the scope of this specific patch. However, IMO it'll probably need to be done prior to Hunspell being turned on for real. For example, does the CID need a new revision?

Németh, feel free to take over the maintenance of this patch from here on (assuming we can't get this landed before your next Hunspell release).

Hopefully this is enough to keep Hunspell on the 1.9 radar :-)
Comment 62 Ryan VanderMeulen [:RyanVM] 2007-06-05 21:21:33 PDT
Created attachment 267377 [details] [diff] [review]
Hunspell 1.1.5 trunk patch WIP
Comment 63 Ryan VanderMeulen [:RyanVM] 2007-06-05 21:23:00 PDT
Created attachment 267378 [details] [diff] [review]
OOo Hunspell en-US dictionary trunk patch
Comment 64 Ryan VanderMeulen [:RyanVM] 2007-06-06 07:11:38 PDT
Created attachment 267413 [details] [diff] [review]
Hunspell 1.1.5 trunk patch WIP2

Well, after some makefile manipulation (using the old myspell makefiles as a reference), it compiles! Not only that, but combined with the dictionary patch it actually seems to work quite well!

To date, I've only had a chance to test this patch with my static win32 Firefox build. I don't have time this morning to do a dynamic build, but if nobody else wants to test that, I can do that too.

I don't have access to any Linux or OSX boxes, though, so if anyone can attempt to compile these patches on those OSes and make sure things work OK, it would be appreciated. Thunderbird testing would also probably be a good idea!
Comment 65 Ryan VanderMeulen [:RyanVM] 2007-06-06 07:15:13 PDT
Comment on attachment 267413 [details] [diff] [review]
Hunspell 1.1.5 trunk patch WIP2

mvl and mscott, can you take a look at this? Also, any insight on my questions in comment #61 would be appreciated.
Comment 66 Michiel van Leeuwen (email: mvl+moz@) 2007-06-06 09:18:39 PDT
My questions on filesize are because in the past, download size was an issue. I don't know if that is still true. But anyway, that's for thunderbird/firefox/toolkit drivers to decide on. I asked to give them the right info.

About the review-request, it liekly will take me some time to get around to it. You can leave the request in my queue, as long as you don't have too high hopes.
Comment 67 Ryan VanderMeulen [:RyanVM] 2007-06-07 12:09:42 PDT
mvl, some codesize numbers for you. For dynamic win32 builds, overall codesize increases by 27400 bytes (26.76KB) and for static win32 builds, codesize increases by 23304 bytes (22.76KB), so I wouldn't say that's an overly prohibitive increase given the large benefits Hunspell has over MySpell.

I've also tested the attached patches under every win32 build scenario I could think of (including dynamic + libxul) and all compiled fine and spellchecked worked afterwards.
Comment 68 Magnus Melin 2007-06-17 08:25:16 PDT
Comment on attachment 267413 [details] [diff] [review]
Hunspell 1.1.5 trunk patch WIP2

I tried building this for tb on linux. 

c++  -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wno-long-long -pedantic -fshort-wchar -pthread -pipe  -DNDEBUG -DTRIMMED -O -fPIC -shared -Wl,-z,defs -Wl,-h,libspellchecker.so -o libspellchecker.so  mozSpellCheckerFactory.o mozSpellChecker.o mozPersonalDictionary.o mozEnglishWordUtils.o mozGenericWordUtils.o mozSpellI18NManager.o mozInlineSpellChecker.o mozInlineSpellWordUtil.o    -lpthread    -Wl,--whole-archive ../hunspell/src/libhunspell_s.a -Wl,--no-whole-archive -L../../../dist/bin -L../../../dist/lib -L../../../dist/bin -lxpcom -lxpcom_core  -L../../../dist/bin -L../../../dist/lib -lplds4 -lplc4 -lnspr4 -lpthread -ldl ../../../dist/lib/libunicharutil_s.a  -Wl,--version-script -Wl,/opt/CVSROOT/mozilla/build/unix/gnu-ld-scripts/components-version-script -Wl,-Bsymbolic -ldl -lm
../hunspell/src/libhunspell_s.a(mozHunspellFactory.o): In function `NSGetModule':
mozHunspellFactory.cpp:(.text+0x1fc): multiple definition of `NSGetModule'
mozSpellCheckerFactory.o:mozSpellCheckerFactory.cpp:(.text+0x436): first defined here
collect2: ld returned 1 exit status
make[6]: *** [libspellchecker.so] Error 1
make[6]: Leaving directory `/opt/mozbuild/mail/extensions/spellcheck/src'
make[5]: *** [libs] Error 2
make[5]: Leaving directory `/opt/mozbuild/mail/extensions/spellcheck'
make[4]: *** [libs_tier_toolkit] Error 2
make[4]: Leaving directory `/opt/mozbuild/mail'
make[3]: *** [tier_toolkit] Error 2
make[3]: Leaving directory `/opt/mozbuild/mail'
make[2]: *** [default] Error 2
make[2]: Leaving directory `/opt/mozbuild/mail'
make[1]: *** [build] Error 2
make[1]: Leaving directory `/opt/CVSROOT/mozilla'
make: *** [build] Error 2
Comment 69 Ryan VanderMeulen [:RyanVM] 2007-06-17 08:31:42 PDT
Yeah, I ran into that too when compiling it with Firefox. If you use --enable-libxul, you should be able to compile it OK (assuming libxul compiles OK for TB :-)...)
Comment 70 Ryan VanderMeulen [:RyanVM] 2007-06-17 08:38:55 PDT
Oh, I should also note that you should be able to compile it OK if you do a static build. Dynamic Linux builds without libxul are literally the only builds that aren't working currently with the attached patches. I've tried all the other combinations :). Hopefully Nemeth and/or myself can figure it out for the next round, though.
Comment 71 Magnus Melin 2007-06-17 10:40:27 PDT
Unfortunately (for tb):
configure: error: --enable-libxul is not compatible with --enable-static

... and yeah, the dynamic build didn't work either.
Comment 72 Ryan VanderMeulen [:RyanVM] 2007-06-17 10:43:24 PDT
--enable-libxul and --enable-static are mutually exclusive options. You either do a static build or a dynamic+libxul build. Static+libxul isn't possible as you've already discovered.
Comment 73 Robert Kaiser 2007-06-17 13:17:04 PDT
--enable-libxul doesn't work with Thunderbird or SeaMonkey from what I know due to mailnews using internal linkage at a few places.
SeaMonkey even doesn't even support static builds at the moment due to linkage errors.
So I think this only can go in when (fully) dynamic linkage dows work with it.
Comment 74 Michiel van Leeuwen (email: mvl+moz@) 2007-06-17 14:06:06 PDT
What does the current myspell impl support? If it works with a dynamic build (aren't those actualyl called shared builda?), I don't see any reason for hunspell not to work with it.
Comment 75 Ryan VanderMeulen [:RyanVM] 2007-06-17 14:09:26 PDT
mvl: Of course that's the case. Like I said in comment #70, it's something which needs to be resolved and I'm aware of that. I was just sharing what did work (for Fx anyway) so he might be able to finish a compile.
Comment 76 Ryan VanderMeulen [:RyanVM] 2007-06-17 18:37:37 PDT
Created attachment 268731 [details] [diff] [review]
Hunspell 1.1.5 trunk patch WIP3

OK, I've tested this patch for both win32 and Linux Firefox builds with the following build options:
1.) Dynamic (shared) + libxul
2.) Dynamic (shared) without libxul
3.) Static

All compiled fine and spell checking worked fine.

Nemeth - to fix this I just removed mozHunspellFactory.cpp outright. As far as I can tell, mozSpellCheckerFactory.cpp does everything it ever did anyway.

If the Seamonkey and Thunderbird folks could test this, that would be great :-).
Comment 77 Ryan VanderMeulen [:RyanVM] 2007-06-21 10:39:43 PDT
(In reply to comment #61)
> 1.) The relevant Mozilla license.html files need to be updated for Hunspell
> (and their Myspell equivalents removed). Or do they need to be there at all
> given that Hunspell is tri-licensed?

To answer my own question on this topic, I talked to Gerv the other day about this. As long as Myspell isn't shipping with any products, we can safely remove it from license.html. Also, Hunspell falls under the generic license at the top, so it doesn't need anything special for it.
Comment 78 Ryan VanderMeulen [:RyanVM] 2007-06-23 07:49:52 PDT
Created attachment 269517 [details]
Hunspell 1.2.0beta WIP4 - Hunspell Bits

OK, here's the WIP4 round of the patches. The main changes from WIP3 is that I fixed a bug where an error message appeared in TBird when trying to compose a new message and Hunspell has been updated to version 1.2.0beta. I've tested the patches with the following build configs and all passed and spell checked fine:

Firefox		Linux		win32
---------------------------------------------------
Shared		OK		OK
Shared+libxul	OK		OK
Static		OK		OK

Thunderbird	Linux		win32
---------------------------------------------------
Shared		OK		OK
Static		OK		OK

Seamonkey	Linux		win32
---------------------------------------------------
Shared		OK		OK

I've changed the patches around a bit this time. I've split them into a Hunspell patch and a Mozilla patch. Basically, the Hunspell patch contains all the new Hunspell code going into the tree (including the dictionary changes). The Mozilla patch contains all the changes to the Mozilla tree needed to get Hunspell compiling and working properly.

After discussing licensing with Gerv on IRC, I've gone ahead and removed the Myspell license info from license.html as Hunspell can fully replace Myspell in all apps which make use spell checking. Since Hunspell is tri-licensed, it falls under the standard boilerplate at the top of the file.

Regarding my IID questions from earlier, from what I've been able to gather from talking to people on IRC, we do NOT need to rev them for this since nothing is changing on the interface level. Hunspell is a direct replacement for Myspell.

So, a couple questions still remain:
1.) Are /tools/module-deps/module2dir.map and /tools/module-deps/all.dot a
simple s/myspell/hunspell replacement or does that need to be looked at more
carefully?
2.) I'm assuming that the comments at lines 139 and 149 of
/extensions/spellcheck/osxspell/src/mozOSXSpell.mm are still relevant to
Hunspell. Is that correct, Németh?

At this point, I'd say these patches are review ready. Scott, you feeling up to the challenge? :-)
Comment 79 Ryan VanderMeulen [:RyanVM] 2007-06-23 07:51:42 PDT
Created attachment 269518 [details] [diff] [review]
Hunspell 1.2.0beta WIP4 - Mozilla Bits
Comment 80 Scott MacGregor 2007-06-26 18:34:54 PDT
Ryan and I are going to land the hunspell back end as NPOTB shortly. That will make it easier for us to maintain the hunspell portion as we look to flush and test out the implementation in our own builds of Firefox and Thunderbird.

I had one last question before I check this in. 

1) Seems like we should be able to remove license.myspell and all of the #include license.myspell in the hunspell files.

2) Is there any reason to use the existing license.hunspell file, including it in each source file as opposed to pasting in our standard license boiler plate at the top of each file like we do for most of the files in our tree? Does it make it easier to keep these files in sync with the open office hunspell equivalents if we don't have our boiler plate getting in the way?
Comment 81 Mike Hommey [:glandium] 2007-06-26 22:44:47 PDT
Please oh please, before landing, please consider comments #57 and #58
Comment 82 Nicolas Mailhot 2007-06-27 00:51:12 PDT
Please push this ASAP. Having many different variants of the same dicts on-disk is a major disencentive to do any work to fix them. Having only the hunspell variant to care about will unfreeze linguistic efforts.
Comment 83 Németh László 2007-06-27 12:02:53 PDT
Remove license.hunspell and license.myspell, and use Mozilla headers, if you like it. (Patch tool ignores headers, so it is no problem for applying new patches. Thanks for the question, Laci
Comment 84 Németh László 2007-06-27 12:28:00 PDT
(In reply to comment #82)

It would be nice to implement a user-friendly installer for spell checking dictionaries, like OpenOffice.org (DicOOo: http://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries/dicooo/DicOOo.sxw) and IE7Pro (http://www.ie7pro.com/spell-checker.html).


Comment 85 Nicolas Mailhot 2007-06-27 12:37:27 PDT
(In reply to comment #84)
> (In reply to comment #82)
> 
> It would be nice to implement a user-friendly installer for spell checking
> dictionaries, like OpenOffice.org (DicOOo:

Please don't mix issues. The problem is not the installer, the problem is incompatible backends. Unless you want me to rant about DicOOo.

Some platforms have sane industrialised update infrastructure (and had them for years)

Comment 86 Ryan VanderMeulen [:RyanVM] 2007-06-27 22:00:43 PDT
Created attachment 270129 [details]
Hunspell 1.2.0beta WIP5 - Hunspell Bits

OK, this is the same as WIP4, except the two license files are removed and the standard boilerplate is at the top of every source file. Look OK, Scott & Nemeth?

Regarding comment #81, that's something we can look at for the Mozilla bits once the NPOTB Hunspell source is checked in.

Regarding comment #84 and comment #85, that sounds like a topic for another bug. How about someone file another bug for that and mark it depending on this one?
Comment 87 Ryan VanderMeulen [:RyanVM] 2007-06-27 22:39:16 PDT
Created attachment 270135 [details]
Hunspell 1.2.0beta WIP5.1 - Hunspell Bits

Holy crap, I made a huge mistake on the last patch (I forgot to actually make the license block a comment :-\...). This fixes that minor mistake.
Comment 88 Németh László 2007-06-29 04:46:49 PDT
Created attachment 270308 [details] [diff] [review]
Hunspell 1.2beta patch to Hunspell 1.1.6 (released version of the next Hunspell)

It contains two changes, a bug fix and a small improvement for Catatan, Italian and French languages.
Comment 89 Németh László 2007-06-29 04:47:41 PDT
Created attachment 270309 [details] [diff] [review]
Hunspell 1.2beta patch to Hunspell 1.1.6 (released version of the next Hunspell)

It contains two changes, a bug fix and a small improvement for Catatan, Italian and French languages.
Comment 90 Ryan VanderMeulen [:RyanVM] 2007-06-29 09:32:05 PDT
Created attachment 270337 [details]
[checked in]Hunspell 1.1.6 WIP6 - Hunspell Bits

Updated to version 1.1.6 (this is actually a bit newer than the 1.2.0beta code posted earlier, plus it's an official release). Scott, can we please get this checked in soon? Pretty please? :)
Comment 91 Scott MacGregor 2007-07-02 10:15:55 PDT
Comment on attachment 270337 [details]
[checked in]Hunspell 1.1.6 WIP6 - Hunspell Bits

sr=mscott for landing the hunspell engine as NPOTB for now. I'm going to do that shortly.
Comment 92 Ryan VanderMeulen [:RyanVM] 2007-07-03 19:20:13 PDT
OK, now that the source is landed in the tree, how do you want me to proceed, Scott? Mozilla WIP4 should still be all that's needed to enable Hunspell for Fx, TB, and SM on Linux & Win32. Otherwise, do you want me to update the patch with comments #57 and #58 taken into account?
Comment 93 Ryan VanderMeulen [:RyanVM] 2007-07-09 20:03:25 PDT
Created attachment 271606 [details] [diff] [review]
Hunspell Mozilla Bits WIP5

This is a minor update to the WIP4 patch to take care of some recent bitrot on the trunk. It doesn't change anything with the CIDs. At this point, I'm inclined to spin that off to a new bug after this one lands.

After seeing Schrep's post about milestone releases in m.d.planning today, I think we need to get this in sooner rather than later or face more hurdles in doing so. Scott, any chance you can review this soon?
Comment 94 Ryan VanderMeulen [:RyanVM] 2007-07-09 20:39:33 PDT
Created attachment 271607 [details] [diff] [review]
Hunspell Mozilla Bits WIP5.1

Whoops, missed a couple more obsolete files.
Comment 95 Scott MacGregor 2007-07-12 22:50:42 PDT
Ryan, can you test something out for me? I want to verify that we'll do the right thing for i18n users who have downloaded myspell dictionaries as extensions using Firefox 2 / Firefox trunk with myspell.

If you download say the german dictionary extension in Firefox and make it the default. And then you run a firefox build with this patch, do we correctly continue to use the same dictionary? Thanks!
Comment 96 Ryan VanderMeulen [:RyanVM] 2007-07-13 18:21:46 PDT
As you requested, I installed the de-de dictionary from a.m.o while running a MySpell build and verified that it worked OK. I think started up a Hunspell build on the same profile. German spellchecking worked fine (as it should have given that Hunspell advertises compatibility with MySpell dictionaries :)...).

I should note, however, that the de-de dictionary I downloaded had a maxVersion of 2.0b2 set on it (which I had to manually change before installing it), so when they upgrade to 3.0, they're going to get told their dictionary is incompatible anyway. That may not be a bad thing, though, because it could open the door for a new round of Hunspell-optimized tri-licensed dictionaries with a minVersion of 3.0 set on them :). Otherwise, the maxVersion can just be safely bumped and be done with it.
Comment 97 Ryan VanderMeulen [:RyanVM] 2007-07-15 17:33:39 PDT
Created attachment 272433 [details] [diff] [review]
Hunspell 1.1.6 to 1.1.7 patch

Hunspell recently got updated to version 1.1.7. This patch updates the copy in the tree accordingly.
Comment 98 Ryan VanderMeulen [:RyanVM] 2007-07-16 05:25:38 PDT
Comment on attachment 272433 [details] [diff] [review]
Hunspell 1.1.6 to 1.1.7 patch

Hunspell 1.1.8 was released today to fix some problems found by Valgrind amongst some other things. I'll patch a new 1.1.6 to 1.1.8 patch later.
Comment 99 Ryan VanderMeulen [:RyanVM] 2007-07-16 15:19:00 PDT
Created attachment 272555 [details] [diff] [review]
[checked in] Hunspell 1.1.6 to 1.1.8 patch

This fixes some some memory handling errors amongst some other problems.
Release notes: http://sourceforge.net/project/shownotes.php?group_id=143754&release_id=523522
Comment 100 Scott MacGregor 2007-08-05 19:18:05 PDT
I received an a=mconnor/schrep to land this on the trunk via e-mail.
Comment 101 Scott MacGregor 2007-08-05 19:20:00 PDT
Comment on attachment 271607 [details] [diff] [review]
Hunspell Mozilla Bits WIP5.1

Ryan and I are going to land this as is and then file a new bug for improving the contact id for the spell check engine.
Comment 102 Scott MacGregor 2007-08-05 21:00:31 PDT
I've checked this in and will mark this bug fixed if the everything stays green and the various tinderbox tests don't change at all.
Comment 103 Scott MacGregor 2007-08-05 22:02:04 PDT
Thanks again for Nemeth, Ryan, mvl and everyone else who helped get this into Mozilla.

Also, see Bug 391039 for changing the contact id.
Comment 105 Ben Turner (not reading bugmail, use the needinfo flag!) 2007-08-06 15:07:55 PDT
This patch broke the vc71 build on windows:

affentry.cpp
Building deps for /cygdrive/c/builds/xulrunner/xr_trunk_washington/mozilla/extensions/spellcheck/hunspell/src/affentry.cpp
/cygdrive/c/builds/xulrunner/xr_trunk_washington/mozilla/build/cygwin-wrapper cl -Foaffentry.obj -c  -DMOZILLA_INTERNAL_API -D_IMPL_NS_COM -DEXPORT_XPT_API -DEXPORT_XPTC_API -D_IMPL_NS_COM_OBSOLETE -D_IMPL_NS_GFX -D_IMPL_NS_WIDGET -DIMPL_XREAPI -DIMPL_NS_NET  -DZLIB_INTERNAL -DOSTYPE=\"WINNT5.1\" -DOSARCH=WINNT  -I../../../../dist/include/xpcom -I../../../../dist/include/string -I../../../../dist/include/uconv -I../../../../dist/include/unicharutil -I../../../../dist/include/spellchecker -I../../../../dist/include/xulapp -I../../../../dist/include   -I../../../../dist/include/hunspell -I../../../../dist/include/nspr  -DMOZ_PNG_READ -DPNG_NO_MMX_CODE -DMOZ_PNG_WRITE   -I../../../../dist/sdk/include       -GR- -TP -nologo -W3 -Gy -Fdhunspell_s.pdb  -DNDEBUG -DTRIMMED -Zi -O1 -UDEBUG -DNDEBUG -MD            -DWINVER=0x500 -D_WIN32_WINNT=0x500 -D_WIN32_IE=0x0500 -DX_DISPLAY_MISSING=1 -DMOZILLA_VERSION=\"1.9a8pre\" -DMOZILLA_VERSION_U=1.9a8pre -DHAVE_SNPRINTF=1 -D_WINDOWS=1 -D_WIN32=1 -DWIN32=1 -DXP_WIN=1 -DXP_WIN32=1 -DHW_THREADS=1 -DSTDC_HEADERS=1 -DWIN32_LEAN_AND_MEAN=1 -DNO_X11=1 -D_X86_=1 -DD_INO=d_ino -DMOZ_EMBEDDING_LEVEL_DEFAULT=1 -DMOZ_EMBEDDING_LEVEL_BASIC=1 -DMOZ_EMBEDDING_LEVEL_MINIMAL=1 -DMOZ_XULRUNNER=1 -DMOZ_BUILD_APP=xulrunner -DMOZ_XUL_APP=1 -DMOZ_DEFAULT_TOOLKIT=\"cairo-windows\" -DMOZ_THEBES=1 -DMOZ_CAIRO_GFX=1 -DMOZ_DISTRIBUTION_ID=\"org.mozilla\" -DOJI=1 -DIBMBIDI=1 -DMOZ_VIEW_SOURCE=1 -DACCESSIBILITY=1 -DMOZ_XPINSTALL=1 -DMOZ_JSLOADER=1 -DNS_PRINTING=1 -DNS_PRINT_PREVIEW=1 -DMOZ_NO_XPCOM_OBSOLETE=1 -DMOZ_XTF=1 -DMOZ_AIRBAG=1 -DMOZ_MATHML=1 -DMOZ_ENABLE_CANVAS=1 -DMOZ_SVG=1 -DMOZ_SVG_FOREIGNOBJECT=1 -DMOZ_UPDATE_CHANNEL=default -DMOZ_PLACES=1 -DMOZ_FEEDS=1 -DMOZ_STORAGE=1 -DMOZ_LOGGING=1 -DMOZ_USER_DIR=\"Mozilla\" -DMOZ_ENABLE_LIBXUL=1 -DHAVE_UINT64_T=1 -DMOZ_XUL=1 -DMOZ_PROFILELOCKING=1 -DMOZ_RDF=1 -DMOZ_MORKREADER=1 -DMOZ_DLL_SUFFIX=\".dll\" -DJS_THREADSAFE=1 -DMOZILLA_LOCALE_VERSION=\"1.9a1\" -DMOZILLA_REGION_VERSION=\"1.9a1\" -DMOZILLA_SKIN_VERSION=\"1.8\"  -D_MOZILLA_CONFIG_H_ -DMOZILLA_CLIENT /cygdrive/c/builds/xulrunner/xr_trunk_washington/mozilla/extensions/spellcheck/hunspell/src/affentry.cpp
affentry.cpp
c:\builds\xulrunner\xr_trunk_washington\mozilla\extensions\spellcheck\hunspell\src\atypes.hxx(65) : error C2010: '.' : unexpected in macro formal parameter list
c:\builds\xulrunner\xr_trunk_washington\mozilla\extensions\spellcheck\hunspell\src\atypes.hxx(65) : error C2010: '.' : unexpected in macro formal parameter list
c:\builds\xulrunner\xr_trunk_washington\mozilla\extensions\spellcheck\hunspell\src\atypes.hxx(65) : error C2010: '.' : unexpected in macro formal parameter list
c:\builds\xulrunner\xr_trunk_washington\mozilla\extensions\spellcheck\hunspell\src\atypes.hxx(65) : error C2010: ')' : unexpected in macro formal parameter list

vc71 isn't officially supported, but figured i'd comment here anyway.
Comment 106 :Gavin Sharp [email: gavin@gavinsharp.com] 2007-08-06 15:32:03 PDT
(In reply to comment #105)
> This patch broke the vc71 build on windows:

filed bug 391147.
Comment 107 u60234 2007-08-18 07:48:17 PDT
From http://lxr.mozilla.org/mozilla/source/extensions/spellcheck/locales/en-US/hunspell/README.txt
------
This dictionary is based on a subset of the original
English wordlist created by Kevin Atkinson for Pspell 
and  Aspell and thus is covered by his original 
LGPL license.
------

Is that really acceptable?
Comment 108 Shawn Wilsher :sdwilsh 2007-08-18 09:19:00 PDT
(In reply to comment #107)
> Is that really acceptable?
acceptable for what?
Comment 109 u60234 2007-08-18 09:27:06 PDT
For inclusion of the dictionary in Mozilla products. I thought only dictionaries with a GPL/LGPL/MPL or a BSD license could be used.
Comment 110 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2007-08-18 15:24:26 PDT
(In reply to comment #109)
> For inclusion of the dictionary in Mozilla products. I thought only
> dictionaries with a GPL/LGPL/MPL or a BSD license could be used.

Gerv, could you please give an answer for that question?
Comment 111 Brett Wilson 2007-08-18 16:26:35 PDT
The English dictionary in Fx2 is LGPL.
Comment 112 Gervase Markham [:gerv] 2007-08-28 04:06:13 PDT
Our policy is that only dictionaries compatible with all three of Mozilla's licences may be included in shipped builds of Firefox. If this is not true for some builds we are shipping, I need to know about it.

Gerv
Comment 113 u60234 2007-09-22 00:41:15 PDT
Filed bug 397150 to sort out the license issue.
Comment 114 Mike Lierman 2008-02-14 20:18:37 PST
*** Bug 362453 has been marked as a duplicate of this bug. ***
Comment 115 Mike Lierman 2008-02-14 20:24:12 PST
I am wondering why this bug hasn't been marked as VERIFIED yet. Hunspell is wokring well over here on Firefox 3 Beta 3 and the Nightly Firefox 3 PreB4.

I have VERIFIED. Please Check-In the patches if needed and mark as verified.

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b3) Gecko/2008020514 Firefox/3.0b3 ID:2008020514

-Mike
Comment 116 Jeff Walden [:Waldo] (remove +bmo to email) 2008-02-16 12:17:21 PST
No particular reason.  VERIFIED is sort of a holdover from the days when each and every fix was double-checked afterward for correctness, but the manpower isn't there now to guarantee that always happens (and frankly, I don't think it's a hugely useful way to use manpower compared to things like writing automated testcases or actually fixing bugs) in general, so whether or not it happens for a bug depends on whether someone makes an effort to do it or not.

But yeah, we are and have been doing fine with this...

Note You need to log in before you can comment on or make changes to this bug.