Closed
Bug 240600
Opened 20 years ago
Closed 19 years ago
merge myspell back from the openoffice original
Categories
(Core :: Spelling checker, defect)
Core
Spelling checker
Tracking
()
RESOLVED
FIXED
mozilla1.8beta1
People
(Reporter: mvl, Assigned: mscott)
References
Details
(Keywords: fixed-aviary1.0)
Attachments
(1 file, 2 obsolete files)
Once upon a time the openoffice spellchecker was ported to mozilla. All cool. But since then, development has been split. There are bugs in the mozilla spellchecker that are not in the openoffice one. (bug 237114). Features are missing and added back way later with lots of pain (bug 227214) I think it would make sense to remerge if possible. If the openoffice spellchecker is sufficient stand alone, it could be done with minimum pain. If we don't change much (in the ideal case, nothing) merging later would be easy. Problem (that I can think of now) that might arise are licensing and charset conversion. I don't know under which conditions myspell was relicensed as mpl/gpl/lgpl. The openoffice spellchecker seems to bu onder something bsd like or so. Charset conversion depends on how openoffice handles that. If the caller must do the conversion, that's fine. We can write a wrapper. If myspell does it itself, and call openoffice functions or so, we must work around that. Ofcourse, we must contact the openoffice developers. http://whiteboard.openoffice.org/source/browse/whiteboard/lingucomponent/source/spellcheck/myspell/CONTRIBUTORS?rev=1.3&content-type=text/vnd.viewcvs-markup has the addresses of the developer and contributors.
Assignee | ||
Comment 1•20 years ago
|
||
this might be a dupe. It was always our intention to maintain our fork and the open office fork but our fork has stagnated. I don't think there are additional licensing issues with trying to update our 2 year old fork with all the other changes that went into the open office fork. I've looked into it several times, just haven't had time.
Reporter | ||
Comment 2•20 years ago
|
||
Our port contains all kind of mozilla specific code, like the string classes. This makes updating quite hard. My idea is to put in the openoffice code, with as few changes as possible. No string classes, no own hash tables, but just a straight copy. This makes updating in the future easier. So i don't want to update our code, but replace it with openoffice code and a wrapper.
Reporter | ||
Comment 3•20 years ago
|
||
I got a straight copy of myspell from openoffice working. All i needed was a to change mozMySpell into a wrapper calling myspell functions and doing charset conversions. It wasn't as hard as i thought. Now, I am still somewhat worried about the licensing. What is in the tree now seems to be a complete rewrite by david einstein. I am putting in a straigt copy will lots of updates. It is under a bsd-lookalike license. (I think, i don't know bsd that well. It allows far more then gpl does, even more then mpl)
Assignee | ||
Comment 4•20 years ago
|
||
I was envisioning us doing a merge and keeping our infrastructure such as strings, etc. i.e. take specific features straight from the myspell engine and port them back to our fork. What are the code footprint ramifications of just wrapping the engine in its entiriety? Are we pulling in lots of plumbing / string classes & stuff? I'm really impressed by how quickly you were able to put together a wrapper! I just need to better understand the costs. As far as the licensing goes, last time we talked to them we were in the clear (in fact the myspell guy worked hand in hand with david einstein when they were getting the fork going). If we have to we can move some stuff into other-licenses but I don't think we will. I will try to find out more.
Reporter | ||
Comment 5•20 years ago
|
||
It surprised me too :) basicly, it is just two functions: check and suggest. The rest is syntactic sugar. I didn't touch stuff like personal dictionary. Only extension/spellcheck/myspell/src is changed. I checked codesize (using size): current myspell, fixed makefile, debug: text data bss dec hex 100221 1232 8 101461 18c55 current, whouth unichar utils (dies on some tinderboxes) 82701 1024 4 83729 14711 OO myspell, new makefile, debug: 66917 12720 4 79641 13719 So, putting in the OpenOffice spellchecker saves code, but adds lots of data. That is due to a unicode table it has to convert to lowercase. I think that with some fudging we can make that use existing mozilla code, and not have a huge table.
Assignee | ||
Comment 6•20 years ago
|
||
wow! That's extremely promising news! Can you run this test on the new spell check engine with the latest myspell stuff? : "I noticed spellchecker does not offer suggestions on contractions. For example, if you type shouldnt, it will not offer shouldn't as a suggestion. Same for wouldn' couldn't, etc...." That was one of the scenarios myspell had fixed since we forked. Kind of a proving test case that we have some of the new features in the myspell engine...
Reporter | ||
Comment 7•20 years ago
|
||
The patch doesn't fix that bug. that bug is way easier to solve: update the .aff file. I copies the TRY line from the en-GB affix file from the openoffice installation i have, and with both the current and the new engine, it suggested wouldn't. This is because the TRY line needs a ' to suggest it. The patch solves something: bug 237114. And added code comments :)
Assignee | ||
Comment 8•20 years ago
|
||
your saying the .aff file we ship with by default (for US english) is just missing a single TRY line in it for the contraction case to work?
Reporter | ||
Comment 9•20 years ago
|
||
No, it is missing a ' char in the TRY line. So it myspell will never try to insert it as a forgotten char.
Reporter | ||
Comment 10•20 years ago
|
||
Attaching the patch i created. It doesn't include removing old files, to somewhat reduce the size. I didn't touch the licensing, so it isn't ready to go in as is, but it is just for experiments.
Reporter | ||
Comment 11•20 years ago
|
||
Got some codesize numbers on an linux opt build: current text data bss dec hex filename 30966 888 4 31858 7c72 new 32411 12560 4 44975 afaf Now, the new version is somewhat larger. Mostly to blame are affixmgr.cpp and mozMySpell.cpp. This isn't surprising, as those two files do most the work. csutils.cpp is indeed what causes the data size. (checked with running size on the .o files)
Assignee | ||
Comment 12•20 years ago
|
||
the change in code size seems pretty small / not important. It would be interesting to try to look at elminating the large data size jump which I assume is because of the unicode tables it has in the code? How hard is it to use our own data for just that part?
Reporter | ||
Comment 13•20 years ago
|
||
The tables are used for upper <-> lowcase conversion. I couldn't find a case conversion of random charsets in our code. It is possible to convert to unicode, then do the conversion, and then convert back. That would add some code, but is doable. The other option is to convert the tables into code. Most have a very simple rule, and could be created on-the-fly.
Reporter | ||
Comment 14•20 years ago
|
||
I now made it so that the tables are created when needed using the caseconversion already in mozilla. It is somewhat ugly, because the caseconversion only works on unichar, and not on random charsets. I was wrnig when i said it had unicode tables. It has case conversion tables for a few charsets, like ISO-8859-* Anyway, the date size went down (debug): text data bss dec hex filename 69291 1840 4 71135 115df obj/dist/bin/components/libmyspell.so 3k of code for 10k of data.
Reporter | ||
Comment 15•20 years ago
|
||
It's been a while, but here is the updated patch. It creates the case conversion tables when needed, so the code size reduces a lot. Can someone with more licensing knowledge take a look at the licensing?
Reporter | ||
Updated•20 years ago
|
Attachment #146717 -
Attachment is obsolete: true
Assignee | ||
Comment 16•20 years ago
|
||
Ok, now that 1.0 is done, I've come back and revived this great patch that mvl put together. I've got it working again on the trunk. Now I just need to find some english dictionary examples of things that work with the updated myspell engine from open offfice that wouldn't work in our old forked version so I can test that things are working.
Assignee: nobody → mscott
Reporter | ||
Comment 17•20 years ago
|
||
There are bugs that this bugs block that should be fixed by this, but they all involve non-english dictionaries.
Assignee | ||
Comment 18•20 years ago
|
||
Michiel, do you remember where you got the original myspell .cpp files that you added to this patch? Somewhere on development.openoffice.org? I want to integrate the latest files into the patch since this work last happened in May. I'm assuming myspell has had further improvements.
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•20 years ago
|
||
I think I found the files: http://lingucomponent.openoffice.org/source/browse/lingucomponent/source/spellcheck/myspell/
Reporter | ||
Comment 20•20 years ago
|
||
Yes, that is were i got them from (using cvs). It doesn't look like there have been changes over there...
Assignee | ||
Comment 21•20 years ago
|
||
Yeah it looks like things haven't changed in quite a while according to CVS. They must be doing something different that I'm not seeing though. I have a group of words that open office 1.1.3 offers suggestions for using myspell. But we don't offer suggestions for them before and after this patch. I even tried copying over the enlish .aff and .dic files from 1.1.3 to our apps and still failed to get suggestions. I wonder what they are doing outside of the myspell core.... The words I'm using include: cemoflag edeosincrysy egnomeneious emeasureble enouspoucious
Reporter | ||
Comment 22•20 years ago
|
||
In their cvs, csutil.cxx and .hxx disapeared, but they are still in the makefile. I really doubt if that builds. They must have moved the spellchecked. We could ask them :)
Assignee | ||
Comment 23•20 years ago
|
||
I also update the dictionary and .aff file for en-US to reflect the lastest versions being shipped with open office. I'm now getting the same set of suggestions for varous mispellings with Open Office 1.1.3 and a build with this patch. Before this patch, we had 0 suggestions for these words: absoolve addvocat afectoinate aglomerition allimentory ambedixterety culoomniate cemoflag dedly decifrable dilitrious erthcake eflorecscense farenhaet falascieous gagetrea gasttruintestenal hafharted egnomeneious emeasureble enouspoucious juditioary jureesproodentail calieduscop cangaru kichenet leccikografy macromolicooler magnenemous medeocrety memmorandem narsicissim neggociable nutricious obscekuieous peliolithick paralingistick parlamentarean quintesintial reaprochment reckonasance repiticious scritzofrenia sintilate scroopulocity Thanks for getting this started mvl! This is a huge jump in effectiveness for our spell checker.
Attachment #149682 -
Attachment is obsolete: true
Comment 24•20 years ago
|
||
Just a random plea to update the other dictionaries as well (http://www.mozilla.org/products/thunderbird/dictionaries.html)
Assignee | ||
Comment 25•20 years ago
|
||
fixed on the aviary branch and the trunk. Thanks a lot for getting this going mvl!
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Keywords: fixed-aviary1.0
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.8beta
Comment 26•19 years ago
|
||
This fix breaks the Solaris build, error messages are: "affixmgr.hxx", line 54: Error: FILE is not defined. "affixmgr.hxx", line 55: Error: FILE is not defined. "affixmgr.hxx", line 56: Error: FILE is not defined. "affentry.cpp", line 29: Error: The function "memcpy" must have a prototype. "affentry.cpp", line 39: Error: The function "free" must have a prototype. <snip> a namespace std issue?
Comment 27•19 years ago
|
||
reopen to get people's attention.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 28•19 years ago
|
||
re-closing this bug is fixed. I'm happy to check in any patches you need for this to build on ports but right now ports shows all green. Please file a new bug....
Status: REOPENED → RESOLVED
Closed: 20 years ago → 19 years ago
Resolution: --- → FIXED
Comment 29•19 years ago
|
||
This caused bug 307052 -- using NS_LossyConvertUCS2toASCII() to get a char* to pass to myspell cannot possibly work for most of the world's languages -- it will lose chars (hence Lossy).
You need to log in
before you can comment on or make changes to this bug.
Description
•