Closed
Bug 1517477
Opened 5 years ago
Closed 5 years ago
Update Hunspell to 1.7.0
Categories
(Core :: Spelling checker, enhancement)
Core
Spelling checker
Tracking
()
RESOLVED
FIXED
mozilla70
People
(Reporter: RyanVM, Assigned: RyanVM)
References
Details
(Whiteboard: [third-party-lib-audit])
Attachments
(1 file)
2018-11-12: Hunspell 1.7.0 release: New features and bug fixes by László Németh, supported by FSF.hu Foundation: * No annoying suggestion times any more, especially in languages with compound word handling and complex morphology. By adding balanced multi-level time limits, now the guaranteed suggestion time is there within half a second, not seconds (nor dozen of seconds or more in extreme cases) for longer misspellings, too. * add SPELLML support for run-time dictionary extension with optional affixation of user words. See new "Grammar By" feature of language-specific user dictionaries of LibreOffice 6.0: * Improved, highly customizable suggestions on level of dictionary words: Pronunciations and typical misspellings defined by optional "ph:" fields of the dictionary words are used not only in n-gram suggestions, but as elements of the REP replacement list getting the highest priority in normal suggestions, also giving the best suggestions for short words, too. More information: see "ph:" in man 5 hunspell. * Handling multiple word suggestions is much more easier. Like in a traditional spelling dictionary, for example, to get the correct suggestion "a lot" for the typical misspelling "alot" at the first place, now it's enough to put the following line to the dic(tionary) file: a lot * Limit compound overgeneration by dictionary based word pairs: Now it's possible to filter bad compound words by listing the correct word pairs with space in the dictionary, as in a traditional spelling dictionary. * clean-up suggestion: * no n-gram and compound word suggestions, if "good" suggestion exists, ie. uppercase, REP, ph: or dictionary word pair suggestions * word pairs are always suggested, if they exist in the dic file * word pairs have top priority in suggestions, and these are the only suggestions if there is no other good suggestion. * also dictionary word pairs separated by dash instead of space are handled specially in two-word suggestion (depending from the language) * limit bad suggestions by improved n-gram suggestion rules: don't suggest capitalized dictionary words for lower case misspellings in n-gram suggestions, except * PHONE usage, or * in the case of German, where not only proper nouns are capitalized, or * the capitalized word has special pronunciation and don't suggest if the difference of lengths of misspellings and suggestions is 5 or more characters. * Extend dotless i and dotted I rules to Crimean Tatar language Allow dotted I in dictionary, and disable bad capitalization of i. * BREAK: extended recursive word breaking algorithm to handle words or words with suffixes when they already contain word break characters, for example, "e-mail" is a dictionary word with a word break character, and it wasn't accepted before in compounds in some languages. * FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound forms recognized by BREAK word breaking by adding the bad compounds to the dictionary with FORBIDDENWORD flags. * lower limit for "doubletwochars" suggestion algorithm: one of the typical misspellings recognized by Hunspell suggestion mechanism is the syllable duplication. Along the old pattern ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the simpler ABAB -> AB pattern is recognized in non-starting position, for example, regretTETEd -> regretTEd. * lower limit for longswapchar and movechar: recognized only max. 4-character distances to avoid slow and bad suggestions. * fix compound handling for new Hungarian orthography reform * Allow suggestion search for prefix + *two suffixes*: Remove artificial performance limit to get correct suggestions for relatively simple misspellings in Hungarian, etc., when the word form contains prefix and both derivative and inflectional suffixes, too: lefikszálása -> lefixálása
Assignee | ||
Comment 1•5 years ago
|
||
One thing I don't understand is why bug 1410214 and bug 1460600 made changes directly to the upstream files without attempting to do so in a way which doesn't make future updates a pain. No in-tree patches, no MOZILLA_CLIENT ifdefs (so they could be upstreamed), etc. I've manually worked around it for now, but I really think we should find a better long-term solution :\
Flags: needinfo?(masayuki)
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 2•5 years ago
|
||
I also went ahead and removed hunzip.cxx since it was removed from moz.build in bug 1410214.
Assignee | ||
Comment 3•5 years ago
|
||
Assignee | ||
Comment 4•5 years ago
|
||
Green on Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c40782aeb7d5c884772fe9ec323df663cca19f6e
Assignee: nobody → ryanvm
Comment 5•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #1) > One thing I don't understand is why bug 1410214 and bug 1460600 made changes > directly to the upstream files without attempting to do so in a way which > doesn't make future updates a pain. No in-tree patches, no MOZILLA_CLIENT > ifdefs (so they could be upstreamed), etc. I've manually worked around it > for now, but I really think we should find a better long-term solution :\ I'm not sure what you mean. Bug 1460600 didn't change Hunspell at all, and the only change made in bug 1410214 was to replace the entire contents of a header with an include of another header, specifically to make it easier to update. Clearly that couldn't be upstreamed...
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 6•5 years ago
|
||
We have a lot of code in Hunspell behind MOZILLA_CLIENT ifdefs, which they're happy to take patches for whenever we submit them. I can't understand at all how silently changing upstream source files without so much as a patch in the directory or a mention of it somewhere is supposed to make future updates easier to perform.
Assignee | ||
Comment 7•5 years ago
|
||
And RE: bug 1460600, hunvisapi.h *is* a part of the upstream library.
Comment 8•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #6) > We have a lot of code in Hunspell behind MOZILLA_CLIENT ifdefs, which > they're happy to take patches for whenever we submit them. I can't > understand at all how silently changing upstream source files without so > much as a patch in the directory or a mention of it somewhere is supposed to > make future updates easier to perform. I made a minimal change to one header, with all of the meaningful code living in mozhunspell. If you want to put it behind an ifdef and try to upstream it, that's fine by me, but I can't imagine I'd accept that sort of patch for a library I maintained (In reply to Ryan VanderMeulen [:RyanVM] from comment #7) > And RE: bug 1460600, hunvisapi.h *is* a part of the upstream library. Oh, that change wasn't intentional. It doesn't have any meaningful effect. I just forgot to revert it before I committed.
Comment 9•5 years ago
|
||
Usually, dealing with thirdparty code, we do like the following: * https://searchfox.org/mozilla-central/source/intl/update-icu.sh#84 a script with a list of patches ( https://searchfox.org/mozilla-central/source/intl/icu-patches ) which will be applied on top of the third party code
Ah, filemgr.cxx and filemgr.hxx are in upstream... I didn't realize that sorry. And sounds like that the change for them should be changed to a patch like comment 9.
Flags: needinfo?(masayuki)
Comment 11•5 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:RyanVM, could you have a look please?
Flags: needinfo?(ryanvm)
Assignee | ||
Comment 12•5 years ago
|
||
Sorry, I need to update the patch to address comments 8 and 10 but haven't had time to do so.
Flags: needinfo?(ryanvm)
Assignee | ||
Comment 13•5 years ago
|
||
I've updated the patch now after the landing of bug 1560517. Per comment 8, I've reverted the hunvisapi.h change back to the upstream version. At this point, we only need to carry forward the changes from bug 1410214.
Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=698270570123612e2e6614d552a6a935c1bf7504
Not planning to land this until after the Gecko 70 version bump, however.
Comment 14•5 years ago
|
||
Pushed by rvandermeulen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fa68a5b78e08 Upgrade Hunspell to version 1.7.0. r=masayuki
Comment 15•5 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 5 years ago
status-firefox70:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Assignee | ||
Updated•5 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•