Closed Bug 1240916 Opened 4 years ago Closed 4 years ago
Update the en-US dictionary to SCOWL 2016
A new version of the en_US Hunspell dictionary is now available. Please note that that the "install-new-dict" will need to be fixed. Just change "SET UTF8" in the line: sed -i=bak -e '/^ICONV/d' -e 's/^SET UTF8$/SET ISO8859-1/' en_US-mozilla.aff to "SET UTF-8" to get: sed -i=bak -e '/^ICONV/d' -e 's/^SET UTF-8$/SET ISO8859-1/' en_US-mozilla.aff
Thank you, Kevin. New version at: http://sourceforge.net/projects/wordlist/files/SCOWL/2016.01.19/ We need to wait for bug 301712 to land before doing this. Ehsan, do you have time? Do you think we can still get this into ESR 45?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: New version of Hunspell en_US dictionary available → Update the en-US dictionary to SCOWL 2016.01.19
Sure. I'll do this later today. Backporting it to ESR depends on what the release managers decide, not my call.
Assignee: nobody → ehsan
Comment on attachment 8710455 [details] [diff] [review] Update the en-US dictionary to SCOWL 2016.01.19 This is a dictionary update, should be safe to backport to Aurora to include it in ESR45.
This should be backed out. Apparently the merge has gone wrong, see bug 1241494 comment #6.
Would you like to back this out or fix up the problem in bug 1241494?
There's no rush for this. I'll back out.
Sorry about that Ehsan. The attached patch fixes my scripts so the upgrade should work smoothly. Please first apply this patch, then do the upgrade. Do an sanity check to make sure 1-base.txt is not empty and that 5-mozilla-added does not contain "get's" or any non-ascii words.
(Fix bug number in commit message)
Attachment #8710606 - Attachment is obsolete: true
Thanks Kevin, this fixes the issue!
Nice job. If you look at the changes to 5-mozilla-added https://hg.mozilla.org/integration/mozilla-inbound/diff/e1208d0c3551/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added you can see the effects of bug 301712: Lost of questionable names got removed. Also removed bad data like "Don't" or "The". Sadly this file also shows the SCOWL deficiencies. While for example "guestbook", "wildcard" and "weaponize" are no longer Mozilla-added since they are now SCOWL-provided, SCOWL is still missing the derived forms (plural, possessive or conjugated forms): guestbook's guestbooks weaponized weaponizes weaponizing wildcard's wildcards which Mozilla needs to add. http://app.aspell.net/lookup?dict=en_US;words=guestbook%27s%0D%0Aguestbooks%0D%0Aweaponized%0D%0Aweaponizes%0D%0Aweaponizing%0D%0Awildcard%27s%0D%0Awildcards Kevin, you might want to take a look at 5-mozilla-added and straighten out your data. Would you like me to raise an issue at Github for that?
Jork K: Yes, please file an upstream issue on GitHub. Now that I got a release out it may be a while before I have a chance to look into it.
Comment on attachment 8710617 [details] [diff] [review] Part 1: Fix dictionary upgrade scripts Approval Request Comment [Feature/regressing bug #]: No regression. [User impact if declined]: A poorer dictionary with less words. [Describe test coverage new/current, TreeHerder]: N/A. [Risks and why]: No risk, en-US dictionary change only. [String/UUID change made/needed]: None. It would be good to include a richer more up-to-date dictionary in ESR 45 (also, dare I say it, for the benefit of Thunderbird users). This is the fourth and really final dictionary fix I'm requesting uplift for ;-) For the other three see bug 1235506 comment #63, bug 1238031 comment #21 and bug 301712 comment #46.
Comment on attachment 8710624 [details] [diff] [review] Part 2: Update the en-US dictionary to SCOWL 2016.01.19 Approval Request Comment - see previous comment. Sorry about the NI, I'd be good to uplift before the branch date. When uplifting, please uplift bug 301712 first, otherwise the patches won't apply.
Attachment #8710624 - Flags: approval-mozilla-aurora?
Comment on attachment 8710617 [details] [diff] [review] Part 1: Fix dictionary upgrade scripts Please don't ni me on uplift requests. I am watching these daily.
Attachment #8710617 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8710624 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
ehsan: this is somehow landed twice on central like with comment #21 and your push #25 ?
ok seems this was initially backed out yesterday and then re-checkedin and so caused the 2 central comments here, at least when i check https://hg.mozilla.org/mozilla-central/log/7459d6a67610/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict
Actually, it's more messy than that: See https://hg.mozilla.org/mozilla-central/pushloghtml in chronological order: First attempt: 3543f727e23a Ehsan Akhgari — Bug 1240916 - Update the en-US dictionary to SCOWL 2016.01.19 Got backed out: 7f3168c2fb0a Ehsan Akhgari — Backout bug 1240916 because the update from upstream seems to be broken a=merge Second attempt got landed: c5da92c5b490 Ehsan Akhgari — Bug 1240916 - Part 2: Update the en-US dictionary to SCOWL 2016.01.19 a=merge 3f280d724c8a Kevin Atkinson — Bug 1240916 - Part 1: Fix dictionary upgrade scripts a=merge And backed out? Why? 05d3068b8573 Ehsan Akhgari — Backout bug 1240916 because the update from upstream seems to be broken Then the same got landed a second time: e1208d0c3551 Ehsan Akhgari — Bug 1240916 - Part 2: Update the en-US dictionary to SCOWL 2016.01.19 7459d6a67610 Kevin Atkinson — Bug 1240916 - Part 1: Fix dictionary upgrade scripts
Oops, looking at this again: https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic Shows 3543f727e23a 7f3168c2fb0a c5da92c5b490 https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict Shows 3543f727e23a 7f3168c2fb0a 3f280d724c8a Looks like e1208d0c3551 and 7459d6a67610 got backed out??
Looks like 05d3068b8573 e1208d0c3551 and 7459d6a67610 appear in the pushlog but they never really happened. So the real landed patches are in comment #21 and comment #25 is wrong. Wes, can you please confirm.
alos would be nice if we could clarify which csets should be uplifted now ?
(In reply to Carsten Book [:Tomcat] from comment #32) > also would be nice if we could clarify which csets should be uplifted now ? There seems to be a choice of two ;-) https://hg.mozilla.org/mozilla-central/rev/3f280d724c8a https://hg.mozilla.org/mozilla-central/rev/c5da92c5b490 https://hg.mozilla.org/mozilla-central/rev/7459d6a67610 https://hg.mozilla.org/mozilla-central/rev/e1208d0c3551 From my looking at it, the first two are real and the second two are identical, but never landed on the files, see: https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict Frankly, I don't think Ehsan can help us here. He pushed some stuff to inbound and is not responsible for what happened on M-C. Here the inbound pushes. They make sense. 1st attempt: https://hg.mozilla.org/integration/mozilla-inbound/rev/3543f727e23a Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/05d3068b8573 2nd attempt: https://hg.mozilla.org/integration/mozilla-inbound/rev/7459d6a67610 https://hg.mozilla.org/integration/mozilla-inbound/rev/e1208d0c3551
So... The reason I waited to request uplift was that I wanted to signal to the sheriff that I want to land on Aurora myself to avoid precisely the confusion here. :-) I'll land on Aurora myself.
(In reply to :Ehsan Akhgari from comment #34) > So... The reason I waited to request uplift was that I wanted to signal to > the sheriff that I want to land on Aurora myself to avoid precisely the > confusion here. :-) That's cool, but can you explain the two landings https://hg.mozilla.org/mozilla-central/rev/3f280d724c8a https://hg.mozilla.org/mozilla-central/rev/c5da92c5b490 vs. https://hg.mozilla.org/mozilla-central/rev/7459d6a67610 https://hg.mozilla.org/mozilla-central/rev/e1208d0c3551
I have no idea, but one set is not something that I pushed: <https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=3f280d724c8a>!
I think that was just the tool we use to mark merges to m-c getting confused by the backout and relanding.
(In reply to Wes Kocher (:KWierso) from comment #38) > I think that was just the tool we use to mark merges to m-c getting confused > by the backout and relanding. Actually, I remember what happened better now that I've talked about it some more with jorgk: Ehsan originally landed one changeset on inbound, and later pushed a backout to inbound, and after that re-landed two changesets to inbound. When I was working on getting a merge from inbound to m-c, the best candidate for the merge at the time was between the push where ehsan originally landed the one changeset and the backout. To make sure what I was about to merge was green, I grafted the backout commit onto what I was about to push to m-c. This generated a new revision hash for the backout commit because that's how `hg graft` works. I then pushed that merge to m-c and did merges from all of the other integration branches onto m-c. While trying to merge the newly updated m-c back around to the integration branches, I was hitting merge conflicts with inbound because of ehsan's relanded 2-cset push, since they touched the same files that were included in my original merge from inbound to m-c. To get things to cleanly merge everywhere (without me having to do a bunch of manual merge conflict resolution and potentially breaking things), I grafted ehsan's 2-cset push onto m-c. (And again, those grafted csets get new hashes since that's what graft does). With all of the changes from ehsan's pushes to inbound effectively combined on m-c, I was able to cleanly push everything back to inbound without conflicts. Then when tomcat did merges from inbound to m-c this morning, it included all of the original csets of the things I had grafted onto m-c yesterday, causing them to show up twice. The two batches of csets on m-c should be identical other than the differing hashes, and the combination of the two should show as being dealt with in the "merge inbound to m-c" merge commit at the top of tomcat's merge push. tl;dr: I made this a lot more confusing than it needed to be with all of the grafting of things, and this could've been avoided if I just skipped merging inbound to m-c yesterday and left it for tomcat to get today, but everything worked out okay in the end.
You need to log in before you can comment on or make changes to this bug.