Update the en-US dictionary to SCOWL 2016.01.19

RESOLVED FIXED in Firefox 45

Status

()

Core
Spelling checker
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Kevin Atkinson, Assigned: Ehsan)

Tracking

unspecified
mozilla46
Points:
---

Firefox Tracking Flags

(firefox45 fixed, firefox46 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

2 years ago
A new version of the en_US Hunspell dictionary is now available.

Please note that that the "install-new-dict" will need to be fixed.  Just change "SET UTF8" in the line:
  sed -i=bak -e '/^ICONV/d' -e 's/^SET UTF8$/SET ISO8859-1/' en_US-mozilla.aff
to "SET UTF-8" to get:
  sed -i=bak -e '/^ICONV/d' -e 's/^SET UTF-8$/SET ISO8859-1/' en_US-mozilla.aff

Comment 1

2 years ago
Thank you, Kevin.
New version at: http://sourceforge.net/projects/wordlist/files/SCOWL/2016.01.19/

We need to wait for bug 301712 to land before doing this.

Ehsan, do you have time? Do you think we can still get this into ESR 45?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(ehsan)
Summary: New version of Hunspell en_US dictionary available → Update the en-US dictionary to SCOWL 2016.01.19
(Assignee)

Comment 2

2 years ago
Sure.  I'll do this later today.

Backporting it to ESR depends on what the release managers decide, not my call.
Assignee: nobody → ehsan
Flags: needinfo?(ehsan)

Comment 3

2 years ago
I'm sure you read my comment #1: Waiting for bug 301712 to land first.
(Assignee)

Comment 4

2 years ago
Created attachment 8710455 [details] [diff] [review]
Update the en-US dictionary to SCOWL 2016.01.19
(Assignee)

Comment 6

2 years ago
Comment on attachment 8710455 [details] [diff] [review]
Update the en-US dictionary to SCOWL 2016.01.19

This is a dictionary update, should be safe to backport to Aurora to include it in ESR45.
Attachment #8710455 - Flags: approval-mozilla-aurora?

Comment 7

2 years ago
This should be backed out. Apparently the merge has gone wrong, see bug 1241494 comment #6.

Comment 8

2 years ago
Would you like to back this out or fix up the problem in bug 1241494?
Flags: needinfo?(ehsan)
(Assignee)

Comment 9

2 years ago
There's no rush for this.  I'll back out.
Flags: needinfo?(ehsan)

Updated

2 years ago
Blocks: 1198052
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1241494
(Assignee)

Updated

2 years ago
Attachment #8710455 - Attachment is obsolete: true
Attachment #8710455 - Flags: approval-mozilla-aurora?
(Reporter)

Comment 12

2 years ago
Created attachment 8710606 [details] [diff] [review]
Fix dictionary upgrade scripts

Sorry about that Ehsan.  The attached patch fixes my scripts so the upgrade should work smoothly.  Please first apply this patch, then do the upgrade.  Do an sanity check to make sure 1-base.txt is not empty and that 5-mozilla-added does not contain "get's" or any non-ascii words.
Flags: needinfo?(ehsan)
(Reporter)

Comment 13

2 years ago
Created attachment 8710617 [details] [diff] [review]
Part 1: Fix dictionary upgrade scripts

(Fix bug number in commit message)
Attachment #8710606 - Attachment is obsolete: true
(Assignee)

Comment 14

2 years ago
Thanks Kevin, this fixes the issue!
Flags: needinfo?(ehsan)
(Assignee)

Comment 15

2 years ago
Created attachment 8710624 [details] [diff] [review]
Part 2: Update the en-US dictionary to SCOWL 2016.01.19

Comment 17

2 years ago
Nice job. If you look at the changes to 5-mozilla-added

https://hg.mozilla.org/integration/mozilla-inbound/diff/e1208d0c3551/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/5-mozilla-added

you can see the effects of bug 301712: Lost of questionable names got removed. Also removed bad data like "Don't" or "The".

Sadly this file also shows the SCOWL deficiencies. While for example "guestbook", "wildcard" and "weaponize" are no longer Mozilla-added since they are now SCOWL-provided, SCOWL is still missing the derived forms (plural, possessive or conjugated forms):
 guestbook's
 guestbooks

 weaponized
 weaponizes
 weaponizing

 wildcard's
 wildcards
which Mozilla needs to add.

http://app.aspell.net/lookup?dict=en_US;words=guestbook%27s%0D%0Aguestbooks%0D%0Aweaponized%0D%0Aweaponizes%0D%0Aweaponizing%0D%0Awildcard%27s%0D%0Awildcards

Kevin, you might want to take a look at 5-mozilla-added and straighten out your data. Would you like me to raise an issue at Github for that?
(Reporter)

Comment 18

2 years ago
Jork K: Yes, please file an upstream issue on GitHub.  Now that I got a release out it may be a while before I have a chance to look into it.
remote:   https://hg.mozilla.org/mozilla-central/rev/3f280d724c8a
remote:   https://hg.mozilla.org/mozilla-central/rev/c5da92c5b490
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox46: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46

Comment 22

2 years ago
Comment on attachment 8710617 [details] [diff] [review]
Part 1: Fix dictionary upgrade scripts

Approval Request Comment
[Feature/regressing bug #]: No regression.
[User impact if declined]: A poorer dictionary with less words.
[Describe test coverage new/current, TreeHerder]: N/A.
[Risks and why]: No risk, en-US dictionary change only.
[String/UUID change made/needed]: None.

It would be good to include a richer more up-to-date dictionary in ESR 45 (also, dare I say it, for the benefit of Thunderbird users).

This is the fourth and really final dictionary fix I'm requesting uplift for ;-)
For the other three see bug 1235506 comment #63, bug 1238031 comment #21 and bug 301712 comment #46.
Attachment #8710617 - Attachment description: Fix dictionary upgrade scripts → Part 1: Fix dictionary upgrade scripts
Attachment #8710617 - Flags: approval-mozilla-aurora?

Comment 23

2 years ago
Comment on attachment 8710624 [details] [diff] [review]
Part 2: Update the en-US dictionary to SCOWL 2016.01.19

Approval Request Comment - see previous comment.

Sorry about the NI, I'd be good to uplift before the branch date.

When uplifting, please uplift bug 301712 first, otherwise the patches won't apply.
Flags: needinfo?(sledru)
Attachment #8710624 - Flags: approval-mozilla-aurora?
status-firefox45: --- → affected
Flags: needinfo?(sledru)
Comment on attachment 8710617 [details] [diff] [review]
Part 1: Fix dictionary upgrade scripts

Please don't ni me on uplift requests. I am watching these daily.
Attachment #8710617 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8710624 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
ehsan: this is somehow landed twice on central like with comment #21 and your push #25 ?
Flags: needinfo?(ehsan)
ok seems this was initially backed out yesterday and then re-checkedin and so caused the 2 central comments here, at least when i check https://hg.mozilla.org/mozilla-central/log/7459d6a67610/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict
Comment hidden (obsolete)

Updated

2 years ago
Flags: needinfo?(ehsan)

Comment 29

2 years ago
Actually, it's more messy than that:
See https://hg.mozilla.org/mozilla-central/pushloghtml in chronological order:

First attempt:
3543f727e23a	Ehsan Akhgari — Bug 1240916 - Update the en-US dictionary to SCOWL 2016.01.19
Got backed out:
7f3168c2fb0a	Ehsan Akhgari — Backout bug 1240916 because the update from upstream seems to be broken a=merge

Second attempt got landed:
c5da92c5b490	Ehsan Akhgari — Bug 1240916 - Part 2: Update the en-US dictionary to SCOWL 2016.01.19 a=merge
3f280d724c8a	Kevin Atkinson — Bug 1240916 - Part 1: Fix dictionary upgrade scripts a=merge

And backed out? Why?
05d3068b8573	Ehsan Akhgari — Backout bug 1240916 because the update from upstream seems to be broken

Then the same got landed a second time:
e1208d0c3551	Ehsan Akhgari — Bug 1240916 - Part 2: Update the en-US dictionary to SCOWL 2016.01.19
7459d6a67610	Kevin Atkinson — Bug 1240916 - Part 1: Fix dictionary upgrade scripts

Comment 30

2 years ago
Oops, looking at this again:

https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
Shows 3543f727e23a 7f3168c2fb0a c5da92c5b490

https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict
Shows 3543f727e23a 7f3168c2fb0a 3f280d724c8a

Looks like e1208d0c3551	and 7459d6a67610 got backed out??
Flags: needinfo?(wkocher)
Flags: needinfo?(ehsan)

Comment 31

2 years ago
Looks like 05d3068b8573 e1208d0c3551 and 7459d6a67610 appear in the pushlog but they never really happened.

So the real landed patches are in comment #21 and comment #25 is wrong.

Wes, can you please confirm.
alos would be nice if we could clarify which csets should be uplifted now ?

Comment 33

2 years ago
(In reply to Carsten Book [:Tomcat] from comment #32)
> also would be nice if we could clarify which csets should be uplifted now ?

There seems to be a choice of two ;-)

https://hg.mozilla.org/mozilla-central/rev/3f280d724c8a
https://hg.mozilla.org/mozilla-central/rev/c5da92c5b490

https://hg.mozilla.org/mozilla-central/rev/7459d6a67610
https://hg.mozilla.org/mozilla-central/rev/e1208d0c3551

From my looking at it, the first two are real and the second two are identical, but never landed on the files, see:

https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/dictionary-sources/install-new-dict

Frankly, I don't think Ehsan can help us here. He pushed some stuff to inbound and is not responsible for what happened on M-C. Here the inbound pushes. They make sense.

1st attempt:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3543f727e23a
Backout:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05d3068b8573
2nd attempt:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7459d6a67610
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1208d0c3551
(Assignee)

Comment 34

2 years ago
So...  The reason I waited to request uplift was that I wanted to signal to the sheriff that I want to land on Aurora myself to avoid precisely the confusion here.  :-)

I'll land on Aurora myself.
status-firefox45: affected → fixed
Flags: needinfo?(ehsan)

Comment 36

2 years ago
(In reply to :Ehsan Akhgari from comment #34)
> So...  The reason I waited to request uplift was that I wanted to signal to
> the sheriff that I want to land on Aurora myself to avoid precisely the
> confusion here.  :-)
That's cool, but can you explain the two landings

https://hg.mozilla.org/mozilla-central/rev/3f280d724c8a
https://hg.mozilla.org/mozilla-central/rev/c5da92c5b490
vs.
https://hg.mozilla.org/mozilla-central/rev/7459d6a67610
https://hg.mozilla.org/mozilla-central/rev/e1208d0c3551
(Assignee)

Comment 37

2 years ago
I have no idea, but one set is not something that I pushed: <https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=3f280d724c8a>!
I think that was just the tool we use to mark merges to m-c getting confused by the backout and relanding.
Flags: needinfo?(wkocher)
(In reply to Wes Kocher (:KWierso) from comment #38)
> I think that was just the tool we use to mark merges to m-c getting confused
> by the backout and relanding.

Actually, I remember what happened better now that I've talked about it some more with jorgk:

Ehsan originally landed one changeset on inbound, and later pushed a backout to inbound, and after that re-landed two changesets to inbound.

When I was working on getting a merge from inbound to m-c, the best candidate for the merge at the time was between the push where ehsan originally landed the one changeset and the backout. 

To make sure what I was about to merge was green, I grafted the backout commit onto what I was about to push to m-c. This generated a new revision hash for the backout commit because that's how `hg graft` works.

I then pushed that merge to m-c and did merges from all of the other integration branches onto m-c.

While trying to merge the newly updated m-c back around to the integration branches, I was hitting merge conflicts with inbound because of ehsan's relanded 2-cset push, since they touched the same files that were included in my original merge from inbound to m-c. 

To get things to cleanly merge everywhere (without me having to do a bunch of manual merge conflict resolution and potentially breaking things), I grafted ehsan's 2-cset push onto m-c. (And again, those grafted csets get new hashes since that's what graft does).

With all of the changes from ehsan's pushes to inbound effectively combined on m-c, I was able to cleanly push everything back to inbound without conflicts.

Then when tomcat did merges from inbound to m-c this morning, it included all of the original csets of the things I had grafted onto m-c yesterday, causing them to show up twice.

The two batches of csets on m-c should be identical other than the differing hashes, and the combination of the two should show as being dealt with in the "merge inbound to m-c" merge commit at the top of tomcat's merge push.



tl;dr: I made this a lot more confusing than it needed to be with all of the grafting of things, and this could've been avoided if I just skipped merging inbound to m-c yesterday and left it for tomcat to get today, but everything worked out okay in the end.
You need to log in before you can comment on or make changes to this bug.