Closed Bug 1490541 Opened 6 years ago Closed 6 years ago

add words to en-US.dic

Categories

(Core :: Spelling checker, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: ananuti, Assigned: ananuti)

References

Details

User Story

merchantability - The condition, state, or quality of being merchantable; saleability. (Chiefly in legal contexts.).
https://en.oxforddictionaries.com/definition/us/merchantability

salability - capable of being or fit to be sold
https://en.oxforddictionaries.com/definition/us/salability

sucky - very bad or unpleasant
‘What a sucky, sucky way to end a sucky, sucky day.’
https://en.oxforddictionaries.com/definition/us/sucky


====
From:
https://www.merriam-webster.com/words-at-play/new-words-in-the-dictionary-september-2018

Latinx - (Capitalized) - a gender-neutral alternative to Latino or Latina
https://www.merriam-webster.com/dictionary/Latinx
https://en.oxforddictionaries.com/definition/latinx

adorbs - extremely charming or appealing; adorable
https://www.merriam-webster.com/dictionary/adorbs
https://en.oxforddictionaries.com/definition/us/adorbs

avo - an avocado
https://www.merriam-webster.com/dictionary/avo

bingeable - something you can binge
https://www.merriam-webster.com/dictionary/bingeable

biohacking, biohacker - biological experimentation (as by gene editing or the use of drugs or implants) done to improve the qualities or capabilities of living organisms especially by individuals and groups outside of a traditional medical or scientific research environment
https://www.merriam-webster.com/dictionary/biohacking
https://en.oxforddictionaries.com/definition/us/biohacking

bougie - bourgeois
https://www.merriam-webster.com/dictionary/bougie
https://en.oxforddictionaries.com/definition/us/bougie

fav - synonym for fave (a verb and noun)
https://www.merriam-webster.com/dictionary/fav
https://en.oxforddictionaries.com/definition/us/fave

fintech - Financial technology
(plural fintechs - can't use magic "S" 🎯)
https://www.merriam-webster.com/dictionary/fintech
https://en.oxforddictionaries.com/definition/us/fintech

gochujang - Korean chili paste ‘lamb cutlets with gochujang, pickled cucumber, and carrot’
https://www.merriam-webster.com/dictionary/gochujang
https://en.oxforddictionaries.com/definition/gochujang

guac - short for guacamole ‘we got chips, salsa, and guac’
https://www.merriam-webster.com/dictionary/guac
https://en.oxforddictionaries.com/definition/guac

hangry, hangrier, hangriest - angry from hunger
https://www.merriam-webster.com/dictionary/hangry
https://en.oxforddictionaries.com/definition/hangry

hophead - one who likes beer
https://www.merriam-webster.com/dictionary/hophead
https://en.oxforddictionaries.com/definition/hophead

iftar - the meal taken by Muslims at sundown
https://www.merriam-webster.com/dictionary/iftar
https://en.oxforddictionaries.com/definition/iftar

mise - from mise en place
https://www.merriam-webster.com/dictionary/mise%20en%20place

mise - the issue in a legal proceeding upon a writ of right; also : the writ itself
https://www.merriam-webster.com/dictionary/mise

mocktail - a alcohol free cocktail
https://www.merriam-webster.com/dictionary/mocktail
https://en.oxforddictionaries.com/definition/mocktail

rando - a random person
https://www.merriam-webster.com/dictionary/rando
https://en.oxforddictionaries.com/definition/us/rando

ribbie - a spelling based on RBI in baseball land
https://www.merriam-webster.com/dictionary/ribbie
https://en.oxforddictionaries.com/definition/us/ribbie

zuke - a zucchini
https://www.merriam-webster.com/dictionary/zuke

Attachments

(1 file)

      No description provided.
Blocks: 499593
Attached patch bug1490541.patchSplinter Review
Attachment #9008273 - Flags: review?(ehsan)
Comment on attachment 9008273 [details] [diff] [review]
bug1490541.patch

Review of attachment 9008273 [details] [diff] [review]:
-----------------------------------------------------------------

::: extensions/spellcheck/locales/en-US/hunspell/en-US.dic
@@ +26397,5 @@
>  fink/MDGS
>  finned
>  finny
> +fintech
> +fintechs

Good observation on not using S!
Attachment #9008273 - Flags: review?(ehsan) → review+
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/84b407430f08
Bug - Add words to en-US dictionary. r=ehsan
Once upon a time we had a long discussion on what should be in the Mozilla en-US dictionary. It's basically based on the (regular) SCOWL dataset, plus some common names, accented words, plus some Mozilla terms and some extra words.

I argued back then that Mozilla should use the large SCOWL dataset and Ehsan argued against it, IIRC, basically saying that users should get a "basic" dictionary without slang or niche or speciality words, also in order to avoid misspellings.
See bug 1235506 comment #9: mask spelling of more common words for example "calender"/"calendar"
See bug 1235506 comment #20: large -> WONTFIX

Here you've added a bunch of words, most of which should *not* have been added at all. The ambition has never been to offer the most complete dictionary. If you want that and don't care about en-US, use https://addons.mozilla.org/en-GB/firefox/addon/british-english-dictionary-2/, that's the most complete one available.

Coming back to the words added. "hangry" is of course an absolute no no, since it will now allow the spelling mistake "hangry" instead of "hungry". Particularly bad since phonetically the "u" in "hungry" is pronounced as "a" (/ˈhʌŋɡri/).

Please take the time and check those words against SCOWL. You can just paste them all
===
merchantability
salability
sucky
latinx
adorbs
bingeable
biohacking
fav
fintech
fintechs
gochujang
guac
hangry
hophead
iftar
mise
mocktail
rando
ribbie
zuke
===

here: http://app.aspell.net/lookup

You will see that most words are not recommended for their dictionary, and some are in the large dataset.

I would back this out. According to SCOWL large, only merchantability, salability, hophead and mocktail are derirable.
Flags: needinfo?(ehsan)
I agree about hangry actually, that's a good point, but do you mind filing a new bug?  No point in backing out the whole patch just because of one hunk.

(In reply to Jorg K (GMT+2) from comment #4)
> You will see that most words are not recommended for their dictionary, and
> some are in the large dataset.

I'm well aware that our en-US dictionary diverges from SCOWL, intentionally so.  I'd rather not have that debate again, since I think it's a matter of opinion.
Flags: needinfo?(ehsan)
Indeed, but could you please state some clear guidelines that lay out what words should be included. What is the definition for the Mozilla en-US dictionary? What are your plans? Only due to lack of clear guidelines there has been the discussion in the past.

IMHO the criterion for inclusion cannot be that someone presents a patch and anything is included. That will make for a very inconsistent and patchy result.

Ehsan, can you as the custodian please make sure such rules are defined and followed. Please don't put the onus of rectifying the the current situation on someone who made a drive-by comment (after coincidentally seeing the changeset on inbound).

Looking at the history of manual additions:
https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/en-US/hunspell/en-US.dic
most seem very welcome, so I don't quite understand what happened this time.

Ekanan, can you please fix the problem. I think not only "hangry" should be removed, but in fact many of the words added this time, see comment #4. I'm happy to stand corrected if Ehsan comes up with some guidelines.

Personally I'd suggest to look all future additions up in http://app.aspell.net/lookup. If they are in the large dictionary, there's no problem including them. If the word has a "should include" rating of one stars, I would generally not include it.
Flags: needinfo?(ananuti)
> Personally I'd suggest to look all future additions up in
> http://app.aspell.net/lookup. If they are in the large dictionary, there's
> no problem including them. If the word has a "should include" rating of one
> stars, I would generally not include it.

Usually, I use that toy ONLY if they are NOT in the AmEng dictionary (OFD/M-W).
If they are, I'll bake a patch. That's my modus operandi. gabish? period.



> I agree about hangry actually, that's a good point, but do you mind filing a new bug? 
> No point in backing out the whole patch just because of one hunk.

Next time around, I'll take it away.
Flags: needinfo?(ananuti)
Ehsan, I really think it needs some guidelines here. If anything in the Oxford or Merriam Webster dictionaries should be included, then we need a different approach to making the dictionary complete.

I don't agree that Kevin Atkinson's tool is a "toy", I believe he and his SCOWL friends do analysis on Google Books regarding the frequency of words and maintain their word lists with great care. Maintaining a dictionary is really the job of a linguist, and neither you, nor me have English as their native language.
https://hg.mozilla.org/mozilla-central/rev/84b407430f08
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
I researched the words added here and they seem to have some usage from what I see in web searches. I came to the conclusion adding them was OK, so please accept my apologies.

Apparently we're aiming for a certain completeness of Mozilla's en-US dictionary, so may I suggest the following:
Download SCOWL's "toy" word lists in the "normal" and "large" size from
https://sourceforge.net/projects/wordlist/files/speller/2018.04.16/wordlist-en_US-2018.04.16.zip/download
https://sourceforge.net/projects/wordlist/files/speller/2018.04.16/wordlist-en_US-large-2018.04.16.zip/download

Compare en_US.txt to en_US-large.txt using some comparison tool and note "useful" words not contained in the smaller set and most likely therefore not contained in Mozilla's dictionary. I did that for the letter "A" and found these useful words:
  Alicante (city in Spain, we already have Madrid, Barcelona, Valencia), Amazonas, Americanist, Anglicist
(all marked as spelling errors).

May I also mention
  enrobe, relict, residuary, enforceability
which were in the Mozilla dictionary before May 2015 (picked from bug 1235506 comment #10).
(In reply to Jorg K (GMT+2) from comment #6)
> Indeed, but could you please state some clear guidelines that lay out what
> words should be included. What is the definition for the Mozilla en-US
> dictionary? What are your plans? Only due to lack of clear guidelines there
> has been the discussion in the past.

There are *no guidelines* at this time.  I invite all who are interested in developing such guidelines that you would like to see to start putting in the time and expertise necessary to research and develop the kind of guideline they would like to see here instead of simply demanding it into existence.

Until such a day where we have some guidelines that we would follow, the process will remain as follows:

Contributions to the en-US dictionary are encouraged.  Contributors are encouraged to study resources including dictionaries, the SCOWL wordset, and any other data sources that should be helpful in the word selection process.  The reviewers will do their best effort to provide guidance.  Occasionally we will get things wrong, and when that happens we encourage bug reports so that we can fix the mistakes.

> Ehsan, can you as the custodian please make sure such rules are defined and
> followed. Please don't put the onus of rectifying the the current situation
> on someone who made a drive-by comment (after coincidentally seeing the
> changeset on inbound).

I did no such thing.  I simply asked you to file a bug instead of commenting on a bug with a patch landed (common Mozilla development practice).

> Looking at the history of manual additions:
> https://hg.mozilla.org/mozilla-central/log/tip/extensions/spellcheck/locales/
> en-US/hunspell/en-US.dic
> most seem very welcome, so I don't quite understand what happened this time.

Well, mistakes happen.

Anyway, Ekanan will take of the issue as mentioned in comment 7, so I don't think there's more to discuss here.  For future discussions, I invite you to read https://bugzilla.mozilla.org/page.cgi?id=etiquette.html again before commenting.  Thank you.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: