Closed Bug 339123 Opened 15 years ago Closed 15 years ago

Words that should or need to be added to the dictionary

Categories

(Core :: Spelling checker, defect)

1.8 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.8.1beta2

People

(Reporter: u88484, Assigned: brettw)

References

()

Details

(Keywords: verified1.8.1)

Attachments

(1 file, 2 obsolete files)

Tracking bug for all words that need to be added to the dictionary.
Summary: [meta] Words that should/need to be added to the dictionary → [meta] Words that should or need to be added to the dictionary
Depends on: 308744
Whiteboard: swag: 1d
Depends on: 339899
No longer depends on: 339899
Depends on: 340176
"proven" doesn't seem to be in the dictionary
+ add "online"
Should words be added to this bug, or should a new bug be filled and added here?
Add words to this bug. I'm closing all the other bugs and listing them here:

Linux distributions:
+ debian
+ ubuntu
+ slackware
+ gentoo
Rationale: Nerds will complain if we don't add them :) and they aren't similar to real words that would confuse "normal" people. I explicitly did NOT add "suse" because it is very likely normal people will type "sues" "sue" "use", etc. and we don't want to confuse them.

Internet:
+ online (as above)
+ unsubscribe
+ blog
+ blogger
+ blogging
+ blogged
+ blogs
+ podcast
+ internet (currently it has only the capitalized version)

Random:
+ caffeinated
+ gauge

Mozilla:
+ Mozilla
+ Thunderbird
+ Firefox
+ Sunbird
+ Seamonkey

Companies and product names:
+ Google
+ eBay
+ PayPal
+ PowerPoint
Company names are a very slippery slope as there is no place to stop. These are the company names that I think are very likely to appear in web forms among the general internet population. "Microsoft" is already in the dictionary, and "Yahoo" is a real word. We can consider adding a FEW other names, but not many. If this is going to be contentious, I'd prefer adding none.
*** Bug 223322 has been marked as a duplicate of this bug. ***
*** Bug 259916 has been marked as a duplicate of this bug. ***
*** Bug 236757 has been marked as a duplicate of this bug. ***
*** Bug 214519 has been marked as a duplicate of this bug. ***
No longer depends on: 214519
*** Bug 308744 has been marked as a duplicate of this bug. ***
No longer depends on: 308744
*** Bug 340176 has been marked as a duplicate of this bug. ***
No longer depends on: 340176
No longer depends on: 223322, 236757, 259916
Summary: [meta] Words that should or need to be added to the dictionary → Words that should or need to be added to the dictionary
+ webcast
That would be SeaMonkey, with capital M, no?
And the linux distros, shouldn't they also be capitalized (or maybe both?). And why not add SUSE in all caps?
- Seamonkey above
+ SeaMonkey
+ JavaScript
+ inline
+ http
+ ftp
+ https
Target Milestone: --- → mozilla1.8.1beta2
TCP is in the dictionary, but IP and UDP are not
Assignee: mscott → brettw
By "the" dictionary, are we talking:
- English (Australia)?
- English (Canada)?
- English (United Kingdom)?
- English (US)?
- English (New Zealand)?
- something else entirely?

We should look at them all when deciding which words to add.  OK, so I'd expect a lot of these computing neologisms I'd expect to be more or less dialect-neutral, but there might be a few that aren't.  For example, what about "online" vs. "on-line" and "inline" vs. "in-line"?  And is "caffeinated" (which strikes me as possibly a back-formation) established in most English dialects?
The dictionary allows (and suggests) some invalid words, such as "yous" and "thats".  Should I start a new bug about this?
+ focussed
I think focussed must be British spelling or something. I've never seen it and it looks very strange to me. MS Word marks it as misspelled. I wonder if there is a separate en-GB dictionary?
I'm not sure, but both Answer.com ("v., -cused or -cussed") and Dictionary.com ("v. fo·cused, or fo·cussed") list it, and neither suggests that it's a British variant.
(In reply to comment #18)
> I think focussed must be British spelling or something. I've never seen it and
> it looks very strange to me.

A quick look through some BrE dictionaries shows that both spellings are valid.

> MS Word marks it as misspelled.

Regardless of which dictionary you use?  Here (MS Office X), only the US dictionary rejects "focussed" - for UK and Australia it accepts both "focused" and "focussed".

> I wonder if there is a separate en-GB dictionary?

Separate - from what?
en-GB dictionary - of course there is.  That's exactly what I was talking about in comment 15, as you could have seen for yourself by selecting "Download More" from the spellchecker UI.
*** Bug 344372 has been marked as a duplicate of this bug. ***
+ proven
D'oh -- someone has already said "proven".  Apologies.
Attached patch Words described above (obsolete) — Splinter Review
I added all the words given above, focussed, and SUSE. SUSE looks OK because I was unable to get it to suggest SUSE for a misspelling of a real word.
Attachment #229555 - Flags: review?(mscott)
Attachment #229555 - Flags: approval1.8.1?
The patch above is for en-US only (the one that is checked into the tree). We should apply this to the other English dictionaries. I think all the words apply to other variants, but I don't know where those live.
Whiteboard: swag: 1d → has patch
Version: Trunk → 1.8 Branch
Comment on attachment 229555 [details] [diff] [review]
Words described above

If we add these words here, all of these words are going to get clobbered the next time we update the dictionary. This file gets copied over from open office.org every time we update the spell checker. At least that's the idea anyway :)
(In reply to comment #26)
> (From update of attachment 229555 [details] [diff] [review] [edit])
> If we add these words here, all of these words are going to get clobbered the
> next time we update the dictionary. This file gets copied over from open
> office.org every time we update the spell checker. At least that's the idea
> anyway :)
> 

Heh, mid-air collision with basically same comment. Some dictionaries on openoffice.org haven't been updated in years though...so I would think that the clobber wouldn't happen too often but its still a pain to make sure it doesn't before each release.

Via http://lingucomponent.openoffice.org/spell_dic.html

English (United States) en_US 2004-06-23
(In reply to comment #26)
> (From update of attachment 229555 [details] [diff] [review] [edit])
> If we add these words here, all of these words are going to get clobbered the
> next time we update the dictionary. This file gets copied over from open
> office.org every time we update the spell checker.

I thought OO wasn't updating MySpell anymore and have switched to Hunspell.

We've never updated this file before.

Even if you are right, what would you suggest doing? We really need some words added to the dictionary, and I wouldn't expect MySpell/HunSpell to add all of them (for example, Mozilla, SeaMonkey, etc). I don't see any way around it.

We could check the patch in the directory like with do with sqlite if you are really worried.
Comment on attachment 229555 [details] [diff] [review]
Words described above

Clearing approval request until reviewed
Attachment #229555 - Flags: approval1.8.1?
Don't know if it is too late, but "spam" is also missing
I'll do a new patch with
+ spam
+ cafe
+ webmaster

I'm less sure about:
+ phishing (and variants)
FWIW, it seems that "proven" is missing from the current patch.
hishing protection is one of the new features of Firefox, so why not add it.

Also saw that "spammer" and "spammers" are missing.
+ uninstall/uninstalling 
Attached patch A few more (obsolete) — Splinter Review
Attachment #229555 - Attachment is obsolete: true
Attachment #230586 - Flags: review?(mscott)
Attachment #229555 - Flags: review?(mscott)
Some of these words should probably be added to the source dictionary, although I'm not sure if its still being maintained.

In any case, many other of these words should not be added (like "Mozilla") but we need them in our dictionary. How about it I check this patch in to the dictionary directory so we can apply it if we ever update the dictionary. We do this successfully with sqlite.
viewport
preload
preloading
JavaScript
CSS
XHTML
Do we really want to add every tech word we can come up with? If so, do we also want to add words from other jargons? If so, where do we stop?
(In reply to comment #35)
> Created an attachment (id=230586) [edit]
> A few more

Why -HTTP?
The criteria are words that are likely to be typed into webmail or other forms by a large number of our target audience. We're not adding HTML tag names, and I would also argue against "viewport" for this reason. Bug JavaScript is OK, as well as some words like "unsubscribe" that are often found in email.

The Linux names are a bit silly and maybe we shouldn't add them, but I checked to make sure they won't get suggested by other words, so it doesn't really hurt.
Because I added "http" which is not case sensitive. If we don't get URL identification working, this will help in URLs.
(In reply to comment #28)
> We've never updated this file before.

I've updated the en-US dictionary files several times over the years. But they have moved into the locale specific directory without preserving CVS history which is why you can't see that from the log. 

> We could check the patch in the directory like with do with sqlite if you are
> really worried.

I was just going to suggest adding a README comment like we do for the myspell changes that can sit along side the dictionary, but your suggestion of checking in the actual patch along side the dictionary is probably an even better idea. I like it. 

This implements the requirements of the previous discussion, checking in the patch alongside the dictionary. I also added a small readme.

There's no reason that we need to let this bake on trunk before checking into branch, so I'm requesting approval.
Attachment #230586 - Attachment is obsolete: true
Attachment #231249 - Flags: review?(mscott)
Attachment #231249 - Flags: approval1.8.1?
Attachment #230586 - Flags: review?(mscott)
Whiteboard: has patch → [needs review]
Comment on attachment 231249 [details] [diff] [review]
Patch with patch in it

thanks for adding the readme and the patch files Brett.
Attachment #231249 - Flags: review?(mscott) → review+
Attachment #231249 - Attachment description: Patch with path in it → Patch with patch in it
Whiteboard: [needs review] → [needs approval]
Fixed on trunk, leaving open for branch checkin.
Comment on attachment 231249 [details] [diff] [review]
Patch with patch in it

a=drivers, please land this on the MOZILLA_1_8_BRANCH.
Attachment #231249 - Flags: approval1.8.1? → approval1.8.1+
Fixed on branch.
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [needs approval]
Not sure if this is the place to complain about more missing words (if not, please let me know where the right place is).

The dictionary is missing the word "programmatically", although my dictionary tells me it's a real word. (http://www.answers.com/main/ntquery?s=programmatically)
(In reply to comment #48)
> Not sure if this is the place to complain about more missing words (if not,
> please let me know where the right place is).

No. Perhaps you should file a new bug for keeping track of new words? Adding words is a never-ending task and I'm *so* done worrying about this, so don't CC me on it :)
+ toolbar   ?

too late?
(In reply to comment #50)
> + toolbar   ?
> 
> too late?

Yes.
(In reply to comment #51)
> (In reply to comment #50)
> > + toolbar   ?
> > 
> > too late?
> 
> Yes.
> 

How are we going to maintain the dictionary in future? Are we forking from ooo, or are we going to track their releases and then apply the patch with mozilla/tech-related words every time?

Should we be suggesting our words to ooo perhaps, and jointly maintain it?
There is a bug on replacing MySpell with HunSpell. Hopefully we'll use that in Firefox 3 becuase it should be much better, so maybe worrying about OO is a waste of time. My goal for this release was to get the major words that were important (e.g. "Mozilla") and a few other random things that would be useful.

We don't have a policy right now. For strategy moving forward, I have no idea, and hopefully I won't be the person in charge of doing this forever.
Status: RESOLVED → VERIFIED
Keywords: verified1.8.1
The word gauge is misspelled in the spell checker/dictionary.  Gage is *not* a correct spelling, rather it was introduced into English by Toyota Motor Co. who misspelled the things on the dashboard.  Whether it shows up now in online dictionaries is not a good gauge of its correctness.

Regarding some of the other comments, focussed is correct, so is targetting, and a whole host of other words that have been re-written since MS Word decided to weigh in on spelling without knowing how to spell properly.  The rules are that if a vowel is short and precedes the last letter of a word, and leaving the last consonant single would normally be pronounced as a long vowel, you add a repeat of the last letter.  There are a lot of other rules that get ignored sometimes too, like adding a 'k' after a 'c' before 'ing' (ci is pronounced like in cigar), as in mimicking or mimicked.  Retaining an 'e' to avoid the wrong pronunciation of 'g' is also correct, as in manageable.  This business about "British variants" is malarky unless you are talking about colour or humour.
(In reply to comment #54)
> The word gauge is misspelled in the spell checker/dictionary.

The job of the spellchecking dictionary is not to implement somebody's idea of correctness. "Gage" is used quite commonly on the web, with 10s of millions of hits on Google, including many gauge-making companies. My American car also uses "gage".
Duplicate of this bug: 383970
(In reply to comment #25)
> The patch above is for en-US only (the one that is checked into the tree). We
> should apply this to the other English dictionaries. I think all the words
> apply to other variants, but I don't know where those live.

We might want to reopen this since this never happened.
You need to log in before you can comment on or make changes to this bug.