Last Comment Bug 339123 - Words that should or need to be added to the dictionary
: Words that should or need to be added to the dictionary
Status: VERIFIED FIXED
: verified1.8.1
Product: Core
Classification: Components
Component: Spelling checker (show other bugs)
: 1.8 Branch
: All All
: -- normal with 1 vote (vote)
: mozilla1.8.1beta2
Assigned To: Brett Wilson
:
Mentors:
http://lingucomponent.openoffice.org
: 214519 223322 236757 259916 308744 340176 344372 383970 (view as bug list)
Depends on:
Blocks: SpellCheckTracking
  Show dependency treegraph
 
Reported: 2006-05-24 10:32 PDT by u88484
Modified: 2007-11-14 10:08 PST (History)
31 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Words described above (6.22 KB, patch)
2006-07-17 15:23 PDT, Brett Wilson
no flags Details | Diff | Splinter Review
A few more (7.62 KB, patch)
2006-07-25 09:27 PDT, Brett Wilson
no flags Details | Diff | Splinter Review
Patch with patch in it (16.36 KB, patch)
2006-07-29 12:34 PDT, Brett Wilson
mscott: review+
mbeltzner: approval1.8.1+
Details | Diff | Splinter Review

Description u88484 2006-05-24 10:32:53 PDT
Tracking bug for all words that need to be added to the dictionary.
Comment 1 Joe Hughes 2006-06-07 13:56:58 PDT
"proven" doesn't seem to be in the dictionary
Comment 2 Brett Wilson 2006-06-12 13:01:22 PDT
+ add "online"
Comment 3 Frank 2006-06-13 03:51:07 PDT
Should words be added to this bug, or should a new bug be filled and added here?
Comment 4 Brett Wilson 2006-06-13 10:55:13 PDT
Add words to this bug. I'm closing all the other bugs and listing them here:

Linux distributions:
+ debian
+ ubuntu
+ slackware
+ gentoo
Rationale: Nerds will complain if we don't add them :) and they aren't similar to real words that would confuse "normal" people. I explicitly did NOT add "suse" because it is very likely normal people will type "sues" "sue" "use", etc. and we don't want to confuse them.

Internet:
+ online (as above)
+ unsubscribe
+ blog
+ blogger
+ blogging
+ blogged
+ blogs
+ podcast
+ internet (currently it has only the capitalized version)

Random:
+ caffeinated
+ gauge

Mozilla:
+ Mozilla
+ Thunderbird
+ Firefox
+ Sunbird
+ Seamonkey

Companies and product names:
+ Google
+ eBay
+ PayPal
+ PowerPoint
Company names are a very slippery slope as there is no place to stop. These are the company names that I think are very likely to appear in web forms among the general internet population. "Microsoft" is already in the dictionary, and "Yahoo" is a real word. We can consider adding a FEW other names, but not many. If this is going to be contentious, I'd prefer adding none.
Comment 5 Brett Wilson 2006-06-13 11:53:31 PDT
*** Bug 223322 has been marked as a duplicate of this bug. ***
Comment 6 Brett Wilson 2006-06-13 11:53:59 PDT
*** Bug 259916 has been marked as a duplicate of this bug. ***
Comment 7 Brett Wilson 2006-06-13 11:54:22 PDT
*** Bug 236757 has been marked as a duplicate of this bug. ***
Comment 8 Brett Wilson 2006-06-13 11:54:37 PDT
*** Bug 214519 has been marked as a duplicate of this bug. ***
Comment 9 Brett Wilson 2006-06-13 11:55:14 PDT
*** Bug 308744 has been marked as a duplicate of this bug. ***
Comment 10 Brett Wilson 2006-06-13 11:55:23 PDT
*** Bug 340176 has been marked as a duplicate of this bug. ***
Comment 11 Brett Wilson 2006-06-13 11:57:40 PDT
+ webcast
Comment 12 Magnus Melin 2006-06-13 12:18:58 PDT
That would be SeaMonkey, with capital M, no?
And the linux distros, shouldn't they also be capitalized (or maybe both?). And why not add SUSE in all caps?
Comment 13 Brett Wilson 2006-06-14 13:16:36 PDT
- Seamonkey above
+ SeaMonkey
+ JavaScript
+ inline
+ http
+ ftp
+ https
Comment 14 Frank 2006-06-14 13:29:17 PDT
TCP is in the dictionary, but IP and UDP are not
Comment 15 Stewart Gordon 2006-06-18 05:26:15 PDT
By "the" dictionary, are we talking:
- English (Australia)?
- English (Canada)?
- English (United Kingdom)?
- English (US)?
- English (New Zealand)?
- something else entirely?

We should look at them all when deciding which words to add.  OK, so I'd expect a lot of these computing neologisms I'd expect to be more or less dialect-neutral, but there might be a few that aren't.  For example, what about "online" vs. "on-line" and "inline" vs. "in-line"?  And is "caffeinated" (which strikes me as possibly a back-formation) established in most English dialects?
Comment 16 tommyjb 2006-06-23 02:04:19 PDT
The dictionary allows (and suggests) some invalid words, such as "yous" and "thats".  Should I start a new bug about this?
Comment 17 tommyjb 2006-06-23 03:01:56 PDT
+ focussed
Comment 18 Brett Wilson 2006-06-23 07:42:31 PDT
I think focussed must be British spelling or something. I've never seen it and it looks very strange to me. MS Word marks it as misspelled. I wonder if there is a separate en-GB dictionary?
Comment 19 tommyjb 2006-06-23 09:07:42 PDT
I'm not sure, but both Answer.com ("v., -cused or -cussed") and Dictionary.com ("v. fo·cused, or fo·cussed") list it, and neither suggests that it's a British variant.
Comment 20 Stewart Gordon 2006-06-25 07:00:52 PDT
(In reply to comment #18)
> I think focussed must be British spelling or something. I've never seen it and
> it looks very strange to me.

A quick look through some BrE dictionaries shows that both spellings are valid.

> MS Word marks it as misspelled.

Regardless of which dictionary you use?  Here (MS Office X), only the US dictionary rejects "focussed" - for UK and Australia it accepts both "focused" and "focussed".

> I wonder if there is a separate en-GB dictionary?

Separate - from what?
en-GB dictionary - of course there is.  That's exactly what I was talking about in comment 15, as you could have seen for yourself by selecting "Download More" from the spellchecker UI.
Comment 21 Magnus Melin 2006-07-12 11:20:07 PDT
*** Bug 344372 has been marked as a duplicate of this bug. ***
Comment 22 tommyjb 2006-07-16 21:38:43 PDT
+ proven
Comment 23 tommyjb 2006-07-16 21:39:44 PDT
D'oh -- someone has already said "proven".  Apologies.
Comment 24 Brett Wilson 2006-07-17 15:23:45 PDT
Created attachment 229555 [details] [diff] [review]
Words described above

I added all the words given above, focussed, and SUSE. SUSE looks OK because I was unable to get it to suggest SUSE for a misspelling of a real word.
Comment 25 Brett Wilson 2006-07-17 15:24:58 PDT
The patch above is for en-US only (the one that is checked into the tree). We should apply this to the other English dictionaries. I think all the words apply to other variants, but I don't know where those live.
Comment 26 Scott MacGregor 2006-07-17 15:27:03 PDT
Comment on attachment 229555 [details] [diff] [review]
Words described above

If we add these words here, all of these words are going to get clobbered the next time we update the dictionary. This file gets copied over from open office.org every time we update the spell checker. At least that's the idea anyway :)
Comment 27 u88484 2006-07-17 15:32:40 PDT
(In reply to comment #26)
> (From update of attachment 229555 [details] [diff] [review] [edit])
> If we add these words here, all of these words are going to get clobbered the
> next time we update the dictionary. This file gets copied over from open
> office.org every time we update the spell checker. At least that's the idea
> anyway :)
> 

Heh, mid-air collision with basically same comment. Some dictionaries on openoffice.org haven't been updated in years though...so I would think that the clobber wouldn't happen too often but its still a pain to make sure it doesn't before each release.

Via http://lingucomponent.openoffice.org/spell_dic.html

English (United States) en_US 2004-06-23
Comment 28 Brett Wilson 2006-07-17 15:42:58 PDT
(In reply to comment #26)
> (From update of attachment 229555 [details] [diff] [review] [edit])
> If we add these words here, all of these words are going to get clobbered the
> next time we update the dictionary. This file gets copied over from open
> office.org every time we update the spell checker.

I thought OO wasn't updating MySpell anymore and have switched to Hunspell.

We've never updated this file before.

Even if you are right, what would you suggest doing? We really need some words added to the dictionary, and I wouldn't expect MySpell/HunSpell to add all of them (for example, Mozilla, SeaMonkey, etc). I don't see any way around it.

We could check the patch in the directory like with do with sqlite if you are really worried.
Comment 29 Mike Connor [:mconnor] 2006-07-18 10:27:03 PDT
Comment on attachment 229555 [details] [diff] [review]
Words described above

Clearing approval request until reviewed
Comment 30 Frank 2006-07-20 10:39:34 PDT
Don't know if it is too late, but "spam" is also missing
Comment 31 Brett Wilson 2006-07-20 10:43:49 PDT
I'll do a new patch with
+ spam
+ cafe
+ webmaster

I'm less sure about:
+ phishing (and variants)
Comment 32 tommyjb 2006-07-20 10:47:49 PDT
FWIW, it seems that "proven" is missing from the current patch.
Comment 33 Frank 2006-07-20 10:54:22 PDT
hishing protection is one of the new features of Firefox, so why not add it.

Also saw that "spammer" and "spammers" are missing.
Comment 34 u88484 2006-07-24 15:11:55 PDT
+ uninstall/uninstalling 
Comment 35 Brett Wilson 2006-07-25 09:27:14 PDT
Created attachment 230586 [details] [diff] [review]
A few more
Comment 36 Brett Wilson 2006-07-25 09:30:04 PDT
Some of these words should probably be added to the source dictionary, although I'm not sure if its still being maintained.

In any case, many other of these words should not be added (like "Mozilla") but we need them in our dictionary. How about it I check this patch in to the dictionary directory so we can apply it if we ever update the dictionary. We do this successfully with sqlite.
Comment 37 Dão Gottwald [:dao] 2006-07-25 13:37:22 PDT
viewport
preload
preloading
JavaScript
CSS
XHTML
Comment 38 Michiel van Leeuwen (email: mvl+moz@) 2006-07-25 13:39:31 PDT
Do we really want to add every tech word we can come up with? If so, do we also want to add words from other jargons? If so, where do we stop?
Comment 39 Dão Gottwald [:dao] 2006-07-25 13:40:18 PDT
(In reply to comment #35)
> Created an attachment (id=230586) [edit]
> A few more

Why -HTTP?
Comment 40 Brett Wilson 2006-07-25 13:45:59 PDT
The criteria are words that are likely to be typed into webmail or other forms by a large number of our target audience. We're not adding HTML tag names, and I would also argue against "viewport" for this reason. Bug JavaScript is OK, as well as some words like "unsubscribe" that are often found in email.

The Linux names are a bit silly and maybe we shouldn't add them, but I checked to make sure they won't get suggested by other words, so it doesn't really hurt.
Comment 41 Brett Wilson 2006-07-25 13:47:26 PDT
Because I added "http" which is not case sensitive. If we don't get URL identification working, this will help in URLs.
Comment 42 Scott MacGregor 2006-07-27 18:00:12 PDT
(In reply to comment #28)
> We've never updated this file before.

I've updated the en-US dictionary files several times over the years. But they have moved into the locale specific directory without preserving CVS history which is why you can't see that from the log. 

> We could check the patch in the directory like with do with sqlite if you are
> really worried.

I was just going to suggest adding a README comment like we do for the myspell changes that can sit along side the dictionary, but your suggestion of checking in the actual patch along side the dictionary is probably an even better idea. I like it. 

Comment 43 Brett Wilson 2006-07-29 12:34:23 PDT
Created attachment 231249 [details] [diff] [review]
Patch with patch in it

This implements the requirements of the previous discussion, checking in the patch alongside the dictionary. I also added a small readme.

There's no reason that we need to let this bake on trunk before checking into branch, so I'm requesting approval.
Comment 44 Scott MacGregor 2006-07-30 15:37:26 PDT
Comment on attachment 231249 [details] [diff] [review]
Patch with patch in it

thanks for adding the readme and the patch files Brett.
Comment 45 Brett Wilson 2006-07-31 11:11:19 PDT
Fixed on trunk, leaving open for branch checkin.
Comment 46 Mike Beltzner [:beltzner, not reading bugmail] 2006-07-31 20:41:31 PDT
Comment on attachment 231249 [details] [diff] [review]
Patch with patch in it

a=drivers, please land this on the MOZILLA_1_8_BRANCH.
Comment 47 Brett Wilson 2006-08-01 09:17:11 PDT
Fixed on branch.
Comment 48 Uri Bernstein (Google) 2006-08-02 06:11:46 PDT
Not sure if this is the place to complain about more missing words (if not, please let me know where the right place is).

The dictionary is missing the word "programmatically", although my dictionary tells me it's a real word. (http://www.answers.com/main/ntquery?s=programmatically)
Comment 49 Brett Wilson 2006-08-02 08:49:43 PDT
(In reply to comment #48)
> Not sure if this is the place to complain about more missing words (if not,
> please let me know where the right place is).

No. Perhaps you should file a new bug for keeping track of new words? Adding words is a never-ending task and I'm *so* done worrying about this, so don't CC me on it :)
Comment 50 Declan Naughton 2006-08-08 20:38:49 PDT
+ toolbar   ?

too late?
Comment 51 Brett Wilson 2006-08-08 20:53:14 PDT
(In reply to comment #50)
> + toolbar   ?
> 
> too late?

Yes.
Comment 52 Cameron 2006-08-08 21:28:17 PDT
(In reply to comment #51)
> (In reply to comment #50)
> > + toolbar   ?
> > 
> > too late?
> 
> Yes.
> 

How are we going to maintain the dictionary in future? Are we forking from ooo, or are we going to track their releases and then apply the patch with mozilla/tech-related words every time?

Should we be suggesting our words to ooo perhaps, and jointly maintain it?
Comment 53 Brett Wilson 2006-08-08 21:34:05 PDT
There is a bug on replacing MySpell with HunSpell. Hopefully we'll use that in Firefox 3 becuase it should be much better, so maybe worrying about OO is a waste of time. My goal for this release was to get the major words that were important (e.g. "Mozilla") and a few other random things that would be useful.

We don't have a policy right now. For strategy moving forward, I have no idea, and hopefully I won't be the person in charge of doing this forever.
Comment 54 Hawley Rising 2006-11-30 17:47:04 PST
The word gauge is misspelled in the spell checker/dictionary.  Gage is *not* a correct spelling, rather it was introduced into English by Toyota Motor Co. who misspelled the things on the dashboard.  Whether it shows up now in online dictionaries is not a good gauge of its correctness.

Regarding some of the other comments, focussed is correct, so is targetting, and a whole host of other words that have been re-written since MS Word decided to weigh in on spelling without knowing how to spell properly.  The rules are that if a vowel is short and precedes the last letter of a word, and leaving the last consonant single would normally be pronounced as a long vowel, you add a repeat of the last letter.  There are a lot of other rules that get ignored sometimes too, like adding a 'k' after a 'c' before 'ing' (ci is pronounced like in cigar), as in mimicking or mimicked.  Retaining an 'e' to avoid the wrong pronunciation of 'g' is also correct, as in manageable.  This business about "British variants" is malarky unless you are talking about colour or humour.
Comment 55 Brett Wilson 2006-11-30 18:06:22 PST
(In reply to comment #54)
> The word gauge is misspelled in the spell checker/dictionary.

The job of the spellchecking dictionary is not to implement somebody's idea of correctness. "Gage" is used quite commonly on the web, with 10s of millions of hits on Google, including many gauge-making companies. My American car also uses "gage".
Comment 56 Dave Townsend [:mossop] 2007-06-10 16:33:11 PDT
*** Bug 383970 has been marked as a duplicate of this bug. ***
Comment 57 Dave Townsend [:mossop] 2007-06-10 16:35:07 PDT
(In reply to comment #25)
> The patch above is for en-US only (the one that is checked into the tree). We
> should apply this to the other English dictionaries. I think all the words
> apply to other variants, but I don't know where those live.

We might want to reopen this since this never happened.

Note You need to log in before you can comment on or make changes to this bug.