Closed Bug 358255 Opened 18 years ago Closed 17 years ago

The us-en dictionary is too permissive for effective spell checking

Categories

(Core :: Spelling checker, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: VanillaMozilla, Assigned: mscott)

Details

The us-en dictionary distributed with Firefox 2.0 allows a lot of strings that are valid letter combinations according to some scheme or other, but which in reality are probably just typos or spelling errors.  A few false positives are probably preferable to false negatives.

*** T here a re n o spelling errors e en thees message. ***

here a re some example s:

almost a n y letter of the alphabet ca n b considered a word. Unfortunately, ac cording to the dictionary, the re are n o spelling error s  i n th es e thr33 sentence s.  i thin k thats 2permissive, and a bi g mis take.

In particular:

1. The dictionary includes abbreviations.

2. It includes many two-letter combinations of dubious significance:  cs, es, gs, and ks, cc, mm, ff, kl, nm.

3. All single letters of the alphabet are allowed as "words", while only "a" and "I" are actual words.

4. Many dubious plurals or third-person singulars.  For example: mis, reds, purples, thees, thats, whats, thous, withs, withouts, withins, ands.

5. Archaic words, or incorrectly spelled abbreviations.  Example:  ac
Indeed I often see errors after posting messages that the spell checker didn't catch. Mostly typos. Like where-were and your-you. For instance it does not recognize "its you business" as incorrect. Should be more grammar-aware.
That's a good point, but this bug report is not about grammar, which is a much more difficult problem than spelling.
No longer blocks: 119232
I think allowing single letters is the right thing, as well as allowing words with letters in them. MS Word also does this. A few of those words do look like problems.
(In reply to comment #3)
> I think allowing single letters is the right thing,...

Why?  Most of the time it is just a mistake.  For the rare use of a single letter as a word, the user can easily ignore it or add it to the dictionary.

> ... as well as allowing words
> with letters in them.

I certainly hope so.  You must have intended to say something else.
I meant digits instead of letters.

So what is this bug about? Are you just complaining about single letters? Your only other examples are grammar and less common words.

If we do any of this, I promise that there will be 10 bug reports from people complaining that it rejects valid words, or should ignore single digit words. There are already many such reports, which contradict your bug, and which I also generally WONTFIX.

There is no way to make everybody happy here, so I'm staying out of it (except by adding a pref which would be ridiculous and we're not going to do it). You can file a bug on OpenOffice which does the dictionary maintenance. The extent of our dictionary maintenance was adding a few tech related words that Firefox users are likely to use a lot.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
(In reply to comment #5)
> So what is this bug about? Are you just complaining about single letters? Your
> only other examples are grammar and less common words.

What kind of grammatical error do the following "words" represent?  What are the definitions of the following "words"?

gs, ks, mm, kl, t

I think they're just spelling errors.  I enumerated exactly the kinds of errors that this bug is about, and it is not about grammar and less common words.


> If we do any of this, I promise that there will be 10 bug reports from people
> complaining that it rejects valid words, or should ignore single digit words.

I don't think you'd get many bug reports for flagging words like "yous" and "t".  The noisiness of a few bug reports notwithstanding, it's probably a mistake to weight false positives more than false negatives.  False positives are easily ignored (or added to the dictionary), while false negatives make the writer look bad.

It's not much of a spell checker if it allows too many genuine errors to slip through.


> You
> can file a bug on OpenOffice which does the dictionary maintenance.

Good point.  Does that mean the we will not be substituting HunSpell?
mm = millimeter
(In reply to comment #6)
> Good point.  Does that mean the we will not be substituting HunSpell?

Hunspell is also from OpenOffice and as far as I can tell, they use the same English dictionary. I know the old dictionary continues to work in Hunspell, at least. Please correct me if there is a new dictionary somewhere.

(In reply to comment #7)
> mm = millimeter

Yes, drat, and cc, mm, ff, kl, nm are all abbreviations according to one scheme or another.  I should have checked all the 2-letter combinations more carefully.

However, the only dictionary justification I can find for cs, es, gs, and ks is that they are said to be plurals (in the same warped sense that "warpeds" is the plural of "warped" and "unfortunatelys" is the plural of "unfortunately" -- i.e., there are two warpeds in this sentence).

They've stretched the concept of a word so far it encompasses more typing errors than words.
You need to log in before you can comment on or make changes to this bug.