Open Bug 294077 Opened 19 years ago Updated 2 years ago

use spelling checker as input to bayesian spam filter

Categories

(MailNews Core :: Filters, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: danm.moz, Unassigned)

References

Details

Suggestion:

I think you should hook up the spelling checker to the Bayesian filter. Some
(configurable) small proportion of unrecognized words should weigh heavily in
the Junk category. I believe that such a rule alone would have caught 3/4 of the
last 20 messages currently in my Junk folder. It's even adaptable in its own
way. If I find improperly junked messages from colleagues containing red
underlined words, I should add those words to my dictionary, if my colleagues
can spell.

Supporting material:

I have the impression that Thunderbird's junk filter doesn't work terribly well
any more, because spammers have caught on. One spam source sends me a lot of
junk consisting of two and three letter word fragments that get built into human
words by the HTML display engine. The fragments are short and somewhat random,
limiting the usefulness of that part of the content to the current filter. A lot
of junk uses 733t and randomly inserted digits to conceal the junk words. I get
a lot of junk in languages and character sets I can't even read. In the past six
months, at least half of the junk I've been sent contains a large block of
random words, or stuff that looks to have been plagiarized from pulp lit.

That last point is a spammer's tool designed to confuse filters, and to trick
people into reducing the future effectiveness of their filter by clogging it
with random input. The spelling checker won't help with that scheme, but the
other schemes in the above paragraph are all susceptible. I include the last
example only to further illustrate that dishonest businessmen are intentionally
fighting spam filters, and they have techniques reasonably effective against the
one in Thunderbird.

I think the judgment of a spelling checker has a lot of potential to help, at
least for people with friends and business associates who can spell. Its use of
course should be a configurable option. Maybe all by itself it'll encourage
people to learn how to spell. In order for spammers to catch on to this defense,
they must use actual recognizable words, not fragment subterfuge. That will
force an increase in conformity, aiding the normal operation of the filter. And
once the spammers have caught on, at least the junk I get will be a little less
offensive to the eye.

A nice grammar checker would make the mail I read even less offensive, and also
should catch the messages spammers craft with blocks of random words
specifically to trip up naive Bayesian filters. What awesome excuses I could
have for not going to meetings! "I'm sorry boss. I didn't get your memo because
according to Thunderbird you write at a third grade level."

|============== Adaptive Filter Controls ==============|
|                         ----                         |
|-------------- I wish to read mail from --------------|
|                                                      |
|       ^                                              |
| |----|O|-------------------------------------------| |
|       v                                              |
|                                                      |
|       |         |         |         |         |      |
|  Philosophers   |    Half-witted    |      Spammers  |
|                 |       Poets       |                |
|            Illiterate           University           |
|            Court Fops              Boys              |
|                                                      |
|======================================================|

Maybe just the spelling checker for now.
Related to bug 222014?
QA Contact: filters
Product: Core → MailNews Core
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.