Open Bug 183171 Opened 22 years ago Updated 2 years ago

Junk Mail (Spam) filters should have a message size limit

Categories

(MailNews Core :: Filters, enhancement)

x86
Windows 2000
enhancement

Tracking

(Not tracked)

People

(Reporter: jah, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021202
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3a) Gecko/20021202

The Bayesian junk mail (Spam) filters should have a user-changeable message size
limit. The reasons for this:
- Most spams are relatively small messages. Otherwise the spammers would have
trouble sending millions of them.
- Scanning large messages takes up lot of time and system resources. And often
this is done for nothing as large messages are typically not spam.
- With IMAP every new message has to be downloaded just to evaluate it for spam
status. This is particluarly bad for slow connections and/or lots of large messages.

By having an option for skipping messages larger than x kbytes the spam
classification process would be quicker and would not use so much network
bandwidth with IMAP.


Reproducible: Always

Steps to Reproduce:



Expected Results:  
Skip scanning messages over x kbytes and just classify them as "not junk".
Make this an option that user can enable and tune themselves but provide a
reasonable default value (50/100k size limit is probably a good start).
It would be a bad idea to skip messages over a certain size. Then messages with
embedded images or (worse) viruses will not be classified.

A preferable idea (for me) would be to have a user configurable size that will
limit the amount of data in a message that is to be downloaded - e.g. "Download
first xxx K of message."

With a default value of 10k Mozilla will have enough data to accuratly classify
the message while not wasting time downloading the entire attachment.
I like the partial message download idea as well and it would work nicely with
IMAP. I assume that the junk filter now does an IMAP FETCH BODY[] when it
downloads the message for classification. The desired size limit would be
implemented by changing this to FETCH BODY[]<0.n> where n is the desired size limit.
Confirming as a valid RFE.
Status: UNCONFIRMED → NEW
Ever confirmed: true
mass re-assign.
Assignee: naving → sspitzer
*** Bug 219413 has been marked as a duplicate of this bug. ***
Actually, in bug 219413 I had suggested a different mechanism. Since most spams
are small in size, how about simply *not* running the spam filters on messages
that are lager than a (user configurable) size? This way, if there's a 1M
message, it's certainly not spam, and there's no point in downloading any part
of that message to run spam filters on..
> Actually, in bug 219413 I had suggested a different mechanism. Since most spams
> are small in size

This is not always true. Everyone gets spammed in his own way :-). About a half
of my spam is in range of 50-200 Kb. And overnight I receive about 10-15
messages of such a size. I use dial-up connection, so it takes the mailer a long
time to check them all in the morning.

More than this, it's becoming more and more common among spammers to send text
ads as a single image exactly in order to fool bayesian-like filters.

> This way, if there's a 1M
> message, it's certainly not spam

Why not? Though rarely I actually did encounter chineese spam of this size.
In regards to comment #6 and #7, and to some extent this bug itself, those
approaches will encourage spammers to send larger and larger spams to defeat
those filters.

What I'd like to see is simply the ability of the spam filter to ignore
everything except the text/plain and text/html sections of the email when
scanning.  This will be a great win for us users on IMAP since it won't download
the entire message just to scan a sub 1kbyte text/hmtl section (well, plus
headers).  On POP I think you are probably stuck downloading the whole message
though, but at least it won't encourage behavior that exacerbates the problem. 
Maybe add the ability to add other attachment types (in case they send spam as
word docs, pdfs, rtl, etc. in response).
Product: MailNews → Core
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
Product: Core → MailNews Core
rkent, I've not tested a huge message to test the pain. But in the current scheme, which has certainly changed since 2002, would there be solid rationale to have an upper bound even if it's outrageous, like 20-30mb?
Yes there is a rationale for having an upper limit. Probably would not be hard to do, either.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.