[RFE] filters that score messages (anti-spam)



MailNews Core
15 years ago
5 years ago


(Reporter: Kaitlin Duck Sherwood, Unassigned)


Dependency tree / graph

Firefox Tracking Flags

(Not tracked)




15 years ago
It would be useful for filters to be able to add and subtract points from a spam
score.  Ideally, I'd like to be able to take actions based on the score (e.g.
"delete all messages whose spam score is less than -100), but that depends upon
multiple actions (bug 13145).  However, even just being able to sort the inbox
by score would be useful.  Or perhaps have a View that hides all messages with a
score lower than -100.

There are *LOTS* of things that *usually* indicate that something is spam, but
unfortunately, not always.  
+ Messages that don't have me in the To or CC line are usually spam.
+ Messages with embedded images are usually spam.
+ HTML messages are usually spam.
+ Messages with more than five people in the To or CC line are usually spam (see
bug 120606).
+ Messages that contain "enhance" in the subject line are usually spam.

However, none of these are failsafe:
+ I *want* to be in the BCC line for some things, e.g. party announcements.   I
don't want to get all the "Yeah, I'll be there!" (erroneous) reply-to-alls.
+ I have a friend who embeds a gif of his signature in all his messages.
+ See above.
+ See bug 120606.
+ The VP could send a message about needing to enhance revenue.

If you can add/subtract points, then you can get much more precise spam
filtering.  If you can easily whitelist at the same time (see bug 34340 and bug
120160), then you can get deadly.  I wrote a Visual Basic macro for OL2002 to do
this, and it was extremely effective (much more effective than Outlook's
built-in junk filters).  SpamAssassin scores messages in much the same way, and
it too is leathal.  

I hear someone off in the distance saying, "but people would never take the time
to tweak the scoring like that."  True, *most* people wouldn't, but some would
-- and if there is per-filter import/export (see bug 151612), then it gets easy
for people to pass the spam filters around.  Heck, post it on a Web page and you
get close to resolving bug #126688.  Or just use SpamAssassin's published penalties.

Now, this idea was discussed at length in npm.mail  at length in Jan 99 (see
message ID <369A4CDC.5DE16593@netscape.com>).  It was discarded partially on the
grounds of inadequate resources for 1.0 but partially on the grounds that it
would be hard to explain to people.  I understand that resources are probably
still an issue (though I may be able to help).  I think, however, that if you
explain it in terms of an integer score, it will be much easier to understand
than if you talk about it in terms of a percentage.


15 years ago
Ever confirmed: true

Comment 1

15 years ago
What about a Mix of 
1. Spamscoring http://spamassassin.taint.org for tagging Mail as Spam
2. Reportingcommunity http://www.cloudmark.com
3. Integration of reporting per right mouse click to http://spamcop.net

That would result in the most impressive Spamfightingsolution ever seen.

Comment 2

15 years ago
Nico, those are good ideas, but really should be in separate RFEs.

This RFE requests the filter action
      add/subtract ___ points from the spam score

What you requested is really one new filter conditions:
      when {external third party} says this is spam

and a UI feature that composes and sends a message.


Comment 3

15 years ago
Added dependency on multiple actions --  for scoring to work properly, you need
to be able to accumulate penalties.  For example, you might dock a message 20
points for being from an unknown address, 50 for having an embedded image, 1000
for having Viagra in the subject line, etc.  This means multiple actions, bug 13145.

There are really two sub-RFEs that I want:

+ Filter action
     [add/subtract] ____ points from spam score

+ Filter condition:
     if spam score is [greater than/less than] spam score
Depends on: 13145


15 years ago
Depends on: 169557
mass re-assign.
Assignee: naving → sspitzer

Comment 5

14 years ago
Now having the bayesian span recognition, is this still needed?

Comment 6

14 years ago
> Now having the bayesian span recognition, is this still needed?

It's not only about having mails devided into spam/not spam, but also into
spam/maybe spam/maybe spam if I don't have time/maybe spam if sent before
x-mas/no spam and the like.

Comment 7

14 years ago
Anti-spam is the biggest reason for scoring, so yes, the high-order bit has been
cleared.  ;-)

There *is* a lesser advantage to having scoring: prioritization (assuming you
can sort by score).  For example, you might add 10 points if the sender is in
your address book, 10 points if you are on the TO line, 10 points if you are the
only person on the TO line, 10 points if it is from your boss, 200 points if it
is from your spouse, etc.

Comment 8

13 years ago
I'm not sure adding/subtracting is the point here is it? Admittedly its likely
part of the solution.

The point is that there should always at least 3 categories of message:

- assumed spam
- assumed not spam
- possibly spam but the user needs to pass judgement as part of the training process

Otherwise I don't see myself trusting Mozilla to trash my spam. I'd like to be
able to.

The obvious way of achieving and controlling this is to give messages a score
and allow the above categories to be configured on that basis.

Comment 9

13 years ago
Jim: this is about custom message filters. It's not related to the Bayesian junk

Comment 10

13 years ago
(In reply to comment #5)
> Now having the bayesian span recognition, is this still needed?

Yes, I can think of cases where e-mail messages need to be basically immune from
being classified as junk mail.  In particular, e-mail that is generated by a
ticket tracking system (or other process flow system).  These messages typically
have a very specific subject line format, "[ticket: XXXX-YYYY-ZZZ]" and come
from a specific address "ticket-request@example.com".  So I should be able to
create a rule that simultaneously notifies me via a pop-up window and also tag
those messages as being immune from being tagged as spam/junk.

Because until you get the bayesian filter properly trained, there's a high
probability that it will tag at least some of those messages as junk.

Other cases would be filters that move mailing list messages into a sub-folder
(a moderated list is highly unlikely to have spam, yet the junk message filter
will tag some messages as spam until it is properly trained).

The reason that having messages improperly tagged as spam by the bayesian filter
is that they forget where they came from, and untagging them and moving them
back where they belong is a manual process.  (There's a bug 208197 which
addresses this.)  So when the filter screws up, it's the user who has to clean
up the mess.

Other thoughts, there are really (3) possibly actions:

- "don't junk-filter this message" (which is an "ignore" action), it acts only
on messages that match the filter, but doesn't retrain the bayesian filter

- "always flag this as junk" (and train the bayesian filter to see this message
as junk).  Maybe call it "junk this and junk other messages like it".

- "always flag this as not-junk" (which differs from "ignore" in that it should
auto-train the bayesian filter on this message). Or "never junk this and never
junk other messages like it"

Comment 11

13 years ago
AIUI, you can actually do that right now, without needing a point-threshold
system. User-defined filters always run before the Bayesian filter. So you can
e.g. create a filter for "Sender: ticket-request@example.com" with a "Set junk
status to: not junk" action, and it should work.
Product: MailNews → Core

Comment 12

12 years ago
The other thing that would be nice along with this would be to be able to do
some recognition of numbers in a header.

For instance, CRM114 returns headers like this:
X-CRM114-Status: Good  ( pR: 2.1939 )

If I could write a filter that would say "if the X-CRM114-Status header has a
number less than -500, delete it summarily" (the value ranges from -999 to +999)
then that would be very, very nice.  Also the ability to sort things by
spamminess would be awfully nice.  (Perhaps the ability to "label" with a
gradated colour?)
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters


9 years ago
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.