Closed Bug 34340 Opened 24 years ago Closed 22 years ago

Filter based on inclusion in specific address book

Categories

(MailNews Core :: Backend, enhancement, P5)

enhancement

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 162789
Future

People

(Reporter: BenB, Assigned: mscott)

References

Details

>------- Additional Comments From selmer@netscape.com 2000-04-03 13:12 -------
>David, this sounds similar to something I'd like.  How hard is it to be able to
>filter based on whether an address is in one of my address books?  It'd be
>really handy to be able to say "If [Sender [v]] is in [Personal AB [v]] then do
>something".
it would be cool, but won't make 6.0
Status: NEW → ASSIGNED
Priority: P3 → P5
Target Milestone: --- → M30
moving to future milestone.
Target Milestone: M30 → Future
QA Contact: lchiang → laurel
Very similiar but not a dupe of bug 34340 or vice-versa.
err woops, that was bug 62687
Actually, I think that bug is a dup, but it needs its elements included in this 
bug.
Depends on: 59368
Blocks: 66425
*** Bug 62687 has been marked as a duplicate of this bug. ***
Bug 62687 is a dup of this bug if we make the logical operator a menu too, so:

If [Sender [v]] [is [v]] in [Personal AB [v]] 

where the operators are "is" and "is not"
Please read the latest comments in Bug 71413. A great argument for implicating
this feature is developing.
*** Bug 116931 has been marked as a duplicate of this bug. ***
*** Bug 120523 has been marked as a duplicate of this bug. ***
*** Bug 120511 has been marked as a duplicate of this bug. ***
OK, how about being able to filter based upon "Not in any address book"?

Mail could be moved aside -- if you get mail from somone you trust, you'll move
their post to the appropriate folder and add their name to the AB.
yes, both would be useful.
adding self to cc list
Blocks: 134603
*** Bug 143871 has been marked as a duplicate of this bug. ***
*** Bug 148883 has been marked as a duplicate of this bug. ***
Exactly.  Chasing spam via blocked senders is useless.  They will always be new.

Being able to move messages from only known address, proves to be a real good
first cut at de-spam.

I've also found that getting rid of the content-type ks... to eliminate korean
junk email works very well.

Sure would be nice, to have some kind of documentation on the matching rules for
filters.  I've not been able to find it, in 1.1a.

Don
Blocks: 120160
I agree.  A major tool in the spam battle would be the filter rule:

    if [sender] [is not in address book] then _________________

A refinement is the ability to filter based on address book subgroups.
Adding self as a cc
over to mscott.
Assignee: bienvenu → mscott
Status: ASSIGNED → NEW
The ability to filter based on
  "if <sender> is not in my personal address book" move to folder "spam"

as a last filter rule (after collecting mailing lists before)
allows experimentally to get rid of almost 100% of all spam,
except these where they put yourself as <sender>.

PS: the address book MUST exclude the automatically "Collected Addresses"
for this to work of course!

PPS: This feature is implemented into a massively popular mail client
that RTM's in about 30 days, not as a filter but as a DISPLAY RULE:
  "Show only messages in this folder from... Unknown Senders"

Maybe this would be even better as a feature (Display rule instead of filter)
given than current Mozilla Mail cannot re-apply filters to mail that has
already being received :-/
While it is true that a message from a stranger is more likely to be spam, it is
NOT guaranteed. 

I get a *lot* of legitimate messages from strangers asking me questions about my
area of expertise.  

Journalists get lots of legitimate messages from strangers with story leads.

Any "info" email account that a business runs will get almost exclusively
messages from strangers.


This is why I think that a "scoring" system (see bug 151622) is important.
"Show messages from... Unknown Senders" does select almost entirely all the spams.

I didn't mean it wasn't including valid messages... only that it was likely to
include almost all spams at a very cheap cost in terms of implementation.

Also it has a very easy fine-tuning: when you get a "false positive"
(getting a valid email from someone who never wrote to you before)
you simply need to add that person in your address book (one click) and his/her
emails will never show up in the "Unknown Senders" again.

Looks like an easy implementation PLUS an easy concept for novice mail users
to understand, in terms of filtering strategy = win across the board.
(no wonder this idea is soon to be released in a mail program designed for
novice users)

Your scoring suggestion, on the other hand, is more for advanced mail
power users (such as both of us, admittedly).

However you'll never beat using Procmail scoring with over 700 KB of
SpamBouncer rules (what I use right now :-)   http://www.spambouncer.org/

So it's kind of a half-way in-between solution, not for newbie mail users
(fine-tuning the scoring system is too complex for them)
and not for the ultimate power-user, who will always use procmail...

PS: you will also get false positive with a scoring system...
with any NON-RBL/DNS based system you will get false positives, and will have
to manually mine the "spam" folder to rescue valid, legitimate emails.

Anyone suggesting an RBL feature for Mozilla ? Again, this may be more
efficiently done at the MTA level, or if you can't via Procmail with
SpamBouncer rules.
Yes, almost all spam will end up in the "strangers" bucket.  

And yes, it takes only one click to rescue someone from the "strangers" bucket. 

However, I worry that a "strangers" bucket by itself will be either dangerous or
ineffective.

+ If I get lots of legitimate messages from strangers, I'll have to constantly
look through my strangers bucket.  This means looking at a lot of spam, which
was exactly what I didn't want to do!

+ If I get almost no legitimate messages from strangers, then I'll eventually
start ignoring the strangers bucket -- and so miss the occasional legit one that
comes through.

****

Real-time blacklists give *lots* of false potitives!  My husband had a lot of
his sent messages blocked by ORBS clients because his ISP ended up on the ORBS
list.  I guarantee that my husband's messages were not spam.

****

I believe that scoring systems can give much better accuracy than simple
pass/fail filters.  I recently tested SpamAssassin Pro on a test corpus.  Of 829
spam messages, it found 810 of them.  Of roughly 200 non-spam messages, it
flagged about 10.  

Note, however, that those ten-ish that were flagged as spam were ambiguous even
to me: they were semi-commercial mailing lists that I cared about mildly at the
time that I set up my test corpus.   In fact, I *now* consider them spam. 
Whitelisting them would  have been easy, so the second time around, I would have
had zero false positives.

I have never seen such accuracy with strict pass/fail filters.


**********

Yes, scoring filters are probably too complicated for Joe Random User to  set
up.  This is why I think it is critical that users be able to import/export sets
of filters easily, as I mentioned in bug 151612.
The tone of the 9/9/02 comments worries me a bit.  There is a lot of valid discussion of what power users versus new users need, as well as discussion on the merits of various spam filtering methodologies.

I think these valid comments are overlooking a key fact.  All mail users are different.  We don't need to try and implement the "best" solution.  Instead lets implement a series of solutions.  The end-user can pick the one they want/need.  If programming resources are stretched so thin that we need to prioritize, this lets do so.  If not lets implement as many of these ideas as we can when the resources become available and the blocking bugs are eliminated.

A discussion on what, if anything, should be on by default however is very appropriate, but not necessarily relevant to this forum.
Brian, thanks for your calming words.  I'm actually in violent agreement with
you.  ;-)    

Yes, every user is different.  That was actually why I jumped in: while some
people will benefit from a "strangers" bucket, I was trying to point out that
this will be dangerous for a lot of users.

(I apologize if my previous comments appeared unfriendly; that was not my
intent.  Please read my comments with an emotional tone of passionate concern
and a little bit of worry.)
Kaitlin,

The calming effect of my words was actually secondary.  I was trying to emphasize that right for everyone, or occasionally "dangerous", the functionality has validity and should be implemented.

So once again it seems we are mostly in "violent" agreement.  Question is can this get coded?

bex
it has been coded. We're working on a plan for checking it in sometime after 1.2
is  done.
this is a dup of a bug which is almost finished. 

*** This bug has been marked as a duplicate of 162789 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
shouldn't blocked bugs get moved to bug 162789 too then?
This bug was filed first, has more votes, has a longer CC list, and has a more
complete list of blocked bugs than bug 162789, but the other bug has a patch
attached.

I'd suggest reading http://www.paulgraham.com/spam.html

as a starting point to fight spam.
There are many interesting ideas in there that could be picked up.

Among them:

Mail User Agents could have two buttons, "delete" and "delete as spam"
that could in turn be coupled into training/feedback on the anti-spam filters.

Note that Yahoo!Mail already has this feature ("This is spam!" button
when reading email)

Another suggestion is:

Content-based spam filtering is often combined with a whitelist, a list of
senders whose mail can be accepted with no filtering. One easy way to build such
a whitelist is to keep a list of every address the user has ever sent mail to.
If a mail reader has a delete-as-spam button then you could also add the from
address of every email the user has deleted as ordinary trash.

overall there are many lessons to be learnt from this article.

PS: one program already implementing the bayesian filter is bogofilter,
http://www.tuxedo.org/~esr/bogofilter/

I would suggest reading http://lwn.net/Articles/9185/
for a comparison on how effective such "learning" approaches can be
long-term compared to hard-coded spam filters such as SpamAssassin
(or SpamBouncer)

However I'm not sure this is the best place to start this discussion
(should a separate bug be opened to discuss this?)

Thanks -- N.

marking verified as a duplicate
Status: RESOLVED → VERIFIED
No longer blocks: 66425
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.