spam blocking filters features tracking [meta]

NEW
Unassigned

Status

MailNews Core
Backend
P1
normal
19 years ago
7 months ago

People

(Reporter: (not reading, please use seth@sspitzer.org instead), Unassigned)

Tracking

(Depends on: 10 bugs, {meta})

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments)

Block sender
Algorithmic blocking
Integration with RBL or NoCEm
Whiteboard: HELP WANTED
Summary: spam blocking filters features → [HELP WANTED]spam blocking filters features
Target Milestone: M15
marking m15.

Updated

19 years ago
QA Contact: lchiang → laurel

Comment 2

19 years ago
*** Bug 10887 has been marked as a duplicate of this bug. ***

Comment 3

19 years ago
Bulk-resolving requests for enhancement as "later" to get them off the Seamonkey
bug tracking radar. Even though these bugs are not "open" in bugzilla, we
welcome fixes and improvements in these areas at any time. Mail/news RFEs
continue to be tracked on http://www.mozilla.org/mailnews/jobs.html

Comment 4

19 years ago
Reopen mail/news HELP WANTED bugs and reassign to nobody@mozilla.org

Updated

18 years ago
Keywords: helpwanted

Updated

18 years ago
Summary: [HELP WANTED]spam blocking filters features → spam blocking filters features
Whiteboard: HELP WANTED
Target Milestone: M15

Comment 5

18 years ago
moving out there.
Target Milestone: --- → Future

Comment 6

17 years ago

*** This bug has been marked as a duplicate of 71413 ***
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → DUPLICATE

Comment 7

17 years ago
Håkan,

I don't think this is a dup. This bug is talking about adding intelligent spam
filters to the product that would automatically add messages to a junk mail
folder.  The other bug is talking about adding a feature that lets the user
manually add a sender or domain to a block list.  They have similar results but
are different in how they do things.

I'm going to reopen.  If you feel differently, let's discuss.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---

Comment 8

17 years ago
Just reread the bug.  This does mention block sender, so that is a dup of the
other bug, but the other things mentioned like Algorithmic blocking
andIntegration with RBL or NoCEm are not.

Comment 9

17 years ago
Isn't RBL a server-side solution?

Comment 10

17 years ago
Marking dependent on bug #73075 (NoCeM support). I suppose algorithmic blocking
should get its own bug too and be marked as a dependency, but I'm not quite sure
what is meant by "algorithmic blocking".
Depends on: 73075

Comment 11

17 years ago
This may actually be a dup of bug #66425.

Comment 12

17 years ago
right click popup menu should show these options:

  Move To
  Copy To
  Filter To <--

The "Filter To" option would filter all mail from this sender to the selected
folder (or trashcan).

Comment 13

17 years ago
In fact, it should be a big button, right next to the "SEND BUTTON": "BLOCK
FUTURE EMAILS BY THIS SPAMMER".

/iaw

Updated

16 years ago
Blocks: 101001
In response to comment 9, RBL is usually used by servers of ISPs to actually
block mail, however on their site I found a link to a perl script to be used
with procmail to block spam using their (and two other) services. So, it would
appear that they don't mind individual users using their service.

I'll attach the perl script just as a reference. Basically, I think one just
does a DNS lookup of the mail origination using a server from mail-abuse.org and
looks at the output.

 
Created attachment 60930 [details]
Perl script to check if sender is in RBL (a spammer)

This perl script by Bjarni R. Einarsson checks to see if the send of a mail is
in the mail-abuse.org list or the orbz.org list. It also takes as input a file
of good, but normally blocked IPs.

Comment 16

16 years ago
In response to comment #13:
That would be it:
*) Analyse the "Received" Header lines to find out the originating IP.
*) remove further mails with this IP in the "Received" header 
*) Do a whois lookup to find a responsible contact for this IP
*) send a spam complaint to that contacts

Updated

16 years ago
No longer blocks: 101001

Comment 17

16 years ago
In reply to comment #9: well RBL is based on DNS lookup (you do the lookup on
the DNS of the chosen RBL service.. e.g. relays.ordb.org or
relays.osirusoft.com)... DNS lookup can easily be done also by Mozilla itself..
why not?
Having a "move if" - "listed in RBL given:" - "relays.ordb.org" would be quite
useful.
Shouln't be too difficult, but I should take a look at filter code before
judging on that...

Comment 18

16 years ago
dns lookups are async and the filter code is synchronous, so there's a big
problem right there. Not involvable, but not easy either.
mscott's been expressing interest in improving our spam fighting features.
Assignee: nobody → mscott
Status: REOPENED → NEW

Comment 20

16 years ago
Cloudmark offers a spam-filtering plugin for m$ outlook. the plugin relies on a
central database maintained by the plugin's users.
the plugin adds a 'block' and 'unblock' button to the email client.

might me interesting to integrate support in mozilla or 'evangelize' cloudmark
to do it themselves.

check out http://theregus.com/content/6/25317.html for more technical details or
visit http://www.cloudmark.com/

also, is there a spam filtering tracking bug in bugzilla? there are lots of bugs
& ideas but no coordination as it seems to me. i believe spam filtering is an
important feature of a modern email client, and a feature that would make me
switch email clients.

phil

Comment 21

16 years ago
I have put together a few filters which should block at least 80-95% of spam.  I
hope it BLOCKS all spam, but thats just hopefulness.  It uses Mozilla mail's
filters, and it has several actual filters, each with their own
subfilters/rules.  In this way its "modular" (actually not really, just more
customizable) in that if you actually subsribe to say "porn mail" you can turn
off the set of filters that targets porn mail.  I was disappointed that the
filters did not allow me to scan the complete headers, however I think my
"filter package" is ready for use.  I've named it "SpamSlayer", and it needs an
easy installer package.  Since its a ruleset, it goes in the mail folder,
located in slightly different paths on different computers.  Usually its
something like:
c:\windows\application data\mozilla/profiles/default user/(random
string).slt/mail/(server IP address (don't know if this is the pop3 or smtp,
since both of mine are the same))

If someone could setup an installer that could find this dynamic location and
replace the current ruleset with the SpamSlayer ruleset, or even better, APPEND
the SpamSlayer ruleset to the current one (so current filters aren't overwritten
and lost).

Comment 22

16 years ago
Forgot to say this, but if anyone can work with me to setup an installer (and
I'll be releasing new versions if neccessary) please email me
6tsh7a001@sneakemail.com

Comment 23

16 years ago
OK everyone...here's the URL for the SpamSlayer project, which I described above:
http://spamslayer.mozdev.org

For right now, it should solve our spam problem.

Comment 24

16 years ago
why not integrate with something like a SpamAssassin? I believe SA is recognized
as the best tool out there today for this sort of thing. Some of it could be
made relevant on the client-side:

http://spamassassin.taint.org/

For example, one aspect that would be fairly obvious to integrate would be Razor:
http://razor.sourceforge.net/ 

SpamAssassin uses Razor and other rules to eliminate spam. It is the best out
there, open source, so maybe there is someone on their project who would be keen
to help integrate it with Mozilla in some fashion.
Spamassassin is a Perl module that one can very easily pass a message to and
receive its "opinion" on whether the mail is spam or not. In fact, I've done
this with a stand-alone IMAP client to deal with the mail as I see fit. 

Razor can be installed as one of the inputs into Spamassassin.

I don't if it would be possible to integrate SA into Mozilla (maybe a plugin or
something) but I can certainly vouch for its effectiveness.

Comment 26

16 years ago
I definately would like to see SpamAssasin (and/or Razor) integrated into
mozilla mail.

However, with the not-so-important rating that this bug has, I don't think it
will be getting any SpamAssasin or Razor integration anytime soon.

Comment 27

16 years ago
reassigning to dmose and raising priority.

we need to start working on anti-spam features for Mozilla.  Maybe someone cc'd
on this bug knows if there's a better bug out there to serve as a Meta bug.  If
not, let's make this one and start adding other bugs as dependencies.
Assignee: mscott → dmose
Keywords: helpwanted
Priority: P3 → P1
Target Milestone: Future → mozilla1.2beta

Comment 28

16 years ago
file a new bug on the spamassasin integration.. that way if someone implements
new spam blocking which DOESN'T use spamassasin, then this bug can be marked
fixed without causing a big bruhaha
Created attachment 95328 [details] [diff] [review]
Checkpointing: msg filtering, spam-assassin: part 1: diffs

I'm just checkpointing here, this is far from being ready to land.  However, it
includes the beginning of a straw-man interface for more generic message filter
(lots of work still required on that, as well as the beginnings of an
implementation of a filtering plugin which can read and use spam-assassin
config files.
Created attachment 95330 [details] [diff] [review]
Checkpointing: part 2: non-cvs diffs

Comment 31

16 years ago
I'm not 100% sure how this fits into your filter plugin architecture, but here's
one thing I was thinking:

Right now there is only one type of filter, that uses the generic filter dialog
to match on headers, etc. It would be cool if you could register your own type
of filter, such that you could create one or more of these other types of
filters.. so in your filter list, you might see your standard list of filters,
but one of them might be "Spam Assassin" or something.

Since the filter plugin is merely a specific type of filter, you could have
multiple instances of the plugin, such as 2 spam assasin plugins - one that
moves high-threshold spam (i.e. stuff that spamassasin most certainly knows is
spam) to one folder (maybe the trash), and low-threshold spam (i.e. stuff its
not sure about) to another (maybe a 'might be spam' folder)

Each filter could have its own settings stored in the filter rules.dat file, so
that you could store instance-specific data.

Then, you could do stuff like select the filter and edit it, but when you edit
it, chrome specific to that filter would appear - i.e. like the spam assasin dialog.

As for actually doing something with the filter, there should be lots of options
beyond just moving it to a folder, etc. It would be nice if there were some
generic interface like nsIMsgFilterSink where you could do macro-operations on
the message, such as forward, reply, maybe edit it and reply, etc. SpamAssasin
would mostly call things like sink.moveMessageTo(spamFolder) and so forth.

JavaScript filters could be implemented in very much the same way. You could
have one or more JS filters.. the JS filter would be a function that gets called
with the message headers, maybe other details about the message or message
parts, and the message sink object. Then the implementation of this function
would perform operations on the sink. 

Whitelist filters would also work this way - you could have one or more
whitelist filters that correspond to an entire address book, or maybe just a
mailing list. Some smart wizard could even set up your whitelist filters for
you. Each whitelist filter would correspond to a different set of people in your
address book, and each one could have a specific action associated with it.

Comment 32

16 years ago
I think this bug is missing the big issue:  we need to include the mozilla user
community in feedback.  This is so easy to do: a button next to the "STOP" sign
would send a message to a user-selectable anti-spam site, with the base
information of the particular email being flagged spam.

This way, the anti-spam site could much faster detect new spam schemes.

Implementation Cost: Low.
Potential Value: Very High.

/iaw

Comment 33

16 years ago
That's Yet Another Bug (and another filter type) and you should file a bug on
that.. and your cost analysis seems quite weak.. sure, the client-side work
doesn't sound hard, but think about all the details beyond just adding the
button in the UI. I mean, who runs this service? are there well-known anti-SPAM
services out there? how does the client know to block future SPAM? how does it
match current e-mail to spam on the service without downloading a whole bunch of
spam from the service and without compromising the users privacy? How does one
handle failure (spam service unavailable, etc) However its done, it sounds
expensive to me.

But wait! don't answer me here. File another bug, make it dependent on this one.
alecf: in fact, this quite similar to what I had in mind, but hadn't yet written
down.  If you've no objections; I'd like to start with your text and whip it up
into a strawman proposal in HTML that we can go with.

The spam-assassin bits that I've got running so far are implemented exactly
along the lines you suggest.  In particular, I've made a simple
nsIMsgFilterPlugin interface, and modified the IMAP message header fetching code
to call out to it once for each message.  I've implemented this interface for
spam assassin as a JS component, because much of the spam-assassin stuff is
regexp based.  Once it decides that something is spam, it just usings the
existing nsIMsgFilterHitNotify::ApplyFilterHit info to tell the IMAP (or POP or
whatever) code how to deal with the hit.
Status: NEW → ASSIGNED

Comment 35

16 years ago
sure - go ahead and use whatever part of that you need!

Comment 36

16 years ago
Check out bug 163188 (Bayesian filtering - very cool!) - looks like this bug
should be dependent on it?

Comment 37

16 years ago
http://www.mozilla.org/mailnews/specs/filters/#Junk

After discussing w/putterman, idea of what UI might look like.
Maybe "Always accept messages from people in my AB" has a dropdown to select a 
specific AB. That AB, becomes a White List of sorts.

Comment 38

16 years ago
A thought re "Always accept messages from people in my address books":
1) At the very least, this should exclude the 'collected addresses' address book!
2) More flexibility would be possible if this was: "Always accept messages from
people in the following address book/group:" with a drop-down which enables
people to select a particular address book, or just a particular list within an
address book
3) Alternatively, this could be "Always accept messages from people in my 'white
list' address book" - this address book would be a new top-level book, at the
same level as 'Personal Address Book' and 'Collected addresses'.

Also, I think there's a bit too much granularity on the 'sensitivity' slider -
nobody is really going to spend enough time tweaking and analysing to see the
difference between e.g. 35 and 36. I suggest a 0-10 scale - enough to get
reasonably fine degree of control without taking too much trial and error to
find your 'optimum' level.

Other than that, looks pretty good.

Comment 39

16 years ago
>2) More flexibility would be possible if this was: "Always accept messages from
>people in the following address book/group:" with a drop-down which enables
>people to select a particular address book, or just a particular list within an
>address book

Agree a dropdown address book selector would be better.

>Also, I think there's a bit too much granularity on the 'sensitivity' slider.

Agree.

Comment 40

16 years ago
Make mozilla intelligently block spam mail is a good feature.
Blocks: 168902

Updated

16 years ago
Depends on: 169557

Updated

16 years ago
Depends on: 167561

Updated

16 years ago
Depends on: 169638

Comment 41

16 years ago
I suggest also adding a list of "non-spam domains" to the screen. This way
people can specify domains of sites / companies they work for never to be
filtered as spam.

Updated

16 years ago
Depends on: 156744

Comment 42

16 years ago
bug 156744 seems like an easy solution for this bug.  TMDA is an open source
project on sourceforge, and so all the code is there, you just would have to
"port" it probably.  It also is designed so that it absolutely blocks 100% of
spam with virtually no false positives.

Comment 43

16 years ago
Regarding Comment #38: How about turning the interface around - would it be so
much more difficult to add a "whitelist" or "always accept email" property in
individual addressbook cards, and in the top-level properties of an addressbook
- especially in ldap-based addressbooks that are used for corporate
addressbooks!  The idea of no-spam domains much like we currently have
uses-HTML-domains sounds like a good idea.

Speaking as a corporate site, I would like to make sure that a spamassin-style
filter not be the ONLY filter available because, as someone else pointed out, it
could present significant privacy/security issues.  bug 163188 seems like a good
alternative method.

Also, does the SA software use any special ports that might get blocked by a
firewall?

Comment 44

16 years ago
we're adding a whitelist feature, based on the personal address book of your
choice, IIRC.

Comment 45

16 years ago
Bienvenu - what if you have multiple addressbooks that you want to function as
whitelists?  By setting a property in the addressbook, you have more flexibility
than just selecting one AB (which it appears is all the drop-down would allow).
 For that reason you wouldn't be able to (e.g.) set both your PAB and an
ldap-based AB as being on a whitelist.  So maybe instead of a drop-down menu,
you have an edit menu that allows you to check/uncheck your ABs, or even a
button (that greys out the edit menu) that says all ABs except collected are
whitelists.

Or are you saying that design has moved past the simple dropdown mentioned
above, and you can select _multiple_ "personal address book of your choice"?

Comment 46

16 years ago
I don't think domain-based whitelists would work very well because spammers
often fake email to you coming from your own domain or even from your own
address. (Obviously if you are sending a spam to john@foo.com it's not hard to
have your spambot mark it from jane@foo.com.) Address book-based whitelists are
much better because then the spammer needs to know both your address and the
address of a person on your whitelist--a much-harder (although not impossible)
combination.

Comment 47

16 years ago
Hi,
I'm currently using 1.1 so apologies if this has been dealt with in 1.2alpha...I
have various filters set up which look at the message body. One string I use is
'You received this email because you signed up with' but this wasn't caught
today as the message source is base64 encoded with content type of text/html -
mozilla seems to decode and display that fine but the filter is run on the
non-decoded version.
Thanks. 

Updated

16 years ago
Depends on: 71413

Comment 48

16 years ago
Looks like we missed our milestone on this bug.  What kind of system are we
aiming at implementing?  I've scanned the comments and can't seem to see a
unified goal.

Updated

15 years ago
Depends on: 163188
Depends on: 179966
*** Bug 179984 has been marked as a duplicate of this bug. ***
Depends on: 179984
Depends on: 179997
Depends on: 179999

Updated

15 years ago
Depends on: 180004
Depends on: 180010
Depends on: 180029

Updated

15 years ago
Depends on: 179162
Depends on: 180153

Updated

15 years ago
Depends on: 180167
Depends on: 180215
Depends on: 180231

Updated

15 years ago
Depends on: 180119
Depends on: 180477
Depends on: 179012

Updated

15 years ago
Depends on: 120599

Updated

15 years ago
Depends on: 180857

Updated

15 years ago
Depends on: 181193

Updated

15 years ago
Depends on: 181394

Updated

15 years ago
Depends on: 181531

Updated

15 years ago
Depends on: 181953

Updated

15 years ago
Depends on: 182381
Depends on: 182386

Updated

15 years ago
Depends on: 183613

Updated

15 years ago
Depends on: 181534

Updated

15 years ago
Depends on: 182109

Comment 50

15 years ago
Comment 46 says that domain whitelists are bad because spammers fake the sender
as being from your domain, and says "specific email address" whitelists are 
better. 

However, the whitelist discussion ignores the fact that the most common address
that spammers spoof is the addressee's address (i.e. *you*), and that's perhaps
one of the most common ones that people would want to be in their whitelist (I
send mail to myself all the time). So something more complicated would seem to
be needed.

Also, this whole feature doesn't seem to be working for me in the 2002120604
build. Nothing is marked as spam (though I have logged several emails as spam,
and bunch of them as "non spam"). 

Also, I can't seem to find my junkmail.js file, there doesn't seem to be
anything in my junk filter log (though I did turn it on), and I don't see a file
anywhere that obviously contains the Bayesian parameters. Is there any
preliminary documentation/discussion on this kind of stuff I could look at?

I'm really happy to see this feature, BTW. 

Comment 51

15 years ago
Check out "training.dat".

Comment 52

15 years ago
Two more thoughts on filtering:
1)  Instead of/in addition to a white book, how about andding an 'automatically
accept email from this person' checkbox to each card in your address book?
2)  Allow filters to work on subfolders.  I just submitted this as bug 184080
before reading this thread.  In brief:  Filters A, B, C... filter email to a
quarantine folder, deleted and so on.  Filter Z searches through your
quarantine, deleted, ... and based on it's criteria (such as friends email
addresses) moves mail back to your inbox.  
It's another way of implementing a white list and catches mail you want to read
that may otherwise be deleted by your overzealous mail filters.

Comment 53

15 years ago
BTW, I don't know how feasible this would be with the current Bayesian filter
mechanism, but it would be really nice (for the curious among us if for no other
reason) if the filter log indicated *why* the spam was filtered rather than just
indicating that it was.

The simple fact of the mail being filtered seems adequately conveyed by the junk
mail icon and whether it's been moved into your spam box. So I'd say that the
current filter log is pretty useless. 

Comment 54

15 years ago
BTW, please see the related bug #187044, suggesting that it'd be nice to have a
challenge/response anti-spam mechanism.  See the bug report for more.

Updated

15 years ago
Depends on: 184948
taking.
Assignee: dmose → sspitzer
Status: ASSIGNED → NEW

Comment 56

15 years ago
I'm not sure whether this should go in this bug or another one, but speaking of
the the "move to folder" feature, something I find annoying about my Junk mail
folder is that when I look at it, there's a (significant) delay while the spam
filter appears to try to recategorize all the email.

Maybe it's just parsing through the headers looking for some kind of "Mozilla
thinks this is spam" flag, but it seems to take too much time for that (although
I *do* have my email stored on another machine so it gets backed up automatically). 

Seems to me that mail could get marked "junk" in the index file, but perhaps
that doesn't get regen-ed until you open the folder either... don't know enough
about the internals...

Comment 57

15 years ago
I'm working on code that will prevent the re-classification of messages moved to
the junk folder, if your imap server supports user-defined keywords. The problem
is that when we move an imap message, we really don't know what the message will
be in the destination folder, due to the way imap works, so we can't "pre-mark"
it as junk, other than by using imap keywords on the imap servers that support
key words.

Comment 58

15 years ago
Re: comment 57, how about just not reclassifying folders that aren't inboxes
(I'm presuming Moz knows which those are, because it does a "get new mail"
whenever I select my Inbox). It didn't occur to me to mention that it was an
IMAP folder... good catch. 

However, I can't see any real benefit (and quite a bit of annoyance/potential
lossage) to running spam filters on secondary folders. Either the user moved it
there (via a filter or by hand), or the Junk mail feature did, and in neither
case does it make sense to reclassify it. 

To clarify my "potential lossage" comment: perhaps there are some desireable
emails that look just like some class of spam, and a user would want to set up a
filter to move those to a "safe" folder before spam classification. If we then
reclassify those secondary folders, data loss occurs. 

The only good thing it would do is show off how good the filter is (or not :-)
by displaying the "junk" tag in the message summary. This doesn't seem that
useful to me unless requested by the user for some special purpose.

Also, would this problem be solved by moving my junk into a local folder instead
of an IMAP one?

Comment 59

15 years ago
the reason we run spam filters on secondary folders is that your mail mail
filters can filter mail to secondary folders, and this happens before the spam
filter runs (and thus before the message body is downloaded). For example, if
you have a filter that moves all messages addressed directly to you to a folder,
and you get spam sent directly to you, you want the spam filters to run on that
folder when you open it to catch the spam sent directly to you.

The alternative is to run the spam filters first, and we don't do that (we might
want to reconsider that, but not for this release)

Comment 60

15 years ago
The 20th - 25th comments of bug 181394 raise an extremely important point: we
have to do something about the "obviousness" of using the junk mail feature. If
people who are savvy enough to be downloading nightlies and entering comments in
b.m.o. have a hard time figuring out that you have to train ~50-100 emails as
spam before it starts working, how will any normal user have a hope?

Possible solution: pre-learn the training data (may be unpopular with this
crowd)... any others? Evangelization isn't likely enough, because no one reads
documentation...

Comment 61

15 years ago
RE comment #57:
> I'm working on code that will prevent the re-classification of messages 
> moved to the junk folder, 

How about not re-classifying *any* messages (no matter where they are moved to)?

> if your imap server supports user-defined keywords. 

Or perhaps by giving each message in mozilla a "JunkClassified(Y/N)" flag.

> The problem is that when we move an imap message, we really don't know 
> what the message will be in the destination folder, due to the way imap 
> works, so we can't "pre-mark" it as junk, other than by using imap keywords 
> on the imap servers that support key words.

So it is impossible to track a message's "JunkClassified(Y/N)" and "Label" state
when moved from IMAP to local? That would really put a damper on things.
Depends on: 188940

Comment 62

15 years ago
Is this bug going to push 1.2beta back, or are the dependancies of this bug
going to be changed so it can make it into 1.2beta.

Comment 63

15 years ago
it would be tricky to push 1.2beta anywhere, as it was released last October!

1.3beta won't be held back by general issues, but some of the individual bugs
may be blockers, I don't know. target milestone isn't really relevant for
tracking bugs like this anyway, so I hope Seth won't mind me taking the liberty
of resetting it...
Keywords: meta
Target Milestone: mozilla1.2beta → ---

Updated

15 years ago
Depends on: 191486

Updated

15 years ago
Depends on: 191723

Comment 64

15 years ago
Would it be possible to disable html in the Junk Mail folder only?

That way when someone does go through any messages that might not be spam they
dont have to worry about html loading that might report their address as active.

Comment 65

15 years ago
Created attachment 114505 [details]
Hard to filter spam concept

I was just thinking about possible ways that spammers could trick our filters,
and this one came to me. Basically, this HTML is "M a k e M o n e y F a s t. P
l e a s e t a k e o u t a l o a n f r o m u s.". It's just that the alpha
letters are in "a few times 'big'" font and most of the spaces are in "many
times 'small'" font, so it looks pretty much like normal text.

I guess eventually that uncommon standalone characters like "k" would get
trained as spam, but that seems dangerous in an engineering environment :-). 

But I can't think of a good way to avoid this problem except perhaps to include
the frequencies of some subset of HTML tags in the list of trained terms...
Maybe this kind of trick is covered by bug 181534, though. 

Anyway, this arms race promises to be an interesting one...

Comment 66

15 years ago
there will ALWAYS be ways that spammers can trick our filters. I'd make
references to bush's missle defense system, but they wouldn't really apply since
the baysian filters are still actually effective.

Comment 67

15 years ago
Anyone working on spam filters should really look at the SpamAssassin code,
since it has lots and lots of ideas to borrow; the trick refered to in comment
65 has the rule GAPPY_SUBJECT (it was found to not be worth looking for it in
the body of messages).  The rules/STATISTICS.txt file has information that can
tell you what rules are worth spending time trying to imitate.  It also has tools
to check rules against mail archives (assuming you also archive your spam), and
to give spam/not-spam ratios for the various rules; thus, SpamAssassin could
be useful for prototyping and analyzing rules in Perl before doing them in
C++/JavaScript.

Comment 68

15 years ago
is there a bug filed to have spam mail move to your junk folder automatically X
seconds after you toggle the junk status of a message (assuming you have that
pref set)?  That's the one last feature I really miss.  I hate having to train
it with the spam it missed, then setting the junk view to delete them all.  It
would be so much easier to have them disappear automatically once I tag them as
junk.

Updated

15 years ago
Depends on: 194273

Comment 69

15 years ago
Hello all-

(this seems to be the most appropriate bug for what I need to say, sorry if I 
bother you)

I am testing the spam filter now for quite a while, and I must say it is little 
use for me. This might be because of some specific reasons, I do receive emails 
in German as well as in English, I am on some mailing lists and I do receive 
emails from people that are not in my contacts list. I tried to train the spam 
filter in many different ways, that is marking all emails either as spam or 
not, marking only spam mail as spam, marking only the most annoying spam mails 
as spam and so on. It just has not really satisfying results.

The best spam filter I have found at all is www.cloudmark.com spam fighter. See 
also bug 153522. I am using this spam filter for my business account and the 
results are great, that is, no real email was marked as spam mail! And that is 
what you need to rely on.

The current spam filter might be great use for some and I see also applications 
in other areas but spam but I recommend to consider cloudmark support as well.

Tobias

Comment 70

15 years ago
I just wanted to reply to comment 69, from a fellow user who has just been
trying out the spam filter for the past 2 months.  What I'd like to say is that
you have to be patient with the Mozilla filter.  I have had to build up my Junk
folder to 1000 messages of pure spam, until I really started to experience
near-perfect spam filtering.  And even now, once in a while, a spam gets
through.  The spam that gets through is sometimes in a foreign language, or it
is one of those Nigerian spam messages.  As it turns out, my Junk folder does
not contain many messages in foreign languages, and I have not received many of
those Nigerian-spam, so it is not well-trained in this ares.  But I'm confident
that if I get a few more of them, they will start to be picked up for sure.

The fact that you get messages in German and English should not matter.  And why
did you try to test the spam filter by marking all messages as spam?  Or, by
only marking the most annoying messages as spam?  This will just make the filter
more inefficient, and you will be forced to have a larger data set, in order to
filter out the spams you don't want.  Like I said, I have 1000+ messages in my
Junk folder right now (I think it's 1300) and I mark practically ALL unsolicited
mail as junk/spam.  Yet still, maybe 5% of the spams can make it through every
day.  But that has been steadily dropping every week...

I think this brings up an important concern, and that is that this filter takes
a long time to become funtional in my opinion.  Is it possible to make its
effect non-linear?  ie. make the spam filter weigh more heavily towards marking
messages as spam if the Junk folder contains less than 100 spams?  Many people
want a populated training.dat file to be shipped with Mozilla, but I don't think
that will every be possible.  A sex therapist, someone in the porn industry, or
maybe even someone in the market for a penis enlargement may use Mozilla, and so
there are obvious problems with doing this.  I can forsee a lot of people
becoming frustrated with the Mozilla filter, as I did in the first week, when I
did not see instant results.  Is there a way around this?  Maybe not, but if
there is, then it should be looked into.  Just some random thoughts...

Comment 71

15 years ago
Tobias: I have a similar situation receiving both spam and legitimate mail in
both English and Polish (plus a lot of Chinese spam - dunno why). After a few
weeks and maybe 1000 messages I have no problems in any language (except that
probably any message in Chinese will be marked as spam - but that does not
bother me). David is right that you need a lot of patience. But it is worth it!

Updated

15 years ago
Depends on: 200190

Comment 72

15 years ago
It would be great if you could set mozilla to block only images in junk mail.

Comment 73

15 years ago
certainly not, Aaron. Junk doesn't always detect a spam. I don't want my email
to be confirmed on someone's spam list just because the spam filter didn't
detect it.
But it would be nice to have an option somewhere to download the images for the
selected message.

Comment 74

15 years ago
Well then, maybe it should be an option:
1 Show all images
2 Don't download/display images on junk mail
3 Don't download/display any images

If you select 2 or 3, then there should be button on the toolbar to download
images that were blocked for a given message.

I use a lot of email with images (Netflix and REI for example), but obviously I
don't want to download the images of suspected junk mail.

-Aaron

Comment 75

15 years ago
I have concern about Mozilla's spam filter and I was wondering if there was a
bug associated with this.  I enable the "purge junk mail after x days feature".
 Imagine the following scenario:

1) I get a message from my friend
2) I accidentally mark it as Junk.
3) I accidentally move it to my Junk folder.
4) 10 days later, it gets "purged" from my Junk folder
5) On the same day, it is automatically deleted from my Trash folder when I
close Mozilla
6) Now, some messages from my friend get mistakenly marked as spam, because some
keywords in his previous email were marked as "bad" words.  But how can I
reverse the process, and "unmark" my friend's original message if it no longer
exists anywhere on my hard drive?

Possible solution: delete the training.dat file, and start over by marking
messages in the Junk folder as spam, and going from there.  However, if you
purge the Junk folder every 10 days, then there will only be around 200 spams
(for me) and this will not create a large enough training.dat file for effective
spam filtering.

So what is being done about this?  Is there a bug relating to this issue that I
describe?  Thanks.

Comment 76

15 years ago
all of these are optional and off by default - empty trash on exit, purging of
the junk folder, and marking junk moving messages to the junk folder. And
furthermore, you can use whitelisting to prevent messages from your friend as
being automatically marked as junk, no matter what words he uses. If you turn on
all of those things, you really need to look at your junk folder occasionally to
make sure it doesn't have any messages you want - it could have non-junk
messages that were mis-categorized, without any errors on your part.

Comment 77

15 years ago
To rephrase the most important piece of advice from bienvenu:

Add your friend to your personal addressbook and switch on the "Do not mark as
junk if sender is in my address book" setting.

As simple as that.

Updated

15 years ago
Depends on: 208197

Updated

15 years ago
Depends on: 212671

Updated

14 years ago
Blocks: 163993

Updated

14 years ago
Depends on: 243430
Product: MailNews → Core

Updated

13 years ago
No longer blocks: 163993

Updated

12 years ago
Blocks: 66425
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → backend
(Assignee)

Updated

10 years ago
Product: Core → MailNews Core

Updated

4 years ago
Summary: spam blocking filters features → spam blocking filters features tracking [meta]

Updated

7 months ago
Depends on: 223716
You need to log in before you can comment on or make changes to this bug.