Open Bug 11035 Opened 23 years ago Updated 2 years ago
[meta] spam blocking filters features tracking
9.14 KB, text/plain
17.72 KB, patch
|Details | Diff | Splinter Review|
17.83 KB, patch
|Details | Diff | Splinter Review|
6.46 KB, text/html
Block sender Algorithmic blocking Integration with RBL or NoCEm
23 years ago
Whiteboard: HELP WANTED
23 years ago
Summary: spam blocking filters features → [HELP WANTED]spam blocking filters features
Target Milestone: M15
*** Bug 10887 has been marked as a duplicate of this bug. ***
Bulk-resolving requests for enhancement as "later" to get them off the Seamonkey bug tracking radar. Even though these bugs are not "open" in bugzilla, we welcome fixes and improvements in these areas at any time. Mail/news RFEs continue to be tracked on http://www.mozilla.org/mailnews/jobs.html
Reopen mail/news HELP WANTED bugs and reassign to firstname.lastname@example.org
Summary: [HELP WANTED]spam blocking filters features → spam blocking filters features
Whiteboard: HELP WANTED
Target Milestone: M15
moving out there.
Target Milestone: --- → Future
*** This bug has been marked as a duplicate of 71413 ***
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
Håkan, I don't think this is a dup. This bug is talking about adding intelligent spam filters to the product that would automatically add messages to a junk mail folder. The other bug is talking about adding a feature that lets the user manually add a sender or domain to a block list. They have similar results but are different in how they do things. I'm going to reopen. If you feel differently, let's discuss.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Just reread the bug. This does mention block sender, so that is a dup of the other bug, but the other things mentioned like Algorithmic blocking andIntegration with RBL or NoCEm are not.
Isn't RBL a server-side solution?
Marking dependent on bug #73075 (NoCeM support). I suppose algorithmic blocking should get its own bug too and be marked as a dependency, but I'm not quite sure what is meant by "algorithmic blocking".
Depends on: 73075
This may actually be a dup of bug #66425.
right click popup menu should show these options: Move To Copy To Filter To <-- The "Filter To" option would filter all mail from this sender to the selected folder (or trashcan).
In fact, it should be a big button, right next to the "SEND BUTTON": "BLOCK FUTURE EMAILS BY THIS SPAMMER". /iaw
In response to comment 9, RBL is usually used by servers of ISPs to actually block mail, however on their site I found a link to a perl script to be used with procmail to block spam using their (and two other) services. So, it would appear that they don't mind individual users using their service. I'll attach the perl script just as a reference. Basically, I think one just does a DNS lookup of the mail origination using a server from mail-abuse.org and looks at the output.
This perl script by Bjarni R. Einarsson checks to see if the send of a mail is in the mail-abuse.org list or the orbz.org list. It also takes as input a file of good, but normally blocked IPs.
In response to comment #13: That would be it: *) Analyse the "Received" Header lines to find out the originating IP. *) remove further mails with this IP in the "Received" header *) Do a whois lookup to find a responsible contact for this IP *) send a spam complaint to that contacts
In reply to comment #9: well RBL is based on DNS lookup (you do the lookup on the DNS of the chosen RBL service.. e.g. relays.ordb.org or relays.osirusoft.com)... DNS lookup can easily be done also by Mozilla itself.. why not? Having a "move if" - "listed in RBL given:" - "relays.ordb.org" would be quite useful. Shouln't be too difficult, but I should take a look at filter code before judging on that...
dns lookups are async and the filter code is synchronous, so there's a big problem right there. Not involvable, but not easy either.
mscott's been expressing interest in improving our spam fighting features.
Assignee: nobody → mscott
Status: REOPENED → NEW
Cloudmark offers a spam-filtering plugin for m$ outlook. the plugin relies on a central database maintained by the plugin's users. the plugin adds a 'block' and 'unblock' button to the email client. might me interesting to integrate support in mozilla or 'evangelize' cloudmark to do it themselves. check out http://theregus.com/content/6/25317.html for more technical details or visit http://www.cloudmark.com/ also, is there a spam filtering tracking bug in bugzilla? there are lots of bugs & ideas but no coordination as it seems to me. i believe spam filtering is an important feature of a modern email client, and a feature that would make me switch email clients. phil
I have put together a few filters which should block at least 80-95% of spam. I hope it BLOCKS all spam, but thats just hopefulness. It uses Mozilla mail's filters, and it has several actual filters, each with their own subfilters/rules. In this way its "modular" (actually not really, just more customizable) in that if you actually subsribe to say "porn mail" you can turn off the set of filters that targets porn mail. I was disappointed that the filters did not allow me to scan the complete headers, however I think my "filter package" is ready for use. I've named it "SpamSlayer", and it needs an easy installer package. Since its a ruleset, it goes in the mail folder, located in slightly different paths on different computers. Usually its something like: c:\windows\application data\mozilla/profiles/default user/(random string).slt/mail/(server IP address (don't know if this is the pop3 or smtp, since both of mine are the same)) If someone could setup an installer that could find this dynamic location and replace the current ruleset with the SpamSlayer ruleset, or even better, APPEND the SpamSlayer ruleset to the current one (so current filters aren't overwritten and lost).
Forgot to say this, but if anyone can work with me to setup an installer (and I'll be releasing new versions if neccessary) please email me email@example.com
OK everyone...here's the URL for the SpamSlayer project, which I described above: http://spamslayer.mozdev.org For right now, it should solve our spam problem.
why not integrate with something like a SpamAssassin? I believe SA is recognized as the best tool out there today for this sort of thing. Some of it could be made relevant on the client-side: http://spamassassin.taint.org/ For example, one aspect that would be fairly obvious to integrate would be Razor: http://razor.sourceforge.net/ SpamAssassin uses Razor and other rules to eliminate spam. It is the best out there, open source, so maybe there is someone on their project who would be keen to help integrate it with Mozilla in some fashion.
Spamassassin is a Perl module that one can very easily pass a message to and receive its "opinion" on whether the mail is spam or not. In fact, I've done this with a stand-alone IMAP client to deal with the mail as I see fit. Razor can be installed as one of the inputs into Spamassassin. I don't if it would be possible to integrate SA into Mozilla (maybe a plugin or something) but I can certainly vouch for its effectiveness.
I definately would like to see SpamAssasin (and/or Razor) integrated into mozilla mail. However, with the not-so-important rating that this bug has, I don't think it will be getting any SpamAssasin or Razor integration anytime soon.
reassigning to dmose and raising priority. we need to start working on anti-spam features for Mozilla. Maybe someone cc'd on this bug knows if there's a better bug out there to serve as a Meta bug. If not, let's make this one and start adding other bugs as dependencies.
Assignee: mscott → dmose
Priority: P3 → P1
Target Milestone: Future → mozilla1.2beta
file a new bug on the spamassasin integration.. that way if someone implements new spam blocking which DOESN'T use spamassasin, then this bug can be marked fixed without causing a big bruhaha
I'm just checkpointing here, this is far from being ready to land. However, it includes the beginning of a straw-man interface for more generic message filter (lots of work still required on that, as well as the beginnings of an implementation of a filtering plugin which can read and use spam-assassin config files.
I think this bug is missing the big issue: we need to include the mozilla user community in feedback. This is so easy to do: a button next to the "STOP" sign would send a message to a user-selectable anti-spam site, with the base information of the particular email being flagged spam. This way, the anti-spam site could much faster detect new spam schemes. Implementation Cost: Low. Potential Value: Very High. /iaw
That's Yet Another Bug (and another filter type) and you should file a bug on that.. and your cost analysis seems quite weak.. sure, the client-side work doesn't sound hard, but think about all the details beyond just adding the button in the UI. I mean, who runs this service? are there well-known anti-SPAM services out there? how does the client know to block future SPAM? how does it match current e-mail to spam on the service without downloading a whole bunch of spam from the service and without compromising the users privacy? How does one handle failure (spam service unavailable, etc) However its done, it sounds expensive to me. But wait! don't answer me here. File another bug, make it dependent on this one.
alecf: in fact, this quite similar to what I had in mind, but hadn't yet written down. If you've no objections; I'd like to start with your text and whip it up into a strawman proposal in HTML that we can go with. The spam-assassin bits that I've got running so far are implemented exactly along the lines you suggest. In particular, I've made a simple nsIMsgFilterPlugin interface, and modified the IMAP message header fetching code to call out to it once for each message. I've implemented this interface for spam assassin as a JS component, because much of the spam-assassin stuff is regexp based. Once it decides that something is spam, it just usings the existing nsIMsgFilterHitNotify::ApplyFilterHit info to tell the IMAP (or POP or whatever) code how to deal with the hit.
Status: NEW → ASSIGNED
sure - go ahead and use whatever part of that you need!
Check out bug 163188 (Bayesian filtering - very cool!) - looks like this bug should be dependent on it?
http://www.mozilla.org/mailnews/specs/filters/#Junk After discussing w/putterman, idea of what UI might look like. Maybe "Always accept messages from people in my AB" has a dropdown to select a specific AB. That AB, becomes a White List of sorts.
A thought re "Always accept messages from people in my address books": 1) At the very least, this should exclude the 'collected addresses' address book! 2) More flexibility would be possible if this was: "Always accept messages from people in the following address book/group:" with a drop-down which enables people to select a particular address book, or just a particular list within an address book 3) Alternatively, this could be "Always accept messages from people in my 'white list' address book" - this address book would be a new top-level book, at the same level as 'Personal Address Book' and 'Collected addresses'. Also, I think there's a bit too much granularity on the 'sensitivity' slider - nobody is really going to spend enough time tweaking and analysing to see the difference between e.g. 35 and 36. I suggest a 0-10 scale - enough to get reasonably fine degree of control without taking too much trial and error to find your 'optimum' level. Other than that, looks pretty good.
>2) More flexibility would be possible if this was: "Always accept messages from >people in the following address book/group:" with a drop-down which enables >people to select a particular address book, or just a particular list within an >address book Agree a dropdown address book selector would be better. >Also, I think there's a bit too much granularity on the 'sensitivity' slider. Agree.
Make mozilla intelligently block spam mail is a good feature.
I suggest also adding a list of "non-spam domains" to the screen. This way people can specify domains of sites / companies they work for never to be filtered as spam.
bug 156744 seems like an easy solution for this bug. TMDA is an open source project on sourceforge, and so all the code is there, you just would have to "port" it probably. It also is designed so that it absolutely blocks 100% of spam with virtually no false positives.
Regarding Comment #38: How about turning the interface around - would it be so much more difficult to add a "whitelist" or "always accept email" property in individual addressbook cards, and in the top-level properties of an addressbook - especially in ldap-based addressbooks that are used for corporate addressbooks! The idea of no-spam domains much like we currently have uses-HTML-domains sounds like a good idea. Speaking as a corporate site, I would like to make sure that a spamassin-style filter not be the ONLY filter available because, as someone else pointed out, it could present significant privacy/security issues. bug 163188 seems like a good alternative method. Also, does the SA software use any special ports that might get blocked by a firewall?
we're adding a whitelist feature, based on the personal address book of your choice, IIRC.
Bienvenu - what if you have multiple addressbooks that you want to function as whitelists? By setting a property in the addressbook, you have more flexibility than just selecting one AB (which it appears is all the drop-down would allow). For that reason you wouldn't be able to (e.g.) set both your PAB and an ldap-based AB as being on a whitelist. So maybe instead of a drop-down menu, you have an edit menu that allows you to check/uncheck your ABs, or even a button (that greys out the edit menu) that says all ABs except collected are whitelists. Or are you saying that design has moved past the simple dropdown mentioned above, and you can select _multiple_ "personal address book of your choice"?
I don't think domain-based whitelists would work very well because spammers often fake email to you coming from your own domain or even from your own address. (Obviously if you are sending a spam to firstname.lastname@example.org it's not hard to have your spambot mark it from email@example.com.) Address book-based whitelists are much better because then the spammer needs to know both your address and the address of a person on your whitelist--a much-harder (although not impossible) combination.
Hi, I'm currently using 1.1 so apologies if this has been dealt with in 1.2alpha...I have various filters set up which look at the message body. One string I use is 'You received this email because you signed up with' but this wasn't caught today as the message source is base64 encoded with content type of text/html - mozilla seems to decode and display that fine but the filter is run on the non-decoded version. Thanks.
Looks like we missed our milestone on this bug. What kind of system are we aiming at implementing? I've scanned the comments and can't seem to see a unified goal.
19 years ago
Depends on: 179503
19 years ago
Depends on: 179504
Depends on: 179966
*** Bug 179984 has been marked as a duplicate of this bug. ***
Depends on: 179984
Depends on: 179997
Depends on: 179999
Depends on: 180010
Depends on: 180029
Depends on: 180153
Depends on: 180215
Depends on: 180231
Depends on: 180477
Depends on: 179012
Depends on: 182386
Comment 46 says that domain whitelists are bad because spammers fake the sender as being from your domain, and says "specific email address" whitelists are better. However, the whitelist discussion ignores the fact that the most common address that spammers spoof is the addressee's address (i.e. *you*), and that's perhaps one of the most common ones that people would want to be in their whitelist (I send mail to myself all the time). So something more complicated would seem to be needed. Also, this whole feature doesn't seem to be working for me in the 2002120604 build. Nothing is marked as spam (though I have logged several emails as spam, and bunch of them as "non spam"). Also, I can't seem to find my junkmail.js file, there doesn't seem to be anything in my junk filter log (though I did turn it on), and I don't see a file anywhere that obviously contains the Bayesian parameters. Is there any preliminary documentation/discussion on this kind of stuff I could look at? I'm really happy to see this feature, BTW.
Check out "training.dat".
Two more thoughts on filtering: 1) Instead of/in addition to a white book, how about andding an 'automatically accept email from this person' checkbox to each card in your address book? 2) Allow filters to work on subfolders. I just submitted this as bug 184080 before reading this thread. In brief: Filters A, B, C... filter email to a quarantine folder, deleted and so on. Filter Z searches through your quarantine, deleted, ... and based on it's criteria (such as friends email addresses) moves mail back to your inbox. It's another way of implementing a white list and catches mail you want to read that may otherwise be deleted by your overzealous mail filters.
BTW, I don't know how feasible this would be with the current Bayesian filter mechanism, but it would be really nice (for the curious among us if for no other reason) if the filter log indicated *why* the spam was filtered rather than just indicating that it was. The simple fact of the mail being filtered seems adequately conveyed by the junk mail icon and whether it's been moved into your spam box. So I'd say that the current filter log is pretty useless.
BTW, please see the related bug #187044, suggesting that it'd be nice to have a challenge/response anti-spam mechanism. See the bug report for more.
Assignee: dmose → sspitzer
Status: ASSIGNED → NEW
I'm not sure whether this should go in this bug or another one, but speaking of the the "move to folder" feature, something I find annoying about my Junk mail folder is that when I look at it, there's a (significant) delay while the spam filter appears to try to recategorize all the email. Maybe it's just parsing through the headers looking for some kind of "Mozilla thinks this is spam" flag, but it seems to take too much time for that (although I *do* have my email stored on another machine so it gets backed up automatically). Seems to me that mail could get marked "junk" in the index file, but perhaps that doesn't get regen-ed until you open the folder either... don't know enough about the internals...
I'm working on code that will prevent the re-classification of messages moved to the junk folder, if your imap server supports user-defined keywords. The problem is that when we move an imap message, we really don't know what the message will be in the destination folder, due to the way imap works, so we can't "pre-mark" it as junk, other than by using imap keywords on the imap servers that support key words.
Re: comment 57, how about just not reclassifying folders that aren't inboxes (I'm presuming Moz knows which those are, because it does a "get new mail" whenever I select my Inbox). It didn't occur to me to mention that it was an IMAP folder... good catch. However, I can't see any real benefit (and quite a bit of annoyance/potential lossage) to running spam filters on secondary folders. Either the user moved it there (via a filter or by hand), or the Junk mail feature did, and in neither case does it make sense to reclassify it. To clarify my "potential lossage" comment: perhaps there are some desireable emails that look just like some class of spam, and a user would want to set up a filter to move those to a "safe" folder before spam classification. If we then reclassify those secondary folders, data loss occurs. The only good thing it would do is show off how good the filter is (or not :-) by displaying the "junk" tag in the message summary. This doesn't seem that useful to me unless requested by the user for some special purpose. Also, would this problem be solved by moving my junk into a local folder instead of an IMAP one?
the reason we run spam filters on secondary folders is that your mail mail filters can filter mail to secondary folders, and this happens before the spam filter runs (and thus before the message body is downloaded). For example, if you have a filter that moves all messages addressed directly to you to a folder, and you get spam sent directly to you, you want the spam filters to run on that folder when you open it to catch the spam sent directly to you. The alternative is to run the spam filters first, and we don't do that (we might want to reconsider that, but not for this release)
The 20th - 25th comments of bug 181394 raise an extremely important point: we have to do something about the "obviousness" of using the junk mail feature. If people who are savvy enough to be downloading nightlies and entering comments in b.m.o. have a hard time figuring out that you have to train ~50-100 emails as spam before it starts working, how will any normal user have a hope? Possible solution: pre-learn the training data (may be unpopular with this crowd)... any others? Evangelization isn't likely enough, because no one reads documentation...
RE comment #57: > I'm working on code that will prevent the re-classification of messages > moved to the junk folder, How about not re-classifying *any* messages (no matter where they are moved to)? > if your imap server supports user-defined keywords. Or perhaps by giving each message in mozilla a "JunkClassified(Y/N)" flag. > The problem is that when we move an imap message, we really don't know > what the message will be in the destination folder, due to the way imap > works, so we can't "pre-mark" it as junk, other than by using imap keywords > on the imap servers that support key words. So it is impossible to track a message's "JunkClassified(Y/N)" and "Label" state when moved from IMAP to local? That would really put a damper on things.
Depends on: 188940
Is this bug going to push 1.2beta back, or are the dependancies of this bug going to be changed so it can make it into 1.2beta.
it would be tricky to push 1.2beta anywhere, as it was released last October! 1.3beta won't be held back by general issues, but some of the individual bugs may be blockers, I don't know. target milestone isn't really relevant for tracking bugs like this anyway, so I hope Seth won't mind me taking the liberty of resetting it...
Target Milestone: mozilla1.2beta → ---
Would it be possible to disable html in the Junk Mail folder only? That way when someone does go through any messages that might not be spam they dont have to worry about html loading that might report their address as active.
I was just thinking about possible ways that spammers could trick our filters, and this one came to me. Basically, this HTML is "M a k e M o n e y F a s t. P l e a s e t a k e o u t a l o a n f r o m u s.". It's just that the alpha letters are in "a few times 'big'" font and most of the spaces are in "many times 'small'" font, so it looks pretty much like normal text. I guess eventually that uncommon standalone characters like "k" would get trained as spam, but that seems dangerous in an engineering environment :-). But I can't think of a good way to avoid this problem except perhaps to include the frequencies of some subset of HTML tags in the list of trained terms... Maybe this kind of trick is covered by bug 181534, though. Anyway, this arms race promises to be an interesting one...
there will ALWAYS be ways that spammers can trick our filters. I'd make references to bush's missle defense system, but they wouldn't really apply since the baysian filters are still actually effective.
is there a bug filed to have spam mail move to your junk folder automatically X seconds after you toggle the junk status of a message (assuming you have that pref set)? That's the one last feature I really miss. I hate having to train it with the spam it missed, then setting the junk view to delete them all. It would be so much easier to have them disappear automatically once I tag them as junk.
Hello all- (this seems to be the most appropriate bug for what I need to say, sorry if I bother you) I am testing the spam filter now for quite a while, and I must say it is little use for me. This might be because of some specific reasons, I do receive emails in German as well as in English, I am on some mailing lists and I do receive emails from people that are not in my contacts list. I tried to train the spam filter in many different ways, that is marking all emails either as spam or not, marking only spam mail as spam, marking only the most annoying spam mails as spam and so on. It just has not really satisfying results. The best spam filter I have found at all is www.cloudmark.com spam fighter. See also bug 153522. I am using this spam filter for my business account and the results are great, that is, no real email was marked as spam mail! And that is what you need to rely on. The current spam filter might be great use for some and I see also applications in other areas but spam but I recommend to consider cloudmark support as well. Tobias
I just wanted to reply to comment 69, from a fellow user who has just been trying out the spam filter for the past 2 months. What I'd like to say is that you have to be patient with the Mozilla filter. I have had to build up my Junk folder to 1000 messages of pure spam, until I really started to experience near-perfect spam filtering. And even now, once in a while, a spam gets through. The spam that gets through is sometimes in a foreign language, or it is one of those Nigerian spam messages. As it turns out, my Junk folder does not contain many messages in foreign languages, and I have not received many of those Nigerian-spam, so it is not well-trained in this ares. But I'm confident that if I get a few more of them, they will start to be picked up for sure. The fact that you get messages in German and English should not matter. And why did you try to test the spam filter by marking all messages as spam? Or, by only marking the most annoying messages as spam? This will just make the filter more inefficient, and you will be forced to have a larger data set, in order to filter out the spams you don't want. Like I said, I have 1000+ messages in my Junk folder right now (I think it's 1300) and I mark practically ALL unsolicited mail as junk/spam. Yet still, maybe 5% of the spams can make it through every day. But that has been steadily dropping every week... I think this brings up an important concern, and that is that this filter takes a long time to become funtional in my opinion. Is it possible to make its effect non-linear? ie. make the spam filter weigh more heavily towards marking messages as spam if the Junk folder contains less than 100 spams? Many people want a populated training.dat file to be shipped with Mozilla, but I don't think that will every be possible. A sex therapist, someone in the porn industry, or maybe even someone in the market for a penis enlargement may use Mozilla, and so there are obvious problems with doing this. I can forsee a lot of people becoming frustrated with the Mozilla filter, as I did in the first week, when I did not see instant results. Is there a way around this? Maybe not, but if there is, then it should be looked into. Just some random thoughts...
Tobias: I have a similar situation receiving both spam and legitimate mail in both English and Polish (plus a lot of Chinese spam - dunno why). After a few weeks and maybe 1000 messages I have no problems in any language (except that probably any message in Chinese will be marked as spam - but that does not bother me). David is right that you need a lot of patience. But it is worth it!
It would be great if you could set mozilla to block only images in junk mail.
certainly not, Aaron. Junk doesn't always detect a spam. I don't want my email to be confirmed on someone's spam list just because the spam filter didn't detect it. But it would be nice to have an option somewhere to download the images for the selected message.
Well then, maybe it should be an option: 1 Show all images 2 Don't download/display images on junk mail 3 Don't download/display any images If you select 2 or 3, then there should be button on the toolbar to download images that were blocked for a given message. I use a lot of email with images (Netflix and REI for example), but obviously I don't want to download the images of suspected junk mail. -Aaron
I have concern about Mozilla's spam filter and I was wondering if there was a bug associated with this. I enable the "purge junk mail after x days feature". Imagine the following scenario: 1) I get a message from my friend 2) I accidentally mark it as Junk. 3) I accidentally move it to my Junk folder. 4) 10 days later, it gets "purged" from my Junk folder 5) On the same day, it is automatically deleted from my Trash folder when I close Mozilla 6) Now, some messages from my friend get mistakenly marked as spam, because some keywords in his previous email were marked as "bad" words. But how can I reverse the process, and "unmark" my friend's original message if it no longer exists anywhere on my hard drive? Possible solution: delete the training.dat file, and start over by marking messages in the Junk folder as spam, and going from there. However, if you purge the Junk folder every 10 days, then there will only be around 200 spams (for me) and this will not create a large enough training.dat file for effective spam filtering. So what is being done about this? Is there a bug relating to this issue that I describe? Thanks.
all of these are optional and off by default - empty trash on exit, purging of the junk folder, and marking junk moving messages to the junk folder. And furthermore, you can use whitelisting to prevent messages from your friend as being automatically marked as junk, no matter what words he uses. If you turn on all of those things, you really need to look at your junk folder occasionally to make sure it doesn't have any messages you want - it could have non-junk messages that were mis-categorized, without any errors on your part.
To rephrase the most important piece of advice from bienvenu: Add your friend to your personal addressbook and switch on the "Do not mark as junk if sender is in my address book" setting. As simple as that.
sorry for the spam. making bugzilla reflect reality as I'm not working on these bugs. filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → backend
Summary: spam blocking filters features → spam blocking filters features tracking [meta]
Priority: P1 → --
Summary: spam blocking filters features tracking [meta] → [meta] spam blocking filters features tracking
You need to log in before you can comment on or make changes to this bug.