Closed
Bug 224318
Opened 21 years ago
Closed 21 years ago
Bayes filtering should learn through use of external/serverside filters
Categories
(MailNews Core :: Filters, enhancement)
MailNews Core
Filters
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: raccettura, Assigned: Bienvenu)
References
(Blocks 1 open bug)
Details
(Keywords: fixed1.7, late-l10n)
Attachments
(11 files, 5 obsolete files)
7.56 KB,
text/plain
|
Details | |
1.59 KB,
text/plain
|
Details | |
1.22 KB,
text/plain
|
Details | |
17.23 KB,
patch
|
mscott
:
superreview+
asa
:
approval1.7+
|
Details | Diff | Splinter Review |
2.10 KB,
patch
|
mscott
:
superreview+
chofmann
:
approval1.7+
|
Details | Diff | Splinter Review |
15.43 KB,
patch
|
mscott
:
superreview+
|
Details | Diff | Splinter Review |
1.75 KB,
patch
|
mscott
:
superreview+
|
Details | Diff | Splinter Review |
3.89 KB,
patch
|
mscott
:
superreview+
|
Details | Diff | Splinter Review |
3.26 KB,
patch
|
Stefan.Borggraefe
:
review+
Bienvenu
:
superreview+
chofmann
:
approval1.7+
|
Details | Diff | Splinter Review |
818 bytes,
patch
|
Bienvenu
:
superreview+
chofmann
:
approval1.7+
|
Details | Diff | Splinter Review |
752 bytes,
patch
|
mscott
:
superreview+
asa
:
approval1.7+
|
Details | Diff | Splinter Review |
Bayes filtering should be aware of the increasingly popular X-Spam headers.
Products such as SpamAssassin use them to mark suspected spam emails.
Ideally, Bayes should ignore the top message that SpamAssassin attaches to
suspected spam.
Invite discussion on how to deal with Bayes and other spam filtering software.
I'm attaching 2 emails, 1 spam, 1 ham, both filtered through SpamAssassin. The
spam is very distinctive, as it's always attached, to a spam notice email. All
have X-Spam.
There is also the possibility to have an option and utalize external spam
filters with/or bayes. For example, training on SpamAssassin's results of
spam/ham (such as SA's Bayes filtering does). Or allowing Mozilla to simply
recognize SA's decision as what the email is. Rather than SA does the scan,
then Mozilla does it again. This would in essence allow third party filters the
ability to use Mozilla's Spam UI, or work in conjunction with Mozilla.
Reporter | ||
Comment 1•21 years ago
|
||
Spam Sample (note attached original email, and headers).
Reporter | ||
Comment 2•21 years ago
|
||
Ham. Note difference from Spam.
Reporter | ||
Comment 3•21 years ago
|
||
Note messages *can* be inline, though no longer the default format in SA later
than 2.50, now attachments are default behavior.
Severity: normal → enhancement
OS: Windows XP → All
Hardware: PC → All
Assignee | ||
Comment 4•21 years ago
|
||
Scott and I had some ideas about what else we could do with this. We're thinking
a new tab on the spam settings window (which we're proposing to have tabs when
we add more options like this) with the following choices about what to do with
the x-spam-status header:
1. Ignore
2. Trust positives
3. Trust negatives (can trust both pos and neg)
4. Give Weight to x-spam-status (somehow combine x-spam-status result with
bayesian score)
If the user choses to trust both positives and negatives, then we don't need to
run the bayesian filter.
Reporter | ||
Comment 5•21 years ago
|
||
I think an option to "use X-Spam" would be good to. Rather than use Mozilla's
bayes filtering. Honor the spam filter on the server. Use Mozilla's UI and
spam handling with the server's decision. Would cut down on CPU for those users.
I like "give weight"
Would be nice if we can have a checkbox to "feed" the bayes filter and train
with the results from X-Spam.
Since SpamAssassin as well as other products are relatively accurate, without
user interaction, Bayes could be trained quite quickly in those situations, as
noted here (among other places):
http://www.eweek.com/article2/0,4149,1366242,00.asp
There's a ton of potential. That little tag can really do a lot to enhance Mozilla.
Comment 6•21 years ago
|
||
There is one problem: There are different headers with different products.
I use spampal (Mail-Proxy freeware for win32) and it adds different headers.
Reporter | ||
Comment 7•21 years ago
|
||
*Detection Checklist*
SpamAssassin:
Ham X-Spam-Status: No,
Spam X-Spam-Status: Yes,
Spam X-Spam-Flag: YES
(x-Spam-Status can have data after Yes, No)
I believe X-Spam-Flag was added in later versions.
SpamPal:
Ham X-SpamPal: PASS
Spam X-SpamPal: SPAM
(spam can have data after the word spam)
SpamCatcher:
Ham X-SpamCatcher-Flag: No
Spam X-SpamCatcher-Flag: Yes
Assignee | ||
Comment 8•21 years ago
|
||
It occurs to me that we need this to be extensible, and that we need pattern
matching of some sort. So, I'm thinking I should add a filter action to set the
junk score. Then, users can write their own filters to handle some of the
server-side spam products. Then, integrating with these products becomes a
matter of defining some filters. We could do like the MDN code does and define
the filters internally on the fly, invisible to the user. So I think I'll add a
filter action that sets a junk score.
Reporter | ||
Comment 9•21 years ago
|
||
I agree, there should be a way through filters.
But I'm thinking in Junk Mail Controls, there should be a new tab, with
checkboxes for:
[ ] Enable Support for External Mail Filters (Select..)
[ ] Enable Habeas Support
[ ] More soon...
and a mention they can define their own filters to customize this further.
In the first one, have a button, that brings a popup asking "Spam Assassin",
"SpamPal", "SpamCatcher". Able to check multiples.
Turn them on by default (if the user doesn't have spamassassin, it just will
never fire, no real harm done). If it does, it automatically kicks in. The
other option is to have Thunderbird detect the first instance.
By having the filter, and the tab in junk mail controls, the user can not only
define their own rules with a filter (power user), but the basics are within
easy reach for the general user.
As more products emerge, we could easily add UI options for a few of the most
popular options. I think the 3 mentioned are the most popular right now.
Only doing filter rules, would make this feature beyond the casual user, who may
want filtering, but is not geeky enough to make a filter. Besides. I think the
above 3 will apply to most of those who want the feature anyway, so it would
work out of the box for most people, and the rest can adapt it to their needs.
As a sidenote bug 11040 is indirectly related to this bug. I wouldn't say
blocking, but definate influcence.
Assignee | ||
Comment 10•21 years ago
|
||
sorry if I wasn't clear - that's what I meant by "Then, integrating with these
products becomes a matter of defining some filters. We could do like the MDN
code does and define the filters internally on the fly, invisible to the user."
So the implementation of that UI would internally be some filters.
One issue I need to deal with is to propagate the junk status set by filters to
the imap server so that if the message gets moved to another folder, the junk
status is also moved.
Reporter | ||
Comment 11•21 years ago
|
||
David: Sounds good to me.
Changing summary of this bug a little, since it's more more than X-Spam Headers
now.
Summary: Bayes filtering should be aware of X-Spam Headers → Bayes filtering should learn through use of external/serverside filters
Assignee | ||
Comment 12•21 years ago
|
||
One drawback of this filter approach, as opposed to putting some code in the
code that parses mail headers, is that filters only run on new mail downloaded
in the inbox. If there are server-side filters that classify messages *and* move
them to other imap folders on the server, the client-side spam header detection
filters won't detect them. Not sure if this is an important issue...if it turns
out to be, we could run the internal filters on folders other than the inbox, I
guess. The advantage of using filters is that they're extensible.
Assignee | ||
Comment 13•21 years ago
|
||
this patch makes it so the user can set a message as junk or not junk through a
mail filter.
I think I'm going to make this three separate bugs.
1. this one - UI and backend for filters to set junk score.
2. adding hidden custom filters for various well-known server-side plugins.
3. Adding ability to train bayesian filter on server-data
Assignee | ||
Updated•21 years ago
|
Attachment #144072 -
Flags: superreview?(mscott)
Reporter | ||
Comment 14•21 years ago
|
||
(In reply to comment #13)
> 2. adding hidden custom filters for various well-known server-side plugins.
>
I think I might take that one when I get a few free cycles.
Assignee | ||
Comment 15•21 years ago
|
||
Robert, I'll get you started by doing one of them, when I get a few cycles :-)
Reporter | ||
Comment 16•21 years ago
|
||
(In reply to comment #15)
> Robert, I'll get you started by doing one of them, when I get a few cycles :-)
If your doing these in seaparate bugs, CC me on them, so I can keep track. Thanks.
Comment 17•21 years ago
|
||
Comment on attachment 144072 [details] [diff] [review]
support for filters setting junk score
awesome!
Attachment #144072 -
Flags: superreview?(mscott) → superreview+
Comment 18•21 years ago
|
||
I think Habeas should write their own as an XPI, personally. Lazy so-and-so's ;-)
Gerv
Assignee | ||
Comment 19•21 years ago
|
||
Habeas headers - they suggest filtering on #3...
http://www.habeas.com/configurationPages/headers.htm
I'm thinking the way this will work is that we'll add the ability to load this
kind of spam filter from disk, so we'll store the individual filters on disk.
That way dropping in new kinds of filters won't involve changing the code so much.
Assignee | ||
Comment 20•21 years ago
|
||
turns out custom headers were somewhat broken, in terms of what the UI allowed
you to set.
Assignee | ||
Updated•21 years ago
|
Attachment #144171 -
Flags: superreview?(mscott)
Updated•21 years ago
|
Attachment #144171 -
Flags: superreview?(mscott) → superreview+
Assignee | ||
Comment 21•21 years ago
|
||
Assignee | ||
Comment 22•21 years ago
|
||
Assignee | ||
Comment 23•21 years ago
|
||
Assignee | ||
Comment 24•21 years ago
|
||
Assignee | ||
Comment 25•21 years ago
|
||
I'm thinking the way this might work is we add some attributes to
nsISpamSettings for handling server-side spam filters:
1. ServerSpamFilterName
2. ServerSpamAction - trust yes, trust no, trust both
Then, when we're starting up a server, if the spam filter name is set, we load
the correspondingly named filter file, and enable the Yes and/or No filters,
according to what the user has specified.
As far as the UI for picking the server side spam filter to incorporate is
concerned, I imagine it'll just be a drop down where you can pick from the list
of server-side spam filters we know about (maybe with a default choice of None,
or a checkbox to turn off this behaviour). It would be cool to populate this
list from the .dat files on disk, so that dropping in a new one adds it
automatically to the list, but we might not get there...
Assignee | ||
Comment 26•21 years ago
|
||
Comment on attachment 144072 [details] [diff] [review]
support for filters setting junk score
this would involve an exception for the localization freeze (it adds a few
strings) but we'd really like to get this into tbird .6 and Moz 1.7 - the fix
is fairly safe, and allows you to make filters set a junk score.
Attachment #144072 -
Flags: approval1.7?
Assignee | ||
Comment 27•21 years ago
|
||
Comment on attachment 144171 [details] [diff] [review]
fix for custom headers
this is needed because the custom headers stuff was always slightly broken...
Attachment #144171 -
Flags: approval1.7?
Comment 28•21 years ago
|
||
Comment on attachment 144171 [details] [diff] [review]
fix for custom headers
a=chofmann for 1.7
Attachment #144171 -
Flags: approval1.7? → approval1.7+
Assignee | ||
Comment 29•21 years ago
|
||
Reporter | ||
Comment 30•21 years ago
|
||
David:
If we are learning from positive marks from external spam filters, isn't it
necessary to learn from negatives as well? Otherwise we are essentially
tainting the built in bayesian filters with one sided results.
Just thinking outloud really.
Assignee | ||
Comment 31•21 years ago
|
||
Not sure what you mean - I've added settings to trust both positive and negative
results in my patch, and in the filters (except for Habeas). But I haven't done
anything about actually feeding the data into the spam filter to train it...I'm
probably going to leave that to you or someone else.
Reporter | ||
Comment 32•21 years ago
|
||
Hmm.. I retract my last comment.
I apparantly have some networking issues, when I was looking at the filter for
spamAssassin I saw:
>name="SpamAssasinYes"
>enabled="yes"
>type="1"
>action="JunkScore"
>actionValue="100"
>condition="OR (\"X-Spam-Status\",begins with,Yes) OR (\"x-Spam-Flag\",begins
with,YES)"
and that was it... hence my question.
But now I see the rest. I've also been double posting on at least one forum,
and having connections time out. So I think I have some networking problem here
at the minute, though my MRTG graph barely shows a change in ping time.
Anyway. Disregard my last comment.
Assignee | ||
Comment 33•21 years ago
|
||
This handles automatically creating hidden filters for a given server-side
filter, if the per-server pref serverFilterName and serverFilterTrustFlags are
set appropriately.
Assignee | ||
Updated•21 years ago
|
Attachment #144590 -
Flags: superreview?(mscott)
Assignee | ||
Comment 34•21 years ago
|
||
diff for filter description files (includes a typo fix in SpamAssassin.sfd)
Attachment #144230 -
Attachment is obsolete: true
Attachment #144231 -
Attachment is obsolete: true
Attachment #144232 -
Attachment is obsolete: true
Attachment #144233 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #144591 -
Flags: superreview?(mscott)
Assignee | ||
Comment 35•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #144592 -
Flags: superreview?(mscott)
Updated•21 years ago
|
Attachment #144592 -
Flags: superreview?(mscott) → superreview+
Updated•21 years ago
|
Attachment #144591 -
Flags: superreview?(mscott) → superreview+
Comment 36•21 years ago
|
||
Comment on attachment 144590 [details] [diff] [review]
backend support for automatic server spam filter filters
looks great.
Attachment #144590 -
Flags: superreview?(mscott) → superreview+
Comment 37•21 years ago
|
||
Comment on attachment 144072 [details] [diff] [review]
support for filters setting junk score
a=asa (on behalf of drivers) for checkin to 1.7
Attachment #144072 -
Flags: approval1.7? → approval1.7+
Assignee | ||
Comment 38•21 years ago
|
||
front and backend support for filters setting junk score checked in.
Comment 39•21 years ago
|
||
Bug 181631 was already about having Mark as Junk/Not Junk in the message filter
actions; I've marked it Fixed with an xref to this bug.
I've opened bug 238816 about adding those enhancements for custom-header
matching to MailViews and Search.
Comment 40•21 years ago
|
||
I think there are some small issues with the strings that were checked in:
> +<!ENTITY setJunkScore.label "Set Junk Status">
The other filter actions that use a combobox all end with a colon. Also I think
this filter action should end with a "to" to be consistend with "Change message
priority to:".
I'm not sure whether "Junk Status" should be upper case or not.
> +<!ENTITY notJunk.label "NotJunk">
There should be a blank between Not and Junk.
Comment 41•21 years ago
|
||
I agree with Stefan about the language changes he suggested. This patch does
just that.
1) It adds a space beteen Not and Junk
2) It adds a colon to the phrase: Set Junk Status to to be consisent with
setting the priority
3) I also moved the junk status action in the dialog so it was grouped with the
rest of the combo box driven actions such as setting priority, label the
message, etc. Don't let the wierd way cvs diff generated the patch for that
change fool you. It was just moving a few lines of xul higher up in the file.
Still have one remaining problem...whenever we read the filter in from disk,
this action always resets to Not Junk even if you had it set to Junk.
Comment 42•21 years ago
|
||
(In reply to comment #41)
> Created an attachment (id=145191)
>
> 1) It adds a space beteen Not and Junk
This is not included in the patch. :-(
Comment 43•21 years ago
|
||
actually it is. But cvs diff -uw ignores white space and it views that change as
white space so it didn't show up. Weird
:)
Updated•21 years ago
|
Attachment #145191 -
Flags: superreview?(bienvenu)
Attachment #145191 -
Flags: review?(Stefan.Borggraefe)
Updated•21 years ago
|
Attachment #145191 -
Flags: review?(Stefan.Borggraefe) → review+
Assignee | ||
Updated•21 years ago
|
Attachment #145191 -
Flags: superreview?(bienvenu) → superreview+
Assignee | ||
Comment 44•21 years ago
|
||
I'm not able to reproduce the filter returning to non-junk problem, even with a
fresh tree from CVS. Maybe it's a release build only issue...
Comment 45•21 years ago
|
||
Comment on attachment 145191 [details] [diff] [review]
following up on Stefan's suggestions to the filter UI
asking for 1.7 status for this polish
Attachment #145191 -
Flags: approval1.7?
Comment 46•21 years ago
|
||
Comment on attachment 145191 [details] [diff] [review]
following up on Stefan's suggestions to the filter UI
a=chofmann for 1.7
Attachment #145191 -
Flags: approval1.7? → approval1.7+
Comment 47•21 years ago
|
||
Comment on attachment 145191 [details] [diff] [review]
following up on Stefan's suggestions to the filter UI
this patch has been checked in for 1.7 final
Comment 48•21 years ago
|
||
(In reply to comment #41)
> Still have one remaining problem...whenever we read the filter in from disk,
> this action always resets to Not Junk even if you had it set to Junk.
I see this too. The actionValue contains a random number instead 0 or 100 when
the FilterListDialog is opened for the first time after mozilla is started. When
you just open the FilterListDialog and close it immediatly without opening the
FilterEditor this value is written to msgFilterRules.dat.
Also in FilterEditor.js sometimes gJunkScoreCheckbox and sometimes
gChangeJunkScoreCheckbox is used for something that looks like it should be just
one variable instead. But this is unrelated to the random number problem.
Comment 49•21 years ago
|
||
this patch fixes one of the issues with this bug fix:
"whenever we read the filter in from disk,
this action always resets to Not Junk even if you had it set to Junk."
We were never reading in the junk mail action value when reading the filter
from disk. Hence, the action value was garbage, causing it to sometimes get set
to mark as junk and sometimes as not junk
However there is still another really nasty issue out there. Any filter that
fires has the random potential of marking mail as junk. Even if the filter does
not have the junk status action checked. See Bug #239349 for information about
that issue.
Comment 50•21 years ago
|
||
Comment on attachment 145363 [details] [diff] [review]
fixes a bug where the junk action value never gets initialized
david see my comment above that explains this patch.
Attachment #145363 -
Flags: superreview?(bienvenu)
Comment 51•21 years ago
|
||
Comment on attachment 145363 [details] [diff] [review]
fixes a bug where the junk action value never gets initialized
uninitialized variables leading to random behavior == good candidate for 1.7
final :)
Attachment #145363 -
Flags: approval1.7?
Comment 52•21 years ago
|
||
I just found the cause of Bug #239349 which caused messages to get randomly
marked as junk or not junk if you had a filter rule that set a label action.
That fix should also go into 1.7
Comment 53•21 years ago
|
||
Comment on attachment 145363 [details] [diff] [review]
fixes a bug where the junk action value never gets initialized
a=chofmann for 1.7
Attachment #145363 -
Flags: approval1.7? → approval1.7+
Assignee | ||
Comment 54•21 years ago
|
||
Comment on attachment 145363 [details] [diff] [review]
fixes a bug where the junk action value never gets initialized
I swear I wrote that code...
Attachment #145363 -
Flags: superreview?(bienvenu) → superreview+
Comment 55•21 years ago
|
||
Comment on attachment 145363 [details] [diff] [review]
fixes a bug where the junk action value never gets initialized
this patch has been checked in for 1.7
Assignee | ||
Comment 56•21 years ago
|
||
backend support is checked in. I still need to write some front end code to
allow the user to set this up (though for now you can just set a hidden pref on
the server, serverFilterName, to the appropriate server side filter name
(Habeas, SpamAssassin, SpamCatcher, or SpamPal).
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 57•21 years ago
|
||
this busted balsa tinderbox (gcc3.4):
/builds/tinderbox/SeaMonkey-gcc3.4/Linux_2.4.7-10_Depend/mozilla/mailnews/base/src/nsSpamSettings.cpp:458:
error: extra `;'
Reporter | ||
Comment 58•21 years ago
|
||
Reporter | ||
Updated•21 years ago
|
Attachment #146072 -
Flags: review?(bienvenu)
Assignee | ||
Comment 59•21 years ago
|
||
Comment on attachment 146072 [details] [diff] [review]
Fix Bustage
I actually already checked in the same fix
Attachment #146072 -
Attachment is obsolete: true
Attachment #146072 -
Flags: review?(bienvenu)
Comment 60•21 years ago
|
||
What about forget headers? Is this code immune to that? E.g. takes only the
later added headers? The spammers could insert their own headers saying
spam-status: 0.
And how are new spam filters added to this? My server has some new stuff, it
inserts a score into the header and even the cause for this score - what was
suspicious in the mail. Something like this:
X-Spam-Status: No, hits=0.1 required=5.0
X-Spam-Level: HTML_MAIL, NO_SENDER
Comment 61•21 years ago
|
||
(In reply to comment #60)
> What about forget headers? Is this code immune to that? E.g. takes only the
> later added headers? The spammers could insert their own headers saying
> spam-status: 0.
Good point.
Some of my mails gets filtered two or more times and get different X-Spam
headers if not all marks it as spam then it might be a problem.
Comment 62•21 years ago
|
||
I meant forged, sorry for the typo.
Assignee | ||
Comment 63•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #147413 -
Flags: superreview?(mscott)
Updated•21 years ago
|
Attachment #147413 -
Flags: superreview?(mscott) → superreview+
Assignee | ||
Comment 64•21 years ago
|
||
Comment on attachment 147413 [details] [diff] [review]
fix for pop3 filter junk score action
very safe fix, only affecting setting junk score with pop3 filters...
Attachment #147413 -
Flags: approval1.7?
Comment 65•21 years ago
|
||
Comment on attachment 147413 [details] [diff] [review]
fix for pop3 filter junk score action
a=asa (on behalf of drivers) for checkin to 1.7
Attachment #147413 -
Flags: approval1.7? → approval1.7+
Comment 66•20 years ago
|
||
*** Bug 243049 has been marked as a duplicate of this bug. ***
Comment 67•20 years ago
|
||
My mail server sends x-junkmail-status headers, an example value is
"score=150/50, host=mx01.versatel.de". There also is a header X-Junkmail-Whitelist.
Updated•20 years ago
|
Product: MailNews → Core
Comment 68•20 years ago
|
||
I noticed that the sfd files were added to packages-os2 (and others), but I
build mailnews, these files never get exported into my dist.
It appears that the Makefile never gets hit?
Comment 69•18 years ago
|
||
(In reply to comment #68)
> I noticed that the sfd files were added to packages-os2 (and others), but I
> build mailnews, these files never get exported into my dist.
>
> It appears that the Makefile never gets hit?
Makefile.in (in mailnews/base/search/src/) has only added SpamAssassin and SpamPal, leaving out the others two, even if the packager scripts try to install them (see attachment 144592 [details] [diff] [review]). Legal issues or simply forgot to add them?
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•