Closed
Bug 361144
Opened 18 years ago
Closed 17 years ago
Junk filter only 10% effective and not improving
Categories
(MailNews Core :: Filters, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: ireneshusband, Unassigned)
Details
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5
I have been using SeaMonkey as my mail client for about 4 weeks now, but, despite getting several dozen junk mails a day and diligently marking them as such, the junk filter still only catches a tiny fraction of them - around 10% or so.
I haven't done any detailed statistical breakdown, but the junk mail I get is (as it was 4 weeks ago) mainly in the following categories:
1. enhancements to the functioning of male genitalia with misspelled words
2. Canadian prescriptions
3. share price tips (especially ones with an Israeli connection)
4. completely incomprehensible with a gif attached
5. Nigerian "Spanish prisoner" scams
6. a few odd mails in Portuguese and Turkish about something or other through a particular mailing list.
I haven't done a statistical analysis of the mails that do get properly filtered.
The filter has done much better at learning from false positives. I only ever got a few, but they have become much more rare.
Reproducible: Always
Steps to Reproduce:
1. Collect mail
2. Mark unmarked spam and unmark false positives (if any)
3.Keep doing this over several weeks
Actual Results:
Filter never learns to catch more than around 10% (if that) of junk mail.
Expected Results:
The spam filter should learn to catch a much larger proportion of the spam to be useful. 50% would be a start.
I know there are immense technical and even metaphysical obstacles to be overcome in trying to get machines to outsmart malicious humans. I don't want to ask the impossible. I am simply reporting the fact that a feature of SeaMonkey doesn't work very well.
If the answer is simply that if I persevere for another month or two I should see significant improvement, then please consider this a documentation bug. If it takes several months for spam filtering to take full effect then the user should be informed of this explicitly the first time the junk filter is run.
Updated•18 years ago
|
Assignee: mail → nobody
Component: MailNews: Main Mail Window → MailNews: Filters
Product: Mozilla Application Suite → Core
QA Contact: filters
Version: unspecified → 1.8 Branch
Comment 1•18 years ago
|
||
How many mails have you received so far to train with? Normally the junk filter learns quite fast...
You could also install the extension Mnenhy (http://mnenhy.mozdev.org/) and then look at the Junk Filter statistics under Tools → Junk Filter Statistics. There you can see how much the filter has already "learnt".
Reporter | ||
Comment 2•18 years ago
|
||
"The junk mail filter has been trained by 1309 messages, whereof 35 (3%) have been rated as solicited and 1274 (97%) as junk. This resulted in a total of 50003 tokens being read, 10452 (20%) being rated as good and 39551 (80%) as evil; the number of different tokens is 44965.
The following table will show the 1 most common tokens, hiding 44964 tokens below the threshold of 1177 appearances."
And that token is "mime-version:1.0", with scores good:24, evil:1237 and junk probability:58.61%.
Reporter | ||
Comment 3•18 years ago
|
||
Since I installed mnheny things seem to have improved somewhat. I haven't been keeping meticulous statistics, but it's starting to look like somewhere between 20% and 40% success at identifying spam with no false positives. I don't know if mnheny could have had anything to do with it or whether this is just a coincidence.
Comment 4•18 years ago
|
||
My junk filter was working pretty good using Mozilla. When I installed SeaMonkey, all the previous junk mail training was lost. Now, with SeaMonkey, in spite of repeated training over many weeks it catches very very little junk mail (1I doubt it is even 10%). It can't even catch exact repeats -- I mark an email as junk, and then run junk control on the folder, and it fails to mark a duplicate email as junk. I have confirmed that mail marked as junk is staying marked and is going into the trash.
Clearly, junk mail control has changed in SeaMonkey -- for the worse.
Comment 5•18 years ago
|
||
In desperation I deleted the old training.dat file, which had been used effectively with Mozilla. After a few junk emails the junk filter is now working quite well. So, if anyone upgraded to Seamonkey and the junk filter seems anemic, try starting from scratch.
Reporter | ||
Comment 6•18 years ago
|
||
I've just trashed my training.dat and now the filter is getting every piece of spam and only needs training not to catch non-spam.
As far as I remember my old training.dat was created by SeaMonkey and not be a Thunderbird or a previous incarnation of Mozilla Suite.
I suspect from this that this particular problem is just one of those things that happens occasionally. I imagine that it would be harder to change the algorithm than to have an option in seamonkey preferences to trash the old spam training file (followed by a sternly worded confirmation dialogue of course), along with a brief note about why trashing this file is occasionally a good idea.
Reporter | ||
Comment 7•18 years ago
|
||
I take back some of my last comment. The junk filter is back to being not all that effective. The new training.dat is only a couple of days old, which means it is too early to say that it won't work well in the future. I'll keep you posted.
Reporter | ||
Comment 8•18 years ago
|
||
Since I my last post the spam filter has been reasonably effective. I would say it is at least 80% effective, although I haven't kept an exact tally.
My suggestion therefore, as I said before, is to include the option in Junk Mail Controls to delete a poorly performing training file and start again.
Comment 9•18 years ago
|
||
this is now working for you?
Reporter | ||
Comment 10•18 years ago
|
||
For some reason I haven't been getting a huge lot of spam in recent months so it's hard to come up with a meaningful statistic. All I can say is that I haven't been annoyed by the amount of spam that's been getting through recently. That said, I still think this bug should not be marked as fixed until the option of deleting a poorly performing training.dat is included with the spam filter preferences.
Comment 11•18 years ago
|
||
that option exists now (reset training data)
Comment 12•17 years ago
|
||
Admittedly my training.dat is quite old but quite effective from what I can tell. Can we mark this as works for me and move on?
Comment 13•17 years ago
|
||
WFM based on last 3 comments.
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•