Closed Bug 245168 Opened 21 years ago Closed 16 years ago

Junk mail controls in 1.8 builds allow more junk mail than in 1.7

Categories

(MailNews Core :: Filters, defect)

1.8 Branch
defect
Not set
major

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: astrojny, Unassigned)

References

Details

(Whiteboard: needs testcase)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a1) Gecko/20040520 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a1) Gecko/20040520 When I had Mozilla 1.7rc2 of the 100+ emails I received the junk mail control would get rid of abourt 95%. Now with Mozilla 1.8a1 the junk mail controls only remove about 60%. Something in the junk mail filter appears to have changed to allow much more junk mail to get through. Reproducible: Always Steps to Reproduce: 1. Junk mail control activated 2. Get mail 3. Much junk mail that was formerly filtered by junk mail controls now gets through Actual Results: Get much more junk mail, i.,e., not labeled as junk than I did using Mozilla 1.7rc2 Expected Results: Junk mail controls should be at least as effective as in Mozilla 1.7rc2.
Reporter, the Bayes algorithm has been improved (see bug 181534). This might require that you retrain Mozilla a bit (mark more mails as junk, or a non-junk). Note that the Junk Mail Algorithm is not automatic, it requires help from the user !
*** This bug has been marked as a duplicate of 243680 ***
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
I have done considerable retraining using junk mail filter, but it appears to have had little effect.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Note I have spent considerable time retraining and reidentifying junk mail. To date, it appears to have had little effect.
(In reply to comment #1) > Reporter, the Bayes algorithm has been improved (see bug 181534). This might > require that you retrain Mozilla a bit (mark more mails as junk, or a non- junk). > Note that the Junk Mail Algorithm is not automatic, it requires help from the user ! Note I have done considerbable retraining and reidentifying of junk mail to little noticeable effect.
The only way to prove this symptom is to set up two side-by-side installs of 1.7 and 1.8, give each the same training.dat file, set each up to access the same account and not delete mails from server, and allow both installs to download the same set of mail. See bug 181534, bug 224318, bug 230093, bug 231873 (all of which *did* change the Bayes algorithm, but should be in 1.7RCx as well as 1.8)
Oh -- and see bug 245176, which has a new patch; check whether today's nightly build works better.
INstalled Mozilla 1.9\8a2, the June 1, 2004 nightly build and now less than 50% of my junk mail is filtered out. :=(
INstalled MOZILLA 1.8a2 June 3rd nightyly build and junk mail filtering still is not as good as in 1.6 release. Don't know what the problem is, but one of great thing about Mozilla was its great email junk mail filtering. Can't say that anymore. At least so far for 1.8a2 version.
FWIW, after upgrading my installed nightly build from 0522 to 0602, I'm noticing more false positives -- in particular, Bugzilla mails are getting flagged as junk where that never happened before. The two times I've seen this, there were ten or more bugmails in the queue and a few of them (two or four) were junked.
I have downloaded the June 7th build of Mozilla 1.8a2. I uninstalled the previous uninstall including the installation folder before installing the June 7th build. I had not downloaded email all weekend and had over 450 mails 99% of which were junk. The June 7th buiild managed to determine 50 were junk. This is after much training using the prior builds. What has happened since version 1.6 where I had a similar experience and all but 29 eamils were determined to be junk.
Build: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a1) Gecko/20040520 I have been looking for a bug with some symptons similar to me in which to comment. I have "upgraded" my version of Mozilla from 1.7 RC2 to 1.8 Alpha 1 (2004052009) and have experienced that the Junk Mail classification barely works at all now. Here is what I did: 1. Backed up Mozilla (thank ye gods) 2. Uninstalled Mozilla 1.7 RC2 3. Installed Mozilaa 1.8 Alpha 1 (2004052009), Junk Mail Controls Checklist: 1. Junk mail controls ENABLED 2. Do not mark messages as junk mail if the sender is in my address book: (Personal Address Book) ENABLED 3. Move incoming messages determined to be junk mail to: "Junk" folder on (address) SELECTED 4. Automatically delete junk messages older than 1 days from this folder ENABLED 5. All other options disabled Mozilla 1.8A1 picked up my old profile and everything looked ok. However the Junk mail classification stopped classifying new Junk mail as junk. Before I upgraded Junk mail was being classified with a 95% accuracy with only 3 or 4 junk mails getting through in a day. Now the accuracy is about 1-5% with virtually all my junk mail getting through. There are however NO false positives (no non-junk mail is being classified as junk). It should be noted that I clasify my junk mail, using the "Unread" view for performance reasons. When I try to manually train against the mail that passed the junk filter, the filter runs very noticibly faster than it did before in 1.7 RC2 and no junk mail is classified. When I classified and re-classified the same junk mail 20 times in a row, i managed to get 2 of the junk mails classfied as junk and moved to the junk folder. There were 16 junk mails at that time i did that spot of training. From my perspective it looks like the junk mail filtering isn't running at all, however with the very rare event of a junk mail being classified it does appear to be running. Its that super fast now! My training.dat file certainly exists and has grown to 2.84 MB (2,984,960 bytes) in siae and I have been training with it since junk filtering was first provided in Mozilla way back when.
June 8th build doesn't improve junk mail filter performance. :=(
I have given up on Mozilla 1.8a2, I downloaded Mozilla 1.7rc3 and junk mail seems to be back to its old self and works well. Don't know what happened to 1.8a2 but 1.7 is fine or so it seem so far.
(In reply to comment #15) Yes I saw the news on a new 1.7 RC, and am downloading it right now to install. Sorry 1.8a1, I tried to love you but just couldn't, I love my Junk mail filtering more. *sniff*
I thought that I should probably say something useful instead. I have installed 1.7rc3 and with training on 12 SPAM messages, another 22 remaining SPAM messages got classified as junk and moved to the junk folder. Old school junk mail filtering is back and working nicely. If I had a choice I would say, stick with the 1.7 stream of junk filtering. Personally I don't want to throw away all my training I have accumulated, but its the masses that need to be satisfied I suppose.
(In reply to comment #17) > I thought that I should probably say something useful instead. I have installed > 1.7rc3 and with training on 12 SPAM messages, another 22 remaining SPAM messages > got classified as junk and moved to the junk folder. > > Old school junk mail filtering is back and working nicely. If I had a choice I > would say, stick with the 1.7 stream of junk filtering. Personally I don't want > to throw away all my training I have accumulated, but its the masses that need > to be satisfied I suppose. For me the problem with 1.8a2 junk filter was not the training, I spent over a week training it,. The problem was the training didn't seem to do any good.
confirmed here with Mozilla 1.8a1 the junk mail filters don't appear to be working at all. tools>run junk mail filter takes no action I get on avg 300 mails a day about 170 of them are junk. Earlier Moz used to get 90% of it. Now I'm looking at my LOTS of junk. After a week of training nothing has changed.
Tried Mozilla 1.8a2 June 21st build. Junk mail controls still do not seem effective.
Just tried Mozilla build for July 10th. Still does not filter junk mail as well as Mozilla 1.7.1. I access to 2 account through Mozilla. After a time the junk mail move funcition ceases to function on the 2nd account, i.e., not the default account. The mail gets marked as junk but does not automatically get moved to the junk mail account, even though the junk mail control settings are set to automatically move junk mail to the junk mail folder. Deleting the account and recreating allows the junk mail move function to work again, but after a few days the same problem recurs.
Tried Mozilla 1.8 alpha 2. The junk mail filter STILL does not appear to be working very well. I tried Mozilla 1.8 alpha 2 after using Mozilla 1.7. Ver 1.7 does a very nice job of filtering, generally filtering around 90% of my mail, i.e., if I get 190 emails all but 10 are labeled junk and moved to the junk mail file. A check of that file indicates mail labeled junk was indeed junk). With Mozilla 1.8 alpha 2 the results were almost the opposite. Of 150 emails recieved all but 90 were labeled junk. Obviously quite a difference. Most of the 90 upon inspection should have been labeled junk. After about 4 days I uninstalled Mozilla 1.8 alpha 2 and went to Mozilla 1.7.1., because Mozilla 1.8 alpha 2 did not appear to be learning and I spend considerable time trying to teach it.
Andy Strojny: Until such point as someone posts a patch, it is pointless for you to continue reporting that things are still not working for you. Bug 230093 / bug 181534 comment 72 would appear to be the major junk-related change since 1.7. I again point to bug 245439, which may or may not be a dupe of this one -- or, it might be a dupe of the problems about failing to update training.dat (bug 243680, bug 245499). A couple more dupes to this bug forthcoming shortly.
Summary: Junk mail controls in 1,8a1 allow more junk mail than in 1.7rc2 → Junk mail controls in 1.8 builds allow more junk mail than in 1.7
*** Bug 256366 has been marked as a duplicate of this bug. ***
*** Bug 256219 has been marked as a duplicate of this bug. ***
Sorry for confusion thought needed to report a new bug for junk mail problem in new version of Mozilla, 1.8alpha3. Same problem that was in other 1.8alpha builds I've tried.. Here is my experience - When I used Moziilla 1.7.2 of 200+ emails downloaded over 176 identified as junk correctly. With Mozilla 1.8alpha3 of 221 emials downloaded only some 80 identified as junk. Over 125 not identifed as junk., which in fact were. This was basically the same mail dump, as I examined the email using both browsers, first 1.8alpha2 and then uninstalling it and installing 1.7.2. Mozilla identifed much more junk mail. Clearly Mozilla 1.7.2's junk mail filter is superior. I'm back to using Mozilla 1.7.2. I'm afraid to try Thunderbird as I understand it uses the same junk mail filter as Moziilla 1.8alphas
(In reply to comment #23) > Andy Strojny: Until such point as someone posts a patch, it is pointless for you > to continue reporting that things are still not working for you. > > Bug 230093 / bug 181534 comment 72 would appear to be the major junk-related > change since 1.7. > > I again point to bug 245439, which may or may not be a dupe of this one -- or, > it might be a dupe of the problems about failing to update training.dat > (bug 243680, bug 245499). > > A couple more dupes to this bug forthcoming shortly. Again sorry for the confusion. Thought you need new bug if same problem in a new Mozilla build. How does one determine if a patch has been posted for a reported bug?
The release notes for Mozilla 1.8 alpha 4 does not indicate anything was done concerning the junk mail filter. Has anything happened???
I see this also on 1.8a4 Solaris Sparc build. As far as I remember, it regressed in 1.8a1, was fixed in 1.8a3 (don't remember the bug number), and regressed again in 1.8a4. Junk detection is more or less random now.
Product: MailNews → Core
Flags: blocking1.8a6?
Flags: blocking1.8a6? → blocking1.8a6-
More slipping through than ever now. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4 ID:2006050805
It is starting to catch a few junk emails again, but not like it used to (on branch: change Version to All or Other). Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1a3) Gecko/20060629 Thunderbird/2.0a1 ID:2006062906
*** Bug 283021 has been marked as a duplicate of this bug. ***
I have seen poor junk filtering in 1.8x builds too, as reported in bug #283021. A recent Seamonkey 1.5 nightly filtered about 1/3 of the junk, while Mozilla 1.7x typically filters 2/3 to 3/4 of my junk mail using the same profile and training.dat, on Win98 and now on Linux. I found the problem in the oldest 1.8a builds I could find, dating back to about when this report was written. I suggested possible reasons in my many reports, but if the reason isn't easy to see, replacing the 1.8 filtering code with 1.7x code would be the simplest fix.
2.0 basically does not filter junk any longer. This should block a release of 2.0 completely.
As commented in bug #283021, the simplest fix might be to cut and paste---cut out the "new and degraded" junk-filtering code in 1.8/1.9 and paste in the code that works from Mozilla 1.7.13.
I pretty much solved the problem my letting my service provider, COX, identify spam and I created a filter to place it in a submail box. I then identify the mail in it as junk, over 99% of it is junk, and slowly Thunderbird learns. So the COX junk filter has solved my problem.
Andy, leaving it to Cox is not a fix for the problem with Mozilla software spam-filtering. As long as there's code taking up space in Seamonkey and Thunderbird that purports to filter spam, it ought to do it well. If not, it's hardly better than no filter at all. I'd rather receive and filter my own spam than let my ISP or another firm do it, because it's a known fact that filtering upstream from your inbox may not be honest. AOL has been caught filtering messages critical of its company policies, for example, though I'd have to search for the details.
I don't disagree. But COX gives me the option of having mail marked by it as spam but still be delivered. I sent up a filter to dump email labeled by COX into a separate COX SPAM mailbox. It is dumped there and I go through it periodically and either label as spam through Thunderbird or move it into my inbox. Agree this is not an ideal situation, but the Thunderbird spam filter just was not doing it for me. Hopefully it will be fixed in version 2.
what's needed, *from one of you who is seeing these problems*, is a testcase from 1.5, 2.0 or trunk, and the relevant files (training.dat, sample emails, etc). also see comment 6 Worcester12345 in comment #34 > 2.0 basically does not filter junk any longer. This should block a release of > 2.0 completely. If junk is totally not working then it deserves a new bug or escalating one of the several existing bugs where junk does not work. This is about junk not working well (but it is working). FWIW I see no such issue running trunk and 2.0 - but I do have spamassassin as a first line of defense.
OS: Windows XP → All
Hardware: PC → All
Whiteboard: needs testcase
Version: Trunk → 1.8 Branch
I just had my first junk email message in the past week or two blocked. I continue marking, but it just doesn't work. Nightly 2.0 builds.
Flags: blocking-thunderbird2?
Flags: blocking-thunderbird2? → blocking-thunderbird2-
My ISP changed my domain name and I'm getting almost no junk to test on. In another report on this problem, I suggested that maybe the tail end of the training.dat file isn't getting read to process junk. The end of the file is where the "blacklist" is located and probably where new junk info is appended. If only the front part of the file is being used, that means only the whitelist and the old blacklist data are being used. Old blacklist data may not be a good match for current junk mail. If that's not the problem, then there's a defect in the revised algorithms for processing junk mail. The differences between the new and old code need to be examined for flaws.
(In reply to comment #39) > what's needed, *from one of you who is seeing these problems*, is a testcase > from 1.5, 2.0 or trunk, and the relevant files (training.dat, sample emails, > etc). also see comment 6 > > > Worcester12345 in comment #34 > > 2.0 basically does not filter junk any longer. This should block a release of > > 2.0 completely. > > If junk is totally not working then it deserves a new bug or escalating one of > the several existing bugs where junk does not work. This is about junk not > working well (but it is working). FWIW I see no such issue running trunk and > 2.0 - but I do have spamassassin as a first line of defense. > I also use SpamAssassin, and have the settings set up to honor the warning from SpamAssassin, yet it does not. This is with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3pre) Gecko/20070308 Thunderbird/2.0pre ID:2007030803 Thunderbird maybe catches maybe 1-5% of incoming junk mail.
sorry for the spam. making bugzilla reflect reality as I'm not working on these bugs. filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
(In reply to comment #40) > I just had my first junk email message in the past week or two blocked. I > continue marking, but it just doesn't work. Nightly 2.0 builds. I've seen comments where people have been helped by resetting the training data and starting from scratch in marking some messages as junk and some as not junk (In reply to comment #42) > I also use SpamAssassin, and have the settings set up to honor the warning from > SpamAssassin, yet it does not. This is with Mozilla/5.0 (Windows; U; Windows NT > 5.1; en-US; rv:1.8.1.3pre) Gecko/20070308 Thunderbird/2.0pre ID:2007030803 "Trust" isn't working in 2.0, at least for some people - bug 381589
(In reply to comment #44) > (In reply to comment #40) > > I just had my first junk email message in the past week or two blocked. I > > continue marking, but it just doesn't work. Nightly 2.0 builds. > > I've seen comments where people have been helped by resetting the training data > and starting from scratch in marking some messages as junk and some as not junk What's the use, then? I thought the effects were cumulative. (In reply to comment #44) > (In reply to comment #42) > > I also use SpamAssassin, and have the settings set up to honor the warning from > > SpamAssassin, yet it does not. This is with Mozilla/5.0 (Windows; U; Windows NT > > 5.1; en-US; rv:1.8.1.3pre) Gecko/20070308 Thunderbird/2.0pre ID:2007030803 > > "Trust" isn't working in 2.0, at least for some people - bug 381589 Hmmm. Sounds more and more like the junk mail controls are pretty much entirely broken.
The file training.dat holds both a whitelist and a blacklist, I understand. Presumably blacklist data is appended. If the process that reads the blacklist has an arbitrary limit on the size of the blacklist, appended data beyond that size would never be read. This would make the training of the junk controls appear not to work when the training.dat file or the blacklist it contains gets larger than whatever size limit may exist. I would suggest that records at the front of the blacklist be dropped off after the blacklist reaches a "mature" size, which would eliminate old and often useless spam data, and make sure that the filtering process reads the entire blacklist as well as the whitelist. If the problem is in the statistical methods instead of simple file processing, I would have no suggestions.
That certainly sounds like something to explore. But why not just expand the size of the blacklist data file or make it user configurable. Hope you had a Merry Christmas and best wishes for a Happy New Year
I have been studying for the last few weeks the spam processing in some detail. I have an extension that I can use to read in an extensive SPAM corpus (TRAC 2005) and run it through the TB spam filters. In that environment, which is "controlled" in the sense that I have precise control of what I train and analyze, the spam filter works quite well (meaning .05% false positives with 5% false negatives). These tests have used recent trunk builds, or TB 2.0. Also, the more I train the better it works, though the improvement is very slow. I am not aware of any limit to the size of the training database. My largest trials trained on 73,000 emails, generating over 300,000 tokens for both junk and good emails. I think that the reason retraining is occasionally necessary is that mistakes creep into the database, that is good emails that were trained as junk, and vice versa. I have not tested the sensitivity to that yet. That doesn't mean that the algorithms are ideal. In particular, there is no pruning of old tokens, so obviously obsolete material (like old dates and message IDs) end up cluttering the database. But pruning will result in a big increase in speed, with a modest decrease in effectiveness. (It's amazing how those rare tokens are often the ones that tip the balance.) It probably won't help the effectiveness (unless there has been mistraining or corruption in training.dat), at least in cases where emails are selected at random. In the real case, of course, both spam and ham drift over time, and perhaps retraining would help (or some sort of token pruning, which I am working on). There are a number of improvements that the spam filter needs - but still the basic algorithm is sound. However, the user interface does not provide much information, so if spam emails slip through (like the rash of big penis emails I've received in the last few days) then you are really very helpless. Or if you get little feedback, its hard to tell if it is working at all, and if not then why not. So, I don't believe there are any bugs or problems with the spam filtering, just a constant need for improvement to keep up with the increased skills of the spammers. I suppose I could run the tests requested in this bug, checking for regressions against 1.7 (what Thunderbird version is that?). But I've tested TB 2.0 and 3.0, and it works pretty well in a controlled environment. So I'm not sure I see the value of testing against an older version, which would take me hours or days to accomplish. If there is still someone convinced this is worth doing, then please make your case to me. I have the tools to do spam testing.
QA Contact: filters
Product: Core → MailNews Core
Flags: wanted1.9.2?
In >5 years there is no testcase here, so I think this bug should be closed and issues that individuals have be pursued in individual bugs, eg comment 42. If Andy's issue still exists for him and a testcase can be developed then perhaps leave the bug open, but in it's present state this bug is going nowhere.
WFM seems inappropriate, so => incomplete. if you are seeing a problem, please create a specific bug for a specific issue based on using version 3, and after checking bugzilla
Status: NEW → RESOLVED
Closed: 21 years ago16 years ago
Resolution: --- → INCOMPLETE
removing wanted-1.9.2? because bug is resolved
Flags: wanted1.9.2?
You need to log in before you can comment on or make changes to this bug.