Closed
Bug 283021
Opened 20 years ago
Closed 19 years ago
Junk filtering in Seamonkey misses a lot of junk
Categories
(SeaMonkey :: MailNews: Message Display, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 245168
People
(Reporter: jvidal, Unassigned)
Details
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050220
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050220
I have junk mail controls enabled, set to move junk mail to a "Junk" folder, but
when mail that was being correctly identified as junk with mozilla 1.8a5 is now
not detected and just sits in my inbox.
It's just like if junk mail controls were disabled (could be related to the bug
in which the default mail client status doesn't stick?)
Reproducible: Always
Steps to Reproduce:
1.enable junk mail controls, set themt o move junl to a "junk" folder
2.use mail/news
3.
Actual Results:
junk mail just sits i my inbox.
Expected Results:
junk mail should be moved to the junk folder
| Reporter | ||
Updated•20 years ago
|
Summary: JUnk mail controls do not work → Junk mail controls do not work
Comment 1•20 years ago
|
||
Do you know when this bug started to appear?
I found a similar problem with a nightly build (2-20-2005) that I used for about
a week. The filter seemed to stop learning, and didn't accurately process mail
based on the previously learned data. Some junk mail was not marked and moved to
the junk folder that should have been, and there was some good mail that started
to be marked as junk.
The nightly build kept asking to be the default mail application, as if the
"yes" response I gave it was not being saved. Perhaps the junk data was not
being saved either, or saved in the right directory.
I'm using Win98SE with updates, and using Mozilla 1.7.3 again.
I downloaded the current nightly (3-2-2005), installed it, ran it, got a little
mail including two junk messages, and found that the junk filter didn't mark
them junk. I removed the nightly, re-installed v. 1.7.3, started it, and
manually ran the junk filter. This time, the two junk messages were marked and
moved to the junk folder. It appears the junk detecting and processing has a
regression that hasn't been fixed. At least, there weren't any good messages
marked junk by the current nightly build.
After re-installing v. 1.7.3, I found that the ID cookie for a website I've used
for years was no longer working. There could be more general issues in 1.8b
versions saving user data correctly.
Using Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8b2) Gecko/20050411 and other
very recent nightlies, I haven't noticed any junk mail problems. In some
previous nightlies, it appeared as if junk mail control was being run
concurrently with downloading of POP mail. A few messages would get marked junk
while the downloading was in progress, and then not moved to the junk folder.
It's simpler to do those steps sequentially instead of concurrently.
Updated•20 years ago
|
Assignee: sspitzer → mail
Maybe one problem. 1.8b2 doesn't seem to filter junk as well as 1.7.x did.
On trying Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.8) Gecko/20050428 after
some problems with 1.8b email, I've found that 1.7.8 filters junk mail far
better than recent 1.8b builds. It's fair to say that 1.8b has become defective
in junk mail filtering.
This is still not working in the nightly 1.8b2 I downloaded on June 8 2005. It
got only about 5 of the 20-25 junk items I got this morning. By reinstalling and
running the junk filter in Mozilla 1.7.8, the rest of them were marked and
moved. The 1.8b junk filter is broken.
| Reporter | ||
Comment 9•20 years ago
|
||
You know what? I migrated to thunderbird and firefox (instead of the mozilla
suite) and everything works better in both programs...I still like the concept
of the all-in-one suite better, but it was getting too buggy...
bye!
Comment 10•20 years ago
|
||
People who have problems with junk mail filtering seemingly not happening: do
you have the junk mail log enabled? If so, does clearing the log and disabling
it fix the problem?
See bug 200594
Comment 11•20 years ago
|
||
I don't think the size of the junk logfile should be a problem, as the log
should simply be appended to, right? It's not loaded and processed in any way,
is it? I just checked my junk log with the junk controls menu, and found that
I've been logging with 1.7x with no problems.
Is there a difference in handling junk logs between 1.7x and 1.8x? 1.7x works
well, and 1.8x did up until 1.8a6.
I would look at how 1.8x reads and writes to the filtering-data file. If the
reading isn't correct, then the filtering will be incorrect. If the writing
isn't correct, then the Bayesian learning will not work. In going from 1.8a5 to
1.8a6, something in this process changed, I suspect. This might happen if
there's a path or filename change in the junk processing.
Comment 12•20 years ago
|
||
There's poor junk-mail processing in Mozilla/5.0 (Windows; U; Win98; en-US;
rv:1.9a1) Gecko/20050823 SeaMonkey/1.0a
In addition, selecting the menu item "run junk mail controls on folder" produces
seemingly endless disk activity that apparently does nothing, instead of marking
the junk in the selected folder (inbox) as junk. I found an action that results
in the processing of junk mail improperly being started and ended before all the
mail has been fetched and will put this into a new bug report.
Comment 13•20 years ago
|
||
Junk mail processing is messed up in Mozilla/5.0 (Windows; U; Win98; en-US;
rv:1.9a1) Gecko/20050929 SeaMonkey/1.1a. I got my mail with it today, receiving
about 275 msgs including junk, resulting in 4168 unread msgs [I'm not keeping up
with my lists :-(] and 218 unread items in the junk folder. Quitting Seamonkey
and running Mozilla 1.7.12, I started with 4168 unread inbox msgs and 217 unread
junk msgs, ran junk mail controls and the new numbers were about 4020 unread
inbox msgs and 330 unread junk msgs.
In other words, Seamonkey FAILED TO IDENTIFY about 100 junk messages that were
identified by Mozilla 1.7.12 processing. Most of the missed messages were new
ones as I clean my inbox by hand after the junk control does its thing. This is
BAD performance by Seamonkey (and its predecessor, Mozilla 1.8b).
Comment 14•20 years ago
|
||
Here's more evidence that Seamonkey isn't filtering spam well.
Using the 10-8 Win32 build, I fetched the morning mail, 220 msgs - 46 spams =
174 messages after filtering.
On starting Mozilla 1.7.12, I got 44 msgs - 20 spams = 24 msgs, but the older
msgs weren't re-filtered. So I selected "Run junk mail controls" in the mail
tools menu.
54 more spams were filtered out, most of which were from the 220 fetched with
Seamonkey. (I try to keep my Inbox cleaned).
Of about 100 spams detectable by junk filtering this morning, Seamonkey missed
about 54 spams which were filtered by Mozilla 1.7.12.
Comment 15•20 years ago
|
||
On Oct 26, I downloaded about 470 emails with either the 10-19 or 10-22 build of Seamonkey 1.1a. About 80 messages were filtered into the junk folder. Then I closed Seamonkey, started Mozilla 1.7.12, ran junk mail controls on the inbox and 86 more messages were filtered from the otherwise clean mailbox. This means that Seamonkey is missing about half of the messages that Mozilla 1.7x is correctly identifying as junk.
Comment 16•20 years ago
|
||
Using the 10-29-2005 nightly Seamonkey, I downloaded my POP mail and Seamonkey filtered 92 spams. I quit Seamonkey, started Mozilla 1.7.12, ran the junk filter on my otherwise clean Inbox and it filtered another 115 spams. By hand, I then removed another 22 spams.
Seamonkey filtered 92, while Mozilla could have filtered 92+115 or 207. Seamonkey filtering is substantially worse than Mozilla's.
Comment 17•20 years ago
|
||
Using the 2005-11-22 Seamonkey nightly, I downloaded the mail received since yesterday and Seamonkey reported 123 new messages. Then I exited, ran Mozilla 1.7.12 and its junk filter, and it removed 71 more messages.
This bug was reported in Feb 2005 in Mozilla 1.8b2. I think I first noticed a drop in filtering quality in 1.8a6. I.e, as the reporter said, 1.8a5 was OK. So a change between 1.8a5 and 1.8a6 caused the decline in filtering performance.
Comment 18•20 years ago
|
||
I'm not quite sure if you're comparisons are very scientific or at least useful, given your somewhat cloudy remarks. ;-)
A Bayesian spam filter filters spam based on past experience, i.e. to compare spam filtering results between Mozilla and SeaMonkey, you'd need to:
- have the exact same training.dat in both _different_ test profiles, and
- feed the exact same messages in the exact same order into it.
Does that hold for your comparisons?
Comment 19•20 years ago
|
||
Yes. I use the same profile and the same training.dat file. After Seamonkey filters the new mail as it's stored into an otherwise clean inbox, I close Seamonkey and run Mozilla 1.7.12 and clean out a lot more messages. Seamonkey filters very poorly. It evidently doesn't use the training.dat file correctly.
And to be very clear, releases up through 1.8a5 worked fine. I first saw the poor filtering in version 1.8a6, and all 1.8 and 1.9 versions since then have filtered badly. I've tested filtering in nightlies for a number of months.
Also, I would like to change the bug title to something like "Junk filtering in Seamonkey misses a lot of junk" or something like that, as some junk is filtered but quite a bit isn't. I think the reporter indicated by private correspondence that he switched to a commercial email program called The Bat and may not be following these remarks anymore.
Comment 20•20 years ago
|
||
OK, my remarks are not based on using different profiles with the same training.dat or the same mail being downloaded into different profiles. But the same mail files are being processed and the differences have been clear for months.
Comment 21•20 years ago
|
||
To clarify, it's the same profile and same training.dat.
Comment 22•19 years ago
|
||
Having heard of changes in filtering incorporated in 1.8a1, I am starting a new training.dat and testing only on Seamonkey. It would have been nice to have been informed of the changes earlier.
Comment 23•19 years ago
|
||
I heard by IRC that Moz 1.8a1 had some improved Bayesian filtering included, so it would work differently than the 1.7 branch. I moved the old training.dat file from my profile and started testing junk filtering exclusively with Seamonkey builds. I found that the allegedly improved filtering only removes about half of my junk mail after three weeks of training and a training.dat file of more than 600 Kb, while the filtering in the 1.7 branch removed 75-90% of my junk mail. The ONLY improvement is in the lower rate of removing good mail. Otherwise, the 1.8/1.9/trunk filtering is much worse, much less satisfactory to mail users.
This finding is consistent with earlier reports I filed while using the old 6 Mb training.dat file.
Comment 24•19 years ago
|
||
In trying to determine the function of the hidden-preference variable "junk_threshold", I found what looks like a typo in the filtering code, in a statement that looks like a variable assignment (sorry I don't know C++). Here's the link:
http://lxr.mozilla.org/seamonkey/source/mailnews/extensions/bayesian-spam-filter/src/nsBayesianFilter.cpp#942
According to the comments, Seth Spitzer was one of the authors and would be a good person to say whether it's a typo or not.
Comment 25•19 years ago
|
||
(In reply to comment #24)
> In trying to determine the function of the hidden-preference variable
> "junk_threshold", I found what looks like a typo in the filtering code, in a
> statement that looks like a variable assignment (sorry I don't know C++).
That does indeed look like a mistake, but looks to me like that pref isn't present by default, so it should not affect the common case. The only thing it would "break" would be people who've tried manually adding that pref, by editing prefs.js.
Comment 26•19 years ago
|
||
Not so. It was added to my prefs programmatically. I never touched it. My junk_threshold was set at 90.
Comment 27•19 years ago
|
||
The value is set in the default mailnews.js, in line 478. The link is http://lxr.mozilla.org/seamonkey/source/mailnews/mailnews.js#478
The text in that part of the script is:
475 // the probablilty threshold over which messages are classified as junk
476 // this number is divided by 100 before it is used. The classifier can be fine tuned
477 // by changing this pref. Typical values are .99, .95, .90, .5, etc.
478 pref("mail.adaptivefilters.junk_threshold", 90);
The above lines indicate that the value is intended to be user-settable. Here's where the source code uses that value to assign a value to mJunkProbabilityThreshold:
http://lxr.mozilla.org/seamonkey/source/mailnews/extensions/bayesian-spam-filter/src/nsBayesianFilter.cpp#915
911 PRInt32 junkThreshold = 0;
912 nsresult rv;
913 nsCOMPtr<nsIPrefBranch> pPrefBranch(do_GetService(NS_PREFSERVICE_CONTRACTID, &rv));
914 if (pPrefBranch)
915 pPrefBranch->GetIntPref("mail.adaptivefilters.junk_threshold", &junkThreshold);
916
917 mJunkProbabilityThreshold = ((double) junkThreshold) / 100;
918 if (mJunkProbabilityThreshold == 0 || mJunkProbabilityThreshold >= 1)
919 mJunkProbabilityThreshold = kDefaultJunkThreshold;
920
921 PR_LOG(BayesianFilterLogModule, PR_LOG_ALWAYS, ("junk probabilty threshold: %f", mJunkProbabilityThreshold));
I don't know C++ and can't guess whether the typo in line 942 of this file affects how the module works. But I didn't find the "adaptivefilters" string showing up except in mailnews.js and this cpp file, so it's unclear how the program may use it.
This setting certainly could affect how sensitive the junk filter is to junk, however the program scores a message.
Comment 28•19 years ago
|
||
Another possible clue re: poor filtering. I had a memory card problem a few days ago resulting in memory dropping from 256mb to 128mb. Win98SE and all the usual utilities ran, so did Mozilla, but the junk filtering became awful. When running in 128mb, almost all the memory is allocated. Today, I found that reseating the memory card fixed the problem.
Yet the filtering remained very bad. Only about 10 out of the 200 or so junk msgs were filtered, while usually most of the junk is removed.
After clearing the log file (no doubt very large) and the junk folder, I re-ran the junk mail filter and got the normal removal of junk.
These behaviors suggest a couple of things to me. 1) A silent out-of-memory condition may occur that doesn't allow training.dat to be fully loaded, just a designated buffer-full. The back end of the file containing recent blacklist data isn't used in junk filtering, it appears. 2) The system memory data may not be sufficiently dynamic. I.e, with 256mb working, Mozilla may have been thinking that only 128mb was available based on data saved from the previous session, and didn't "malloc" memory that was available for training.dat, etc.
A similar failure to use available memory might explain the bad filtering I've reported with Seamonkey.
Instead of just loading a piece of the training.dat file into a buffer and leaving it, the program should read the entire file as needed, and rewind to the start as needed, to filter all the new mail against the entire training.dat file. What's virtual memory for if not to page pieces of a file in and out of main memory as needed?
Comment 29•19 years ago
|
||
(In reply to comment #28)
>
> These behaviors suggest a couple of things to me. 1) A silent out-of-memory
> condition may occur that doesn't allow training.dat to be fully loaded, just a
> designated buffer-full. The back end of the file containing recent blacklist
> data isn't used in junk filtering, it appears. 2) The system memory data may
> not be sufficiently dynamic. I.e, with 256mb working, Mozilla may have been
> thinking that only 128mb was available based on data saved from the previous
> session, and didn't "malloc" memory that was available for training.dat, etc.
>
> A similar failure to use available memory might explain the bad filtering I've
> reported with Seamonkey.
>
> Instead of just loading a piece of the training.dat file into a buffer and
> leaving it, the program should read the entire file as needed, and rewind to
> the start as needed, to filter all the new mail against the entire training.dat
> file. What's virtual memory for if not to page pieces of a file in and out of
> main memory as needed?
there are many core + Thunderbird bugs (some of which are surely dupes to each other) Bug 228675, bug 228675, bug 229002, Bug 236842 to name a few. (plus a nondescript Bug 181328)
As for the log issues mentioned, sadly it doesn't look like Bug 200594 has any suggested solutions.
Comment 30•19 years ago
|
||
With the bad filtering in Seamonkey, I can't use it for serious email, only for testing. Other users very well may have the same opinion. If the filtering isn't fixed, how can Seamonkey be any more than a hobby for serious email users?
Comment 31•19 years ago
|
||
I checked the performance of Seamonkey 1.0.1 and the version 1.1 nightly of 2006-4-16 and found that both appear to have the junk filtering problem. Here's a short list of checkins that I suspect includes the cause of the filtering problem. Many junk checkins before and since were mainly about the user interface, but these sound like they may affect training.dat file processing.
2004-03-19 13:43 bsmedberg%covad.net mozilla/lib/libi18n/net_junk.h 0/0 Bug 180555 - remove unused parts of mozilla/lib (not part of build)
2004-03-19 13:43 bsmedberg%covad.net mozilla/lib/libi18n/net_junk.c 0/0
2004-03-12 11:29 scott%scott-macgregor.org mozilla/mail/base/locale/junkMail.dtd 1.4 1/1 add a colon to the end of the reset training data string
2004-03-12 11:17 scott%scott-macgregor.org mozilla/mail/base/locale/junkMail.dtd 1.3 6/2 Bug #237151 --> Add UI to the junk control dialog for resetting the training data
2004-03-12 11:17 scott%scott-macgregor.org mozilla/mail/base/content/junkMail.xul 1.6 8/0
2004-03-12 11:17 scott%scott-macgregor.org mozilla/mail/base/content/junkMail.js 1.2 19/0
Comment 32•19 years ago
|
||
The 1.8a1 release couldn't access my mailboxes so I couldn't test it. The problem exists in the earliest 1.8a build I found and could use, Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8a2) Gecko/20040625. It filtered 24 out of 56 spams, much worse than any 1.7 build performs. As noted above, 1.8a1 had some Bayseian filtering changes, and I noted some checkins that may be related to the problem too.
Seamonkey 1.0.1 filtered 50 spams out of a batch of mail, and then 1.7.13 filtered about 70 more out, all newly arrived mail that Seamonkey missed. This is a noticeable issue that will turn people off on Seamonkey until it's fixed. It's broken. A defect was introduced very early on in going from 1.7 to 1.8a.
Comment 33•19 years ago
|
||
The junk filtering problem exists with Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20060722 SeaMonkey/1.5a. It isn't just a Windows problem. I'm running Mandrake Linux 10.1 which uses a 2.6.8 kernel if I recall correctly.
Was there any change in file and buffer handling when the 1.8 builds began that would make virtual-memory file reading and writing not work right? Someone reported a loss of most of his bookmarks using Firefox 1.5 in a Linux site, and a few other junk processing problems might be explained by incorrect virtual memory handling. One example is a bug I reported about junk processing being incomplete if I switched mail folders while the inbox was being de-junked. If not virtual memory, then maybe buffer handling that applies to both Windows and Linux?
Comment 34•19 years ago
|
||
I tried Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20060921 SeaMonkey/1.5a and found that the junk filtering problem discussed here still exists in current nightlies, is not confined to Windows 98, and is a substantial decline in performance from Mozilla 1.7x. I made this assessment using the same profile, training.dat, etc for both programs running in Mandrake Linux 10.1.
The difference: Mozilla filters about 2/3 of the junk, Seamonkey filters about 1/3 of the junk. How can anyone not miss this defect and find Seamonkey unsatisfactory? I should add that my ISP uses a non-interactive filter that junked some good mail, so I asked them to turn it off. They set the threshold to the max so I get all my spam for Mozilla (or Seamonkey) to handle. Mozilla does fine. Seamonkey mail isn't really usable.
This problem appeared in the earliest days of Mozilla 1.8. I have no idea if a threshold variable was changed, but if the problem is in the algorithm, the code in Seamonkey needs to be replaced with the code in Mozilla 1.7x.
Comment 35•19 years ago
|
||
this is a good match to bug 245168, so duping. John T, you've done much research and the effort is great. But the volume of comments is quite high. If you follow up in the bug 245168, suggest you summarize your findings of fact to 3-4 lean sentences, perhaps in areas that others haven't commented about, that help narrow the focus of the search.
Have you cleared your training data and started from scratch?
*** This bug has been marked as a duplicate of 245168 ***
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
Summary: Junk mail controls do not work → Junk filtering in Seamonkey misses a lot of junk
You need to log in
before you can comment on or make changes to this bug.
Description
•