Closed Bug 194238 Opened 22 years ago Closed 22 years ago

junk mail controls won't analyze a message as junk until you mark a message as "not junk"

Categories

(SeaMonkey :: MailNews: Message Display, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.4alpha

People

(Reporter: sspitzer, Assigned: sspitzer)

References

Details

(Keywords: regression, Whiteboard: [adt2])

Attachments

(1 file, 1 obsolete file)

junk mail controls stopped working for mac os x around 2/17? more details coming...
Hardware: PC → Macintosh
I trained the build from 02/17 all week. No messages were ever filtered to the junk folder as I had set up. esther was seeing a prblem where truning on delete message after _1__ days cuased filtering to not work. I had also set up my account to delete 2 day old junk marked messages. I've since turned that off and am waiting for incoming junk. Note: going back to M1.3b (ns) works
morphing the bug based on commments from twalker and esther: from twalker: "I think the delete messages issue that esther pointed out is the problem I turned it off...and an incoming message was filtered to the junk folder" from esther: "I think I noticed it stopped working on my Winxp system yesterday, on a Profile with a very well trained JMC (Junk Mail Control) when I selected to delete junk messages after 1 day option. I have since disabled that option and this morning I saw the JMC work again on that profile. Still investigating."
OS: MacOS X → All
Hardware: Macintosh → All
Summary: junk mail controls stopped working for mac os x → enabling delete junk messages after [n] days causes junk filter to stop working.
Target Milestone: --- → mozilla1.4alpha
I'm sorry but after more investigation on my Profile mentioned in comment 2, I am not seeing the problem when I enable the automatic delete option again using the same profile and same build. So this option is working OK on my winxp system.
morphing this bug back, based on comments from esther.
Keywords: regression
OS: All → MacOS X
Summary: enabling delete junk messages after [n] days causes junk filter to stop working. → junk mail controls not work on OS X
I can no longer reproduce this bug. I've been running the build from 2003-02-20. it is learning and filtering fine (for this early state in it's learning curve) with the delete junkmail after (n) days on or off. Has anyone else besides esther and I seen this bug?
I think I found the problem. As mentioned before my POP and IMAP newly created profiles on MAC OSX were not evaluating incoming junk mail either (the training file was being built and growing in size as I marked more messages as junk). I found in both cases that after I marked a message Not Junk the evaluation & move started working. This is why those who have been using profiles and JMC on MAC did not run into this problem. And Tracy added this as a new profile to his mac system with no training file so messages were not getting marked as Junk so he had no need to mark them as Not Junk. Not sure if this is across platforms or if it's regression since we all have had Junk mail working for a while.
morphing, based on comments from esther. esther, does this happen on win32 or linux?
Summary: junk mail controls not work on OS X → junk mail controls not work on OS X until you mark a message as "not junk"
Yes, it happens on winxp and linux. Not sure if this is regression, we may not have realized this happened in our early testing because we weren't as familiar with the feature as we are now. Our expectations may have been that the JMC wasn't robust enough to catch the junk so we were marking messages ourselves, once a user makes the mistake of marking a message as junk when it shouldn't be they would unmark it and from that point on JMC worked.
Note: The envelope feature should make it so the user marks messages as Not Junk right from the start so they probably won't run into this.
Flags: blocking1.3?
Bug 188232 may be the first report of this bug which goes back to the 12-12-2002 build.
OS: MacOS X → All
Summary: junk mail controls not work on OS X until you mark a message as "not junk" → junk mail controls not work until you mark a message as "not junk"
accepting. I hope to fix this soon.
Status: NEW → ASSIGNED
Keywords: nsbeta1
Flags: blocking1.3? → blocking1.3-
Mail triage team: nsbeta1+/adt2
Keywords: nsbeta1nsbeta1+
Whiteboard: [adt2]
If the number of good tokens is 0, there's a possible division by 0 in mailnews/extensions/bayesian-spam-filter/src/nsBayesianFilter.cpp near line #625 in function void nsBayesianFilter::classifyMessage(Tokenizer& tokenizer, const char* messageURI, nsIJunkMailClassificationListener* listener) I'm not sure what 'min (1, g / ngood)' returns when ngood=0, so this may not be that important.
Not sure if this is the right bug, but... I had junk mail controls working great on Win2k and WinXP installs of 1.3b against the same IMAP account. Both installs had been trained pretty well and were working with high levels of accuracy. Sometime about a week ago, the Win2k install of 1.3b stopped marking incoming junk mail (previously it had near 100% accuracy). The WinXP install continued marking incoming mail as spam for about 4 days longer than the Win2K machine, until two days ago a friend sent me a false positive (had "sexy" in the subject). I marked it as not junk, and from that point on, the WinXP install failed to classify junk at all. Upgrading both installs to 1.3final hasn't fixed the problem. After reading comments in Bugzilla, I've marked a few e-mails as "Not Junk" and will update comments if that fixes the problem. Apologize for having few specifics on the Win2K install breaking; let me know if anyone wants more specifics about my setup.
Attachment #117370 - Attachment is obsolete: true
fixed. the patch has r/sr=bienvenu. this should improve the initial experience, and encourage users to train both good and bad, which is important for the algorithm to work.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
*** Bug 188232 has been marked as a duplicate of this bug. ***
I'm experiencing symptoms from this bug, but have marked many messages as non-junk. Specifically, the symptom I'm seeing is that if I run the junkmail controls on a spam message, moz marks it as non-junk. If I mark it as junk manually, and rerun the control, it still marks it as non-junk. Is there a tool to analyze the training.dat file? I'd like to provide more feedback but I don't know how to analyze the training.dat. I've trained mozilla on all my spam and non-spam since 2000 (yes, I even keep all my spam :). The training.dat file is about 1.6Mb. There are a zillion words in the file, but how can I recognize which tokens it considers good and which ones it considers bad?
Summary: junk mail controls not work until you mark a message as "not junk" → junk mail controls won't analyze a message as junk until you mark a message as "not junk"
This seems to be a pretty important issue in making JMC work perfectly ! Now, there seems to be a 1.3.1 release in the workings (bug 197105 and bug 185169) so how about seeing if it's possible to get this into the 1.3.1 build too ???
FYI - marking messages as non-junk fixed my problem reported in comment #14 on the WinXP box. Haven't tested the Win2k box but assume that it will fix the problem there, too.
here's how the code works now, for a first time user of this feature: 1) no training data, all incoming will be determined to be junk (except for whitelisting). 2) user marks message as not junk, now training data has only "good" tokens. 3) all incoming messages will be determined to be not junk. 4) user maks message as junk, now training data has both "good" and "bad" tokens 5) all incoming messages properly analyzed (except for whitelisting) this is as desired, since it forces users to train both junk and not junk (which is need for the JMC to work properly.)
Using trunk builds 20030318 on winxp and linux this is fixed per how it should work in comment #22. Using trunk build 20030324 on mac osx this is fixed per comment 22 also. Verified.
Status: RESOLVED → VERIFIED
*** Bug 198762 has been marked as a duplicate of this bug. ***
*** Bug 197801 has been marked as a duplicate of this bug. ***
I am running 2003052908 (1.4b). This bug appears to still be rearings its ugly head in one form. I have no whitelist that I am aware of, and no address book entries. If I delete training.dat and then run the JMC on my 7200-message nonjunk corpus, it marks about 99% of them as junk. The ones not marked as junk have no common features except that they really arent junk. I have been told this is not proper, and that is backed up by previous comments. It should mark them all as junk, but doesnt, indicating a possible problem related to this bug. And, slightly off topic, not really commenting on the bug, more using this as a message board to get an answer from people who know how the JMC work: In a proper bayesian filter you can start with a corpus of already sorted mail, junk and non-junk, and create a dictionary of tokens ('words') from their occurences in each half of the corpus which can then be used as a starting point for filtering. Is there any way to do this in Mozilla? So far the best results I have had, after multiple failed training attempts, is to mark each half INCORRECTLY, then delete the training.dat, then mark them correctly, thus having artificially 'recieved' and marked all the mails into their proper category and theoretically producing a proper starting dictionary. However, my results after using this method, as measured by false positive and false negative results on new incoming mail, are FAR below those I would expect and have seen with other bayesian filters (including one written myself for IRC). My initial corpus is comprised of 7200 non-junk emails and 1200 junk emails, which I am aware is slightly imbalanced. Out of the 87 emails I have recieved since I finished the training Mozilla has gotten 39 proper junk positives, 11 false negatives (junk that didnt get marked), 31 proper negatives, and 6 false positives (nonjunk marked as junk). This is many orders of magnitude worse than I have learned to expect from a bayesian filter with this level of training, my expectations being more along the lines of 50 0 36 1 respectively. If anyone could shed some light on this I would be grateful.
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: