194238 - junk mail controls won't analyze a message as junk until you mark a message as "not junk"

(not reading, please use seth@sspitzer.org instead)

Assignee

Description

•

22 years ago

junk mail controls stopped working for mac os x around 2/17? more details coming...

sairuh (rarely reading bugmail)

Updated

•

22 years ago

Hardware: PC → Macintosh

Tracy Walker [:tracy]

Comment 1

•

22 years ago

I trained the build from 02/17 all week. No messages were ever filtered to the junk folder as I had set up. esther was seeing a prblem where truning on delete message after _1__ days cuased filtering to not work. I had also set up my account to delete 2 day old junk marked messages. I've since turned that off and am waiting for incoming junk. Note: going back to M1.3b (ns) works

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 2

•

22 years ago

morphing the bug based on commments from twalker and esther: from twalker: "I think the delete messages issue that esther pointed out is the problem I turned it off...and an incoming message was filtered to the junk folder" from esther: "I think I noticed it stopped working on my Winxp system yesterday, on a Profile with a very well trained JMC (Junk Mail Control) when I selected to delete junk messages after 1 day option. I have since disabled that option and this morning I saw the JMC work again on that profile. Still investigating."

OS: MacOS X → All

Hardware: Macintosh → All

Summary: junk mail controls stopped working for mac os x → enabling delete junk messages after [n] days causes junk filter to stop working.

Target Milestone: --- → mozilla1.4alpha

esther

Comment 3

•

22 years ago

I'm sorry but after more investigation on my Profile mentioned in comment 2, I am not seeing the problem when I enable the automatic delete option again using the same profile and same build. So this option is working OK on my winxp system.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 4

•

22 years ago

morphing this bug back, based on comments from esther.

Keywords: regression

OS: All → MacOS X

Summary: enabling delete junk messages after [n] days causes junk filter to stop working. → junk mail controls not work on OS X

Tracy Walker [:tracy]

Comment 5

•

22 years ago

I can no longer reproduce this bug. I've been running the build from 2003-02-20. it is learning and filtering fine (for this early state in it's learning curve) with the delete junkmail after (n) days on or off. Has anyone else besides esther and I seen this bug?

esther

Comment 6

•

22 years ago

I think I found the problem. As mentioned before my POP and IMAP newly created profiles on MAC OSX were not evaluating incoming junk mail either (the training file was being built and growing in size as I marked more messages as junk). I found in both cases that after I marked a message Not Junk the evaluation & move started working. This is why those who have been using profiles and JMC on MAC did not run into this problem. And Tracy added this as a new profile to his mac system with no training file so messages were not getting marked as Junk so he had no need to mark them as Not Junk. Not sure if this is across platforms or if it's regression since we all have had Junk mail working for a while.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 7

•

22 years ago

morphing, based on comments from esther. esther, does this happen on win32 or linux?

Summary: junk mail controls not work on OS X → junk mail controls not work on OS X until you mark a message as "not junk"

esther

Comment 8

•

22 years ago

Yes, it happens on winxp and linux. Not sure if this is regression, we may not have realized this happened in our early testing because we weren't as familiar with the feature as we are now. Our expectations may have been that the JMC wasn't robust enough to catch the junk so we were marking messages ourselves, once a user makes the mistake of marking a message as junk when it shouldn't be they would unmark it and from that point on JMC worked.

esther

Comment 9

•

22 years ago

Note: The envelope feature should make it so the user marks messages as Not Junk right from the start so they probably won't run into this.

Asa Dotzler [:asa]

Updated

•

22 years ago

Flags: blocking1.3?

esther

Comment 10

•

22 years ago

Bug 188232 may be the first report of this bug which goes back to the 12-12-2002 build.

hacker formerly known as seawood@netscape.com

Updated

•

22 years ago

OS: MacOS X → All

Summary: junk mail controls not work on OS X until you mark a message as "not junk" → junk mail controls not work until you mark a message as "not junk"

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 11

•

22 years ago

accepting. I hope to fix this soon.

Status: NEW → ASSIGNED

Keywords: nsbeta1

Asa Dotzler [:asa]

Updated

•

22 years ago

Flags: blocking1.3? → blocking1.3-

Samir Gehani

Comment 12

•

22 years ago

Mail triage team: nsbeta1+/adt2

Keywords: nsbeta1 → nsbeta1+

Whiteboard: [adt2]

mozilla.gv6r

Comment 13

•

22 years ago

If the number of good tokens is 0, there's a possible division by 0 in mailnews/extensions/bayesian-spam-filter/src/nsBayesianFilter.cpp near line #625 in function void nsBayesianFilter::classifyMessage(Tokenizer& tokenizer, const char* messageURI, nsIJunkMailClassificationListener* listener) I'm not sure what 'min (1, g / ngood)' returns when ngood=0, so this may not be that important.

Ben Galbraith

Comment 14

•

22 years ago

Not sure if this is the right bug, but... I had junk mail controls working great on Win2k and WinXP installs of 1.3b against the same IMAP account. Both installs had been trained pretty well and were working with high levels of accuracy. Sometime about a week ago, the Win2k install of 1.3b stopped marking incoming junk mail (previously it had near 100% accuracy). The WinXP install continued marking incoming mail as spam for about 4 days longer than the Win2K machine, until two days ago a friend sent me a false positive (had "sexy" in the subject). I marked it as not junk, and from that point on, the WinXP install failed to classify junk at all. Upgrading both installs to 1.3final hasn't fixed the problem. After reading comments in Bugzilla, I've marked a few e-mails as "Not Junk" and will update comments if that fixes the problem. Apologize for having few specifics on the Win2K install breaking; let me know if anyone wants more specifics about my setup.

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 15

•

22 years ago

Attached patch patch, testing this now... (obsolete) — Details — Splinter Review

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 16

•

22 years ago

Attached patch a patch the works — Details — Splinter Review

Attachment #117370 - Attachment is obsolete: true

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 17

•

22 years ago

fixed. the patch has r/sr=bienvenu. this should improve the initial experience, and encourage users to train both good and bad, which is important for the algorithm to work.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

(not reading, please use seth@sspitzer.org instead)

Assignee

Comment 18

•

22 years ago

*** Bug 188232 has been marked as a duplicate of this bug. ***

Ivo Jansch

Comment 19

•

22 years ago

I'm experiencing symptoms from this bug, but have marked many messages as non-junk. Specifically, the symptom I'm seeing is that if I run the junkmail controls on a spam message, moz marks it as non-junk. If I mark it as junk manually, and rerun the control, it still marks it as non-junk. Is there a tool to analyze the training.dat file? I'd like to provide more feedback but I don't know how to analyze the training.dat. I've trained mozilla on all my spam and non-spam since 2000 (yes, I even keep all my spam :). The training.dat file is about 1.6Mb. There are a zillion words in the file, but how can I recognize which tokens it considers good and which ones it considers bad?

(not reading, please use seth@sspitzer.org instead)

Assignee

Updated

•

22 years ago

Summary: junk mail controls not work until you mark a message as "not junk" → junk mail controls won't analyze a message as junk until you mark a message as "not junk"

Bjarne D Mathiesen

Comment 20

•

22 years ago

This seems to be a pretty important issue in making JMC work perfectly ! Now, there seems to be a 1.3.1 release in the workings (bug 197105 and bug 185169) so how about seeing if it's possible to get this into the 1.3.1 build too ???

Ben Galbraith

Comment 21

•

22 years ago

FYI - marking messages as non-junk fixed my problem reported in comment #14 on the WinXP box. Haven't tested the Win2k box but assume that it will fix the problem there, too.

esther

Comment 22

•

22 years ago

here's how the code works now, for a first time user of this feature: 1) no training data, all incoming will be determined to be junk (except for whitelisting). 2) user marks message as not junk, now training data has only "good" tokens. 3) all incoming messages will be determined to be not junk. 4) user maks message as junk, now training data has both "good" and "bad" tokens 5) all incoming messages properly analyzed (except for whitelisting) this is as desired, since it forces users to train both junk and not junk (which is need for the JMC to work properly.)

esther

Comment 23

•

22 years ago

Using trunk builds 20030318 on winxp and linux this is fixed per how it should work in comment #22. Using trunk build 20030324 on mac osx this is fixed per comment 22 also. Verified.

Status: RESOLVED → VERIFIED

Andreas Höfler

Comment 24

•

22 years ago

*** Bug 198762 has been marked as a duplicate of this bug. ***

Robert Pollak

Comment 25

•

22 years ago

*** Bug 197801 has been marked as a duplicate of this bug. ***

Clarence Risher

Comment 26

•

22 years ago

I am running 2003052908 (1.4b). This bug appears to still be rearings its ugly head in one form. I have no whitelist that I am aware of, and no address book entries. If I delete training.dat and then run the JMC on my 7200-message nonjunk corpus, it marks about 99% of them as junk. The ones not marked as junk have no common features except that they really arent junk. I have been told this is not proper, and that is backed up by previous comments. It should mark them all as junk, but doesnt, indicating a possible problem related to this bug. And, slightly off topic, not really commenting on the bug, more using this as a message board to get an answer from people who know how the JMC work: In a proper bayesian filter you can start with a corpus of already sorted mail, junk and non-junk, and create a dictionary of tokens ('words') from their occurences in each half of the corpus which can then be used as a starting point for filtering. Is there any way to do this in Mozilla? So far the best results I have had, after multiple failed training attempts, is to mark each half INCORRECTLY, then delete the training.dat, then mark them correctly, thus having artificially 'recieved' and marked all the mails into their proper category and theoretically producing a proper starting dictionary. However, my results after using this method, as measured by false positive and false negative results on new incoming mail, are FAR below those I would expect and have seen with other bayesian filters (including one written myself for IRC). My initial corpus is comprised of 7200 non-junk emails and 1200 junk emails, which I am aware is slightly imbalanced. Out of the 87 emails I have recieved since I finished the training Mozilla has gotten 39 proper junk positives, 11 false negatives (junk that didnt get marked), 31 proper negatives, and 6 false positives (nonjunk marked as junk). This is many orders of magnitude worse than I have learned to expect from a bayesian filter with this level of training, my expectations being more along the lines of 50 0 36 1 respectively. If anyone could shed some light on this I would be grateful.

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Browser → Seamonkey

patch, testing this now... 22 years ago (not reading, please use seth@sspitzer.org instead) 1.40 KB, patch		Details \| Diff \| Splinter Review
a patch the works 22 years ago (not reading, please use seth@sspitzer.org instead) 1.45 KB, patch		Details \| Diff \| Splinter Review