Closed Bug 209074 Opened 22 years ago Closed 20 years ago

Junk Mail Controls fail on spam using multipart/alternative but lacking plain text part

Categories

(MailNews Core :: Filters, defect)

x86
Windows 98
defect
Not set
major

Tracking

(Not tracked)

RESOLVED EXPIRED

People

(Reporter: bah_pop3, Assigned: sspitzer)

Details

Attachments

(7 files)

User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.5a) Gecko/20030605 Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.5a) Gecko/20030605 Junk Mail Controls have failed massively for me since late April/early May. I'm not entirely sure why, and hence the info I'm giving here may be completely misdirected. Never the less, there are several people who have reported similar kinds of failings with JMC A look at six recently received spam messages reveals that all have the plain text portion excised, altho' they claim to be multipart/alterative. Several insert 'garbage' following </html> and before the ending boundary delimiter. Several also insert commented-out random words in the body of the text, presumably to subvert Bayesian filtring. While I realise there may be up to three distinct bugs here, I have chosen to report them omnibus; separate bugs can be spun off if and as necessary. (Again, I have no proof that what I'm reporting is in fact what's causing the behaviour.) I will create attachments giving the bodies of these half-dozen spam messages. I will likely just replace '@' with '%' in the spam messages, for whatever slight preventive help in might provide. A quick look at an additional 55 spam showed that 43 claimed to be in multipart/alternative but had no plain text part (a couple even lacked the HTML part!), 23 had 'garbage' after </html>, and 16 had <!-- random words --> (only two made use of both techniqes). There were several messages that used base64. So it is not always reproducible. (BTW, there is no actual component for JMC, so I've used Mail Window Front End.) Reproducible: Sometimes Steps to Reproduce: 1. Enable Junk Mail Controls; 2. wait for spam to arrive; 3. watch as JMC fail on most messages. Actual Results: I had manually to mark the messages as spam (at which point they were moved to the Junk folder). Expected Results: Marked messages as spam and moved them to the Junk folder. Will create attachments of recently received spam that escaped JMC. I am suggesting a severity of Major on the basis that this problem seems to exist in the 1.4 branch builds as well as the trunk builds I've been using; it would be unfortunate were 1.4 final to go out the door with this feature -- a pretty good selling point -- broken. NOTE: The account that is primarily affected is an IMAP account, and I have set JMC to move items manually marked as junk to the Junk folder. This /did/ work for some time; the problems began shortly after I enabled the latter setting, but did work for a while (a few days? a week or two? -- I'm afraid I don't recall) before failing.
BTW, let me just add that, having gone thro' all of the hassle to file the bug and create the attachments, I'm well aware that this might just prove to be a big wild goose chase. However, since the JMC aren't catching the spam, the obvious first thing to look at would be whether there is something in the messages themselves that thwarts JMC before proceeding to specifics of the set-up that I and others experiencing this problem might share in common (/e.g./, IMAP, setting JMC to move manually marked spam to the Junk folder, no auto-delete). The other alternative would seem to be corruption in training.dat itself, but a recent message on one of the n.m.u.* suggests that deleting the existing training.dat does not correct the problem. Note that this bug does not seem to have anything directly to do with training.dat 'forgetting' about spam upon deletion: I purge my spam every week or so, with no noticeable change in behaviour.
Is this a bug at all? Since the Junk Mail Filter has to be 'taught' what is and what isnt spam, there is no guarantee that any spam will be detected as such from the get go. It could be that your specific JMC wasnt taught that such messages were spam. By marking them as junk, you have taught the program. It is difficult to adjudge, even with the attachments, that the contents are relative to whether or not JMC would have picked it up in any case. Since as you note it is NOT always repeatable, it simply may have been that your specific JMC had not encountered such types before, and therefore was not detecting it as spam. It should in future, if you marked them as Junk.
Brian, Using 20030610 a 1.4 branch build. 1.) I saved all of your attached messages to my system, put them in a Local folder in an existing profile that has my IMAP account set up. 2.) I then created a new profile I then added an IMAP account and retrieved messages (note I now have new profile with a new JMC training file). 3.) I trained the Junk control for a couple of hours until I was satisfied it was doing the right thing. Exited this profile. 4.) I then went into the other profile and selected the folder with your 7 messages (have them in an unread state) selected all of the messages and copied them to my IMAP Inbox and exited the profile. 5.) I launched the new profile with the new training file and marked all 7 of the messages as Junk (they moved to the IMAP Junk folder as I had expected since I set up all the JMC preferences to ON). 6.) I exited and repeated steps 4 & 5. Result 5 of the 7 were analyzed as Junk and moved. (As expected) 7.) Repeated steps 4&5 again Result all 7 of the messages were analyzed as Junk and moved. For me, in a new profile, these messages were correctly analyzed as Junk when expected. You may have a different problem. If possible could you try this test.
It would seem to be one of two things: /either/ a bug /or/ some kind of corruption w/i training.dat. I should perhaps clarify: JMC had been working pretty much flawlessly for me from the time of their introduction till late April/early May; they were correctly pegging what I would estimate to be > 90 per cent. of spam (with no false positives). Then, suddenly, they stopped working. I've not seen a message correctly identified as spam on this particular account in weeks, and have seen very few messages (perhaps a handful) correctly identified as spam at all (over several accounts); I've even had the occasional false positive. (Unfortunately, I do not know precisely when this failure occurred, which makes matters more difficult. :-( ) Training.dat currently sits at 513 kB. It's not like I'm some clueless n00b in dealing with this, y'know? :-\ While you are correct in saying there is nothing to guarantee that JMC had encountered these files before, there are instances in which one I have had to mark an e-mail as junk even tho' an identical e-mail is in the Junk folder marked as junk (http://bugzilla.mozilla.org/attachment.cgi?id=125438&action=view). Further, the fact that this failure is so massive is in and of itself suspicious. I don't as a rule pore over my spam, but I have seen enough spam to realise that -- excepting the <!-- random words --> -- there would appear to be nothing here that I've not seen before. I do realise that <!-- random words --> are part of the problem here (and I'd never seen them before till recently), but they only affected ~1/3 of spam messages. Again, as I pointed out in comment #8, part of the rationale for filing this bug is to try to pin-point where the failure is coming about; it may not in fact be related to the nature of the spam itself -- but at the very least that needs to be eliminated as a possible cause. /b.
In re. comment #9: Esther: Thanks for the clarification. I will try to create a new profile in the next day or two (time permitting :-( ) and look at it again. Assuming that the problem isn't inherent to the spam I attached and is unrelated to the use of IMAP and moving manually marked messages to the Junk folder, then the problem would seem to involve training.dat itself. If so, it would appear from various comments regarding problems with JMC that have been made on n.m.u.* and n.p.m.* that training.dat seems rather too susceptible to some kind of corruption. I will post back once I have gone thro' the whole profile creation rigmarole (I've got it down to an art, I swear; I can do it in 45 minutes or less, and most of that time is spent setting up my various e-mail/news acc'ts again); I'll keep the spam I have not yet deleted to hand in order to use it to train (excepting (if I remember to do so) the messages that I've posted here).
I submit that this entire 'bug' has served only to waste time Brian is now claimiing elsewhere that Junk Mail has quit working entirelly, completely! Yet he posts this bug with specific details and examples, that after many hours of work on several peoples parts, are proved to be fasle. In fact his entire bug should have been "Junk Mail is no longer working" PERIOD (for him at least) Yet he carries on in this bug report as if it was something in the spam messages themselves. And now offers his conclusion that it has to be something in the training.dat that 'suddenly broke'.
In re. comment #13: I'm not sure from whence this sudden antagonism derives. I will, as Esther has suggested, build a new profile tomorrow and see if that doesn't take care of the matter. I did say that one of three things could cause the behaviour I am experiencing: 1) the nature of the spam itself (an obvious first conclusion and the basis on which this bug was filed); Esther's experience suggests that this is not the case; 2) something in the nature of how I deal with e-mail; again, Esther's experience suggests that this is not the case; 3) something in training.dat or otherwise profile related. If, after having built a new profile, spam recognition returns roughly to what it was before I encountered this issue (and it was recognising most of the spam I received, with no false positives, from the time JMC were implemented late last year till late April/early May), then it would seem that the third point would be what would need to be looked in to. If this is the case, this bug would, I suppose, then have to be INVALIDed and another bug spun off from it. It is primarily my time that is being spent on this issue (filing the bug and collecting whatever data I could on the basis of spam messages available to me; I will spend time in the morning building a new profile); there is a reason why I very rarely report bugs. I am spending/wasting this time because I believe there is a problem with JMC. I am grateful for the time Esther has spent to try to eliminate possible causes of what I am seeing. That said, were I the only one experiencing these problems, the bug might well be a waste time for all concerned -- including myself. However, there are others who have experienced problems with JMC. This is troubling, since they are potentially a killer feature in the way that tabs were when 1.0 came out. If there is a problem here, it needs to be traced down and dealt with. As for the rest, see comment #8: I was aware from the outset that the cause of what I was seeing might not be inherent in the spam itself.
Brian Heinrich, is this bug still a problem for you? Of the junk in my Junk and Trash folders, only a small amount is multipart/alternative at all; all have a text/plain part, altho that is empty in some and filled with decoy words in others.
Component: Mail Window Front End → Filters
Oh, ****. I've been meaning to get back to this bug for a while now. :-( I never was able properly to track down what was going on. I tried editing training.dat; I tried deleting training.dat and starting from scratch; I tried to see if there was a difference between IMAP and POP3 accounts. It eventually got to the point where I simply closed down two of the four accounts that were giving me the most grief (in one case, over 5 200 pieces of spam in an eight- or 10-week period). I have two accounts that still receive substantial amounts of spam. I currently access them /via/ the Web interface. What I'll do is set them up in Mozilla today, one as IMAP the other as POP3, and test them for a few days. If the problem no longer persists, the bug can be resolved as WFM, I suppose. . . .
Product: MailNews → Core
This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/
This bug has been automatically resolved after a period of inactivity (see above comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → EXPIRED
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: