User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:188.8.131.52) Gecko/20060508 Firefox/184.108.40.206 Build Identifier: Thunderbird version 1.0.6 (20050716) The training.dat file leaks as plain text the interesting words in encrypted email messages. I am engaged in an email conversation using encrypted messages, using the Enigmail 0.93.0 OpenPGP extension and Thunderbird. If I enable Tools :: Junk Mail Controls :: Adaptive Filter :: Enable adaptive junk mail detection, and if I then mark one of the encrypted messages as junk while I am reading it decoded within Thunderbird, the significant words in that message are written in plain text to the Mozilla Thunderbird training.dat file. I initially suspected that even if I did -not- mark one of these secret messages junk, but rather just read it and left it be, I still got secret words written to training.dat. However my limited attempts just not to reproduce that failure don't result in anything being written to training.dat. So that probably doesn't happen. I don't know, but I probably have to mark the message Junk to cause these secret words to get written to the training.dat file. This is an unfortunate security bug, in my view. Though it has an obvious enough workaround - disable adaptive junk filtering and remove any compromised training.dat file. Encrypted message contents should not be written to disk files in plain text, unless the user explicitly requests it. (Thanks for an excellent mailer.) Reproducible: Always Steps to Reproduce: 1. Enable adaptive junk mail detection. 2. Remove the training.dat file. 3. Obtain an encrypted message. 4. Decode and read that message with Thunderbird. 5. While reading it, mark it as junk. 6. Look inside the newly recreated training.dat file. 7. Observe significant words from your message. Actual Results: Secret words visible in plain text in the training.dat file. Expected Results: Such secret words should not be added to the training.dat file. I see this bug on both my SuSE 10.0 system, and on my Windows XP SP2 system. This may be a bug in the Enigmail extension, not in Thunderbird. I don't have any way that I know of to distinguish which is the case.
This does not appear to be limited to Enigmail, an invented word I inserted into an S/MIME-encrypted mail also ended up in my training.dat after I marked it "junk". I guess the options are 1. don't use the Junk button on important mail (status quo) 2. disable the Junk button on encrypted mail 3. a confirm dialog after hitting 'junk' on encrypted mail 4. don't train on encrypted mail, just do associated moves or deletes. I'm not too unhappy with the status quo, why in the world would anyone "junk" mail that has secret information in it anyway? Just delete it when it's done. The confirm idea is horrible. If we're going to do anything I prefer 2, with maybe a hidden pref to turn it back on for people who know what they're doing (terms from junked encrypted mail end up in training). 4 would be OK, but essentially makes the junk button just another delete button of sorts and is kind of misleading.
Assignee: dveditz → mscott
Status: UNCONFIRMED → NEW
Ever confirmed: true
Your analysis seems reasonable. Thanks for the rapid reply. When I started to file this bug, it was because I saw these encrypted message words in the training.dat file, and I was presuming (without actual knowledge) that they had gotten there as a consequence of my normal reading of these messages (no use of the 'Junk' button.) However I can only reproduce this using the 'Junk' button, so I presume that I must have marked one of these secret messages as 'Junk' sometime in the past couple of years, and forgotten now that I did so. If encrypted message words had any way of ending up in the training.dat file without some action such as marking it 'Junk', then that would have been a more serious issue, of course. Disabling the Junk button on encrypted messages would seem like the best option to me. I'd happily accept the other options 3 or 4 as well. I leave that choice up to others who know this code better than I (as in "know it at all" ;).
> If encrypted message words had any way of ending up in the > training.dat file without some action such as marking it 'Junk', > then that would have been a more serious issue, of course. Marking a mis-classified message as "not junk" would also add terms to the training database. This might happen if a legit message had a bunch of spammy terms in it and you rescued it from the junk folder. We could of course disable the "un-junk" button, but you probably don't want your good mail to remain classed as junk. So we could skip spam-filtering any encrypted mail, maybe we already do since unencrypting it would be expensive. I guess that gives the spammers a way in, though -- just encrypt the spam pitches. It'd be expensive to individually encrypt thousands of spam mails, but not inconceivable if you've got a zombie army doing your spamming.
> I guess that gives the spammers a way in, though -- just encrypt the spam Encrypted spam doesn't sound like a problem we have at present. Too few people user crypto for that to be attractive to spammers. So I like the idea of: > So we could skip spam-filtering any encrypted mail ...
unfortunately spammers will be more than happy to move into any open avenue we provide. so that route is really unappealing. i think we're mostly screwed. the best path i can see is to create an encrypted junk db.
I must be missing something - how could a spammer send me any useful spam that was encrypted? They don't have my public key. Ah - one thing I should point out. I use encryption to send -myself- email. This is unusual. Most people send others email, and many of them publish their public key. That's the usual way of using encryption. So I am less worried about getting encrypted spam than the typical crypto user. And obviously you all would not be expected to base design decisions on the unusual details of my situation. I'd be happy with an encrypted junk folder, so long as I didn't have to deal much with extra steps in the user interface in the normal (normal, for me ;) use cases.
(In reply to comment #3) > Marking a mis-classified message as "not junk" would also add terms to the > training database. ... you probably don't want your good mail to remain > classed as junk. There are two separate actions which tend to be conflated in mailnews. "Mark" as junk just adds a flag to the message saying it is junk, and then takes the normal action (moving it to a junk folder, say). "Training" as junk will add the tokens to the database. So it seems to me that the logical thing to do is to mark encrypted messages as junk without training them (same with good). That conflation is so deeply embedded that this is not necessarily a trivial option.
Simply reading a message doesn't cause anything to get written to training.dat. I don't think this needs to remain closed for security purposes at all. Opening.
You need to log in before you can comment on or make changes to this bug.