Open Bug 342612 Opened 18 years ago Updated 2 years ago

training.dat leaks words in encrypted email

Categories

(Thunderbird :: Security, defect)

x86
All
defect

Tracking

(Not tracked)

People

(Reporter: pj, Unassigned)

Details

(Keywords: privacy, sec-other, Whiteboard: [sg:nse])

User-Agent:       Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4
Build Identifier: Thunderbird version 1.0.6 (20050716)

The training.dat file leaks as plain text the interesting words
in encrypted email messages.

I am engaged in an email conversation using encrypted messages,
using the Enigmail 0.93.0 OpenPGP extension and Thunderbird.

If I enable Tools :: Junk Mail Controls :: Adaptive Filter ::
Enable adaptive junk mail detection, and if I then mark one
of the encrypted messages as junk while I am reading it decoded
within Thunderbird, the significant words in that message are
written in plain text to the Mozilla Thunderbird training.dat
file.

I initially suspected that even if I did -not- mark one of these
secret messages junk, but rather just read it and left it be, I
still got secret words written to training.dat.  However my
limited attempts just not to reproduce that failure don't result
in anything being written to training.dat.  So that probably
doesn't happen.  I don't know, but I probably have to mark the
message Junk to cause these secret words to get written to the
training.dat file.

This is an unfortunate security bug, in my view.  Though it
has an obvious enough workaround - disable adaptive junk
filtering and remove any compromised training.dat file.

Encrypted message contents should not be written to disk files
in plain text, unless the user explicitly requests it.

(Thanks for an excellent mailer.)

Reproducible: Always

Steps to Reproduce:
1. Enable adaptive junk mail detection.
2. Remove the training.dat file.
3. Obtain an encrypted message.
4. Decode and read that message with Thunderbird.
5. While reading it, mark it as junk.
6. Look inside the newly recreated training.dat file.
7. Observe significant words from your message.

Actual Results:  
Secret words visible in plain text in the training.dat file.


Expected Results:  
Such secret words should not be added to the training.dat file.

I see this bug on both my SuSE 10.0 system, and on my Windows
XP SP2 system.

This may be a bug in the Enigmail extension, not in Thunderbird.
I don't have any way that I know of to distinguish which is
the case.
This does not appear to be limited to Enigmail, an invented word I inserted into an S/MIME-encrypted mail also ended up in my training.dat after I marked it "junk".

I guess the options are
1. don't use the Junk button on important mail (status quo)
2. disable the Junk button on encrypted mail
3. a confirm dialog after hitting 'junk' on encrypted mail
4. don't train on encrypted mail, just do associated moves or deletes.

I'm not too unhappy with the status quo, why in the world would anyone "junk" mail that has secret information in it anyway? Just delete it when it's done.

The confirm idea is horrible. If we're going to do anything I prefer 2, with maybe a hidden pref to turn it back on for people who know what they're doing (terms from junked encrypted mail end up in training). 4 would be OK, but essentially makes the junk button just another delete button of sorts and is kind of misleading.
Assignee: dveditz → mscott
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: privacy
Whiteboard: [sg:nse]
Your analysis seems reasonable.  Thanks for the rapid reply.

When I started to file this bug, it was because I saw these
encrypted message words in the training.dat file, and I was
presuming (without actual knowledge) that they had gotten there
as a consequence of my normal reading of these messages (no use
of the 'Junk' button.)

However I can only reproduce this using the 'Junk' button, so I
presume that I must have marked one of these secret messages as 
'Junk' sometime in the past couple of years, and forgotten now 
that I did so.

If encrypted message words had any way of ending up in the
training.dat file without some action such as marking it 'Junk',
then that would have been a more serious issue, of course.

Disabling the Junk button on encrypted messages would seem
like the best option to me.  I'd happily accept the other options
3 or 4 as well.  I leave that choice up to others who know this
code better than I (as in "know it at all" ;).
> If encrypted message words had any way of ending up in the
> training.dat file without some action such as marking it 'Junk',
> then that would have been a more serious issue, of course.

Marking a mis-classified message as "not junk" would also add terms to the training database. This might happen if a legit message had a bunch of spammy terms in it and you rescued it from the junk folder. We could of course disable the "un-junk" button, but you probably don't want your good mail to remain classed as junk. So we could skip spam-filtering any encrypted mail, maybe we already do since unencrypting it would be expensive.

I guess that gives the spammers a way in, though -- just encrypt the spam pitches. It'd be expensive to individually encrypt thousands of spam mails, but not inconceivable if you've got a zombie army doing your spamming.
> I guess that gives the spammers a way in, though -- just encrypt the spam

Encrypted spam doesn't sound like a problem we have at present.
Too few people user crypto for that to be attractive to spammers.

So I like the idea of:

> So we could skip spam-filtering any encrypted mail ...
unfortunately spammers will be more than happy to move into any open avenue we provide. so that route is really unappealing.

i think we're mostly screwed. the best path i can see is to create an encrypted junk db.
I must be missing something - how could a spammer send me any
useful spam that was encrypted?  They don't have my public key.

Ah - one thing I should point out.  I use encryption to send
-myself- email.  This is unusual.  Most people send others
email, and many of them publish their public key.  That's the
usual way of using encryption.  So I am less worried about getting
encrypted spam than the typical crypto user.  And obviously
you all would not be expected to base design decisions on the
unusual details of my situation.

I'd be happy with an encrypted junk folder, so long as I
didn't have to deal much with extra steps in the user interface
in the normal (normal, for me ;) use cases.
Assignee: mscott → nobody
(In reply to comment #3)

> Marking a mis-classified message as "not junk" would also add terms to the
> training database. ... you probably don't want your good mail to remain
> classed as junk.

There are two separate actions which tend to be conflated in mailnews. "Mark" as junk just adds a flag to the message saying it is junk, and then takes the normal action (moving it to a junk folder, say). "Training" as junk will add the tokens to the database. So it seems to me that the logical thing to do is to mark encrypted messages as junk without training them (same with good). That conflation is so deeply embedded that this is not necessarily a trivial option.
Simply reading a message doesn't cause anything to get written to training.dat. I don't think this needs to remain closed for security purposes at all. Opening.
Group: core-security
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.