Closed Bug 1462488 Opened 3 years ago Closed 3 years ago

Sanitize HTML quotes before sending them to the composer when replying

Categories

(MailNews Core :: Composition, enhancement)

enhancement
Not set
critical

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: BenB, Assigned: BenB)

References

Details

Attachments

(4 files, 4 obsolete files)

In the past, including in the recent months, we had a number of attacks that require the user to reply to an email. They don't work in the normal viewer, but something happens when you reply. That's been a repeated theme.

We should sanitize the HTML that we quote from the original message and insert into the user's mail.

After all, this is just a quote/cite. We only need to preserve the essence of it, enough for a citation. The most important is the text. It doesn't need to have Amazon or eBay brand recognition in order to reply to the content of an email that a seller sent.

This should close a whole class of attacks that
a) attack our composer
b) attack the recipients of our user
   (which could lead to interesting constellations, esp. when it comes to decryption and trust)

So, I think this will be an important protection going forward. We don't know yet what the exact attacks will be, but we've had a enough in the past to think that there are more to come. This should stop them all.
This runs the full sanitizer over the message, including removing presentational HTML, colors, font sizes, etc. It looks different from the original message. It leaves only the text and basic structure.

Personally, I think this is good for quoting. But it's a matter of opinion.

I've tested this by
1. setting View | Message body as | Original HTML
2. opening an Amazon HTML email
-> It display with colors and <table> boxes and everything, in the viewer
3. Clicking Reply

Without the patch:
-> The inserted quote has colors, boxes etc.

With the patch:
-> The inserted quote is the sanitized HTML version,
   without colors, with broken remote images (as intended),
   basically just text, but with links etc..
   Same that you get with View | Message Body as | Simple HTML.

Disclaimer / need for help:
I am not completely sure that I found all the necessary places to insert the code. My goal is to sanitize all foreign HTML content that's inserted into the composer, whereby "foreign" means anything not created by our user, anything coming from the network. If anybody knows additional places to cover, please let me know.
If we want to preserve the original look of the HTML, this patch (or a followup) is needed.

Good news and bad news:

The good news is that if I enable presentational HTML and styles, the result looks the same to me as without sanitation. (There might be minor differences, but I haven't noticed any.) So, this does look like we could ship this as default for everybody. It would remove a whole class of attacks.

The bad news is that the Gecko HTML sanitizer is not able to remove URL references from CSS rules/styles. The sanitizer I had written back in 2004 could do that, but the rewrite by Henri cannot, unfortunately. He was aware of that, and that's why he has an |if| that disables all CSS (althogether, even the innocent rules), if I say that I don't want external resources.

We should later add support to remove external URLs from CSS. But right now, we have no sanitation at all and allow flat out everything, so this is no worse than before.

I've made a band-aid fix that removes that |if|. The caller can still control it, because the allowStyles has its own flag. Thunderbird and DirectorySericeProvider are the only users of this sanitizer, and all callers set AllowStyles properly, so removing that automatic is not a concrete problem, and fixes the issue The only problem is that I'd need to patch Gecko.

Anyway, the patch as attached works, and is a massive improvement over status quo. I don't see any drawbacks. I'll attach a screenshot.
This is a screenshot of:
* 1. Viewer, Original HTML
* 2. Composer, before (TB 58.7)
* 3. Composer, with patch v4, sanitizer, but allowing styles
* 4. Composer, with patch v2, full sanitation

As you can see, you see nothing :-) ...that is, no difference between 2. and 3.
That means that Sanitizer patch v4 makes no noticeable changes to the quote result, even though the sanitizer ran over the quote and cleaned it from most problematic HTML.
Attachment #8976733 - Attachment description: Screenshot of original, patch v2 and patch v4 → Screenshot of original, full sanitation (patch v2) and sanitation with styles preserved (patch v4)
Attachment #8976733 - Attachment description: Screenshot of original, full sanitation (patch v2) and sanitation with styles preserved (patch v4) → Screenshot of original, sanitation with styles preserved (patch v4) and full sanitation (patch v2)
Jens Müller, efail reporter, wrote in bug 1419417 comment #199:

S/MIME reply or forward exfiltration:

* S/MIME: decrypted MIME parts can be completely hidden with CSS; if user replies/forwards in HTML format the (non-visible) plaintext is leaked, otherwise (in text-only reply mode) the victim could be fooled by inserting newlines
---------------------------------------------------
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: text/html

Reply please to this email
<div style="color: red;">

--BOUNDARY
Content-Type: application/pkcs7-mime; name="smime.p7m"; smime-type=enveloped-data
Content-Transfer-Encoding: base64
...
--BOUNDARY--
---------------------------------------------------
This above attack is fixed by my patch here.

I would urge you to take this. I don't know anything with less impact that still fixes all possible angles and future variations of the basic issue that Jens mentioned.
Comment on attachment 8976732 [details] [diff] [review]
Sanitize HTML, v4 (for 52 and mozilla) - Sanitize the HTML, but keep styles and presentation

This less aggressive sanitation does *not* fix the problem Jens mentioned.
Attachment #8976732 - Flags: review-
Comment on attachment 8976730 [details] [diff] [review]
Sanitize HTML quotes, v2 (for 52) - full sanitation

This is the right fix here.

Good news is that it works with 52 and trunk. I had tested this with 52 and it works fine.

Yes, it changes the layout of messages. But after all, it's just a quote. The purpose of a quote is not to be pretty in presentation, but to allow the reader to know the context of the reply. So, only the text is important. So, I do think that removing styles is not necessarily a bad thing.
Attachment #8976730 - Flags: review?(jorgk)
I don't think this is something out users are going to like, and also it is not really fixing the attack, just making it harder. The content can still be semi-hidden using many newlines and whatever tricks you could think of, even with plaintext.

The correct fix to avoid it altogether is to either not decrypt when not appropriate, or mark it so that composition could remote the parts we don't want to send out.
(In reply to Ben Bucksch (:BenB) from comment #0)
> In the past, including in the recent months, we had a number of attacks that
> require the user to reply to an email. They don't work in the normal viewer,
> but something happens when you reply. That's been a repeated theme.

Don't know about that, on the top of my head efail is the only thing we've had since 52.
(In reply to Magnus Melin from comment #8)
> I don't think this is something out users are going to like, and also it is
> not really fixing the attack, just making it harder.
I was inclined to approve the less stringent sanitation, but since it somehow (how?) doesn't appear to address the problem and its own author gave it r-, there's no point looking into it further.

I agree with Magnus here, we can't make life miserable for 99.5% of our users. A quote is a quote and not a mutilated quote.

Also, I'm a little confused by all the security discussions going on. Which attack vector is this closing? Where is a test case?
> efail is the only thing we've had since 52.

That's a short timeframe :). I don't remember details, but I do remember other older attacks that worked only on reply.

> The content can still be semi-hidden using many newlines

Newlines do not hide it, because there's a scroll bar. But yes, we could remove newlines as well, and we probably should, in fact.

Remember, this is a quote. It is not supposed to be a pixel-perfect representation of the original email. It's supposed to give the reply enough context so that the respondent doesn't have to write "regarding what you said to... blabla [repeat what was written]", but simply quote the one sentence he replies to. That is the primary reason for quotes: to give context, not to be Amazon-branded.

> we can't make life miserable for 99.5% of our users

Cleaning quotes from colors is not exactly "making life miserable". I rather enjoy the cleaner looks. But that's surely a question of style and opinion.

> Which attack vector is this closing? Where is a test case?

Comment 4, among other things.
And yes, efail is 3-4 different classes of attacks, and each attack consisting of several more specific attack vectors, so it's hard to keep track.

That's exactly why I try to install broader security measures here, to avoid not just this attack, but future ones as well. I am convinced by experience that this is the only way to be secure.

I had made another patch v4 which is less drastic and tries to preserve style, and I thought we had found a nice compromise, just to be taught better by comment 4 from Jens Müller, in that v4 would not fix that attack, but my original idea with the more drastic v2 that removes style would fix it.

I understand that v2 is drastic. That's why I made v4, which doesn't change style, but also doesn't fix comment 4.
Simply put: If you reject v2, we can't do anything for TB 52 and TB 60.

v4 uses the sanitizer partially, but still allows style. It does not change the looks of the email, but still protects from *some* attacks, even if not from all. I think it would still be an improvement over status quo.
v4 needs a patch to Mozilla core, though, so that cannot happen in TB 52 and TB 60.
See bug 1464056 comment 5 for an alternative way to stop the attack in comment 4. That one would also stop the newlines "trick".

Even with that, I think the fix v4 here (which does not alter presentation) is still necessary, because it will stop other classes of attacks which are as of yet unknown.
Attachment #8976730 - Flags: review?(jorgk)
This is the same as the previous v4, just applied to trunk, and split into 2 patches for comm-central and mozilla-central, with a comment header, as Jörg requested.
A later patch for mozilla-central should also remove URLs from CSS. That shouldn't be too hard, we'd just need to look at <style> and style="" content and remove all "url(" and "@import" on a plain text level.
(In reply to Ben Bucksch (:BenB) from comment #11)
> > efail is the only thing we've had since 52.
> 
> That's a short timeframe :). 

I wrote 52 because there was one case for 52, but after the rework for that, compose doesn't have special privileges anymore. Since there's no special privileges anymore it's hard to make a case for how it would cause any risks.
Added HG header and fixed spelling ("intact").

Boris and Henri, we've come here from bug 1419417, a security bug which is part of the "efail report"; you might have heard about it in the press or seen it in the security advisories that came with TB 52.8.0:
https://www.mozilla.org/en-US/security/advisories/mfsa2018-13/
https://www.efail.de/

Could you please review this for us or find an appropriate reviewer. There is a large explanational comment in the patch, also, see bug 1419417 comment #92 (of 208 so far!).
Attachment #8980479 - Attachment is obsolete: true
Attachment #8980498 - Flags: review?(bzbarsky)
Attachment #8980498 - Flags: review?(hsivonen)
> it's hard to make a case for how it would cause any risks.

Comment 4 is one case :). I can think of others, but discussing that in detail would be distracting here.
(In reply to Ben Bucksch (:BenB) from comment #11)
> > Which attack vector is this closing? Where is a test case?
> Comment 4, among other things.
OK, my favourite test message is still attachment 8975458 [details] from bug 1419417. I've modified it like so:

--DELIMITER
Content-Type: text/html

Please answer to this email
(note that this only works if `compose messages in HTML format' is set)
<div style="color: red;">

--DELIMITER
Content-Type: application/pkcs7-mime; name="smime.p7m"; smime-type=enveloped-data
Content-Transfer-Encoding: base64

MIAGCSqGSIb3DQEHA6CAMIACAQAxggNvMIIBogIBADCBiTB1MQswCQYDVQQGEwJJTDEWMBQG
...

Viewing the message, I see no red since there is no div. The decrypted second part displays fine. 

If instead of |<div style="color: red;">| I use |<style>body { color: red; }</style>|, then I get the entire body red, since the CSS leaks into all the message parts.

If I use |<style>div { display: none; }</style>|, the the decrypted part doesn't show in the display, but it does show in the reply. If I use |<style>div { display: none; } pre { display: none; }</style>|, the decrypted doesn't show at all.
Comment on attachment 8980477 [details] [diff] [review]
Sanitize HTML, v4 (part for comm-central) - Sanitize the HTML, but keep styles and presentation

I'm cancelling the review until I get a test case. My test case from comment #20 is not fixed. That was already known, see comment #11, last sentence.

So I'm not sure what we're trying to fix here.
Attachment #8980477 - Flags: review?(jorgk)
Comment on attachment 8980498 [details] [diff] [review]
Sanitize HTML, v4 (part for mozilla-central) - Sanitize the HTML, but keep styles and presentation

Sorry about the noise :-(
Attachment #8980498 - Flags: review?(hsivonen)
Attachment #8980498 - Flags: review?(bzbarsky)
Comment on attachment 8980508 [details] [diff] [review]
Sanitize HTML, v5 (part for mozilla-central) - Sanitize the HTML, and remove CSS external resources, but keep styles and presentation

This implements sanitizing external URLs in CSS, by removing all properties that have "url(" in the value.
If I read the existing code right, it sanitizes both <style> tags and style="" attributes. Would be nice to have confirmation.

@import rules were already forbidden.

Is there anything else we should be removing from CSS? But I think this already takes care of most problematic CSS.
Attachment #8980508 - Attachment description: efail-composer-5-mozilla-trunk.diff → Sanitize HTML, v5 (part for mozilla-central) - Sanitize the HTML, and remove CSS external resources, but keep styles and presentation
It just came to mind. This vector would be closed by only putting attachments into an iframe. That could be done now, even with a fixed iframe height. That would fix a lot of other problems, too. WIP in bug 1297653. Maybe Ben with his better understanding of MIME can move this forward.

Note that that patch no longer applies, since I removed writing the <div class="moz-text-html" ...> in bug 1463133. We could put it back, but only for subordinate parts, not the main body.
> I'm not sure what we're trying to fix here.

This is sanitizing the HTML that goes into the reply, to protect against future exploits.

> If I use |<style>div { display: none; } pre { display: none; }</style>|, the decrypted doesn't show at all.

If you want to protect against exploits based on style, you have to take my patch v2. But then - obviously - the message doesn't look like before, and you rejected that patch. But it fixes that exploit.

Given that you rejected v4, I now created v4/5, which leaves style in fact, but that's of course vulnerable to attacks based on style, so we need other fixes for that, e.g. bug 1464056 and bug 1463360.

Patches v4/5 protect against completely different exploits that e.g. let external URLs leak from the attacker over our user's email to the recipients of our user's email and do harm there.
I don't have access to bug 1464056 and bug 1463360. How about my comment #25 and bug 1297653?
(In reply to Ben Bucksch (:BenB) from comment #26)
> Patches v4/5 protect against completely different exploits that e.g. let
> external URLs leak from the attacker over our user's email to the recipients
> of our user's email and do harm there.

And remote content is blocked in Thunderbird so there is not even any privacy issues there. It also has very little to do with efail.
(In reply to Ben Bucksch (:BenB) from comment #19)
> > it's hard to make a case for how it would cause any risks.
> 
> Comment 4 is one case :). I can think of others, but discussing that in
> detail would be distracting here.

Well, we already sanitizing is not really fixing the case from comment 4, right?
It's relevant because you seem to want sanitizing at all cost without giving more than hand-waiving about what issue it would fix.
we already *agreed* ...
> I'm cancelling the review until I get a test case.

Really? Do you want me to write exploits now?
> And remote content is blocked in Thunderbird so there is not even any privacy issues there.

The attacker attacks the recipients of our user to get the text from our user. The recipients don't use Thunderbird. And yes, it *is* our problem, because we created that HTML that contains the exploit. This patch stops it.

This is a *protective* measure to stop exploits that we do not know yet. But I know they will come, based on the exploits we saw so far.
We will never get the application secure, if we wait for exploits before we act.
(In reply to Ben Bucksch (:BenB) from comment #31)
> > I'm cancelling the review until I get a test case.
> Really? Do you want me to write exploits now?
Well, a test case would certainly be useful. In any case, let's get the M-C review first. The C-C part is straight forward.
> a test case would certainly be useful.

Agreed, but see comment 23. The very nature of creating a secure application is that you prevent exploits *before* they happen. You typically don't even know what exactly they could be. You just know that a certain thing is dangerous, and you forbid it.

> let's get the M-C review first. The C-C part is straight forward.

Alright.
(In reply to Ben Bucksch (:BenB) from comment #32)
> > And remote content is blocked in Thunderbird so there is not even any privacy issues there.
> 
> The attacker attacks the recipients of our user to get the text from our
> user. The recipients don't use Thunderbird. And yes, it *is* our problem,
> because we created that HTML that contains the exploit. This patch stops it.

You're confusing issues here. There being css with urls in them has nothing to do with getting text from our users. And I don't agree it's a problem that we send out messages that may include remote content. That's not an exploit.
Attachment #8980508 - Flags: review?(bzbarsky) → review-
Boris, can you please elaborate on the r- a bit?
Flags: needinfo?(bzbarsky)
Um...  I had a long review comment to go along with that r-.  What the heck?

I'm not going to spend the time reconstructing it all, but a brief summary:

1)  The commit message is very rambly and repetitive, says "we" when it means "Thunderbird", talks about "styles and CSS" as if they were different things and has a whole bunch of other issues.  I guess I can do another pass on the commit message once it gets rewritten some.

2)  The "delete while iterating" pattern will make you skip every property after one that gets deleted as far as I can see.  Was this code actually tested?  Please add tests with the patch that would have caught that.

3)  The effect of deletions on iteration order in general is not defined.

4)  This will delete innocuous things like |content: "url()"| but as long as the only API consumer (Thunderbird) is OK with this that's OK.

5)  Please use the two-string-arg form of FindInReadable, or even just use .Find().
Flags: needinfo?(bzbarsky)
Thanks, bz, for the review.
Quick update to address most review comments.
TODO:
* Avoid Remove in loop
* Add tests
Attachment #8980508 - Attachment is obsolete: true
> 4)  This will delete innocuous things like |content: "url()"| but as long as the only API consumer (Thunderbird) is OK with this that's OK.

Yes, this is OK. Idea here is "rather be safe than sorry".
> make you skip every property after one that gets deleted

That is, skip the next property after the one you delete, because the indexing changes.
bz, I've monkey-patched that with i--; , but is there a better pattern that I should use?
Gather up the names to remove into an array, then remove them?
OK. I was hoping for a simpler solution.
Is there a good reason to do string to string sanitizing and then reparse? As opposed to parsing to sanitized DOM fragments and inserting them into the composer doc?
I think you'd have to turn the whole thing upside down (or inside out) if you wanted to do that. That's my gut feel without analysing it. The same goes for bug 1419417 comment #209.
Marking this WONTFIX as I don't see us doing this.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Summary: [efail] Sanitize HTML quotes before sending them to the composer when replying → Sanitize HTML quotes before sending them to the composer when replying
You need to log in before you can comment on or make changes to this bug.