<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Ludovic Hirlimann [:Usul]

Updated

•

15 years ago

Summary: Please remove the dependency on mozSanitizingSerializer → Change mozSanitizingSerializer to not use old HTML parser

Comment 2

•

15 years ago

Isn't that a duplicate of 650787 ?

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 3

•

15 years ago

(In reply to comment #2) > Isn't that a duplicate of 650787 ? No. Bug 650787 is about nsPlainTextSerializer. mimemoz2.cpp seems to use both nsPlainTextSerializer and mozSanitizingSerializer.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 4

•

14 years ago

BenB, is mozSanitizingSerializer used in your non-mailnews XULRunner apps? Could the functionality be moved to comm-central? It appears that Firefox itself doesn't use this functionality and doesn't expose it to JS-based extensions.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 5

•

14 years ago

> BenB, is mozSanitizingSerializer used in your non-mailnews XULRunner apps? Yes, it is. E.g. TomTom HOME 2 (several million installations) uses it, IIRC. It's also useful to others, as I think it's more paranoid than the other implementation.

Assignee

Comment 6

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #5) > It's also useful to others, as I think it's more paranoid than the > other implementation. I had a closer look at mozSanitizingSerializer. The code is pretty bad. Many parts copied and pasted from an old state of nsPlainTextSerializer. Furthermore, the sanitizing functionality is so close to nsTreeSanitizer that from a maintenance perspective, I think it doesn't make sense to keep both in m-c. Could you, please, identify where mozSanitizingSerializer is more paranoid than nsTreeSanitizer in a way that's essential to current users of mozSanitizingSerializer?

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 7

•

14 years ago

* based on whitelist of HTML tags *and* their attributes (only the whitelisted combinations are allowed) * the whitelist is configurable in prefs, and can be different for each use. For example, you probably want to be very restrictive in emails (spam, viruses, targetted attacks etc., you can't control what is coming in), but maybe you want to allow more in RSS (you subscribed yourself). * as a particularly strong instance of paranoia, mozSanitizingSerializer is also looking into the content textnodes and HTML attributes, whether it can find anything that might be or turn into a script, and eliminates it. I understand that Gecko has protections to not run scripts etc., and nsTreeSanitizer trusts on them. mozSanitizingSerializer is written with the assumption that Gecko regularly has bugs, and it's nevertheless no option to ever be compromised, so mozSanitizingSerializer tries to stop harmful things before they reach the renderer / JS engine. This is probably the most significant conceptual difference. It's vital for me that these guarantees continue to exist, because it's used in Thunderbird View | Message Body As | Simple HTML, which we also use for phishing and junk mail suspects. There are users of Thunderbird that have high security demands and Gecko just doesn't live up.

Assignee

Comment 8

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #7) > * based on whitelist of HTML tags *and* their attributes (only the > whitelisted combinations are allowed) Why is this essential? Why would it be harmful to allow attributes that nsTreeSanitizer allows on any of the elements that nsTreeSanitizer allows? > * the whitelist is configurable in prefs, and can be different for each use. > For example, you probably want to be very restrictive in emails (spam, > viruses, targetted attacks etc., you can't control what is coming in), but > maybe you want to allow more in RSS (you subscribed yourself). I very much doubt that users change this in prefs. > * as a particularly strong instance of paranoia, mozSanitizingSerializer is > also looking into the content textnodes and HTML attributes, whether it can > find anything that might be or turn into a script, and eliminates it. How could something turn into a script? Concretely, what's the scenario that this protects against? As far as I can tell, the TEXT_REMOVED placeholder never gets written... This part of mozSanitizingSerializer seems unnecessary and bogus. If a text node is serialized with a regular serializer that always escapes < and >, there should be no risk of the text node becoming markup. > I understand that Gecko has protections to not run scripts etc., and > nsTreeSanitizer trusts on them. No. nsTreeSanitizer removes scripts, javascript: URLs and event handler attributes. The design goal of nsTreeSanitizer is that you can safely put the sanitized tree fragment into a scripting-enabled different-origin document. > It's vital for me that these guarantees continue to exist, because it's used > in Thunderbird View | Message Body As | Simple HTML, which we also use for > phishing and junk mail suspects. There are users of Thunderbird that have > high security demands and Gecko just doesn't live up. I'm a bit disappointed that something as vital as this hasn't been picked up since I filed this bug in April. :-(

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 9

•

14 years ago

> I very much doubt that users change this in prefs. I think you missed the part "can be different per caller". Any code that uses the function can set different defaults. This is necessary, and I showed an example. > nsTreeSanitizer removes scripts, javascript: URLs and event handler attributes. Last time I checked, it found them in less places. Can you show that the nsTreeSanitizer finds strictly all scripts, no matter how obscure, that mozSanitizingSerializer finds? Ditto with remote content, BTW.

Assignee

Comment 10

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #9) > > I very much doubt that users change this in prefs. > > I think you missed the part "can be different per caller". Any code that > uses the function can set different defaults. This is necessary, and I > showed an example. You have shown an example where the whitelist differs for different callers. You haven't shown that it's necessary for the whitelists to differ for RSS and email. > > nsTreeSanitizer removes scripts, javascript: URLs and event handler attributes. > > Last time I checked, it found them in less places. nsTreeSanitizer parses the values of attributes as URLs and checks them where the attributes are known to be processed as URL-containing attributes elsewhere in Gecko. mozSanitizingSerializer drops any attributes whose value contains "data:", "javascript:" or "base64". I think it's pretty bogus to drop e.g. the title attribute if it happens to contain "data:". To what extent this is unfounded paranoia and to what extent this actually addresses a real problem? > Can you show that the nsTreeSanitizer finds strictly all scripts, no matter > how obscure, that mozSanitizingSerializer finds? I think that sort of approach isn't particularly productive for discovering what the real requirements are (ones that have solid rationale--not just "it's more paranoid"). That said, would you be OK with using nsTreeSanitizer if it had a mode that dropped all attributes whose values have "javascript:", "data:" or "base64" as a substring? > Ditto with remote content, BTW. Are you referring to only allowing cid: URLs in <img src>? (In reply to Henri Sivonen (:hsivonen) from comment #8) > I'm a bit disappointed that something as vital as this hasn't been picked up > since I filed this bug in April. :-( This was out of line, sorry. I hadn't delivered the dependencies of this bug until now.

Comment 11

•

14 years ago

> You haven't shown that it's necessary for the whitelists to differ for RSS > and email. (FYI, with "caller", I mean email vs. RSS, these are different callers.) I wrote: For example, you probably want to be very restrictive in emails (spam, viruses, targetted attacks etc., you can't control what is coming in), but maybe you want to allow more in RSS (you subscribed yourself). More concretely: You may want to flash plugins in a certain blog that you subscribe to, but you definitely don't want that in email. Similarly, I don't want to see the presentational formatting of the email sender, given the nonsense that Outlook and some people send, but I may want to see colors and <table> formatting in RSS. And I want to see other things than you do. That's why email can have a different default set than RSS, and I can adapt them both to my needs. > nsTreeSanitizer parses the values of attributes as URLs and checks them where > the attributes are known to be processed as URL-containing attributes elsewhere > in Gecko. Yeah. That's what I meant with paranoid: I don't want to run the risk that some new attribute is added somewhere in Gecko and we forget to adapt the sanitizer. No scripts *anywhere* allowed. The current approach of Gecko to security doesn't work very well, so that's not an argument, but exactly the reason why this component is needed. It modifies the HTML on the source level and drops anything that remotely smells like it might be dangerous. Yes, that drops some legitimate stuff. False positives have never been the problem, though, but rather that the default rules also drop <font> and similar presentation stuff and some users want that while others don't. > I think that sort of approach isn't particularly productive for discovering > what the real requirements are The real requirement is "no security hole in 20 years". So far, as far as I am aware (!), we're at 10 years without a single hole that sneaked by this sanitizer, apart from image decoder bugs. Gecko has about 10 holes per month. > > Ditto with remote content, BTW. > Are you referring to only allowing cid: URLs in <img src>? Yes, but that is the exception. We're trying to throw away all remote URLs, be it image, stylesheet, <embed>, <object> or whatever.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 12

•

14 years ago

> No scripts *anywhere* allowed. Since there's no way to have javascript enabled in mail anymore, why is any of this an issue? For rss we allow what the browser allows (js/plugins and everything, unless disabled by pref), and i don't see why anyone would want that different.

Assignee

Comment 13

•

14 years ago

(In reply to Magnus Melin from comment #12) > For rss we allow what the browser allows (js/plugins and > everything, unless disabled by pref), and i don't see why anyone would want > that different. (In reply to Ben Bucksch (:BenB) from comment #11) > You may want to flash plugins in a certain blog that you subscribe to, but > you definitely don't want that in email. Firefox doesn't allow plug-ins or JS in RSS. Is allowing plug-ins and JS in RSS a requirement for Thunderbird? How does Thunderbird sandbox RSS posts? When JS and plug-ins are allowed in RSS in Thunderbird, is mozSanitizingSerializer involved at all? > That's why email can have a different default set than RSS, and I can adapt > them both to my needs. I think we shouldn't make decisions on what gets maintained in m-c (by me apparently) based on what you (or another power user) might want to adapt to your needs. Instead, we should consider what configurability Thunderbird really needs to expose to all users in UI outside about:config. > > nsTreeSanitizer parses the values of attributes as URLs and checks them where > > the attributes are known to be processed as URL-containing attributes elsewhere > > in Gecko. > > Yeah. That's what I meant with paranoid: I don't want to run the risk that > some new attribute is added somewhere in Gecko and we forget to adapt the > sanitizer. No scripts *anywhere* allowed. You didn't answer my question, though: Would you be OK with using nsTreeSanitizer if it had a mode that dropped all attributes whose values have "javascript:", "data:" or "base64" as a substring? > > I think that sort of approach isn't particularly productive for discovering > > what the real requirements are > > The real requirement is "no security hole in 20 years". So far, as far as I > am aware (!), we're at 10 years without a single hole that sneaked by this > sanitizer, apart from image decoder bugs. Gecko has about 10 holes per month. mozSanitizingSerializer doesn't contain anything that would make it special in a way that'd require m-c to continue to contain that particular piece of legacy code. It would really be more productive to help me identify which concrete features I need to add to nsTreeSanitizer to make it acceptable for Thunderbird than to keep implying that there will be security holes unless a particular piece of legacy code stays in use. > > > Ditto with remote content, BTW. > > Are you referring to only allowing cid: URLs in <img src>? > > Yes, but that is the exception. We're trying to throw away all remote URLs, > be it image, stylesheet, <embed>, <object> or whatever. nsTreeSanitizer throws away <link rel>, <embed> and <object>. <video> and <audio> are not thrown away, but their src is sanitized (more strictly than <img src>) and the controls attribute is added to make them playable without scripting. Whether <style> and style="" are dropped is configurable. If they are not dropped, they are sanitized for -moz-binding.

Comment 14

•

14 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #13) > (In reply to Magnus Melin from comment #12) > > For rss we allow what the browser allows (js/plugins and > > everything, unless disabled by pref), and i don't see why anyone would want > > that different. > > (In reply to Ben Bucksch (:BenB) from comment #11) > > You may want to flash plugins in a certain blog that you subscribe to, but > > you definitely don't want that in email. > > Firefox doesn't allow plug-ins or JS in RSS. Is allowing plug-ins and JS in > RSS a requirement for Thunderbird? How does Thunderbird sandbox RSS posts? > When JS and plug-ins are allowed in RSS in Thunderbird, is > mozSanitizingSerializer involved at all? With RSS i meant when displaying entries as web pages - this is done in an iframe. For summaries (where i guess sanitizing may apply at least if you choose View As | Simple HTML) the same policies as with mail apply. See bug 504965 comment 13.

Comment 15

•

14 years ago

> Firefox doesn't allow plug-ins or JS in RSS. That's not the point. You may want to allow it in the future. It's a different usecase and has different requirements, in general. > Would you be OK with using nsTreeSanitizer if it had a mode that dropped all attributes > whose values have "javascript:", "data:" or "base64" as a substring? I would be OK with using nsTreeSanitizer is and only if it can do everything that mozSanitizingSerializer can. That includes mostly 2 general points: * complete paradoia in what's allowed, *not* trusting Gecko protections even one bit. That includes any and all scripts and anything that might come in the future. * configurability of allowes tags and attributes, and different config per caller It's OK if HTML tags and attributes are not pairs, but managed separately, if necessary. But the above things are minimum requirements. Anything below doesn't go. > nsTreeSanitizer throws away <link rel>, <embed> and <object> Does it use a whitelist or blacklist? I need not worry about additions like that in mozSanitizingSerializer, because it's a whitelist. If a tag is added that is desirable, we only need to change the default pref, and if users disagree with our choice, they can do that themselves. > based on what you (or another power user) might want to adapt to your needs. I have explained extensively that the choice of which tags are allowed and which are not is *highly* subjective. By default, we don't allow "font", out of mere matters of taste, but many users like or need colored fonts and find that this makes the feature unusuable for them. They have the choice of adapting it, while with nsTreeSanitizer, every caller and every user is stuck with the same list. There are bound to be disagreements, and not making that configurable is a profoundly bad decision, IMHO. > mozSanitizingSerializer doesn't contain anything that would make it special in a way > that'd require m-c to continue to contain that particular piece of legacy code. That is factually untrue.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 16

•

14 years ago

> > Firefox doesn't allow plug-ins or JS in RSS. > That's not the point. (This wasn't about plugins specifically, either. I also mentioned colors and tables. I fear you're not making a honest discussion here.) This bug is not about getting rid of the component, but adapting it to the new API. Somebody changed the API, they also have to adapt the code that uses the API. That's normal.

Assignee

Comment 17

•

14 years ago

(In reply to Magnus Melin from comment #14) > With RSS i meant when displaying entries as web pages - this is done in an > iframe. OK. > For summaries (where i guess sanitizing may apply at least if you > choose View As | Simple HTML) the same policies as with mail apply. Indeed. Looking at the code, it seems to me that mailnews always uses the same whitelist for all sanitizer invocations regardless of email vs. feed context. This makes the case for an arbitrarily configurable white list look pretty weak. (In reply to Ben Bucksch (:BenB) from comment #15) > > Firefox doesn't allow plug-ins or JS in RSS. > > That's not the point. You may want to allow it in the future. It's a > different usecase and has different requirements, in general. Writing code for hypothetical things you might want to do in the future is the kind of architecture astronautics that begat the kind of overabstraction in Gecko that we've now been working to fix for a decade. > > Would you be OK with using nsTreeSanitizer if it had a mode that dropped all attributes > > whose values have "javascript:", "data:" or "base64" as a substring? > > I would be OK with using nsTreeSanitizer is and only if it can do everything > that mozSanitizingSerializer can. That includes mostly 2 general points: > * complete paradoia in what's allowed, *not* trusting Gecko protections even > one bit. > That includes any and all scripts and anything that might come in the > future. I can implement equivalent paranoia (i.e. drop even non-URL attributes whose value contains "javascript:", "data:" or "base64") in order to avoid arguing about this point. > * configurability of allowes tags and attributes, and different config per > caller Is this a Thunderbird product requirement, what you would as a user like to have in Thunderbird or you speaking as an author of non-comm-central XULRunner apps? Since, by code inspection, Thunderbird seems to always use the same whitelist and doesn't seem offer UI (apart from about:config) for changing it, I find it hard to believe that generic whitelist configurability was a Thunderbird product requirement. > > nsTreeSanitizer throws away <link rel>, <embed> and <object> > > Does it use a whitelist or blacklist? Whitelist. > If a tag is added that is desirable, we only need to change the default pref, If a tag is added, Gecko gets recompiled anyway, so it's OK for the whitelist to be baked into C++. > and if users disagree with our choice, they can do that themselves. The sanitizer is (apart from what you say about <font>) about script injection and privacy (no external loads when opening an email). Whether something enables script injection or causes external loads is not a matter of user opinion. We have zillions of security-oriented decisions baked in code without allowing users to second guess them from about:config. > > based on what you (or another power user) might want to adapt to your needs. > > I have explained extensively that the choice of which tags are allowed and > which are not is *highly* subjective. You have mentioned <font> as a subjective matter. You also mentioned plug-ins, but that example doesn't make sense, because if you allow plug-ins, you are fully XSS vulnerable, so might as well turn off the sanitizer altogether. > By default, we don't allow "font", out > of mere matters of taste, but many users like or need colored fonts and find > that this makes the feature unusuable for them. Currently, AFAICT, Thunderbird has no UI (apart from about:config) for configuring whether the user wants <font> or not. I don't find arguments from about:config particularly persuasive. If stuff is important for users, it should have UI. > They have the choice of > adapting it, while with nsTreeSanitizer, every caller and every user is > stuck with the same list. If Thunderbird wants to disallow <font> even though it's not a script injection or privacy risk, I can expose a flag in nsTreeSanitizer that Thunderbird can use to disallow <font>. > > mozSanitizingSerializer doesn't contain anything that would make it special in a way > > that'd require m-c to continue to contain that particular piece of legacy code. > > That is factually untrue. What I mean is this: mozSanitizingSerializer does not appear to embody hard-to-replicate subtleties or a large number of unique features. Therefore, replicating its salient features in nsTreeSanitizer is a reasonable option for providing the functionality comm-central needs instead of continuing to maintain another sanitizer in m-c. (In reply to Ben Bucksch (:BenB) from comment #16) > > > Firefox doesn't allow plug-ins or JS in RSS. > > That's not the point. > > (This wasn't about plugins specifically, either. I also mentioned colors and > tables. I fear you're not making a honest discussion here.) > > This bug is not about getting rid of the component, but adapting it to the > new API. Somebody changed the API, they also have to adapt the code that > uses the API. That's normal. After investigating the matter further than I did before I filed this bug, I have concluded that it makes sense to remove mozSanitizingSerializer and to add just enough functionality to nsTreeSanitizer to make it an acceptable replacement for mailnews purposes.

Comment 18

•

14 years ago

The ability to configure the whitelist is important for large installations to use it this feature, because they have users who will complain that they need this and that and with a compiled list, it's a matter of facing constant complaints of users or being insecure. I had this very concrete case where a large government installation wanted to enable it, but only after modifying the list of tags to allow more presentation. I fundamentally disagree with your stance that all prefs that have no UI can be removed. Free software is about choice and contributing. You're not at Oracle. If I create something, I have the reasonable expectation that there's nobody else coming later and goes and destroys it. > I have concluded OK, you choose the conflict. Sorry, but that is not your decision, this is *my* component. If you change APIs, you fix the fallout. Removing components is not the right way. This component is critical for ensuring the security of email, for people and large installations who need it. It has stood 10 years without a single exploit that I know of, apart from image decoders, while Gecko has exploits constantly. You are just not in a position to second-guess the design of a component that has faired so good.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 19

•

14 years ago

Again, if you want to make nsTreeSanitizer do *everything* that mozSanitizingSerializer can, go ahead. I don't want to stand in the way of progress, but I do oppose steps back.

Comment 20

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #18) > > I have concluded > > OK, you choose the conflict. Oh, come on. You've been at least as aggressive in this bug.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 21

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #18) > The ability to configure the whitelist is important for large installations > to use it this feature, because they have users who will complain that they > need this and that and with a compiled list, it's a matter of facing > constant complaints of users or being insecure. I had this very concrete > case where a large government installation wanted to enable it, but only > after modifying the list of tags to allow more presentation. Was this about something more than <font>? Could this "want more presentation" / "want less presentation" be captured in a boolean-ish pref instead of having user micromanage the whitelist? > I fundamentally disagree with your stance that all prefs that have no UI can > be removed. That's not my stance. Whether it makes sense to maintain UIless prefs depends on the impact of the UIless code paths. > If I create something, I have the reasonable expectation that there's nobody > else coming later and goes and destroys it. ... > Sorry, but that is not your decision, this is *my* component. If you change > APIs, you fix the fallout. Removing components is not the right way. I can see that perspective. Another perspective is that this is code in Gecko / mozilla-central that * isn't used by Firefox * isn't used by non-binary parts of AMO-hosted Firefox extensions as indexed on MXR * duplicates functionality of code that is used by Firefox * is expected to be maintained by Gecko developers * starts with a comment that says "I am not proud about the implementation here at all. Feel free to fix it :-)." and continues with similar comments. * is used in Thunderbird but could be replaced in Thunderbird by code that is used in Firefox (plus minor tweaks) without regressing the functionality that Thunderbird exposes in its UI I don't think it's a reasonable expectation that code landed in Gecko in the Netscape era will stay in Gecko and be maintained by Gecko developers in m-c when the code is unused by Firefox or the bulk of Firefox extensions. I'm trying to go for code reuse between Firefox and Thunderbird by using Firefox's sanitizer in Thunderbird with tweaks to avoid regressing functionality that Thunderbird exposes in its UI. If Thunderbird module owners are of the opinion that the functionality has to be delivered by the lines of code that are now in mozSanitizingSerializer and/or that arbitrary whitelist configurability is needed, maybe we should consider moving mozSanitizingSerializer over to comm-central, exposing an nsIDocumentEncoder contract id that picks up mozSanitizingSerializer as an nsIContentSerializer and hope that the bits copied and pasted from nsPlainTextSerializer still work. > This component is critical for ensuring the security of email, for people > and large installations who need it. It has stood 10 years without a single > exploit that I know of, apart from image decoders, while Gecko has exploits > constantly. You are just not in a position to second-guess the design of a > component that has faired so good. AFAICT, allowing users to configure the whitelist is a security risk--not a design that contributes positively to the security track record of the feature. Some of the security track record of the component is based on the merits of what it does. Other parts of the component are at best incidental to the security track record. (The code bears comments like "I don't really know, what these functions do".) I don't think it's reasonable position that the functionality has to be delivered by these exact lines of code.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 22

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #18) > If I create something, I have the reasonable expectation that there's nobody > else coming later and goes and destroys it. You're certainly free to maintain it in your own repo. In any other interpretation, i can't really think you seriously mean that code that go in should never be allowed to change/get rewritten.

Assignee

Updated

•

14 years ago

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Summary: Change mozSanitizingSerializer to not use old HTML parser → Use nsTreeSanitizer instead of mozSanitizingSerializer

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 23

•

14 years ago

Henri, will you port over the features? If not, please leave the bug was it was.

Assignee

Comment 24

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #23) > Henri, will you port over the features? I'm planning to * Add a capability for blocking external references in embedded content. * Add an option for removing or allowing presentational HTML and expose this as an about:config pref in Thunderbird. * Add a capability for dropping HTML forms. Unless Thunderbird module owners indicate that arbitrary whitelist configurability is a Thunderbird requirement, I'm not planning to port over controlling the whitelist with the current sort of pref. But see above for the presentational HTML removal/retention use case. I'm not planning to port over the attribute value paranoia for all attributes, because upon re-reading the code I noticed that the code doesn't actually defend against deliberate attacks. OTOH, there's no clear need for correctly-implemented paranoia of that nature.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Updated

•

14 years ago

Blocks: 729050

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 25

•

14 years ago

Attached patch Part 1: Introduce the new API in Gecko (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 26

•

14 years ago

> * Add an option for removing or allowing presentational HTML > and expose this as an about:config pref in Thunderbird. Sorry, but as stated often before, that's not good enough. The level of stripping in terms of tags is a policy thing and highly subjective. I mentioned several examples above. It should be configurable. It's not too hard to do either, as the current code can do it.

Assignee

Comment 27

•

14 years ago

Attached patch Part 1: Introduce the new API, v2 (obsolete) — Details — Splinter Review

Attachment #599174 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 28

•

14 years ago

I could be wrong, but I think Florian had some issues with sanitizing html for IRC, and might be interested in this bug.

Assignee

Comment 29

•

14 years ago

Attached patch Part 2 WIP: Use the API from mailnews (obsolete) — Details — Splinter Review

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 30

•

14 years ago

Attached patch Part 1: Introduce the new API, v3 (obsolete) — Details — Splinter Review

Attachment #599939 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 31

•

14 years ago

Attached patch Part 2: Use the new API from mailnews — Details — Splinter Review

What would be a better place for the pref migration code?

Attachment #600379 - Attachment is obsolete: true

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 32

•

14 years ago

Attached patch Part 3: Remove mozSanitizingSerializer from mozilla-central — Details — Splinter Review

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 33

•

14 years ago

Attached patch Part 1: Introduce the new API, v4 — Details — Splinter Review

Attachment #600882 - Attachment is obsolete: true

neil@parkwaycc.co.uk

Comment 34

•

14 years ago

Comment on attachment 600884 [details] [diff] [review] Part 2: Use the new API from mailnews These prefs don't have UI so I would be tempted to relnote them instead. Is there a good reason not to have a single new flags preference?

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 35

•

14 years ago

(In reply to neil@parkwaycc.co.uk from comment #34) > Comment on attachment 600884 [details] [diff] [review] > Part 2: Use the new API from mailnews > > These prefs don't have UI so I would be tempted to relnote them instead. Doing the migration isn't that hard, so it seems polite to do it. If not doing the migration gets an r+ more easily, works for me. ;-) > Is there a good reason not to have a single new flags preference? I didn't really consider exposing the flags integer as a pref. Two booleans seem more self-documenting than a single int pref that takes mystery values.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 36

•

14 years ago

Comment on attachment 600896 [details] [diff] [review] Part 1: Introduce the new API, v4 Dropping forms, dropping non-CSS presentation and limiting src to cid: only come from the behavior of mozSanitizingSerializer. The use case from dropping media comes from the tb-enterprise mailing list; University of Oslo reportedly configures Thunderbird to drop images.

Attachment #600896 - Flags: review?(bugs)

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 37

•

14 years ago

Comment on attachment 600886 [details] [diff] [review] Part 3: Remove mozSanitizingSerializer from mozilla-central Requesting removal review from jst, since he super-reviewed the original addition.

Attachment #600886 - Flags: review?(jst)

Comment 38

•

14 years ago

Summary comment 15: I would be OK with using nsTreeSanitizer if and only if it can do everything that mozSanitizingSerializer can. That includes mostly 2 general points: * complete paradoia in what's allowed, *not* trusting Gecko protections even one bit. That includes any and all scripts and anything that might come in the future. * configurability of allowed tags and attributes, and different config per caller For first point, paranoia: The idea is to have 2 enitrely different levels of defense: One on document source level (sanitizer), one on logic level (Gecko). Even if there is a hole in the sanitizer or in Gecko, hopefully there wouldn't be the same hole in both at the same time. One exception are image decoders. comment 11: The real requirement is "no security hole in 20 years". ... Gecko has about 10 holes per month. In some contexts like email in higher security areas, this is extremely important. For second point, configurability on tag level: comment 11: For example, you probably want to be very restrictive in emails (spam, viruses, targetted attacks etc., you can't control what is coming in), but maybe you want to allow more in RSS (you subscribed yourself). comment 18: The ability to configure the whitelist is important for large installations to use it this feature, because they have users who will complain that they need this and that and with a compiled list, it's a matter of facing constant complaints of users or being insecure. I had this very concrete case where a large government installation wanted to enable it, but only after modifying the list of tags to allow more presentation. --- Henri, your changes may or may not provide the first point, I would need to review it. (not smaug, not jst, but me. I don't know why you're not asking other people and not the original author for review. You can't pick and choose reviewers depending on the result you want. I said repeatedly that I am not opposed to the general idea of removing my own component.) Your changes definitely fail the second point. I consider them important for this feature to be used in practive.

Updated

•

14 years ago

Attachment #600886 - Flags: review?(ben.bucksch)

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Updated

•

14 years ago

Attachment #600886 - Flags: review?(ben.bucksch)

Assignee

Comment 39

•

14 years ago

(In reply to Ben Bucksch (:BenB) from comment #38) > I would be OK with using nsTreeSanitizer if and only if it can do everything > that mozSanitizingSerializer can. I understand and disagree. I think the merits of the replacements should be evaluated by seeing how well it addresses real-world use cases--not how well it replicates behaviors that might theoretically be used. > That includes mostly 2 general points: > * complete paradoia in what's allowed, *not* trusting Gecko protections even > one bit. The new parser outputs a DOM, so when the new parser is used for sanitizing content, you pretty much have to rely on Gecko features that make the DOM inert (specifically, having its docshell be null, having it principal set to a null principal, not giving it a script global object and having its "loaded as data" bit set to true). (There's no way I'm adding new non-DOM output to the new parser just to enable sanitization without trusting the Gecko DOM.) > That includes any and all scripts and anything that might come in the > future. > * configurability of allowed tags and attributes, and different config per > caller You are shifting the goalposts from what Thunderbird does currently: It doesn't have a different config per caller. It uses the same config for both the mimemoz2 caller and the message composition caller. In particular, there does not seem to exist a separate RSS caller. > For second point, configurability on tag level: > comment 11: > For example, you probably want to be very restrictive in emails (spam, > viruses, targetted attacks etc., you can't control what is coming in), but > maybe you want to allow more in RSS (you subscribed yourself). This is shifting the goalposts from what the current code does, because the RSS whitelist isn't actually used anywhere. > comment 18: > The ability to configure the whitelist is important for large installations > to use it this feature, because they have users who will complain that they > need this and that and with a compiled list, it's a matter of facing > constant complaints of users or being insecure. I have asked for use cases here and on the tb-enterprise list. I have added support for both actual use cases stated. I think it would be inappropriate to support abstract architectures in mozilla-central when simpler solutions address actual use cases. > Henri, your changes may or may not provide the first point, I would need to > review it. Please do take a look to see what the code provides. > (not smaug, not jst, but me. I don't know why you're not asking > other people and not the original author for review. You can't pick and > choose reviewers depending on the result you want. nsTreeSanitizer lives in content/base/src/, so additions to it require review by a "Document Object Model" module owner or peer. smaug is a module peer. I requested review from him, because he has recently reviewed patches related to the DOMParser work that this patch leverages and patches to parser/html/ that this patch also (lightly) adds stuff to. Alternatively, requesting review from bz would have been reasonable, since bz has previously reviewed nsTreeSanitizer code. mozSanitizingSerializer also lives in content/base/src/, so removing it requires review from a "Document Object Model" owner or peer. That the removal has been performed technically correctly is trivial to review, so this is more of a policy review. For policy review, it seems prudent to request review from an owner rather than a peer. This leaves jst and peterv. It makes sense to request review from jst, because http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/content/base/src/mozSanitizingSerializer.cpp&rev=1.1&root=/cvsroot says jst sr'ed the addition. As for why I'm not requesting you as the original author of mozSanitizingSerializer to review even though it often makes sense to request review from non-module owner code author: If I was making changes to code in mozSanitizingSerializer, it would make sense to ask you to review the changes since you are familiar with the code. However, removing the whole file doesn't involve familiarity on a code line level. Removing code that Firefox *does* *not* *use* (and whose functionality is duplicated by code the Firefox *does* use) from mozilla-central is a policy issue and policy matters need review from the relevant module owner (or perhaps peer). It's already pretty clear from the comments on this bug that you and I disagree on the policy level. Neither of us is a "Document Object Model" owner or peer, so neither of us gets to make the policy decision under the Mozilla module ownership system / review rules. When stuff in m-c that c-c depends on goes away, what the c-c callers do instead is a call for the relevant owner/peer of the c-c code. For part 2, I intend to request review from bienvenu, but I haven't done so yet, because my attempts to use Thunderbird Try have been unsuccessful and I was planning on deferring the review request until I've seen a green try run and it will be easier to get a green Thunderbird Try run when more of the m-c dependencies are in m-c instead of loose patches to be applied by the tryserver. > I said repeatedly that I am not opposed to the general idea of removing my own > component.) I wanted to avoid refuting the "my component" aspect, since I thought that doing so would be impolite, but since the notion keeps coming up: mozSanitizingSerializer isn't a module per https://wiki.mozilla.org/Modules/All . Therefore, it's not owned in the module ownership sense except as part of the larger "Document Object Model" module. You aren't an owner or peer of the relevant module under the module owner system (neither am I). mozSanitizingSerializer has had 43 CVS revisions and then 17 additional hg revisions. 60 revisions in total when including the initial landing and excluding the cvs to hg import. Of these, you have authored 3: The initial landing in 2002, a build bustage fix immediately afterwards and a fix for bug 243040 in 2004. The other 57 revisions have been people other than you bearing the burden of maintaining this code when the world around it has changed. This is massively collaborative project. I think it's unreasonable that if someone once manages to land some code and then fixes one bug in the code two years later, he gets to rule over what happens to the code a decade later, when other people have been bearing the maintenance burden and *the code isn't even using by other mozilla-central code*. (There are cases where someone is in charge of a piece of Gecko, but in such cases, the person is expected to do the work keeping the piece of Gecko afloat instead of just telling other people to keep it just the way it was and the piece is expected to be used by the rest of m-c.) I've been trying to be really nice about this. As far as I can tell, m-c tree rules allow changes that burn c-c. However, I'm trying to be nice and not break c-c. In principle, I could just try to leave my c-c changes at just supporting the features that Thunderbird exposes in its UI. Instead, I've been so nice about this that I've reached out to tb-enterprise and sought to address even hidden pref uses cases. But seeing how the maintenance burden of mozSanitizingSerializer is borne by other people than you and knowing that nsTreeSanitizer maintenance burnen will be borne by me regardless of what happens here, I don't think it makes sense to be so nice as to do everything you say on the authority of you landing the initial revision and put myself on the hook being yet another person bearing the mozSanitizingSerializer maintenance burden going forward. I suggest we proceed as follows: I land the patches attached here (m-c parts subject to DOM module owner/peer review; c-c part subject to MIME Parser and Mail and News Backend owner/peer review). This allows me to meet the goals for this quarter in a way that doesn't burn c-c, doesn't regress Thunderbird's default features or hidden features to the extent they are motivated by identified real-world use cases and in a way that doesn't depend on your schedule and allows other m-c work to proceed. Then, if you still feel the result is inadequate, I suggest you take the time to plug alternative sanitization code (not to be landed in m-c) in the point in mimemoz2.cpp where I placed a comment that you could add alternative code there (subject to relevant module owner review, of course). It might seem unfair that I'm shifting some work to you, but it's not reasonable that hidden features you want to keep around get a free ride forever (in the sense of no additional work required from you) just because you did the initial work in 2002.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Updated

•

14 years ago

Blocks: 731162

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Updated

•

14 years ago

Attachment #600884 - Flags: review?(dbienvenu)

Johnny Stenback (:jst)

Comment 40

•

14 years ago

Comment on attachment 600886 [details] [diff] [review] Part 3: Remove mozSanitizingSerializer from mozilla-central r=jst for removing this. If this is deemed necessary from a Thunderbird point of view this code should be moved there (and adapted to the new parser), this serves no purpose in mozilla-central.

Attachment #600886 - Flags: review?(jst) → review+

Comment 41

•

14 years ago

Comment on attachment 600884 [details] [diff] [review] Part 2: Use the new API from mailnews this fails to apply in mimemoz2.cpp - I'll see what's up.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 42

•

14 years ago

mozilla fails to build, haven't investigated yet. nsTreeSanitizer.cpp c:/builds/tbirdhq/objdir-tb/mozilla/content/base/src/../../../../../mozilla/cont ent/base/src/nsTreeSanitizer.cpp(1040) : error C2039: 'keygen' : is not a member of 'nsGkAtoms' c:\builds\tbirdhq\objdir-tb\mozilla\dist\include\nsGkAtoms.h(49) : see d eclaration of 'nsGkAtoms' c:/builds/tbirdhq/objdir-tb/mozilla/content/base/src/../../../../../mozilla/cont ent/base/src/nsTreeSanitizer.cpp(1040) : error C2065: 'keygen' : undeclared iden tifier

Assignee

Comment 43

•

14 years ago

Oops. Sorry about the blank dependency field. This requires bug 650787 and the remaining parts of bug 650784 to be applied first.

Depends on: 650787

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Comment 44

•

14 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #43) > Oops. Sorry about the blank dependency field. This requires bug 650787 and > the remaining parts of bug 650784 to be applied first. ugh, I can't get any of the patches in bug 650784 to apply - have they bit-rotted, or can you tell me which order they need to be applied in? thx!

Assignee

Comment 45

•

14 years ago

(In reply to David :Bienvenu from comment #44) > ugh, I can't get any of the patches in bug 650784 to apply - have they > bit-rotted, or can you tell me which order they need to be applied in? thx! Hmm. Attachment 591720 [details] [diff] had rotted. Sorry. Unrotted as attachment 603648 [details] [diff] [review]. Patches for m-c: Attachment 603648 [details] [diff] Attachment 599068 [details] [diff] Attachment 600896 [details] [diff] Patches for c-c: Attachment 603650 [details] [diff] Attachment 600884 [details] [diff]

Comment 46

•

14 years ago

great, thx, I've got all the patches applied now. I'll build and test today.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 47

•

14 years ago

Comment on attachment 600884 [details] [diff] [review] Part 2: Use the new API from mailnews one nit - we don't tend to use the braces if they're not needed: + if (dropPresentational) { + flags |= nsIParserUtils::SanitizerDropNonCSSPresentation; + } + if (dropMedia) { + flags |= nsIParserUtils::SanitizerDropMedia; this all seems to work for mail. I couldn't get rss feeds to display plain text, but I couldn't get that to happen w/o these patches either.

Attachment #600884 - Flags: review?(dbienvenu) → review+

Comment 48

•

14 years ago

Comment on attachment 600896 [details] [diff] [review] Part 1: Introduce the new API, v4 >+ if (mCidEmbedsOnly && >+ NS_SUCCEEDED(rv) && >+ kNameSpaceID_None == aNamespace) { >+ if (nsGkAtoms::src == aLocalName || nsGkAtoms::background == aLocalName) { >+ nsCAutoString scheme; >+ attrURI->GetScheme(scheme); >+ if (!scheme.EqualsLiteral("cid")) { Why not attrURI->SchemeIs >-nsParserUtils::Unescape(const nsAString & aFromStr, >- nsAString & aToStr) >+nsParserUtils::Unescape(const nsAString& aFromStr, >+ nsAString& aToStr) Could you align the parameters >+nsParserUtils::Sanitize(const nsAString& aFromStr, >+ PRUint32 aFlags, >+ nsAString& aToStr) ditto

Attachment #600896 - Flags: review?(bugs) → review+

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 49

•

14 years ago

(In reply to Olli Pettay [:smaug] from comment #48) > Why not attrURI->SchemeIs Oops. I forgot we had that. Fixed. > Could you align the parameters > ditto Aligned. Thanks. https://hg.mozilla.org/integration/mozilla-inbound/rev/91750caaa3fe I'll land part 2 on c-c (with review comments addressed) and part 3 on m-c after part 1 has made it from inbound to central.

Whiteboard: [please leave open after inbound merge; see comment 49]

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Updated

•

14 years ago

Blocks: 737013

Matt Brubeck (:mbrubeck)

Comment 50

•

14 years ago

https://hg.mozilla.org/mozilla-central/rev/91750caaa3fe

Serge Gautherie (:sgautherie)

Updated

•

14 years ago

Depends on: 737314

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 51

•

14 years ago

Landing on c-c is blocked by bug 737314.

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 52

•

14 years ago

(In reply to David :Bienvenu from comment #47) > one nit - we don't tend to use the braces if they're not needed: Dropped. > this all seems to work for mail. I couldn't get rss feeds to display plain > text, but I couldn't get that to happen w/o these patches either. It seems to me that RSS hasn't really been using mozSanitizingSerializer in a while. Thank you for the r+ Landed: http://hg.mozilla.org/comm-central/rev/db3613baa9c0 (In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #40) > Comment on attachment 600886 [details] [diff] [review] > Part 3: Remove mozSanitizingSerializer from mozilla-central > > r=jst for removing this. Thank you. Landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/390cc6053800

Whiteboard: [please leave open after inbound merge; see comment 49]

Comment 53

•

14 years ago

> pref("mailnews.display.html_sanitizer.drop_media", false); // whether to drop <img>, <video> and <audio> Sorry, but that's wrong. We most definitely don't want <video> in emails when the sanitizer is on.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

14 years ago

Attachment #600884 - Flags: review-

Comment 54

•

14 years ago

Comment on attachment 600884 [details] [diff] [review] Part 2: Use the new API from mailnews This patch has r+ from someone who actually has the authority to review it.

Attachment #600884 - Flags: review-

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 55

•

14 years ago

Comment on attachment 600884 [details] [diff] [review] Part 2: Use the new API from mailnews ms2ger, please leave my review markers (and those of other people, e.g. at bug 714408) alone. I didn't remove bienvenu's review+, I just added my own comment. This is code that I myself wrote and I have every right in the world to make review comments about it.

Attachment #600884 - Flags: review-

Comment 56

•

14 years ago

Comment on attachment 600884 [details] [diff] [review] Part 2: Use the new API from mailnews You have every right to make comments. You don't have the right to act like you own this code, and hence no right to set the review flag.

Attachment #600884 - Flags: review-

Henri Sivonen (:hsivonen) (not reading bugmail until 2026-07-27)

Assignee

Comment 57

•

14 years ago

dev-doc-needed for the new stuff in nsIParserUtils. addon-compat for potential binary add-ons only. mozISanitizingSerializer wasn't available to JS. (In reply to Ben Bucksch (:BenB) from comment #53) > > pref("mailnews.display.html_sanitizer.drop_media", false); // whether to drop <img>, <video> and <audio> > > Sorry, but that's wrong. We most definitely don't want <video> in emails > when the sanitizer is on. Let's handle follow-up issues in follow-up bugs. (<video> is subject to the same cid:-only policy as <img>, FWIW.)

Keywords: addon-compat, dev-doc-needed

https://hg.mozilla.org/mozilla-central/rev/390cc6053800

Comment 58

•

14 years ago

> Let's handle follow-up issues in follow-up bugs. OK > (<video> is subject to the same cid:-only policy as <img>, FWIW.) That's good and takes care of privacy issues. The security issues (attacks) remain, though.

alta88

Comment 59

•

14 years ago

some historical context - bug 438429 attempted to separate the (potentially) more sanitized presentation for mail, from that for feed summaries. so in anticipation of bug 458606, UI was added for feed messages (View->Feed Message Body As), but the change to actually read the pref was never implemented, and feeds continued using the mail pref despite the misleading UI. so the question remains, should there be separate handling for feed summaries?

Matt Brubeck (:mbrubeck)

Comment 60

•

14 years ago

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 14.0

Comment 61

•

14 years ago

(In reply to alta88 from comment #59) > some historical context - > bug 438429 attempted to separate the (potentially) more sanitized > presentation for mail, from that for feed summaries. so in anticipation of > bug 458606, UI was added for feed messages (View->Feed Message Body As), but > the change to actually read the pref was never implemented, and feeds > continued using the mail pref despite the misleading UI. > > so the question remains, should there be separate handling for feed > summaries? What do you think? It's true that rss feeds are perhaps a bit more trusted than random e-mails, but they could still be hacked. I.e., would anyone set these two prefs differently? Though, given the UI, the path of least resistance is probably to make the backend use the feed pref.

Comment 62

•

14 years ago

bienvenu, there are 2 fundamental differences between email and RSS: * selection: I subscribe RSS myself, and I'm usually quite selective, even more selective than the webpages I go to. With email OTOH, I have absolutely no control over what eends up in my mailbox. Spam, phishing are commonplace, and anybody could send me a targetted attack (which does happen). * media use: Email is very limited, partially by size limits and transmission times. Videos in email probably won't happen anytime soon because of that, and external files are a problem for privacy. Plugins are disabled for security reasons (see first point). RSS, OTOH, is webbish and commonly has big images or even video. Users will want to see that, with pleasure.

alta88

Comment 63

•

14 years ago

(In reply to David :Bienvenu from comment #61) > (In reply to alta88 from comment #59) > > some historical context - > > bug 438429 attempted to separate the (potentially) more sanitized > > presentation for mail, from that for feed summaries. so in anticipation of > > bug 458606, UI was added for feed messages (View->Feed Message Body As), but > > the change to actually read the pref was never implemented, and feeds > > continued using the mail pref despite the misleading UI. > > > > so the question remains, should there be separate handling for feed > > summaries? > > What do you think? It's true that rss feeds are perhaps a bit more trusted > than random e-mails, but they could still be hacked. I.e., would anyone set > these two prefs differently? Though, given the UI, the path of least > resistance is probably to make the backend use the feed pref. there is an issue if people commonly set mail message view prefs to be non html and use feed summaries expecting richer content. perhaps there's some data on that. independent of images/fonts/media tags etc which those prefs control, most embedded media also requires js to actually play. i believe there's something in the code to exempt feeds, but it doesn't seem to work for js in summaries. in any event, the UI is misleading, so should either be removed or supported on the backend. i was leaning toward removing it, mostly because i don't think (the majority of) people change the mail pref to non html. the valid reasons for excluding tags due to theoretical <script> and <image src=''> type exploits from 10 yrs ago may no longer be as valid, content scripts can't access the DOM etc etc. but also because since feed summary presentation is pretty important to me (lots of graphs and videos) i made TotalMessage to allow images/js on a per message basis on demand, so it works exactly how i like ;)