Closed Bug 750436 Opened 12 years ago Closed 11 years ago

SecReview: [HTML5] Re-implement HTML sanitizer using the HTML5 parser

Categories

(mozilla.org :: Security Assurance: Review Request, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: curtisk, Assigned: freddy)

References

Details

(Whiteboard: [completed secreview][fuzzing][score:0::Low][Fx])

Attachments

(1 file, 2 obsolete files)

SecReview tracking bug
Actions regarding the review of the dependent bug should be tracked here.
Risk/Priority Ranking Exercise https://wiki.mozilla.org/Security/RiskRatings

Priority: N/A

Operational: 0 - N/A
User: 0 - N/A
Privacy: 0 - N/A
Engineering: 2 - Normal
Reputational: 0 - N/A

Priority Score: 0
Whiteboard: [pending secreview][fuzzing] → [pending secreview][fuzzing][score:0::Low]
I plan to fuzz the HTML sanitizer for memory safety bugs, but it's also a security surface that needs manual review.
Assignee: jruderman → nobody
jesse, can you fuzz it to see which tags and JS code it lets through? It shouldn't let any JS code whatsoever through.
Flags: needinfo?(jruderman)
My experience so far is that fuzzing is not effective at finding policy bugs, because it's hard to describe the policy in a way that (1) the fuzzer can check and (2) makes sense for the kinds of invalid input that the fuzzer generates.

I can check for <*:script> tags, but there are so many other forms of scripts (javascript: URLs, data: URLs in some cases, on* attributes) and plugins...
Flags: needinfo?(jruderman)
I suggest this also get's a manual review from somebody websec :)
Jesse, are there a bunch of HTML test files for various means to run Javascript, preferably in a mean / non-obvious / attacking way? We could run all these and see whether the scripts run.
dan can you weigh in on who should do the manual review, would this be good for freddy?
Flags: needinfo?(dveditz)
Sure, he seems to be interested.
Assignee: nobody → fbraun
Flags: needinfo?(dveditz)
Because Freddy asked:
This is used e.g. by Thunderbird as security feature. e.g. many enterprise installations run old versions like thunderbird 3.1, and we don't want enterprises being commercially spied on by simply sending HTML emails that exploit known Gecko bugs. Similarly, I want to be protected against yet unknown Gecko bugs.

So, for the purpose of security, we should run with this assumption (note: assumption, not intended usecase, although some Firefox devs have done similar things with similar implementations, against my advise):
You get an HTML document from an attacker (say, an HTML email from a corporate spy), and you run it through the sanitizer, and then you insert it into chrome (!) HTML as-is. If there is any way for the attacker to do anything outside the document frame, this is a security bug.
Comment 4 wrote:
> I can check for <*:script> tags, but there are so many other forms of scripts
> (javascript: URLs, data: URLs in some cases, on* attributes) and plugins...

Indeed. The core difficulty in writing - and security-checking - a sanitizer is to *think* of all these forms and ways, and not to miss any.
I want to poke at the HTML sanitizer, but I am wondering how to actually use it. I suppose I would need to build a testbed that allows me to supply some input files and then returns some (hopefully) sanitized output files. Well and I know very little to no C++ :)
Any suggestions?
(In reply to Frederik Braun [:freddyb] from comment #11)
> I want to poke at the HTML sanitizer, but I am wondering how to actually use
> it. I suppose I would need to build a testbed that allows me to supply some
> input files and then returns some (hopefully) sanitized output files. Well
> and I know very little to no C++ :)
> Any suggestions?

Wayne, do you have some idea on who can help Freddy out here?
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #12)
> Wayne, do you have some idea on who can help Freddy out here?
Joshua is probably closest to this code at the moment.
Flags: needinfo?(Pidgeot18)
Going from how mime calls the HTML sanitizer, it looks like a simple xpcshell script can be written to load a file, call nsIParserUtils::sanitize, and write the result to a file.
Flags: needinfo?(Pidgeot18)
Whiteboard: [pending secreview][fuzzing][score:0::Low] → [pending secreview][fuzzing][score:0::Low][Fx]
Depends on: 892473
No longer depends on: 892473
I have set up a testbed to perform xpcshell tests against the nsIParserUtils.sanitize() function.

My current setup:
I have taken XSS vectors from html5sec.org (namely the payloads and items as json) and ran them across the existing sanitizer and gathered results (attack vector and sanitized version) in a JSON file as output.

My next step would be applying the new sanitizer patches against my comm-central and comparing sanitized results with those that I gathered from the old parser for each attack vector.
Applying the patches fails. Can you help me set this up *or* look into why applying your patches against comm-central fails, Henri? I get quite a few failed hunks..
Flags: needinfo?(hsivonen)
Attached file xpcshell test to work with sanitizer (obsolete) —
Attaching current xpcshell test environment (the test doesn't return true or false, I just used it to generate the sanitizer output data).
Oh wait...the patch has already been commited.
I'll try assembling a few other test vectors and then make the outcome of this secreview a real xpcshell test that can be commited as well...
Flags: needinfo?(hsivonen)
I want to make this a self-contained xpcshell test and wonder what the best practice is for carrying data along. Can I put arbitrary json files along the test in the same directory? I'd like to avoid carrying a huge chunk of data within the test file itself (huge meaning about 3KB of JSON, e.g. http://www.pastebin.mozilla.org/2644563)
Attached patch test case for html sanitizer (obsolete) — Splinter Review
This is a working patch that adds a test for the sanitizer component. Feedback highly appreciated :)
Attachment #775642 - Attachment is obsolete: true
Attachment #779159 - Flags: review?
Attachment #779159 - Flags: review? → review?(hsivonen)
Comment on attachment 779159 [details] [diff] [review]
test case for html sanitizer

r+ as far as the technical stuff goes. I don't know how the html5sec.org vectors are licensed. Please make sure that this patch is OK license-wise.
Attachment #779159 - Flags: review?(hsivonen) → review+
The content is CC 3.0 BY licensed. I think we're fine if I add a line that points to the repo (https://code.google.com/p/html5security/). Should we ask a legal person for that?
Gerv, do we allow CC-by 3.0 for third-party-originating test data in m-c? We'd need a licensing legend somewhere in any case, right?
Flags: needinfo?(gerv)
For code not shipped in our products, anything open source will do, and I'd say that includes CC-BY.

If it doesn't ship with Firefox, no need for a licensing legend anywhere except the directory in question.

Please add a LICENSE file to the directory with a copy of CC-BY 3.0.

Gerv
Flags: needinfo?(gerv)
Could we reformat the JSON so that it's readable, please?
Carrying over Henri's r+.
I have pretty-printed the JSON as Ben suggested and added a file containing the CC-BY 3.0 license + proper attribution to the html5 security project.
Attachment #779159 - Attachment is obsolete: true
Attachment #789463 - Flags: review+
Keywords: checkin-needed
Whiteboard: [pending secreview][fuzzing][score:0::Low][Fx] → [completed secreview][fuzzing][score:0::Low][Fx]
https://hg.mozilla.org/integration/mozilla-inbound/rev/5f7da9fe6cf5

In the future, please make sure your patches have a commit message suitable for landing before requesting checkin.
Keywords: checkin-needed
And because we like it when things actually build after pushing a patch...
https://hg.mozilla.org/integration/mozilla-inbound/rev/4dceda951fba

Which also leaves me feeling really warm and fuzzy about how thoroughly this was tested prior to requesting checkin.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: