Open Bug 652818 Opened 13 years ago Updated 2 years ago

Science says we should do a security audit of nsContentUtils.cpp

Categories

(Core :: DOM: Core & HTML, defect, P5)

Other Branch
defect

Tracking

()

People

(Reporter: jorendorff, Unassigned)

References

Details

(Keywords: sec-audit, Whiteboard: [sg:audit])

Christian Holler was part of a team that used science to estimate the exploitability of various components of Mozilla.
  http://www.st.cs.uni-saarland.de/softevo/vulnerabilities.php

In brief the researchers used the dependency graph of the source tree (what calls what) and historical data about which files have contained security bugs (MFSAs -> bug numbers -> hg log) as input to mathematical wizardry. It turned out to be fairly accurate. This was in 2007.

I asked him to re-run the analysis. The results are here:
  http://users.own-hero.net/~decoder/mozilla-svm.html
It fingers content/base/src/nsContentUtils.cpp, and the score it assigns is disturbingly high.

It is debatable whether this estimate means anything; due to the kind of math used, it's very hard to know what's behind this, just what pattern the math is seeing in the data, etc. I am pretty skeptical myself. We could discuss these imponderables ad nauseum in this bug. I'd rather try something and see what happens. I am no expert in this code but my plan is to stare at it for 8 hours next Wednesday (4 May) (assuming nobody takes it first).

(This bug is closed for now. I don't know how security-sensitive the data at that URL is.)
Whiteboard: [sg:audit]
I compared the prediction with the most recent MFSAs (MFSA 2011-12 to 2011-18, released in April) and got two hits (same bug id):

#7 layout/base/nsLayoutUtils MSFA2011-12 619021
#9 layout/generic/nsFrame MSFA2011-12 619021

When I get some time I plan to investigate in how far a clustering algorithm could help to improve the reasoning of the predictions (i.e. hint on why this is component is considered problematic).
I forgot about this. Rescheduling for *next* Wednesday, 18 May.
Christian ran a more fine-grained version of the same analysis (2 weeks ago) and was able to offer this:

    Lines 1 to 999:       0.04126131
    Lines 1000 to 1999:   8.341698
    Lines 2000 to 2999:   1.187755
    Lines 3000 to 3999:   0.5436279
    Lines 4000 to 4999:   1.920721
    Lines 5000 to 5999:   4.188863
    Lines 6000 to 6522:   0.1392063 

So I will start by reading lines 1000-1999 or thereabouts.

Narrowing it too far, I gather, would eventually make the results meaningless.
Depends on: 658037
nsContentUtils::Shutdown is probably the source of most of the "smell" Christian's math is detecting in lines 1000-1999 of this file. It calls into everything. There's no way I can realistically audit that -- at least, not today.
OK, done with my first 8 hours. I read lines 1000-1999 and found one
bug, bug 658037. Here are some general observations:

There are several XXXbz comments that could be revisited, including
none-too-reassuring stuff like

>    // XXX We don't need to use index if name is there
>    // XXXbz We don't?  Why not?  I don't follow

Throughout this code return values don't get checked. This is probably
safe in most cases because of special knowledge this code has of
implementation details of the callee... or even the caller... but that
kind of tight coupling is inherently brittle, and even if you did buy
it, you wouldn't just throw the return value away. You'd assert.

All this seems kind of crazy, when a bug could so easily lead to
undefined behavior, and it would be so easy to just check the stupid
returns.

Maybe I'm like Joel Spolskly in
http://www.joelonsoftware.com/articles/Wrong.html
and I just don't know what "clean" is supposed to look like in this
code. I beg to be reassured on that point, if so, because to me checking
return values is C++ 101.
I would like bz or mrbkap to look closely at this code in
nsContentUtils::ReparentContentWrappersInScope:

>  // Try really hard to find a context to work on.
>  nsIScriptContext *context = aOldScope->GetContext();
>  if (context) {
>    cx = static_cast<JSContext *>(context->GetNativeContext());
>  }
>
>  if (!cx) {
>    context = aNewScope->GetContext();
>    if (context) {
>      cx = static_cast<JSContext *>(context->GetNativeContext());
>    }
>
>    if (!cx) {
>      sThreadJSContextStack->Peek(&cx);
>
>      if (!cx) {
>        sThreadJSContextStack->GetSafeJSContext(&cx);

There's similar code in a few other places. A few thoughts:

- This is about as sketchy as it gets. Why not just add:

         if (!cx) {
           sIKnowAGuy->HeKnowAGuy()->GetYouSafeJSContextRealCheap(&cx);

- nsHTMLDocument::OpenCommon can only be successfully called from
  script, right?  If so, since that's our only caller,
  sThreadJSContextStack->Peek here must succeed. So the GetSafeJSContext
  part at least should be dead code; I say dike it out.

- For that matter, why are we trying aOldScope and aNewScope first?

- If OpenCommon really can only be called from script, via XPConnect,
  then we should use explicit_jscontext and delete all of this.
The rest are nits.

In nsContentUtils::ParseIntMarginValue:
>  marginStr.CompressWhitespace(PR_TRUE, PR_TRUE);

No bug, but this method looks like it accepts a pretty wide range of
stupid bogus input. (Callers are XUL-only, looks like, so not exposed to
the web.)

> void
> nsContentUtils::GetOfflineAppManifest(nsIDocument *aDocument, nsIURI **aURI)

No bug, just very strange style choices. This is fallible but void. On
failure it just returns, leaving aURI (hopefully?) untouched. Also:

>  nsAutoString manifestSpec;
>  docElement->GetAttr(kNameSpaceID_None, nsGkAtoms::manifest, manifestSpec);
>
>  // Manifest URIs can't have fragment identifiers.
>  if (manifestSpec.IsEmpty() ||

Do COM objects always leave the out parameter alone on failure?  I
thought they generally did but that callers weren't supposed to
assume.

>  nsContentUtils::NewURIWithDocumentCharset(aURI, manifestSpec,
>                                            aDocument,
>                                            aDocument->GetDocBaseURI());
>}

Same thing here. All it would take to make me happy is change this to
return an nsresult and do normal error checking.

Comment on nsContentUtils::CheckSameOrigin:
> * Checks whether two nodes come from the same origin. aTrustedNode is
> * considered 'safe' in that a user can operate on it and that it isn't
> * a js-object that implements nsIDOMNode.
> * Never call this function with the first node provided by script, it
> * must always be known to be a 'real' node!

Working in js/src has convinced me that information like this doesn't belong
in a comment. XPConnect should provide a way to check this and we should
assert. Same thing in several other places.

>  // XXXbz should we actually have a Subsumes() check here instead?  Or perhaps
>  // a separate method for that, with callers using one or the other?
>  if (NS_FAILED(trustedPrincipal->Equals(unTrustedPrincipal, &equal)) ||

No bug. If anything we're too strict here. The comment could stand to be
replaced with one explaining why this is fine, though.

In nsContentUtils::CanCallerAccess:
>  isSystem = NS_FAILED(rv) || isSystem;
>  const char* capability =
>    NS_FAILED(rv) || isSystem ? "UniversalXPConnect" : "UniversalBrowserRead";

The `isSystem = ` line is redundant, since we then do the
`NS_FAILED(rv) ||` bit again anyway.

In nsContentUtils::GetDocumentFromCaller:
>  JSAutoEnterCompartment ac;
>  if (!ac.enter(cx, obj)) {
>    return nsnull;
>  }

I wonder why GetDocumentFromCaller needs one of these but
GetDocumentFromContext doesn't.

In IsCharInSet:
>  PRUnichar ch;
>  while ((ch = *aSet)) {
>    if (aChar == PRUnichar(ch)) {

The cast is redundant; if we want a cast here at all it belongs on the
previous line, where the 8-bit *aSet is being assigned to 16-bit ch.
> including none-too-reassuring stuff like

Yeah, that form control state restoration code is sorta broken by design....  Shouldn't be a security issue, though.

> nsContentUtils::ReparentContentWrappersInScope:

peterv's your man.

> nsHTMLDocument::OpenCommon can only be successfully called from script, right?

Hmm... more or less, yes.  This didn't use to be the case back in the day.

> then we should use explicit_jscontext and delete all of this.

Sold. Want to file?

> Do COM objects always leave the out parameter alone on failure?

No.  However GetAttr happens to not be fallible.

I think the point is that if manifest processing fails, we want to just do nothing, so things are ok-ish...  Or something.  I agree being more explicit here would be nice.

> XPConnect should provide a way to check this

XPConnect should provide a way to _enforce_ this (i.e. passing random crud as nsIDOMNode should throw).  There are bugs on this.

> I wonder why GetDocumentFromCaller needs one of these but
> GetDocumentFromContext doesn't.

I _think_ the latter doesn't use any nontrivial JSAPI, unlike GetDocumentFromCaller... but in general when to enter compartments seems like a deep mystery.  :(
(In reply to comment #8)
> > nsContentUtils::ReparentContentWrappersInScope:
> 
> peterv's your man.

Not really, but I'll bite. That code was added in bug 321299, so about 5 years ago. There might have been more callers and we didn't have anything like explicit_jscontext, so I'm not really sure what the problem is. The code was written to make sure that we always rewrap, even if getting the context of the document failed. Not rewrapping would have been a security hole. If there's one caller now and we can switch it to explicit_jscontext then by all means let's just file a bug and patch.

(In reply to comment #6)
> - This is about as sketchy as it gets.

I don't think adding a safety net is sketchy. Rewrapping has always been brittle and a source of security bugs. Rather than leave another potential hole based on conditions that we think shouldn't happen (but aren't sure) we added fallback code. Over the years we've done various changes to make it all more robust, if we now have more tools to do that then let's just fix it.

> - For that matter, why are we trying aOldScope and aNewScope first?

I didn't write this, but my guess is performance. Using the context stack used to be slow. That might also have changed in the meantime.
(In reply to comment #9)
> > - This is about as sketchy as it gets.
> 
> I don't think adding a safety net is sketchy. Rewrapping has always been
> brittle and a source of security bugs.

What bothers me is just that different cx's can have different subject principals and different scope chains (and can be in different compartments), so there's ordinarily just the one correct cx to use for a given task.

> > - For that matter, why are we trying aOldScope and aNewScope first?
>
> I didn't write this, but my guess is performance.

Oh, ok. explicit_jscontext is quick. Filed bug 658213.
If this audit is complete can we open this up?
Yep. Let's open it up.
Group: core-security
Keywords: sec-audit
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5
Component: DOM → DOM: Core & HTML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.