Closed Bug 401155 Opened 17 years ago Closed 16 years ago

Content XBL can exploit adblock (or any other JS nsIContentPolicy impl)

Categories

(Core :: XBL, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta5

People

(Reporter: bzbarsky, Assigned: sicking)

References

Details

(Whiteboard: [sg:critical])

Attachments

(1 file, 7 obsolete files)

The problem is that we call into content policy at various unsafe times, ranging from "under BindToTree" to "under style resolution, which happens during frame construction" in terms of unsafe-ness.

Now normally this is not _all_ that bad because we document in nsIContentPolicy.idl that the implementation must not do various things that could trigger layout or whatnot.

The problem arises because to pass a context node (e.g. an <img> for an image load) to a JS content policy implementation we have to WrapNative it.  That, of course, calls PostCreate, which will run XBL constructors "as needed".  This means that a web page could exploit users of adblock or other such extensions simply by attaching some XBL to an <img> and doing malicious enough things in the constructor.  We do avoid running the constructor if it's not safe to flush, but during BindToTree it is in fact "safe" to flush (it's not, really, but the presshell doesn't know that).  Note that there is nothing adblock can do to defend against this short of reimplementing its content policy in C++, which is rather undesirable.

I think it may be worth revisiting the idea of running constructors from PostCreate.  We definitely need to do that for nodes which have XBL attachment forced (clones of XUL elements), but do we need to do it for anything else?  It's a scary change, but might be worth it...
Flags: blocking1.9?
OS: Linux → All
Hardware: PC → All
Whiteboard: [sg:critical]
I still think we need to force it in the general case. It also happens when accessing nodes that have an ancestor with display:none.

What we could maybe do is if we're currently at a place where it's not safe to run script (such as while in layout), then but the ctor on the PAQ queue instead. The problem is identifying when it's unsafe to run script.
> It also happens when accessing nodes that have an ancestor with display:none.

Yes.  But do people rely on that?

> if we're currently at a place where it's not safe to run script (such as while
> in layout)

We check for that already...

> The problem is identifying when it's unsafe to run script.

I would posit "any time we're in an update".  
(In reply to comment #2)
> > It also happens when accessing nodes that have an ancestor with 
> > display:none.
> 
> Yes.  But do people rely on that?

Impossible to say of course, but I would kind'a be surprised if not.

> > The problem is identifying when it's unsafe to run script.
> 
> I would posit "any time we're in an update".  

Does layout always happen in EndUpdate/BeginUpdate? Or do we need to check layout and update?
The latter.  Ideally, IsSafeToFlush() on the presshell would be false inside an update... e.g. if there were a non-virtual (inline) nsIDocument method to check whether we're in an update.
(In reply to comment #0)
>The problem arises because to pass a context node (e.g. an <img> for an image
>load) to a JS content policy implementation we have to WrapNative it.  That, of
>course, calls PostCreate, which will run XBL constructors "as needed".
This is a content node, so surely the JS content policy implementation actually sees an XPCNativeWrapper, which can't actually access any of the XBL properties? Can we not call PostCreate unless the privileged script calls .wrappedJSObject on the wrapper specifically to access XBL?

(In reply to comment #2)
>>It also happens when accessing nodes that have an ancestor with display:none.
>Yes.  But do people rely on that?
All the time. When opening a popup window the chrome is hidden using display: none but we still need to be able to access properties on the URLbar etc.
> sees an XPCNativeWrapper,

Yes, but to create one of those you first need an XPCWrappedNative.  And creating one of those calls PostCreate.

Not calling PostCreate there is a non-starter, since it's needed in some cases to set things up right.

> All the time. 

Ok.  That's a little unfortunate.  :(

In that case we raelly do need to just not do the ctor/attachment sync in the middle of an update.
Wrapping a node using XPConnect can run arbitrary JS more or less any time (through us setting up the prototype chain by accessing window.<class parent> among other things), so wrapping should really only be done when it's safe to run JS.
Hmm.. Then we have a serious problem.  We either need to change the timing of our content policy calls (that is, not start any network activity of any sort without going out to the event loop first), stop passing nodes as context to such calls, or forbid implementing this API in JS.

For that matter, there are probably other APIs that we need to forbid implementing in JS.  Like anything that we use internally....

Do we really allow content to effectively override window.<class parent>?  That seems odd...
(In reply to comment #6)
>>sees an XPCNativeWrapper,
>Yes, but to create one of those you first need an XPCWrappedNative.  And
>creating one of those calls PostCreate.
Maybe for Mozilla 2.0 when accessing content nodes from chrome we'll treat it as a non-DOM object and only fully wrap it if you invokde .wrappedJSObject?
Maybe, if we completely redefine what it means to reflect a node into JS.
Assignee: nobody → jonas
Flags: blocking1.9? → blocking1.9+
Jonas/JST can we get something in here for b2?
Moving this to P2 since I'm not gonna be able to fix this for b2
Priority: P1 → P2
how aobut b3?   we should shoot for any bugs that could affect compatability by then.
Priority: P2 → P1
Attached patch Patch to fix (obsolete) — Splinter Review
This adds a service that keeps track of when it's safe to run script. It also lets you register nsIRunnables that are will be run as soon as it's safe to run script again, with the idea that that will happen before scripts keep running.

I moved a few things over to using this new system. There's lots more that could be moved but I didn't want to do too much in a single patch.

The tricky parts here is what sections needs to be marked as unsafe to run script. We also need to be prepared that scripts can run at the end of all those sections. A lot of this happens in the PresShell which is code that I don't really know that well so I definitely need input there.

Note that there is one exception to script that can run while we thing it's unsafe to run script. This exception is mutation events. This isn't really a security problem, it just means that mutation events might fire before plugins are fully set up, or xbl ctors have run. That should be fine though as mutation events aren't that much used, and very unlikely to depend on these edgecases.
Attached patch Patch v1 -w (obsolete) — Splinter Review
Oh, I should note that I didn't really need to move the scriptloader over to using this. Mostly did it to get testing on the new code. I also wanted to test that the API was useful enough.
Attachment #302107 - Flags: superreview?(jst)
Attachment #302107 - Flags: review?(bzbarsky)
Attachment #302106 - Attachment is patch: true
Attachment #302106 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 302107 [details] [diff] [review]
Patch v1 -w

- In nsCxPusher::Push(JSContext *cx):

   if (cx) {
+    mScx = GetScriptContextFromJSContext(cx);
+
+

Loose one of the two empty lines here?

     if (!mStack) {
       mStack = do_GetService(kJSStackContractID);
     }
 
     if (mStack) {
       if (IsContextOnStack(mStack, cx)) {
         // If the context is on the stack, that means that a script
         // is running at the moment in the context.
         mScriptIsRunning = PR_TRUE;
       }
 
       mStack->Push(cx);
     }

Here we can push cx with a null mScx, which means Pop() won't pop what we just pushed here. It's very critical to have this stack balanced, so we need to sort this out before this can go in.

- In nsObjectLoadingContent::GetPluginInstance(nsIPluginInstance** aInstance):

Seems like this function could reuse most of GetFrame() if you'd add another argument to it, i.e. don't-flush-anything or whatnot.

- In nsDOMClassInfo.cpp:

+// Note that not only XPConnect calls this PostCreate() method when
+// it creates wrappers, nsObjectFrame also calls this method when a
+// plugin is loaded if the embed/object element is already wrapped to
+// get the scriptable plugin inserted into the embed/object's proto
+// chain.

That comment is no longer true is it?

- In nsDOMClassInfo.h:

InstallProtoChain() should be named SetupProtoChain() or somesuch, the prototype chain is already there, we're just inserting into it.

sr- based on the nsCxPusher issue, the rest looks good to me. I don't know details about the layout code though, so rely more on bz's input on that part :)
Attachment #302107 - Flags: superreview?(jst) → superreview-
Attached patch Patch v2 (obsolete) — Splinter Review
Fixes those comments
Attachment #302106 - Attachment is obsolete: true
Attachment #302107 - Attachment is obsolete: true
Attachment #302107 - Flags: review?(bzbarsky)
Attached patch Patch v2 -w (obsolete) — Splinter Review
Attachment #302847 - Flags: superreview?(jst)
Attachment #302847 - Flags: review?(bzbarsky)
Attachment #302847 - Flags: superreview?(jst) → superreview+
Attached patch Sync to tip (obsolete) — Splinter Review
Trivial sync to tip.
Attachment #302846 - Attachment is obsolete: true
Comment on attachment 302847 [details] [diff] [review]
Patch v2 -w

>Index: content/base/public/nsIObjectLoadingContent.idl
>+   * Returns the plugin instance if it already has been instantiated. This

"it it has already been"

>Index: content/base/src/nsContentUtils.cpp
>@@ -818,16 +826,21 @@ nsContentUtils::Shutdown()
>+  NS_ASSERTION(!sBlockedScriptRunners || !sBlockedScriptRunners->Count(),

I think comparing Count() to 0 would be clearer.

>Index: content/base/src/nsScriptLoader.h

Why are the changes here to never execute sync on unblock ok?  Because either wehad pending requests or we posted that nsContentUtils event?  Please double-check that this isn't regressing any of those <script> ordering bugs...  We do have all those tests in mochitest, right?  

>Index: content/xbl/src/nsBindingManager.cpp

Hmm.  Why are these changes safe?  We're going to be defining properties, hitting classinfo, etc, right?  I don't think that's safe to do from nsXBLService::LoadBindings.  I seem to recall jst saying that just JS-wrapping an object can cause script execution as properties are looked up on the global object, etc.

>Index: layout/base/nsPresShell.cpp
>+  friend class nsAutoCauseReflowNotifier;

I seem to recall that as written this will fail to compile on some of our platforms.  You need to forward-declare |class nsAutoCauseReflowNotifier| before you start |class nsPresShell|.

>+class nsAutoCauseReflowNotifier

You're sure that this class never lives across presshell destruction, right?  If it does, bad things will happen...

>@@ -2374,40 +2396,42 @@ PresShell::InitialReflow(nscoord aWidth,
>-    // Something in mFrameConstructor->ContentInserted may have caused
>-    // Destroy() to get called, bug 337586.
>-    NS_ENSURE_STATE(!mHaveShutDown);
>+      NS_ASSERTION(!mHaveShutDown, "shouldn't happen");

Uh....  Why is this change ok?  Is the point that we've now fixed all the ways that could happen?

>@@ -2497,46 +2521,52 @@ PresShell::ResizeReflow(nscoord aWidth, 
>+      // XXX can this be moved to before the DidCauseReflow?

As things stand, no.  It cannot.

> PresShell::IsSafeToFlush(PRBool& aIsSafeToFlush)
>+  // XXX Should we return false if nsContentUtils::IsSafeToRunScript returns
>+  // false?

If you do, you'll need to change the ordering of some of the code you're adding; there are places where you create a script blocker, then flush...  So "not as things stand".

>Index: dom/src/base/nsDOMClassInfo.cpp
>+class nsBindingAttachedHandlerRunner : public nsIRunnable

Is there a reason this can't just be a nsRunnableMethod?

The rest looks good.

r- because I'm not sure I buy the XBL changes.  Unfortunately, my net connection (and free time) is spotty today and tomorrow, but I should be around tomorrow night.  If you're absolutely sure about the XBL changes and I'm just missing something, go ahead and land this with the other (all minor) issues fixed, and make sure to catch me tomorrow night (before freeze) and convince me that I'm wrong?
Attachment #302847 - Flags: review?(bzbarsky) → review-
> >Index: content/base/src/nsScriptLoader.h
> 
> Why are the changes here to never execute sync on unblock ok?  Because either
> wehad pending requests or we posted that nsContentUtils event?  Please
> double-check that this isn't regressing any of those <script> ordering  
> bugs... We do have all those tests in mochitest, right?  

The only place which now uses block/unblock is CSS loading which I see no reason to need synchronous unblocking. So basically I decided to redefine the API to always do async execution since that generally seems safer. If you need synchronous blocking/unblocking you should use the new nsContentUtils API.

> >Index: content/xbl/src/nsBindingManager.cpp
> 
> Hmm.  Why are these changes safe?  We're going to be defining properties,
> hitting classinfo, etc, right?  I don't think that's safe to do from
> nsXBLService::LoadBindings.  I seem to recall jst saying that just JS-wrapping
> an object can cause script execution as properties are looked up on the global
> object, etc.

That's what this patch aims to change. Wrapping an object should not execute script.

The known places in classinfo that executes script is XBL and plugins, both now checks with nsContentUtils::IsSafeToRunScripts and posts an event to do what they need to if it's not safe.

The issue of getting global properties in order to set up the protochain does remain though, but that's going to get fixed before release. I wanted to get this patch in ASAP though since it contains all the scary changes.

> >+class nsAutoCauseReflowNotifier
> 
> You're sure that this class never lives across presshell destruction, right? 
> If it does, bad things will happen...

It does prevent scripts from executing during its lifetime. Not sure if there are other things that can cause the presshell to go away? If there are we would already have a problem, right?

> >@@ -2374,40 +2396,42 @@ PresShell::InitialReflow(nscoord aWidth,
> >-    // Something in mFrameConstructor->ContentInserted may have caused
> >-    // Destroy() to get called, bug 337586.
> >-    NS_ENSURE_STATE(!mHaveShutDown);
> >+      NS_ASSERTION(!mHaveShutDown, "shouldn't happen");
> 
> Uh....  Why is this change ok?  Is the point that we've now fixed all the ways
> that could happen?

Hmm.. that was the idea yes. But looking at the referenced bug it doesn't involve any of the code that i've blocked. The crasher involved "beforeunload", but how could that execute from the code here? I'll investigate.

> > PresShell::IsSafeToFlush(PRBool& aIsSafeToFlush)
> >+  // XXX Should we return false if nsContentUtils::IsSafeToRunScript returns
> >+  // false?
> 
> If you do, you'll need to change the ordering of some of the code you're
> adding; there are places where you create a script blocker, then flush...  So
> "not as things stand".

Will said flushing cause scripts to execute? If so I do want to change the ordering of that code. Generally I really want us to be able to rely on that content scripts don't execute while we have blockers on the stack.

With mutation events being the only exception that I hope to fix down the line (by putting unblock/reblock calls around them).

> If you're absolutely sure about the XBL changes and I'm just
> missing something, go ahead and land this with the other (all minor) issues
> fixed, and make sure to catch me tomorrow night (before freeze) and convince 
> me that I'm wrong?

The XBL changes are the whole point of this patch, so they are "right". I.e. it should be safe to wrap an XPCOM object at any time. My question is if we need to make further other changes in order for that to be true.
> Wrapping an object should not execute script.

I see.  Could we possibly add asserts to that effect in the wrapping code somehow?  I'm not sure we can, but if it's at all possible, please do.

> Not sure if there are other things that can cause the presshell to go away?

Sure there are.  Flushes, for example.  And "go away" in this case means "Destroy() called".  Presumably someone is holding a strong ref to the presshell during the whole thing, but various things the presshell expects to never be null will be null after Destroy().

> Will said flushing cause scripts to execute?

I think it can trigger resize events, possibly.  roc or smaug would know for sure.

The reason this affects the ordering is that the flushing code checks IsSafeToFlush() and does nothing if it's not.  So if you made this code return false when script is blocked, that would be equivalent to removing those flushes unless the ordering changes.

I think that other than the classinfo lookup thing, wrapping is safe with this patch, yes.
I'm not sure about resize events --- I'd hope they're asynchronous --- but I bet Olli knows :-)
And there aren't any post-reflow callbacks that can run script (e.g. via mutation events; I'm looking at scrollbars here)?
So flushing will definitely run scripts. The first thing it does is flush the contentsink, which I think can cause scripts to execute (at the very least there's nothing that prevents it). Then it does frameconstruction and explicitly runs any newly queued XBL constructors.

So I do think we need to make IsSafeToFlush check with nsContentUtils::IsSafeToRunScript.

Another thing I need to look into is the beforeunload event. Apparently that can fire at scary times. See bug 337586.
Flags: tracking1.9+ → blocking1.9+
(In reply to comment #21)
> (From update of attachment 302847 [details] [diff] [review])
> >Index: content/base/public/nsIObjectLoadingContent.idl
> >+   * Returns the plugin instance if it already has been instantiated. This
> 
> "it it has already been"

It says "if it" :)
Yeah, I meant "if it".  The part I wanted changed there is the ordering of "has" and "already".
> >@@ -2374,40 +2396,42 @@ PresShell::InitialReflow(nscoord aWidth,
> >-    // Something in mFrameConstructor->ContentInserted may have caused
> >-    // Destroy() to get called, bug 337586.
> >-    NS_ENSURE_STATE(!mHaveShutDown);
> >+      NS_ASSERTION(!mHaveShutDown, "shouldn't happen");
> 
> Uh....  Why is this change ok?  Is the point that we've now fixed all the ways
> that could happen?

Reverted this. Though I would really like to know why this is needed. Seems very scary that the PresShell can get destroyed during frame construction. I added an assertion to the code that fires beforeunload events (which is what bug 337586 is about) and couldn't get it to fire.

Possibly this has been fixed in better ways now, though I would really like to know what stack caused the event to fire at unsafe times.

Smaug, do you remember?

> > PresShell::IsSafeToFlush(PRBool& aIsSafeToFlush)
> >+  // XXX Should we return false if nsContentUtils::IsSafeToRunScript returns
> >+  // false?
> 
> If you do, you'll need to change the ordering of some of the code you're
> adding; there are places where you create a script blocker, then flush...  So
> "not as things stand".

Made this change and fixed scoping appropriately, I think. There calls to VERIFY_STYLE_TREE already lives in sections where flushes are blocked so I'm shouldn't be changing anything there at least.

(In reply to comment #23)
> > Wrapping an object should not execute script.
> 
> I see.  Could we possibly add asserts to that effect in the wrapping code
> somehow?  I'm not sure we can, but if it's at all possible, please do.

Not sure if there's a reasonable way of doing this. I'll check with jst. In theory I guess we'd be fine with chrome script running, but if that doesn't happen currently we should just assert.

> > Not sure if there are other things that can cause the presshell to go away?
> 
> Sure there are.  Flushes, for example.  And "go away" in this case means
> "Destroy() called".  Presumably someone is holding a strong ref to the
> presshell during the whole thing, but various things the presshell expects to
> never be null will be null after Destroy().

nsAutoCauseReflowNotifier prevents flushing during its lifetime. And it should only been marking off sections that before were surrounded by WillCauseReflow/DidCauseReflow during which time flushing was (and is) forbidden.

The only place where the current code breaks out between an WillCauseReflow/DidCauseReflow pair is the code that deals with bug 337586. In general that code is making me nervous and I'd ideally like to see it removed. But for now I added an mHaveShutDown check before calling DidCauseReflow.

> > Will said flushing cause scripts to execute?

We concluded that flushing can run all sorts of scripts. So added a call to nsContentUtils::IsSafeToRunScript in IsSafeToFlush.
Attached patch Patch v3 -w (obsolete) — Splinter Review
Attachment #302847 - Attachment is obsolete: true
Attachment #302957 - Attachment is obsolete: true
Attachment #306696 - Flags: review?(bzbarsky)
Attachment #306696 - Attachment description: Patch v3 → Patch v3 -w
Attached patch Patch v3 (obsolete) — Splinter Review
With proper whitespace
Attachment #306698 - Attachment is patch: true
Attachment #306698 - Attachment mime type: application/octet-stream → text/plain
Realized that I need to add

else { nsContentUtils::RemoveScriptBlocker() }

when we're not going to call DidCauseReflow in nsAutoCauseReflowNotifier.

However I'm tempted to revert back to not calling NS_ENSURE_STATE(!mHaveShutDown) after calling mFrameConstructor->ContentInserted in InitialReflow. It just seems like so many other things have gone horribly wrong at that point and I wasn't able to reproduce the crash that that was needed for.


Also filed bug 420603 on the remaining way that scripts can run while wrapping.
Comment on attachment 306696 [details] [diff] [review]
Patch v3 -w

>Index: content/base/src/nsObjectLoadingContent.cpp
>@@ -1504,24 +1518,24 @@ nsObjectLoadingContent::GetFrame(PRBool 
>+    if (flushed || aFlushType == eDontFlush) {

It just occured to me that we could eliminate the |flushed| boolean altogether and just reset aFlushType to eDontFlush after flushing.  If you want to do that, go for it.  Otherwise, this is fine.

>Index: layout/base/nsPresShell.cpp
>+  ~nsAutoCauseReflowNotifier()
>+    // This check should not be needed. Currently only the place that seem

s/only the place that seem/the only place that seems/

>+    if (!mShell->mHaveShutDown) {
>+      mShell->DidCauseReflow();
>+    }

As you noted, there should be an else here that removes the script blocker.

> PresShell::IsSafeToFlush(PRBool& aIsSafeToFlush)
>+  // XXX technically we don't need to check anything but
>+  // nsContentUtils::IsSafeToRunScript here since that should be false
>+  // if any of the other flags are set.

Make that an assert, please.  Probably in addition to the comment.  If IsSafeToRunScript is ever true when mIsReflowing or mChangeNestCount, we want to know, right?

>@@ -6278,16 +6310,17 @@ PresShell::ProcessReflowCommands(PRBool 
>+      nsAutoScriptBlocker scriptBlocker;

So when this goes out of scope, we can destroy ourselves.  In particular, we could have a null frame constructor when we call DidDoReflow.  You probably want to check before calling DidDoReflow and make sure that we're not dead yet.

File a followup bug, perhaps, to get rid of mChangeNestCount if it's no longer needed?

With those nits (the else clause and the check to see whether we died) fixed, r=bzbarsky
Attachment #306696 - Flags: review?(bzbarsky) → review+
I also wonder whether we can at least get in tests for the things we found in review and patch iteration, if not for the original bug...  Certainly the constructor timing should be easy to test.
(In reply to comment #34)
> (From update of attachment 306696 [details] [diff] [review])
> >Index: content/base/src/nsObjectLoadingContent.cpp
> >@@ -1504,24 +1518,24 @@ nsObjectLoadingContent::GetFrame(PRBool 
> >+    if (flushed || aFlushType == eDontFlush) {
> 
> It just occured to me that we could eliminate the |flushed| boolean altogether
> and just reset aFlushType to eDontFlush after flushing.  If you want to do
> that, go for it.  Otherwise, this is fine.

Btw, this function is very confusing to me. Seems very counter intuitive that we flush if oen exists, but don't flush if none has been created. I do suspect that a lot of callers that used to be GetFrame(PR_FALSE) actually never wanted a flush to happen.

> > PresShell::IsSafeToFlush(PRBool& aIsSafeToFlush)
> >+  // XXX technically we don't need to check anything but
> >+  // nsContentUtils::IsSafeToRunScript here since that should be false
> >+  // if any of the other flags are set.
> 
> Make that an assert, please.  Probably in addition to the comment.  If
> IsSafeToRunScript is ever true when mIsReflowing or mChangeNestCount, we want
> to know, right?

I added an assertion at the end checking that the current algorithm is equal to simply calling nsContentUtils::IsSafeToRunScript.
> Seems very counter intuitive that we flush if oen exists

We flush if we're going to try setting up the plug-in (which we won't if there is no frame).  See bug 377070.  Maybe with your changes we don't even need that anymore?

> I do suspect that a lot of callers that used to be GetFrame(PR_FALSE) actually
> never wanted a flush to happen.

That's pretty possible, yeah... Want to file a followup to investigate this?
Blocks: 421997
Attached patch Latest versionSplinter Review
Attachment #306696 - Attachment is obsolete: true
Attachment #306698 - Attachment is obsolete: true
Depends on: 423269
Blocks: 423180
No longer blocks: 423180
Depends on: 423180
Those links don't show a regression, to me?  Even trying to squint through the noise, Ts looks like it's down, for example.  What were the regression amounts when you saw them (and what's the confidence on the measurements)?
You have to increase the number of days shown to see it. Increase the number of days until you see the marker.

That said, I don't see anything in Tdhtml other than a one-run spike. Txul and Ts does show a regression though.
Depends on: 423355
Target Milestone: --- → mozilla1.9beta5
Jonas/BZ we still need this for b5?
This one is actually FIXED. Have kept it open in case it needs to be backed out. But I think I can handle the regressions without need for that. And I really think we need beta testing so i'd rather live with the regressions during the beta than land this during RC.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Depends on: 427691
Depends on: 429780
Are there any testcases that can be used to demonstrate the vulnerbility and verify that it's fixed?

I don't see why this wouldn't be a problem on the 1.8 branch, too, so a testcase would be nice to test that too.
Flags: wanted1.8.1.x?
Blocks: abp
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: