Created attachment 564067 [details]
testcase (requires extension for GC)
1. Install https://www.squarefree.com/extensions/domFuzzLite2.xpi
2. Load the testcase
###!!! ASSERTION: should have a JS object by this point: 'win->GetOuterWindowInternal()->IsCreatingInnerWindow()', file dom/base/nsDOMClassInfo.cpp, line 4916
###!!! ASSERTION: Non-global object has the wrong flags: '!(jsclazz->flags & JSCLASS_IS_GLOBAL)', file js/src/xpconnect/src/xpcwrappednative.cpp, line 1146
Created attachment 564068 [details]
Blake, can you suggest a severity for this?
Bobby, can you dig in here? It's unclear what the severity of this is, so as soon as you get an understanding of what's going on here we can put the appropriate rating on this.
Bobby, when do you think you'll get to investigating here?
(In reply to Johnny Stenback (:jst, firstname.lastname@example.org) from comment #4)
> Bobby, when do you think you'll get to investigating here?
Not totally sure. Ideally sometime in the next few weeks, but I have a pretty full plate at the moment. :-(
Hm, I can't reproduce this (using a recent nightly debug build with domFuzzLite3, OSX Lion, x86_64). Jesse, can you still reproduce?
I can still reproduce. Apparently the testcase has to be local.
Still can't reproduce on macosx, but I can on my 32-bit linux VM. I'll take a look.
Bobby, is this still behaving the same after your cpg related xpconnect changes?
I've been looking into this today. I don't have a handle on the problem yet, but my sense thus far is that it's unaffected by c-p-g. I'll have more info shortly.
Ok, so here's what's going on here:
1 - We make an iframe. We give it a url, and an onload handler P. We then insert the iframe into the dom to kick off the load.
2 - The iframe loads, and onload is fired, which triggers P. P simply does setTimeout(j, 0), sticking j at the end of the event loop.
3 - Once onload has been fired, pageshow fires. This triggers a chrome event listener Q, defined here: http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#1533
4 - Q is called. It creates a closure R that holds onto the event (!), and posts that closure to the end of the event queue.
5 - We return to the event loop and process the next thing in line. j from step (2) is called. It sets the hash on the iframe window to something arbitrary. This is necessary for the testcase, because it appears to keep the native window alive across the next GC, but _not_ the window JS Object. I think this has to do with how the docshell handles hash loads (something about creating an asynchronous load event, but it actually just reusing the same document), but I'm not totally sure.
6 - Next, j does a redundant appendChild to insert the iframe into the master document. Because it's already there, this first removes it, which causes the frameloader and docshell to be destroyed. This tears down the document and window, and makes them GC-able (for the window, this happens when we destroy the DocShell and call nsGlobalWindow::CleanUp, where we null out mInnerWindowHolder). The window loses its pointer to the document, and the document loses mScriptGlobalObject. But the document still has the weak mScopeObject pointer, so it can find the window as long as it's alive.
6 - Finally, j forces a GC. This collects the window object, eventually invoking nsGlobalWindow::OnFinalize, which nulls out the JSObject pointer.
7 - j returns, and then R rears its ugly head. The first thing it does is to access originalTarget on the event, which corresponds to the (now dead) document. We try to create a wrapper for it, and call nsNodeSH::PreCreate. The native window is still alive and accessible by the weak pointer, so we proceed in trying to wrap it. But the underlying JS Object is long gone, so win->GetFlatJSObject() is null. This confuses us in nsWindowSH::PreCreate, because it looks like the window is not-yet-born, but it's actually long-dead. So we assert.
So in a nutshell, the issue is the native document stays alive (via the Event), and the native window stays alive (via the short-circuit hashtag docshell load), but they're both torn down, and their wrappers are gone.
I think the best solution here is just to throw when we try to wrap a window that's been closed. This will presumably also throw for torn-down documents, because they need to wrap their window before wrapping themselves, assuming we always tear windows and documents down at the same time. Alternatively, we might not care about wrapping torn-down documents at all.
I'm going to write up this fix and attach it.
> I think the best solution here is just to throw when we try to wrap a window that's
> been closed.
With this fix, will the web-page-visible behavior still depend on GC timing?
Created attachment 598447 [details] [diff] [review]
Refuse to wrap windows that are on the way out. v1
Attaching the fix proposed in the previous comment. With this fix, the testcase throws in browser.js:pageShowEventHandlers, which is (I think) what we want.
Flagging jst for review.
What do we think about the security rating here? I'd like to turn the testcase into a crashtest, which is why I switched it to a fatal assert. I'll let jst make the call here.
> I'd like to turn the testcase into a crashtest, which is why I switched it to a
> fatal assert.
Crashtest considers unexpected NS_ASSERTIONs to be test failures, fwiw.
(In reply to Jesse Ruderman from comment #14)
> > I'd like to turn the testcase into a crashtest, which is why I switched it to a
> > fatal assert.
> Crashtest considers unexpected NS_ASSERTIONs to be test failures, fwiw.
Oh, that's interesting. Is there any way to trigger a GC in crashtests? I was under the impression that there wasn't any way to access SpecialPowers and all that in that context.
I guess not, which means you'll be using mochitest, which doesn't yet consider NS_ASSERTION to be a failure (bug 404077).
(In reply to Jesse Ruderman from comment #12)
> > I think the best solution here is just to throw when we try to wrap a window that's
> > been closed.
> With this fix, will the web-page-visible behavior still depend on GC timing?
It does, I think. But I'm not really sure how to change that. We'd either need to find a way to trace all the events in the event queue, or alter the ownership/teardown model for windows I guess.
Comment on attachment 598447 [details] [diff] [review]
Refuse to wrap windows that are on the way out. v1
+ // See bug 691178 comment 11 for why this is necessary.
+ if (!win->IsClosedOrClosing())
+ return NS_ERROR_FAILURE;
I would say this check should be inside the if !winObj check below, it's fine to reach into another window after it's been closed, which could trigger this PreCreate() hook being called. I think we should throw here only in the case where the window has been collected already, and we're attempting to re-wrap it.
r=jst with that changed.
Created attachment 600547 [details] [diff] [review]
Refuse to wrap windows that are on the way out. v2
It looks like I actually had the previous check backwards. :-( Reflagging jst on that account.
I also gave up trying to make a mochitest in favor of working on other stuff. So I've also reverted to an NS_ASSERTION.
Pushed this to try: https://tbpl.mozilla.org/?tree=Try&rev=67d964886265
Pushed to m-i: http://hg.mozilla.org/integration/mozilla-inbound/rev/782603d06047
Marking at mak's request:
merged to m-c:
Given the understanding in comment 11 is this an exploitable condition? What happened after nsWindowSH::PreCreate returned after firing the assertion? R has an empty originalTarget, then what? Other than the assertions I don't see any obvious evidence of bad memory use in a debug build.
Or should we just fix this on ESR and not worry about it.
Hm. Well, the parent object will remain unchanged, so we'll end up wrapping the DOM object in a scope different from its old window. I don't see any glaring security issues with this, though.
Given that the bug is hard enough to trigger as is, I'd be tempted to just let it ride. We could also land the fix though, since it's super low-risk. That might be a better idea.
I have verified this on:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20120517 Firefox/13.0 debug beta build
I've followed the steps from the description and no assertion is shown in the terminal.
Marking this as Verified Fixed.