Closed
Bug 752764
Opened 12 years ago
Closed 12 years ago
document.write after page with <video> causes a CC crash [@ nsXPConnect::Traverse]
Categories
(Core :: XPConnect, defect)
Core
XPConnect
Tracking
()
VERIFIED
FIXED
mozilla17
Tracking | Status | |
---|---|---|
firefox15 | --- | verified |
firefox16 | --- | verified |
firefox17 | --- | verified |
firefox-esr10 | --- | unaffected |
People
(Reporter: jruderman, Assigned: mccr8)
References
Details
(Keywords: crash, sec-moderate, testcase, Whiteboard: [advisory-tracking-])
Crash Data
Attachments
(4 files, 4 obsolete files)
261 bytes,
text/html
|
Details | |
4.58 KB,
text/plain
|
Details | |
4.31 KB,
text/plain
|
Details | |
4.99 KB,
patch
|
mccr8
:
review+
lsblakk
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
No description provided.
Reporter | ||
Comment 1•12 years ago
|
||
(Breakpad opt: bp-2a98b4e2-3875-4377-b2ae-b61ae2120506)
Assignee | ||
Comment 2•12 years ago
|
||
Somehow the GetJSPrivate slot of a wrapper is null.
Comment 3•12 years ago
|
||
On Windows 7: bp-1fdd18ff-448a-4267-8704-e5aed2120508.
Crash Signature: [@ nsXPConnect::Traverse] → [@ nsXPConnect::Traverse]
[@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)]
OS: Mac OS X → All
Hardware: x86_64 → All
Assignee | ||
Comment 4•12 years ago
|
||
Similar test case from mw22: https://bugzilla.mozilla.org/attachment.cgi?id=625297
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 5•12 years ago
|
||
With the test case in comment 4, I had trouble reproducing it unless I opened and closed empty tabs. One odd thing is that when I attempted to save the file, I ended up with this crash instead, inside nsWebBrowserPersist::SaveDocumentInternal: https://crash-stats.mozilla.com/report/index/bp-78647826-b87a-4f42-97e6-3fce92120523
Assignee | ||
Updated•12 years ago
|
Group: core-security
Assignee | ||
Comment 7•12 years ago
|
||
I investigated a bit, and then went through my findings a bit with bholley. He said we should mark it s-s just in case, because "this is kind of a rabbit hole". What I found by looking at JS heap dumps is that there were a bunch of XULElement wrappers with null privates. They have things looking like this: // legit-looking reflector, where the underlying XPCWN's wrapper is a proxy // (instead of to itself) 0x127a4ca00 G XULElement 11a8f1e10 > 0x127a48640 G type > 0x127a67fb0 G shape > 0x127a4c400 B XPCWrappedNativeProto::mJSProtoObject > 0x127a50d40 G XPCWrappedNative::mWrapper > 0x127a4b060 B XPCWrappedNativeScope::mGlobalJSObject > 0x127a4d020 B XPCWrappedNativeScope::mPrototypeJSObject // the proxy in question, which has a target object of the broken XUL element 0x127a50d40 G Proxy <no private> > 0x127a487c0 G type > 0x127a67830 G shape > 0x118416a60 G targetObject > 0x118416a60 G private // the XUL element reflector with private 0 0x118416a60 G XULElement 0 > 0x127a1d700 G type > 0x127a36998 G shape Bobby looked into the code a bit. Here's my summary. Blame any errors on me. This XUL stuff is a video control, and is thus some kind of XBL thing, subject to same-compartment security checks ("SOW"). This means it is subject to the "special object-plus-wrapper transplant". Here's the code from XPCWrappedNative::ReparentWrapperIfFound: JSObject *wwsaved = ww; wrapper->SetWrapper(newwrapper); ww = js_TransplantObjectWithWrapper(ccx, flat, ww, newobj, newwrapper); What is probably happening is that the wrapper field of the new copy of the element XPCWN (0x127a4ca00 in the example above) gets set to a wrapper for the element (0x127a50d40), that is created for use as a fallback in the transplant (0x118416a60). But the transplant finds an existing wrapper it can use instead, so the fallback 0x118416a60 is left behind in a zombie state. Because it is still reachable, it remains alive, then the CC finds it and blows up.
Comment 8•12 years ago
|
||
comment 7 is exactly right. In plainer terms, js_TransplantObjectWithWrapper doesn't promise to use the object you give it - you have to check the return value, which we don't. We don't because we're already holding our breath with respect to the comment preceding that bit of code. The real answer here is bug 758415.
Assignee | ||
Comment 9•12 years ago
|
||
This still crashes on a recent-ish tree with the patches in bug 758415 and bug 751995 applied.
Comment 10•12 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #9) > This still crashes on a recent-ish tree with the patches in bug 758415 and > bug 751995 applied. Huh, really? It seemed like we'd pretty decisively figured out the problem, and the backout of the 'pretty serious brain surgery' code should have fixed it. What seems to be happening? Is it still Location?
Assignee | ||
Comment 11•12 years ago
|
||
This wasn't Location before, it was XBL SOW gunk related to video controls. But yeah, same part of the code seemed to be the problem. I'll look into it.
Assignee | ||
Comment 12•12 years ago
|
||
The good news is that this is better than before. Previously on this example there were 3 or 4 XULElement wrappers with null privates, but now there's only one.
0x1188456d0 G XULElement 0
> 0x118841880 G type
> 0x1188b0d30 G shape
> 0x118895880 G Utils
The zombie XULElement is still created in XPCWrappedNative::ReparentWrapperIfFound.
However, it isn't being held alive by a proxy any more. So I guess that is progress, potentially.
Instead, it is the fun_callscope of a ton of functions that appear to be implementing various video control stuff, and the parent of a bunch of other functions. It is also the "item" field of one object, the "videocontrols" field of another.
The object that has the videocontrols field is a root in the GC graph:
0x118895880 G nsXPCWrappedJS[nsIDOMEventListener,0x1263abb80:0x125dbee80].mJSObj
So it looks like a JS object implementing a listener is keeping alive the zombie object. (Possibly other things are, too.)
Assignee | ||
Comment 13•12 years ago
|
||
Here's what find_roots shows is keeping alive the zombie XULElement. The roots are all various nsXPCWrappedJS[nsIDOMEventListener]. There's this one: via nsXPCWrappedJS[nsIDOMEventListener,0x1263abb80:0x125dbee80].mJSObj : 0x118895880 [Object <no private>] --[onVolumeMouseInOut]-> 0x118891a40 [Function] --[fun_callscope]-> 0x1188456d0 [XULElement 0] ...then a bunch of mostly parent chain things through shapes.
Assignee | ||
Comment 14•12 years ago
|
||
I accidentally chopped the last line off the file: --[parent]-> 0x1188456d0 [XULElement 0] but it isn't very interesting...
Assignee | ||
Comment 15•12 years ago
|
||
So, there are live functions sitting around with an environment that is a transplanted XUL object. It seems a little weird to me, knowing little about JS functions, that a XULElement wrapper would be an environment. But maybe that's normal? Assuming that is okay, then I have two guesses as to what is happening. Luke told me that the environment for a function should be in the same compartment as it. Thus, I see two possibilities: a) The function has been transplanted already, but something should have removed the original copy from mWrappedJSRoots. b) Something should have updated the environment of the function, but didn't. I don't see anything that look like copies of either the XULElement or the the function, judging by field names, so I'm not sure what may be the case. I can try to log the heap contents in a more fine grained way.
Comment 16•12 years ago
|
||
If the function is using the XULElement as the environment, and we transplant the XULElement, it seems likely that the function is getting left behind with the cross-compartment wrapper as its parent. I know nothing about environments though. How do they relate to parents? Are they only for scripted functions? I know that we parent all XUL elements directly to their XUL parent, so if that also determines the environment then this would make sense...
Assignee | ||
Comment 17•12 years ago
|
||
I think the problem must be that there are direct references to the XPCWN wrapper, via the parent and environment of the functions, but js_TransplantObjectWithWrapper assumes there are no direct references ("Because there are no direct references to the location object itself, we don't want the old obj (|origobj| here) to become the new wrapper but the wrapper itself instead."). We swap out the guts of the wrapper wrapper to turn it into a CCW, but the old references to the XPCWN wrapper are still there. One solution to this would be to somehow make sure that the environment and parent we use ends up with the wrapper wrapper. I don't know if making the function go through that extra layer of indirection is a problem or not. Another solution would be to turn the XPCWN wrapper into some kind of pass-forward wrapper that just says "hey! go to the CCW over here!" but I'm not sure if that's a thing or not.
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #17) > One solution to this would be to somehow make sure that the environment and > parent we use ends up with the wrapper wrapper. I don't know if making the > function go through that extra layer of indirection is a problem or not. Err, this is somewhat unclear. What I mean is that the environment/parent would point to the wrapper wrapper from the very beginning, so you wouldn't have a dead reference later. This would involve tracking down where those env/parents are set, and adding some kind of awful check like "Hey, are you a wrappered wrapper? If so, use the wrapper wrapper instead!".
Comment 19•12 years ago
|
||
Ugh yeah. I _knew_ the assumption that there should be no intra-compartment references to mFlatJSObject sounded too good to be true. I think the easiest solution is to update js_TransplantObjectWithWrapper to swap in a transparent same-compartment wrapper into |origObj| that points to |origWrapper| (which becomes the new intra-compartment identity for the object after the transplant). Then, we'll probably need to fix up GetWrappedNativeOfJSObject (and anywhere else we call GetObjectParent), and uses of environment in the JS engine, to make sure they handle that case properly. This might also fix bug 760887.
Assignee | ||
Comment 20•12 years ago
|
||
Yeah, that's what I was thinking with my second suggestion there. It does seem like a better approach, in that we won't have to chase down ever possible way something might refer to the wrapper. How do you make a "transparent same-compartment wrapper"?
Comment 21•12 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #20) > How do you make a "transparent same-compartment wrapper"? Just do Wrapper::New and use Wrapper::singleton as the handler. See WrapperFactory::WrapSOWObject for an example of creating a wrapper.
Comment 22•12 years ago
|
||
> I know nothing about environments though. How do they relate to parents?
Same thing.
Assignee | ||
Comment 23•12 years ago
|
||
Doesn't crash with test case, and I'm able to browse around a little without anything too obviously awful happening. This just implements the first part of Bobby's suggestion, which is to turn the reflector into a DirectWrapper around the new CCW. I'll look into the GetObjectParent stuff now.
Assignee | ||
Comment 24•12 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #19) > Then, we'll probably need to fix up GetWrappedNativeOfJSObject (and anywhere else > we call GetObjectParent), and uses of environment in the JS engine, to make sure > they handle that case properly. At least one use in GWNOJSO looks okay to me, as it immediately unwraps what it gets: JSObject* funObjParent = js::UnwrapObject(js::GetObjectParent(funobj)); I'll try looking around at other uses and see how they seem.
Assignee | ||
Comment 26•12 years ago
|
||
dbaron appears to have hit this in the wild, in bug 766624.
Reporter | ||
Updated•12 years ago
|
Crash Signature: [@ nsXPConnect::Traverse]
[@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)] → [@ nsXPConnect::Traverse]
[@ nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)]
[@ nsXPConnectParticipant::TraverseImpl]
Reporter | ||
Comment 27•12 years ago
|
||
The crash looks slightly different now: bp-e38552ec-b8f0-4269-a867-a03bb2120621
Assignee | ||
Comment 28•12 years ago
|
||
So, my patch in bug 763773 gets rid of this crash, but only because it rips out the check that is actually crashing in this test case. The zombie objects must still be around, so that's a little disturbing. I'm going to wait to land that until this is fixed, but it would still be nice to have a way to check for this. Ideally we'd have some way to check that all of these objects we expect to die when transplanting actually go away at the next GC. My concern is that if somehow one of these things is kept alive for a few CCs due to being held onto by something else that we could get false positives, but maybe that shouldn't happen anyways?
Assignee | ||
Comment 29•12 years ago
|
||
It is possible that with the patch in bug 763773 this isn't actually a problem. When I run the test case locally, the bogus XULElement gets cleaned up in the first CC, which suggests that it is garbage, as it should be, but is being held alive by an XPConnect root, so it can only be cleaned up with the CC. It looks like what is happening is that a number of nsXPCWrappedJS(nsIDOMEventListener) are keeping alive the zombie XULElement. The listeners get cleaned up in the next CC. I don't know enough about how they work to know that this is guaranteed to happen, or if that's just particular to this test case.
Assignee | ||
Comment 30•12 years ago
|
||
Bobby is going to figure out how transplanting of XBL and plugins and other stuff is supposed to work, so I'm going to put this on hold until then. One hacky fix (that is slightly less hacky than just adding a null check to the CC...) would be to nuke the reflector object with one of khuey's proxies that throw when you try to do things with them.
Assignee | ||
Comment 31•12 years ago
|
||
Here's a simpler approach I discussed with Bobby. We create a DeadObjectProxy, then swap it in for the wrapper. That fixes the crash, and will hopefully result in a more controlled situation if we end up exposing the wrapper or holding onto it somehow. (Bill suggested a plain object, but unfortunately it has the wrong background finalization kind, so it fails an assertion on the swap...)
Attachment #632422 -
Attachment is obsolete: true
Assignee | ||
Comment 32•12 years ago
|
||
The previous version of the patch passed a try run on Linux debug. I'll do a full try run before landing. This is just cleaned up a bit.
Attachment #647369 -
Attachment is obsolete: true
Attachment #647631 -
Flags: review?(bobbyholley+bmo)
Attachment #647631 -
Flags: feedback?(wmccloskey)
Comment on attachment 647631 [details] [diff] [review] nuke wrapped reflector during transplant Review of attachment 647631 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsapi.cpp @@ +1671,5 @@ > /* > + * The location object and SOWs are special. There is the reflector object |origobj| > + * and the reflector wrapper |origwrapper|. We turn the reflector wrapper into the > + * new CCW because there are (probably) not any live references to the reflector. > + * This leads to subtle differences with JS_TransplantObject. Could you maybe expand on this comment? Ideally someone without knowledge of how xpconnect works should be able to understand it. Lots of people who read this won't know what an SOW is, or even how location objects differ from normal objects. I think you could explain that some C++ objects have an xpconnect reflector as well as a security wrapper, and since the reflector isn't expected to have any references, we turn the security wrapper into a cross-compartment wrapper.
Attachment #647631 -
Flags: feedback?(wmccloskey) → feedback+
Comment 34•12 years ago
|
||
Comment on attachment 647631 [details] [diff] [review] nuke wrapped reflector during transplant > { > AutoCompartment ac(cx, origobj); >- JSObject *wrapperGuts = targetobj; >- if (!ac.enter() || !JS_WrapObject(cx, &wrapperGuts)) >+ if (!ac.enter()) >+ return NULL; >+ >+ // Render the reflector inert, in case something gets ahold of it. Please explain this a little bit more. >+ JSObject *wrapperGuts = NewDeadProxyObject(cx, JS_GetGlobalForObject(cx, origobj)); >+ if (!wrapperGuts || !origobj->swap(cx, wrapperGuts)) >+ return NULL; >+ >+ // Turn origwrapper into a CCW to the new object. >+ wrapperGuts = targetobj; I might prefer to use a different variable for these two operations to avoid confusion while reading. Looks great! r=bholley
Attachment #647631 -
Flags: review?(bobbyholley+bmo) → review+
Assignee | ||
Comment 35•12 years ago
|
||
Carrying forward bholley's review. Thanks for the feedback! I updated the comments and split out the initial usage of wrapperGuts into a new variable reflectorGuts. If either of you have further suggestions for the comments, let me know. I'm going to do a try run here before I land it, as I don't remember if I did a full one before or not. https://tbpl.mozilla.org/?tree=Try&rev=c39f48fa3305
Attachment #647631 -
Attachment is obsolete: true
Attachment #648754 -
Flags: review+
Assignee | ||
Comment 36•12 years ago
|
||
Hopefully I have not traded clarity for wit with this comment: "After the swap we have a possibly-live object that isn't dangerous, and a possibly-dangerous object that isn't live."
Assignee | ||
Comment 37•12 years ago
|
||
I updated the patch a little to root reflectorGuts. https://hg.mozilla.org/integration/mozilla-inbound/rev/61ad814fab44
Comment 38•12 years ago
|
||
Backed out as part of the mass tree revert due to bustage caused by other landings: https://hg.mozilla.org/integration/mozilla-inbound/rev/c801b99d726f Once the tree is open again, this can reland :-)
Assignee | ||
Comment 39•12 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/0a48f2d0e2a0
Comment 40•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/0a48f2d0e2a0 Should this have a test?
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Not yet.
Assignee | ||
Comment 42•12 years ago
|
||
There are a handful of null deref crashes on Nightly in WrapperIsNotMainThreadOnly(), just like the test case here, so it seems like this could be an actual problem in the wild. Of course, I have deleted this code in 17 anyways in bug 763773, so the crash at least is fixed either way.
Assignee | ||
Comment 43•12 years ago
|
||
In the last 28 days, there were about 10 of these WrapperIsNotMainThreadOnly() crashes, but none since 8/4, which is the build before this patch landed, so it seems likely this patch fixed those crashes. Before my other patch fixed it permanently. ;)
Assignee | ||
Comment 44•12 years ago
|
||
Comment on attachment 648754 [details] [diff] [review] nuke wrapped reflector during transplant [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 650353 User impact if declined: some crashes, possible security problems Testing completed (on m-c, etc.): it has been on m-c for a few days Risk to taking this patch (and alternatives if risky): should be pretty low. We replace an object in a weird scary state with one in a known clear state, so we're probably only improving things. The crashes could be fixed instead by adding just a null check, but that wouldn't fix any lurking security issues we're not sure about. String or UUID changes made by this patch: none
Attachment #648754 -
Flags: approval-mozilla-beta?
Attachment #648754 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 45•12 years ago
|
||
I'm going to put this as sec-moderate. We don't know of anything particularly bad that results from this aside from crashes, but there could be.
status-firefox-esr10:
--- → unaffected
status-firefox15:
--- → affected
status-firefox16:
--- → affected
status-firefox17:
--- → fixed
Keywords: sec-moderate
Comment 46•12 years ago
|
||
Comment on attachment 648754 [details] [diff] [review] nuke wrapped reflector during transplant Not a high volume of crashes, but seems low risk enough that we can take this on Beta.
Attachment #648754 -
Flags: approval-mozilla-beta?
Attachment #648754 -
Flags: approval-mozilla-beta+
Attachment #648754 -
Flags: approval-mozilla-aurora?
Attachment #648754 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 47•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/10de2817a14e
Assignee | ||
Comment 48•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/bc0b53d05e68
Updated•12 years ago
|
Whiteboard: [advisory-tracking-]
Comment 49•12 years ago
|
||
I'm unable to reproduce this crash with the attached testcase using 2012-05-07 mozilla-central builds (tried opt and debug). Can someone give me some tips as to how I might reproduce this?
Keywords: verifyme
Whiteboard: [advisory-tracking-] → [advisory-tracking-][qa?]
Assignee | ||
Comment 50•12 years ago
|
||
I was able to reproduce it on the 5-7 build on OSX by reloading the page three or four times.
Comment 51•12 years ago
|
||
Thanks Andrews, I got it that time. Confirmed reproducible with Firefox 15.0a1 2012-05-06. Verified fixed with: * 2012-08-24 Firefox 15 Beta Debug * 2012-08-24 Firefox 16 Aurora Debug * 2012-08-24 Firefox 17 Nightly Debug
Status: RESOLVED → VERIFIED
QA Contact: anthony.s.hughes
Whiteboard: [advisory-tracking-][qa?] → [advisory-tracking-]
Updated•12 years ago
|
Group: core-security
Comment 52•12 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #19) > Ugh yeah. I _knew_ the assumption that there should be no intra-compartment > references to mFlatJSObject sounded too good to be true. FWIW, this entire situation is now much clearer to me. we put SOWs (same-compartment security wrappers) around native anonymous content, which generally involve XUL elements in a content HTML document. XUL elements are parented directly to their parent element (as opposed to their document, as is the case for HTML), so the parent chain of the anonymous content eventually hits the SOW-ed object, which violates the (faulty) assumption that there are no direct intra-compartment references to SCSW-ed objects.
Comment 53•12 years ago
|
||
So the issue here is that JS_DescribeScriptedCaller returns false when it finds no JS code on the stack, which we were incorrectly interpretting as an exception. The interesting thing here though is that there _should_ be JS code on the stack. What's actually happening is that GetContextFromStack skips over the top JS context (the sandbox context) because it isn't associated with a DOM window. This seems totally wrong IMO, and I'm going to file another bug to fix it.
Attachment #671423 -
Flags: review?(bzbarsky)
Comment 54•12 years ago
|
||
Comment on attachment 671423 [details] [diff] [review] Properly handle |false| return false from JS_DescribeScriptedCaller. v1 Arg, wrong bug number. Please ignore.
Attachment #671423 -
Attachment is obsolete: true
Attachment #671423 -
Flags: review?(bzbarsky)
You need to log in
before you can comment on or make changes to this bug.
Description
•