Last Comment Bug 761422 - Handle failures in ReparentWrapperIfFound more effectively
: Handle failures in ReparentWrapperIfFound more effectively
Status: VERIFIED FIXED
[advisory-tracking-]
: assertion, crash, regression, sec-critical, testcase
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: Trunk
: x86_64 Mac OS X
: -- critical (vote)
: mozilla17
Assigned To: Andrew McCreight [:mccr8]
: Anthony Hughes (:ashughes) [GFX][QA][Mentor]
:
Mentors:
Depends on: 765890
Blocks: 326633 594645 cpg
  Show dependency treegraph
 
Reported: 2012-06-04 15:53 PDT by Jesse Ruderman
Modified: 2012-09-24 10:49 PDT (History)
9 users (show)
continuation: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
+
verified
+
verified
+
verified
unaffected


Attachments
testcase (crashes Firefox on next CC) (285 bytes, text/html)
2012-06-04 15:53 PDT, Jesse Ruderman
no flags Details
stack traces (5.15 KB, text/plain)
2012-06-04 15:53 PDT, Jesse Ruderman
no flags Details
testcase 2 (385 bytes, text/html)
2012-07-12 20:05 PDT, Jesse Ruderman
no flags Details
allocate the new obj before we mess with maps, catch double reflectors, WIP (8.28 KB, patch)
2012-07-27 17:24 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Splinter Review
part 1: clone before we adjust XPC maps (4.29 KB, patch)
2012-07-29 09:33 PDT, Andrew McCreight [:mccr8]
bobbyholley: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review
part 2: guard against double reflectors on failure (4.79 KB, patch)
2012-07-29 09:52 PDT, Andrew McCreight [:mccr8]
bobbyholley: review+
Details | Diff | Splinter Review
part 3: don't try to be a hero (6.06 KB, patch)
2012-07-29 10:02 PDT, Andrew McCreight [:mccr8]
bobbyholley: review+
Details | Diff | Splinter Review
part 2: guard against double reflectors on failure (6.13 KB, patch)
2012-07-30 11:32 PDT, Andrew McCreight [:mccr8]
continuation: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review
part 3: create sec wrappers before adjusting maps (5.12 KB, patch)
2012-07-30 11:35 PDT, Andrew McCreight [:mccr8]
bobbyholley: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review
part 4: don't try to be a hero (4.85 KB, patch)
2012-07-30 11:38 PDT, Andrew McCreight [:mccr8]
continuation: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Jesse Ruderman 2012-06-04 15:53:12 PDT
Created attachment 629977 [details]
testcase (crashes Firefox on next CC)

When loading the testcase:

* Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIDOMHTMLDocument.adoptNode]

During the next cycle collection:

* ###!!! ASSERTION: Clearing a preserved wrapper!: '!PreservingWrapper()', file dom/base/nsWrapperCache.h, line 113

* Crash at nsXPCOMCycleCollectionParticipant::CheckForRightISupports.  The crash address is often 0x0 but sometimes "random" (0x10584745, 0x10be15bf).
Comment 1 Jesse Ruderman 2012-06-04 15:53:26 PDT
Created attachment 629978 [details]
stack traces
Comment 2 Jesse Ruderman 2012-06-04 15:55:41 PDT
In a nightly, the crash looks different: bp-fe82b53d-61ef-47e9-bb33-4ccef2120604
Comment 3 Andrew McCreight [:mccr8] 2012-06-04 18:04:15 PDT
I can take a look.

That stack in comm 2 looks bogus.
Comment 4 Bobby Holley (:bholley) (busy with Stylo) 2012-06-05 01:24:12 PDT
There seem to be a few other adoptNode crashes as well: bug 761499, and bug 761507 (as well as bug 751995, whose fix just landed).

adoptNode means brain transplants, and we already know from bug 752764 that that's busted. So it's worth checking to see if the patches from bug 758415 fix these.
Comment 5 Andrew McCreight [:mccr8] 2012-06-05 14:03:39 PDT
So bug 758415 alone should fix bug 752764 and possibly this?  I'll check it out.
Comment 6 Andrew McCreight [:mccr8] 2012-06-05 15:22:17 PDT
This still crashes on a recent-ish tree with the patches in bug 758415 and bug 751995 applied.
Comment 7 Andrew McCreight [:mccr8] 2012-06-05 17:40:14 PDT
Unsurprisingly, the !PreservingWrapper() assertion is happening during finalization:

0 XPCWrappedNative::FlatJSObjectFinalized() + 335 (nsWrapperCache.h:113)
1 WrappedNativeFinalize(js::FreeOp*, JSObject*, WNHelperType) + 291 (XPCWrappedNativeJSOps.cpp:618)
2 bool js::gc::Arena::finalize<JSObject>(js::FreeOp*, js::gc::AllocKind, unsigned long) + 837
3 js::gc::FinalizeArenas(js::FreeOp*, js::gc::ArenaLists::ArenaList*, js::gc::AllocKind) + 455
4 js::gc::ArenaLists::finalizeObjects(js::FreeOp*) + 36
5 SweepPhase(JSRuntime*, js::JSGCInvocationKind, bool*) + 2016

The NS_ERROR_OUT_OF_MEMORY is coming from the call to ReparentWrappedNativeIfFound in nsNodeUtils::CloneAndAdopt.

It gets to ReparentWrappedNativeIfFound from a call to XPCWrappedNative::ReparentWrapperIfFound, where it comes from this particular point:
            JSObject *propertyHolder =
                JS_NewObjectWithGivenProto(ccx, NULL, NULL, aNewParent);
            if (!propertyHolder || !JS_CopyPropertiesFrom(ccx, propertyHolder, flat))
                return NS_ERROR_OUT_OF_MEMORY;

JS_CopyPropertiesFrom is returning false, because this compartment->wrap check is failing:
        Value v = shape->hasSlot() ? obj->getSlot(shape->slot()) : UndefinedValue();
        if (!cx->compartment->wrap(cx, &v))
            return false;

No idea what any of that means, though. ;)
Comment 8 Bobby Holley (:bholley) (busy with Stylo) 2012-06-06 02:16:40 PDT
(In reply to Andrew McCreight [:mccr8] from comment #7)
> JS_CopyPropertiesFrom is returning false, because this compartment->wrap
> check is failing:
>         Value v = shape->hasSlot() ? obj->getSlot(shape->slot()) :
> UndefinedValue();
>         if (!cx->compartment->wrap(cx, &v))
>             return false;
> 
> No idea what any of that means, though. ;)

So, why is wrap() returning false? OOM?
Comment 9 Andrew McCreight [:mccr8] 2012-06-07 13:56:05 PDT
Doesn't seem to affect 14, so perhaps it is a regression from 15.
Comment 10 Al Billings [:abillings] 2012-06-07 14:08:20 PDT
Marking ESR as unaffected since this doesn't affect 14 or earlier.
Comment 11 Andrew McCreight [:mccr8] 2012-06-29 11:41:25 PDT
(In reply to Bobby Holley (:bholley) from comment #8)
> So, why is wrap() returning false? OOM?

I dug into this some more.  It looks like wrap is returning false because we're calling DirectWrapper::New on an XML object, which isn't supported.  That makes New return null, which makes WrapperFactory::Rewrap return null, which makes wrap() return false.  Here's what the stack trace looks like (line numbers are probably off a bit because I added a bunch of asserts):

#0  0x0000000103549429 in js::DirectWrapper::New (cx=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, proto=<value temporarily unavailable, due to optimizations>, parent=<value temporarily unavailable, due to optimizations>, handler=<value temporarily unavailable, due to optimizations>) at /Users/amccreight/mz/cent3/js/src/jswrapper.cpp:315
#1  0x000000010265cf63 in xpc::WrapperFactory::Rewrap (cx=0x11bc06f60, obj=0x127edf0d0, wrappedProto=0x127eea240, parent=0x127ee5060, flags=0) at /Users/amccreight/mz/cent3/js/xpconnect/wrappers/WrapperFactory.cpp:448
#2  0x00000001033bc49d in JSCompartment::wrap (this=<value temporarily unavailable, due to optimizations>, cx=0x11bc06f60, vp=0x7fff5fbfa020) at /Users/amccreight/mz/cent3/js/src/jscompartment.cpp:275
#3  0x00000001033bce21 in JSCompartment::wrap (this=<value temporarily unavailable, due to optimizations>, cx=<value temporarily unavailable, due to optimizations>, objp=0x7fff5fbfa0a0) at /Users/amccreight/mz/cent3/js/src/jscompartment.cpp:327
#4  0x00000001033bc475 in JSCompartment::wrap (this=<value temporarily unavailable, due to optimizations>, cx=0x11bc06f60, vp=0x7fff5fbfa2b0) at /Users/amccreight/mz/cent3/js/src/jscompartment.cpp:265
#5  0x00000001034734b5 in JS_CopyPropertiesFrom (cx=0x11bc06f60, target=0x127ee7160, obj=0x127ed4460) at /Users/amccreight/mz/cent3/js/src/jsobj.cpp:3231

I guess that makes sense, given that the test case involves E4X.

I can't imagine that's very fixable, so maybe something along the error recovery path needs to be fixed.
Comment 12 Bobby Holley (:bholley) (busy with Stylo) 2012-06-29 11:59:59 PDT
Ah hm, right.

Note that this bug is effectively fixed on nightly, since e4x has been preffed off for content (on its way down Deprecation Drive to being removed completely).

http://mozillamemes.tumblr.com/post/20381316930/inspired-by-a-bugfix-which-caused-a

If we want to fix this on nightly, then we have to make the error recovery more graceful. This can hard, because we're kind of committed after doing the brain transplant. Is it the first call to JS_CopyPropertiesFrom that's failing, or the second?
Comment 13 Andrew McCreight [:mccr8] 2012-06-29 12:12:11 PDT
It isn't off yet (see bug 765890) but hopefully it will be soon.

This is after the first call XPCWrappedNative::ReparentWrapperIfFound:

            JSObject *propertyHolder =
                JS_NewObjectWithGivenProto(ccx, NULL, NULL, aNewParent);
            if (!propertyHolder || !JS_CopyPropertiesFrom(ccx, propertyHolder, flat))
                return NS_ERROR_OUT_OF_MEMORY;

(it reports OOM even though this isn't actually OOM)
Comment 14 Andrew McCreight [:mccr8] 2012-06-29 13:28:56 PDT
We could of course just abort the browser if this fails. All of these untested failure paths always make me nervous. ;)
Comment 15 Bobby Holley (:bholley) (busy with Stylo) 2012-06-29 13:36:18 PDT
(In reply to Andrew McCreight [:mccr8] from comment #14)
> We could of course just abort the browser if this fails. All of these
> untested failure paths always make me nervous. ;)

Yeah, though a very easy content-triggerable abort isn't great. Better than a security hole to be sure, but is such a hack worth backporting?
Comment 16 Jesse Ruderman 2012-07-12 20:05:03 PDT
Created attachment 641706 [details]
testcase 2

This variant doesn't require CC.

adoptNode threw: [Exception... "Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIDOMHTMLDocument.adoptNode]"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: file:///Users/jruderman/Desktop/o.html :: boom :: line 13"  data: no]

###!!! ABORT: bad scope for new JSObjects: 'js::IsObjectInContextCompartment(lccx.GetScopeForNewJSObjects(), cx)', file js/xpconnect/src/XPCConvert.cpp, line 846
Comment 17 Andrew McCreight [:mccr8] 2012-07-25 13:53:05 PDT
Yeah, the new test also hits the same failure, attempting to wrap an E4X object.

The function is now Wrapper::New instead of DirectWrapper::New.
Comment 18 Andrew McCreight [:mccr8] 2012-07-25 18:55:22 PDT
I messed around with this a little.  I moved the chunk of code in XPCWrappedNative::ReparentWrapperIfFound that sets up newobj and property holder before the chunk that does the newMap/oldMap stuff.  The idea is that we only mess around with global state once we've actually successfully created the new objects.

That makes Jesse's second example crash with a stack similar to the first, which may be an improvement.  I investigated a bit, and it looks like the problem is that mJSHolders contains an object that has already been freed.  It looks like that object was added to mJSHolders by AutoWrapperChanger.  I'm not sure why aCOMObject from ReparentWrapperIfFound is getting freed without being removed.
Comment 19 Andrew McCreight [:mccr8] 2012-07-26 11:33:52 PDT
After some more investigation (I guess that sounds more dignified than "messing around"), I've figured out what the other problem here is.  In XPCWrappedNative::ReparentWrapperIfFound, we do a JS_CloneObject of |flat|, creating |newobj|.  |flat| has the private |aCOMObj|, so |newobj| does, too.  Then we try to do a JS_CopyPropertiesFrom, but it fails, so we return.

|newobj| is garbage, so it is GCed.  When we finalize it, we call ClearWrapper() on |aCOMObj|, which has |flat| as its reflector.  It is still actually alive, so it is still preserved, thus resulting in our first assertion, "Clearing a preserved wrapper".  I'm guessing what happens there is that we are tearing down ClearWrapper() without the CC, so we don't call Unlink(), so we never call ReleaseWrapper() on it, even though it is dead, so we get a dead point in the XPC data structure, causing the CC crash.

The fix for this precise test case is simple: just NULL out the private before we return.  I'm not really sure what the best way to fix this along all paths is.  Maybe just null out the private immediately, and set it when we're ready to put a good value in.
Comment 20 Bobby Holley (:bholley) (busy with Stylo) 2012-07-26 13:29:19 PDT
Sigh. We're totally not set up to deal with failures in ReparentWrapperIfFound - we're changing a dizzying amount of intricate state. Can we just MOZ_CRASH on the failure points we don't expect to hit, and very explicitly handle the ones that can happen?

mccr8++ for digging into this mess.
Comment 21 Bobby Holley (:bholley) (busy with Stylo) 2012-07-27 01:59:00 PDT
Um, I don't think we need qa or a regression window. This is a regression from CPG.
Comment 22 Andrew McCreight [:mccr8] 2012-07-27 17:24:30 PDT
Created attachment 646774 [details] [diff] [review]
allocate the new obj before we mess with maps, catch double reflectors, WIP
Comment 23 Andrew McCreight [:mccr8] 2012-07-29 08:32:48 PDT
Try run looked good https://tbpl.mozilla.org/?tree=Try&rev=164ce213b6bf
Comment 24 Andrew McCreight [:mccr8] 2012-07-29 09:33:29 PDT
Created attachment 646977 [details] [diff] [review]
part 1: clone before we adjust XPC maps

Perform the cloning before we adjust the global maps.  This turns the failure of testcase 2 into the failure of testcase 1.  I also moved the declarations of oldMap and newMap inside the scope for the lock, to make their extent more clear.  I also changed the CopyPropertiesFrom failure to return NS_ERROR_FAILURE instead of NS_ERROR_OUT_OF_MEMORY.
Comment 25 Andrew McCreight [:mccr8] 2012-07-29 09:52:08 PDT
Created attachment 646980 [details] [diff] [review]
part 2: guard against double reflectors on failure

This patch fixes the problem where after we clone, but before we clear the old private, we have two reflectors pointing to a single native. My solution is to use an RAII class that nulls out the private of the clone if we haven't nulled out the private of the new class, because there are a lot of possible error paths. My initial version didn't work, because it checked the private of the original object, but on successful exit the original object has been turned into something else so reading the private fails. Instead, here we have a boolean flag to indicate what state the class is in.

This fixes the two test cases. Unfortunately, I don't have a test, because E4X is going to be turned off soon. It might be a good idea to create a test-only way to create an unwrappable object, which would allow us to test and fuzz wrapping failures.
Comment 26 Andrew McCreight [:mccr8] 2012-07-29 10:02:25 PDT
Created attachment 646981 [details] [diff] [review]
part 3: don't try to be a hero

Bobby recommended that we just make all failures fatal that we can't handle, so that's what this patch does.  It turns all failures in the latter half of the function fatal. We're probably in a totally messed up state that is going to crash anyways. We could probably disentangle a few of these failures if we find some way to actually trigger them, but they don't seem particularly likely to fail outside of an OOM scenario.
Comment 27 Bobby Holley (:bholley) (busy with Stylo) 2012-07-30 01:44:54 PDT
Comment on attachment 646981 [details] [diff] [review]
part 3: don't try to be a hero

># HG >             // Before proceeding, eagerly create any same-compartment security wrappers
>             // that the object might have. This forces us to take the 'WithWrapper' path
>-            // while transplanting that handles this stuff correctly.
>+            // while transplanting that handles this stuff correctly. This could probably
>+            // be moved before updating the scope maps, then made to have a non-fatal
>+            // failure.

Can we just move it, possibly in another patch? There's really no reason at all it has to be here.

>             SetSlimWrapperProto(flat, newProto.get());
>-            if (!JS_SetPrototype(ccx, flat, newProto->GetJSProtoObject())) {
>-                // this is bad, very bad
>-                SetSlimWrapperProto(flat, nsnull);
>-                NS_ERROR("JS_SetPrototype failed");
>-                return NS_ERROR_FAILURE;
>-            }
>+            if (!JS_SetPrototype(ccx, flat, newProto->GetJSProtoObject()))
>+                MOZ_CRASH(); // this is bad, very bad

We should be careful here about interactions with bug 776019. I think crashing here is the right call, but that has implications about how we approach the other bug.
Comment 28 Bobby Holley (:bholley) (busy with Stylo) 2012-07-30 01:50:10 PDT
Comment on attachment 646980 [details] [diff] [review]
part 2: guard against double reflectors on failure

Given that in the next patch we start fatally aborting after the ExpandoObject stuff, I don't think we need to wait so long before nulling out the private. So I think we should move that stuff earlier so that the length of this double jeopardy is minimized.

I would also prefer to avoid using the guard to do the actual private clearing. I'd rather the guard just be a (scoped?) guard, so that we don't have to think about it for non-error cases. After it goes away, we can manually null out the private of the other object.
Comment 29 Andrew McCreight [:mccr8] 2012-07-30 07:04:52 PDT
(In reply to Bobby Holley (:bholley) from comment #27)
> Can we just move it, possibly in another patch? There's really no reason at
> all it has to be here.
Sure, I can just manually test failures in there to see if anything bad happens.

> We should be careful here about interactions with bug 776019. I think
> crashing here is the right call, but that has implications about how we
> approach the other bug.

Okay, but I don't understand that other bug at all, so you'll have to be more explicit if there's something different I should do here.

Your suggestions for part 2 sound like a good idea, I'll try that.
Comment 30 Bobby Holley (:bholley) (busy with Stylo) 2012-07-30 07:36:58 PDT
(In reply to Andrew McCreight [:mccr8] from comment #29)
> (In reply to Bobby Holley (:bholley) from comment #27)
> > Can we just move it, possibly in another patch? There's really no reason at
> > all it has to be here.
> Sure, I can just manually test failures in there to see if anything bad
> happens.
> 
> > We should be careful here about interactions with bug 776019. I think
> > crashing here is the right call, but that has implications about how we
> > approach the other bug.
> 
> Okay, but I don't understand that other bug at all, so you'll have to be
> more explicit if there's something different I should do here.

It's just a question of whether we allow script to do things that make JS_SetPrototype fail. I advocate not, for reasons like this one.
Comment 31 Andrew McCreight [:mccr8] 2012-07-30 11:32:49 PDT
Created attachment 647225 [details] [diff] [review]
part 2: guard against double reflectors on failure

In this new version of the patch, I move back the setting of the private, and limit the scope to include only the part where the private is non-null.  The guard now just checks if the private of the old object is null on exit.  Carrying forward bholley's r+.
Comment 32 Andrew McCreight [:mccr8] 2012-07-30 11:35:44 PDT
Created attachment 647227 [details] [diff] [review]
part 3: create sec wrappers before adjusting maps

This is a pretty bad diff, but what it does is move the same compartment wrapper stuff before the global XPC maps are adjusted.  I tried making the same-compartment test always fail, and ran a few test cases that do adoptNode, and it seemed to run without causing any crashes or assertions.
Comment 33 Andrew McCreight [:mccr8] 2012-07-30 11:38:28 PDT
Created attachment 647230 [details] [diff] [review]
part 4: don't try to be a hero

Basically the same as before, except we don't convert the same compartment security wrapper failure into a crash, because we do it before we adjust the global map.  Carrying forward bholley's r+.
Comment 34 Bobby Holley (:bholley) (busy with Stylo) 2012-07-30 12:47:31 PDT
Comment on attachment 647227 [details] [diff] [review]
part 3: create sec wrappers before adjusting maps

r=bholley on the understanding that there are no snakes in that diff. :-)
Comment 37 Andrew McCreight [:mccr8] 2012-08-06 13:25:59 PDT
Comment on attachment 646977 [details] [diff] [review]
part 1: clone before we adjust XPC maps

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 650353
User impact if declined: security vulnerabilities
Testing completed (on m-c, etc.): It has been on m-c for almost a week. I don't see any crashes that look like they are from this.
String or UUID changes made by this patch: none
Risk to taking this patch (and alternatives if risky): I don't think it should be too bad. This series of patches is pretty huge, but is mostly just moving around blocks of code that should be independent, and handling an error that is likely to be rare. The final part of the patch makes a number of errors that should be rare into fatal crashes (and thus will show up on crash stats as being due to this patch). I'll probably let this sit on aurora for a few days before putting it to beta.
Comment 38 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-06 14:44:53 PDT
Comment on attachment 647230 [details] [diff] [review]
part 4: don't try to be a hero

Good idea to let this bake a bit, so approving Aurora for starters.
Comment 40 Lukas Blakk [:lsblakk] use ?needinfo 2012-08-09 10:05:26 PDT
Comment on attachment 647230 [details] [diff] [review]
part 4: don't try to be a hero

It's been on Aurora a few days now, no fallout reported in this bug so approving for Beta.
Comment 41 Andrew McCreight [:mccr8] 2012-08-09 14:23:06 PDT
Thanks. Yeah, I looked just now, and I don't see any aborts triggering in XPConnect, nor any XPCWrappedNative related crashes.
https://hg.mozilla.org/releases/mozilla-beta/rev/d5fe000f51bf
https://hg.mozilla.org/releases/mozilla-beta/rev/c6e67e5626e2
https://hg.mozilla.org/releases/mozilla-beta/rev/57841381e779
https://hg.mozilla.org/releases/mozilla-beta/rev/757d9c0a8d0a
Comment 42 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-08-21 15:36:51 PDT
Confirmed both testcases crash 2012-06-04 Firefox 15.0a1 Nightly Debug. 

Does not crash:
2012-08-01 Firefox 17.0a1 Nightly Debug
2012-08-06 Firefox 16.0a2 Aurora Debug
2012-08-09 Firefox 15.0 Beta Debug
Comment 43 Andrew McCreight [:mccr8] 2012-09-24 10:49:53 PDT
The test cases here require E4X, so they aren't going to work any more on trunk. Having some kind of debug-only "unwrappable thing" could possibly be useful for ferreting out these kinds of errors, but I'm not sure how bad it would be to add something like that.

Note You need to log in before you can comment on or make changes to this bug.