Closed Bug 761422 Opened 8 years ago Closed 7 years ago

Handle failures in ReparentWrapperIfFound more effectively

Categories

(Core :: XPConnect, defect, critical)

x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla17
Tracking Status
firefox14 --- unaffected
firefox15 + verified
firefox16 + verified
firefox17 + verified
firefox-esr10 --- unaffected

People

(Reporter: jruderman, Assigned: mccr8)

References

(Blocks 2 open bugs)

Details

(5 keywords, Whiteboard: [advisory-tracking-])

Attachments

(7 files, 3 obsolete files)

285 bytes, text/html
Details
5.15 KB, text/plain
Details
385 bytes, text/html
Details
4.29 KB, patch
bholley
: review+
Details | Diff | Splinter Review
6.13 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
5.12 KB, patch
bholley
: review+
Details | Diff | Splinter Review
4.85 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
When loading the testcase:

* Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIDOMHTMLDocument.adoptNode]

During the next cycle collection:

* ###!!! ASSERTION: Clearing a preserved wrapper!: '!PreservingWrapper()', file dom/base/nsWrapperCache.h, line 113

* Crash at nsXPCOMCycleCollectionParticipant::CheckForRightISupports.  The crash address is often 0x0 but sometimes "random" (0x10584745, 0x10be15bf).
Attached file stack traces
In a nightly, the crash looks different: bp-fe82b53d-61ef-47e9-bb33-4ccef2120604
Keywords: sec-critical
I can take a look.

That stack in comm 2 looks bogus.
Assignee: nobody → continuation
There seem to be a few other adoptNode crashes as well: bug 761499, and bug 761507 (as well as bug 751995, whose fix just landed).

adoptNode means brain transplants, and we already know from bug 752764 that that's busted. So it's worth checking to see if the patches from bug 758415 fix these.
So bug 758415 alone should fix bug 752764 and possibly this?  I'll check it out.
This still crashes on a recent-ish tree with the patches in bug 758415 and bug 751995 applied.
Unsurprisingly, the !PreservingWrapper() assertion is happening during finalization:

0 XPCWrappedNative::FlatJSObjectFinalized() + 335 (nsWrapperCache.h:113)
1 WrappedNativeFinalize(js::FreeOp*, JSObject*, WNHelperType) + 291 (XPCWrappedNativeJSOps.cpp:618)
2 bool js::gc::Arena::finalize<JSObject>(js::FreeOp*, js::gc::AllocKind, unsigned long) + 837
3 js::gc::FinalizeArenas(js::FreeOp*, js::gc::ArenaLists::ArenaList*, js::gc::AllocKind) + 455
4 js::gc::ArenaLists::finalizeObjects(js::FreeOp*) + 36
5 SweepPhase(JSRuntime*, js::JSGCInvocationKind, bool*) + 2016

The NS_ERROR_OUT_OF_MEMORY is coming from the call to ReparentWrappedNativeIfFound in nsNodeUtils::CloneAndAdopt.

It gets to ReparentWrappedNativeIfFound from a call to XPCWrappedNative::ReparentWrapperIfFound, where it comes from this particular point:
            JSObject *propertyHolder =
                JS_NewObjectWithGivenProto(ccx, NULL, NULL, aNewParent);
            if (!propertyHolder || !JS_CopyPropertiesFrom(ccx, propertyHolder, flat))
                return NS_ERROR_OUT_OF_MEMORY;

JS_CopyPropertiesFrom is returning false, because this compartment->wrap check is failing:
        Value v = shape->hasSlot() ? obj->getSlot(shape->slot()) : UndefinedValue();
        if (!cx->compartment->wrap(cx, &v))
            return false;

No idea what any of that means, though. ;)
(In reply to Andrew McCreight [:mccr8] from comment #7)
> JS_CopyPropertiesFrom is returning false, because this compartment->wrap
> check is failing:
>         Value v = shape->hasSlot() ? obj->getSlot(shape->slot()) :
> UndefinedValue();
>         if (!cx->compartment->wrap(cx, &v))
>             return false;
> 
> No idea what any of that means, though. ;)

So, why is wrap() returning false? OOM?
Doesn't seem to affect 14, so perhaps it is a regression from 15.
Marking ESR as unaffected since this doesn't affect 14 or earlier.
(In reply to Bobby Holley (:bholley) from comment #8)
> So, why is wrap() returning false? OOM?

I dug into this some more.  It looks like wrap is returning false because we're calling DirectWrapper::New on an XML object, which isn't supported.  That makes New return null, which makes WrapperFactory::Rewrap return null, which makes wrap() return false.  Here's what the stack trace looks like (line numbers are probably off a bit because I added a bunch of asserts):

#0  0x0000000103549429 in js::DirectWrapper::New (cx=<value temporarily unavailable, due to optimizations>, obj=<value temporarily unavailable, due to optimizations>, proto=<value temporarily unavailable, due to optimizations>, parent=<value temporarily unavailable, due to optimizations>, handler=<value temporarily unavailable, due to optimizations>) at /Users/amccreight/mz/cent3/js/src/jswrapper.cpp:315
#1  0x000000010265cf63 in xpc::WrapperFactory::Rewrap (cx=0x11bc06f60, obj=0x127edf0d0, wrappedProto=0x127eea240, parent=0x127ee5060, flags=0) at /Users/amccreight/mz/cent3/js/xpconnect/wrappers/WrapperFactory.cpp:448
#2  0x00000001033bc49d in JSCompartment::wrap (this=<value temporarily unavailable, due to optimizations>, cx=0x11bc06f60, vp=0x7fff5fbfa020) at /Users/amccreight/mz/cent3/js/src/jscompartment.cpp:275
#3  0x00000001033bce21 in JSCompartment::wrap (this=<value temporarily unavailable, due to optimizations>, cx=<value temporarily unavailable, due to optimizations>, objp=0x7fff5fbfa0a0) at /Users/amccreight/mz/cent3/js/src/jscompartment.cpp:327
#4  0x00000001033bc475 in JSCompartment::wrap (this=<value temporarily unavailable, due to optimizations>, cx=0x11bc06f60, vp=0x7fff5fbfa2b0) at /Users/amccreight/mz/cent3/js/src/jscompartment.cpp:265
#5  0x00000001034734b5 in JS_CopyPropertiesFrom (cx=0x11bc06f60, target=0x127ee7160, obj=0x127ed4460) at /Users/amccreight/mz/cent3/js/src/jsobj.cpp:3231

I guess that makes sense, given that the test case involves E4X.

I can't imagine that's very fixable, so maybe something along the error recovery path needs to be fixed.
Ah hm, right.

Note that this bug is effectively fixed on nightly, since e4x has been preffed off for content (on its way down Deprecation Drive to being removed completely).

http://mozillamemes.tumblr.com/post/20381316930/inspired-by-a-bugfix-which-caused-a

If we want to fix this on nightly, then we have to make the error recovery more graceful. This can hard, because we're kind of committed after doing the brain transplant. Is it the first call to JS_CopyPropertiesFrom that's failing, or the second?
It isn't off yet (see bug 765890) but hopefully it will be soon.

This is after the first call XPCWrappedNative::ReparentWrapperIfFound:

            JSObject *propertyHolder =
                JS_NewObjectWithGivenProto(ccx, NULL, NULL, aNewParent);
            if (!propertyHolder || !JS_CopyPropertiesFrom(ccx, propertyHolder, flat))
                return NS_ERROR_OUT_OF_MEMORY;

(it reports OOM even though this isn't actually OOM)
Depends on: 765890
We could of course just abort the browser if this fails. All of these untested failure paths always make me nervous. ;)
(In reply to Andrew McCreight [:mccr8] from comment #14)
> We could of course just abort the browser if this fails. All of these
> untested failure paths always make me nervous. ;)

Yeah, though a very easy content-triggerable abort isn't great. Better than a security hole to be sure, but is such a hack worth backporting?
Attached file testcase 2
This variant doesn't require CC.

adoptNode threw: [Exception... "Component returned failure code: 0x8007000e (NS_ERROR_OUT_OF_MEMORY) [nsIDOMHTMLDocument.adoptNode]"  nsresult: "0x8007000e (NS_ERROR_OUT_OF_MEMORY)"  location: "JS frame :: file:///Users/jruderman/Desktop/o.html :: boom :: line 13"  data: no]

###!!! ABORT: bad scope for new JSObjects: 'js::IsObjectInContextCompartment(lccx.GetScopeForNewJSObjects(), cx)', file js/xpconnect/src/XPCConvert.cpp, line 846
Yeah, the new test also hits the same failure, attempting to wrap an E4X object.

The function is now Wrapper::New instead of DirectWrapper::New.
I messed around with this a little.  I moved the chunk of code in XPCWrappedNative::ReparentWrapperIfFound that sets up newobj and property holder before the chunk that does the newMap/oldMap stuff.  The idea is that we only mess around with global state once we've actually successfully created the new objects.

That makes Jesse's second example crash with a stack similar to the first, which may be an improvement.  I investigated a bit, and it looks like the problem is that mJSHolders contains an object that has already been freed.  It looks like that object was added to mJSHolders by AutoWrapperChanger.  I'm not sure why aCOMObject from ReparentWrapperIfFound is getting freed without being removed.
After some more investigation (I guess that sounds more dignified than "messing around"), I've figured out what the other problem here is.  In XPCWrappedNative::ReparentWrapperIfFound, we do a JS_CloneObject of |flat|, creating |newobj|.  |flat| has the private |aCOMObj|, so |newobj| does, too.  Then we try to do a JS_CopyPropertiesFrom, but it fails, so we return.

|newobj| is garbage, so it is GCed.  When we finalize it, we call ClearWrapper() on |aCOMObj|, which has |flat| as its reflector.  It is still actually alive, so it is still preserved, thus resulting in our first assertion, "Clearing a preserved wrapper".  I'm guessing what happens there is that we are tearing down ClearWrapper() without the CC, so we don't call Unlink(), so we never call ReleaseWrapper() on it, even though it is dead, so we get a dead point in the XPC data structure, causing the CC crash.

The fix for this precise test case is simple: just NULL out the private before we return.  I'm not really sure what the best way to fix this along all paths is.  Maybe just null out the private immediately, and set it when we're ready to put a good value in.
Sigh. We're totally not set up to deal with failures in ReparentWrapperIfFound - we're changing a dizzying amount of intricate state. Can we just MOZ_CRASH on the failure points we don't expect to hit, and very explicitly handle the ones that can happen?

mccr8++ for digging into this mess.
Keywords: regression
Um, I don't think we need qa or a regression window. This is a regression from CPG.
Blocks: cpg
Perform the cloning before we adjust the global maps.  This turns the failure of testcase 2 into the failure of testcase 1.  I also moved the declarations of oldMap and newMap inside the scope for the lock, to make their extent more clear.  I also changed the CopyPropertiesFrom failure to return NS_ERROR_FAILURE instead of NS_ERROR_OUT_OF_MEMORY.
Attachment #646774 - Attachment is obsolete: true
Attachment #646977 - Attachment description: clone before we adjust XPC maps → part 1: clone before we adjust XPC maps
This patch fixes the problem where after we clone, but before we clear the old private, we have two reflectors pointing to a single native. My solution is to use an RAII class that nulls out the private of the clone if we haven't nulled out the private of the new class, because there are a lot of possible error paths. My initial version didn't work, because it checked the private of the original object, but on successful exit the original object has been turned into something else so reading the private fails. Instead, here we have a boolean flag to indicate what state the class is in.

This fixes the two test cases. Unfortunately, I don't have a test, because E4X is going to be turned off soon. It might be a good idea to create a test-only way to create an unwrappable object, which would allow us to test and fuzz wrapping failures.
Attached patch part 3: don't try to be a hero (obsolete) — Splinter Review
Bobby recommended that we just make all failures fatal that we can't handle, so that's what this patch does.  It turns all failures in the latter half of the function fatal. We're probably in a totally messed up state that is going to crash anyways. We could probably disentangle a few of these failures if we find some way to actually trigger them, but they don't seem particularly likely to fail outside of an OOM scenario.
Attachment #646977 - Flags: review?(bobbyholley+bmo)
Attachment #646980 - Flags: review?(bobbyholley+bmo)
Attachment #646981 - Flags: review?(bobbyholley+bmo)
Attachment #646977 - Flags: review?(bobbyholley+bmo) → review+
Comment on attachment 646981 [details] [diff] [review]
part 3: don't try to be a hero

># HG >             // Before proceeding, eagerly create any same-compartment security wrappers
>             // that the object might have. This forces us to take the 'WithWrapper' path
>-            // while transplanting that handles this stuff correctly.
>+            // while transplanting that handles this stuff correctly. This could probably
>+            // be moved before updating the scope maps, then made to have a non-fatal
>+            // failure.

Can we just move it, possibly in another patch? There's really no reason at all it has to be here.

>             SetSlimWrapperProto(flat, newProto.get());
>-            if (!JS_SetPrototype(ccx, flat, newProto->GetJSProtoObject())) {
>-                // this is bad, very bad
>-                SetSlimWrapperProto(flat, nsnull);
>-                NS_ERROR("JS_SetPrototype failed");
>-                return NS_ERROR_FAILURE;
>-            }
>+            if (!JS_SetPrototype(ccx, flat, newProto->GetJSProtoObject()))
>+                MOZ_CRASH(); // this is bad, very bad

We should be careful here about interactions with bug 776019. I think crashing here is the right call, but that has implications about how we approach the other bug.
Attachment #646981 - Flags: review?(bobbyholley+bmo) → review+
Comment on attachment 646980 [details] [diff] [review]
part 2: guard against double reflectors on failure

Given that in the next patch we start fatally aborting after the ExpandoObject stuff, I don't think we need to wait so long before nulling out the private. So I think we should move that stuff earlier so that the length of this double jeopardy is minimized.

I would also prefer to avoid using the guard to do the actual private clearing. I'd rather the guard just be a (scoped?) guard, so that we don't have to think about it for non-error cases. After it goes away, we can manually null out the private of the other object.
Attachment #646980 - Flags: review?(bobbyholley+bmo) → review+
(In reply to Bobby Holley (:bholley) from comment #27)
> Can we just move it, possibly in another patch? There's really no reason at
> all it has to be here.
Sure, I can just manually test failures in there to see if anything bad happens.

> We should be careful here about interactions with bug 776019. I think
> crashing here is the right call, but that has implications about how we
> approach the other bug.

Okay, but I don't understand that other bug at all, so you'll have to be more explicit if there's something different I should do here.

Your suggestions for part 2 sound like a good idea, I'll try that.
(In reply to Andrew McCreight [:mccr8] from comment #29)
> (In reply to Bobby Holley (:bholley) from comment #27)
> > Can we just move it, possibly in another patch? There's really no reason at
> > all it has to be here.
> Sure, I can just manually test failures in there to see if anything bad
> happens.
> 
> > We should be careful here about interactions with bug 776019. I think
> > crashing here is the right call, but that has implications about how we
> > approach the other bug.
> 
> Okay, but I don't understand that other bug at all, so you'll have to be
> more explicit if there's something different I should do here.

It's just a question of whether we allow script to do things that make JS_SetPrototype fail. I advocate not, for reasons like this one.
In this new version of the patch, I move back the setting of the private, and limit the scope to include only the part where the private is non-null.  The guard now just checks if the private of the old object is null on exit.  Carrying forward bholley's r+.
Attachment #646980 - Attachment is obsolete: true
Attachment #647225 - Flags: review+
This is a pretty bad diff, but what it does is move the same compartment wrapper stuff before the global XPC maps are adjusted.  I tried making the same-compartment test always fail, and ran a few test cases that do adoptNode, and it seemed to run without causing any crashes or assertions.
Attachment #647227 - Flags: review?(bobbyholley+bmo)
Basically the same as before, except we don't convert the same compartment security wrapper failure into a crash, because we do it before we adjust the global map.  Carrying forward bholley's r+.
Attachment #646981 - Attachment is obsolete: true
Attachment #647230 - Flags: review+
Comment on attachment 647227 [details] [diff] [review]
part 3: create sec wrappers before adjusting maps

r=bholley on the understanding that there are no snakes in that diff. :-)
Attachment #647227 - Flags: review?(bobbyholley+bmo) → review+
Summary: Crash with adoptNode, E4X → Handle failures in ReparentWrapperIfFound more effectively
Comment on attachment 646977 [details] [diff] [review]
part 1: clone before we adjust XPC maps

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 650353
User impact if declined: security vulnerabilities
Testing completed (on m-c, etc.): It has been on m-c for almost a week. I don't see any crashes that look like they are from this.
String or UUID changes made by this patch: none
Risk to taking this patch (and alternatives if risky): I don't think it should be too bad. This series of patches is pretty huge, but is mostly just moving around blocks of code that should be independent, and handling an error that is likely to be rare. The final part of the patch makes a number of errors that should be rare into fatal crashes (and thus will show up on crash stats as being due to this patch). I'll probably let this sit on aurora for a few days before putting it to beta.
Attachment #646977 - Flags: approval-mozilla-beta?
Attachment #646977 - Flags: approval-mozilla-aurora?
Attachment #647225 - Flags: approval-mozilla-beta?
Attachment #647225 - Flags: approval-mozilla-aurora?
Attachment #647227 - Flags: approval-mozilla-beta?
Attachment #647227 - Flags: approval-mozilla-aurora?
Attachment #647230 - Flags: approval-mozilla-beta?
Attachment #647230 - Flags: approval-mozilla-aurora?
Attachment #646977 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #647225 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #647227 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 647230 [details] [diff] [review]
part 4: don't try to be a hero

Good idea to let this bake a bit, so approving Aurora for starters.
Attachment #647230 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #646977 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #647225 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #647227 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 647230 [details] [diff] [review]
part 4: don't try to be a hero

It's been on Aurora a few days now, no fallout reported in this bug so approving for Beta.
Attachment #647230 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Confirmed both testcases crash 2012-06-04 Firefox 15.0a1 Nightly Debug. 

Does not crash:
2012-08-01 Firefox 17.0a1 Nightly Debug
2012-08-06 Firefox 16.0a2 Aurora Debug
2012-08-09 Firefox 15.0 Beta Debug
Status: RESOLVED → VERIFIED
Keywords: verifyme
QA Contact: anthony.s.hughes
Whiteboard: [advisory-tracking-]
Group: core-security
The test cases here require E4X, so they aren't going to work any more on trunk. Having some kind of debug-only "unwrappable thing" could possibly be useful for ferreting out these kinds of errors, but I'm not sure how bad it would be to add something like that.
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.