Closed Bug 791896 Opened 12 years ago Closed 7 years ago

Cross compartment pointers in ReparentWrapperIfFound

Categories

(Core :: XPConnect, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox15 --- wontfix
firefox16 - wontfix
firefox17 - wontfix
firefox18 - wontfix
firefox19 - wontfix
firefox20 - wontfix
firefox21 - wontfix
firefox-esr10 --- unaffected

People

(Reporter: jruderman, Unassigned)

References

Details

(Keywords: assertion, sec-other, testcase)

Attachments

(4 files, 5 obsolete files)

Attached file testcase
1. Install https://www.squarefree.com/extensions/domFuzzLite3.xpi 2. Load the testcase 3. Wait a minute while it seems to be hanging Result: Assertion failure: thing->compartment() == trc->compartment || thing->compartment() == trc->runtime->atomsCompartment || (trc->srcKind == JSTRACE_OBJECT && InCrossCompartmentMap((JSObject *)trc->src, thing, kind)), at js/src/jsgc.cpp:3316
This assertion was added in bug 790865.
Attached file stack
Group: mozilla-corporation-confidential
Group: mozilla-corporation-confidential → core-security
Probably good to file these assertion failures as core-security for now.
It is plausible that I may be able to figure this out, so I'll take a look-see.
Assignee: nobody → continuation
What is happening here is that we are in the middle of ReparentWrapperIfFound, at the point where we have two reflectors, |flat| and |newobj|, for a single XPCWN, |wrapper|. |flat| is in the old compartment, |newobj| is in the new compartment. |wrapper|'s proto is still the old one, so it is in the old compartment. We call JS_NewObjectWithGivenProto, which ends up creating a new shape, which triggers a GC. When we trace |newobj|, which is in the new compartment, we end up tracing |wrapper|'s proto in the old compartment, which causes the cross-compartment assertion. I'm not sure what the right fix for this is. We're in this bad state from the JS_CloneObject to the wrapper->SetProto(newProto) down below. Maybe we could carefully avoid allocations in between those calls, perhaps by delicately reordering things. Prior to bug 761422, we did the JS_CloneObject after the SetProto, but I think that just meant that |flat| had the sketchy cross compartment reference rather than |newobj| when we call JS_NewObjectWithGivenProto, so it wasn't any better.
Component: DOM → XPConnect
Keywords: sec-critical
I'm marking this sec-critical because the possibility of the GC encountering cross-compartment edges where it wasn't expecting them seems bad. It is probably much harder to exploit without gcZeal, as there is a fairly narrow danger zone, but you never know.
Is there some way we can just forbid GC with a scoped stack class?
(In reply to Bobby Holley (:bholley) from comment #7) > Is there some way we can just forbid GC with a scoped stack class? Something could probably be added, but you'd have to just crash if you tried to GC if there wasn't any space, which is probably bad. There is, however, JS::AutoAssertNoGC, which is an RAII class that asserts in a debug build if it gets to a point where there is the possibility of GC, so if we carefully juggle things then we can at least check that our assumptions are right. This patch has one possible solution, and has a few parts. 1) Hoist the allocation of the property holder above the clone. This eliminates the possibility of GC between the clone and when we leave the double-reflector state with the JS_SetPrivate on |flat|. 2) Set the proto of wrapper before we transition from double reflector to reflector. This ensures that once we are in a single reflector state the proto pointer is no longer cross compartment. 3) Add a AutoAssertNoGC for the period we are in the double-reflector state, to ensure we don't GC while we're in a dangerous state. This seems to work and passes Jesse's test case, but I don't know if it is completely right. Terrence said the GC checking stuff is new, and not all GC points may be hooked up yet. I have no idea if, for instance, JS_SetReservedSlot actually can GC under some circumstances. So, remaining questions here are: 1. Does this actually fix the problem for real? 2. Are we actually in a safe for GC state after the SetPrivate? I can try adding an explicit GC there to see what happens. This is probably too risky to land in 16, given that we're on the last beta.
Nice! I just checked: JS_SetReservedSlot will never GC.
This patch is incomplete. The XPCWN's wrapper is also a cross-compartment edge until we update it later. Getting that wrapper involves all sorts of hideousness, so I don't think it is really possible to move it. Bill suggested nulling out the proto until we set it. Maybe I can do the same for the XPCWN's wrapper. I also hit the "not running GC" assertion while running XPConnect crash tests, but I haven't managed to reproduce that.
Summary: With gczeal, adoptNode trips CheckForCompartmentMismatches → Cross compartment pointers in ReparentWrapperIfFound
Blocks: 765416
Sigh. Can we just remove support for adoptNode? It's nothing but trouble.
I suggested that earlier in #content but I had no takers. bholley has some plan for greatly simplifying this code with the new DOM bindings.
I've come to understand that basically anything that takes a context argument can GC, so there's a bunch more cleanup to do. JS_CopyPropertiesFrom should probably be hoisted above the JS_CloneObject. XrayUtils::CloneExpandoChain probably needs to be after the JS_SetPrivate, BUT we still want to null out the expando chain before the JS_SetPrivate, so we don't get a cross compartment pointer (I'm assuming that the expando chain pointer is a JS pointer, but I should double check). I looked into the HasProto-SetProto stuff, but you can't really null out the proto, because of the write barrier in SetProto does a call on the proto, so you'll get a null-deref when you try to set it to a real value again, though that could be worked around. Aside from that, I still need to deal with the other fields in XPCWrappedNative::TraceInside... bholley, would it be totally hopeless to build up a new XPCWN from scratch, rather than dealing with this double reflector problem? Though maybe that would only solve some of our problems here...
If it helps, JS_SetPrivate will never GC.
(In reply to Andrew McCreight [:mccr8] from comment #13) > bholley, would it be totally hopeless to build up a new XPCWN from scratch, > rather than dealing with this double reflector problem? Though maybe that > would only solve some of our problems here... It still seems to me like we should be able to do the following: 1 - Ask at the top of the function "are we anywhere close to possibly needing a GC in these compartments? If so, let's just GC now". 2 - Forbid GCing, and crash safely if we OOM in the JS engine. The only place here where I think the user really has arbitrary control over the size of the allocations is with expandos (i.e. the JS_CopyPropertiesFrom). So maybe we could just copy those properties up front before doing any of the other stuff?
Attached patch mash up the code (obsolete) — Splinter Review
I don't see how this handles GetWrapper() or mFlatJSObject potentially causing cross-compartment edges, but it seems to at least pass XPConnect mochitests.
Attachment #663579 - Attachment is obsolete: true
Attached patch handle wrapper wrapper (obsolete) — Splinter Review
This handles the cross-compartment pointer from GetWrapper() by saving the wrapper to a local variable, then nulling it out on the XPCWN itself while in the double-wrapper state. I also hoisted the call to GetSameCompartmentSecurityWrapper before the double wrapper, because otherwise we'll end up regenerating it incorrectly. I think mFlatJSObject is the only remaining possible problem. I haven't seen it cause failures yet, but I'm not sure why.
Attachment #676419 - Attachment is obsolete: true
The latest version of the patch has a timeout and a crash I need to investigate. This seems a little scary to land right at the end of beta, but maybe it isn't too bad. It seems like it would be difficult to exploit in practice, as it requires hitting a GC right at a certain point in the code. Jesse's test case works because it relies on special functions to control the GC that are not exposed to web content. Maybe I am wrong though. I'm not sure how you really exploit this kind of problem, either, but these kinds of errors are not awesome.
Nothing ready for 17 yet means that at this stage in the beta cycle we're wontfixing this for 17 as well.
Attached patch try to check more (obsolete) — Splinter Review
Here's an attempt to make a lighter-weight check than the a full GC. The Check method is passed the compartment we expect the object to be in, and then checks all of the outgoing pointers to make sure they are in that compartment. As expected, it fails once we expect the object to be transitioned to the new compartment, because the proto is still in the old compartment. It only is seeing XPCWrappedNativeProto::mJSProtoObject right now, which isn't quite what I would expect. There's also the problem that it doesn't have any way to check the double reflector situation: we should fail earlier because there's a reflector in the new compartment that holds onto the XPCWN.
It's also worth looking at what's about to land in bug 820577, because it copy-pastes a bunch of wrapper reparenting junk. :-(
Attached patch check more (obsolete) — Splinter Review
This slightly updated version of the patch checks that the XPCWN is valid in the old compartment at the GC point before we go double reflector, that it is valid in both the old and new compartments at the GC points during double reflector, and that it is valid in the new compartment after the double reflector goes away. As expected, it fails immediately on the first double reflector check. Without the new compartment check during the double reflector phase, it fails right after it, because the proto hasn't been updated yet. The only outgoing JS edge at this point appears to be mJSProtoObject. In some sense, the simplest fix (which I think billm has suggested before) is to do SetProto(NULL). The problem is that I'll have to add at least one additional null check, because the trace code does not expect it. Hopefully it won't subtly break other things; I'm not sure what exactly the proto is used for in this section of the code, hopefully nothing.
Attachment #692480 - Attachment is obsolete: true
Attached patch check moreSplinter Review
Attachment #693459 - Attachment is obsolete: true
This patch eliminates the cross-compartment pointer by NULLing out the proto. With that in place, we don't trip the assertions I added, but unfortunately if you GC during the double reflector phase, you end up crashing on a NULL deref, because none of the code that does HasProto()/GetProto() expects it to return NULL. I've tried a few hack arounds, but nothing is particularly great. I'd like to avoid putting null checks everywhere just so we can handle GC during this tiny segment of code. On the other hand, it is a smaller change than my previous attempt that was stirring up the soup.
Attachment #677236 - Attachment is obsolete: true
I thought of another way we could fix the security problem here, at least until we have moving collection: we could explicitly root both protos on the stack. That would keep the proto from going away, even though it may not be traced through the one XPCWN.
Actually, I'm just going to go ahead and lower this to moderate. Maybe it is possible for mMaybeProto to have a dead pointer, but I don't think the code ever uses the proto during that part of the code, so that should prevent UAF. Of course, this is still a sketchy situation, but this whole method is sketchy.
I think this is moderate now, so it probably doesn't need to be tracked for beta, and at this point, I'm going to prioritize fixing other higher things for beta.
Setting it to sec-other because it seems like this isn't a security problem as is, it is just delicate code that could break so we should make it better.
Keywords: sec-moderatesec-other
We'll stop tracking this then. Uplift nominations welcome when someone does get around to this.
Assignee: continuation → nobody
Group: core-security → dom-core-security
Is this still an issue?
Flags: needinfo?(continuation)
I don't know. Somebody could try the test case.
Flags: needinfo?(continuation)
This doesn't reproduce for me on either Win10 or Ubuntu 17.04 with debug builds from as far back as a year ago (which is the furthest back mozregression can go). Not sure if it's worth trying to land the testcase or not, will leave that decision to others more knowledgeable about all of this.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: