cycle collector can be made to Unlink live objects (by using weak maps or watchpoints)

RESOLVED FIXED in Firefox 19

Status

()

defect
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: mccr8, Assigned: mccr8)

Tracking

({sec-high})

Trunk
mozilla21
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, firefox17 wontfix, firefox18 wontfix, firefox19+ fixed, firefox20+ fixed, firefox21 fixed, firefox-esr10 wontfix, firefox-esr1719+ fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

Details

(Whiteboard: [adv-main19+][adv-esr1703+])

Attachments

(4 attachments, 16 obsolete attachments)

3.43 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.87 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
8.60 KB, patch
mccr8
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
4.44 KB, patch
billm
: review+
Details | Diff | Splinter Review
Posted file test case (obsolete) —
As I reported earlier in Bug 667011, Igor figured out that my patch in Bug 653248 can cause the CC to incorrectly consider reachable objects to be garbage.  It turns out that this is actually also the case for trunk without that patch applied, too.  I discovered this after writing a test case for Bug 667011.

Anyways, so the result of this is that you can make the cycle collector call Unlink() on any object you can manipulate with Javascript, even if the object isn't actually dead.  In my particular test case, this is not a danger, because the Unlink just removes the child, which can be done anyways, but I'm not sure how safe this is in general.  Do you have any idea, Peter?

My test case is a mochitest chrome test, but the only high privilege thing I do is explicitly call the GC and CC.  You could probably just wait a minute for them to be called naturally.

Running the test case results in the following message:

mochitest-chrome failed:
6 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/js/src/xpconnect/tests/chrome/test_bug653248.xul | DOM element indirectly reachable via gray weak map entry was unlinked. - didn't expect null, but got it

val_elem gets Unlink() called on it, causing its child to be dropped, but we can still get to val_elem, because it is being held live by a JS wrapper, and is reachable by the weak map.  This weak map edge is invisible to the cycle collector because the map itself is black, and in the current code, the CC is not told about the connection from the key to the value.

In the previous bug, I thought this wouldn't be a problem on trunk, because the GC marks weak map entries black (which is what my patch was fixing).  But I wasn't thinking about the fact that the weak reference marking phase violates a very important GC invariant: Never Mark Black After Gray (we should enforce this more rigorously, or make it not a problem by making black overwrite gray).  If the gray marking gets to an object reachable from both black and grey before the black marking gets there, it will end up gray.  If an object is gray when it should be black, then the CC can end up freeing a reachable object.

In my example code, I create a cycle between a DOM element val_elem and the weak map entry value v, which means that the entry gets marked gray before weak map marking happens.  The CC then traces from val_elem and decides it is garbage.
I haven't test this on anything before the latest version, but I'm pretty sure that all versions of Firefox are vulnerable to this going back to the initial landing of WeakMaps in Firefox 6.
Bug 668855 should fix this, though I haven't actually tested it.

I'm having trouble thinking of a good simple fix that might be backportable, if we want to do that.  Changing black marking to overwrite gray is probably a pretty severe change.  Doing something like marking the values of all entries in all weak maps before the gray marking phase can cause leaks, even in pure JS weak maps.

A modified version of my example, in which the map is marked gray and the key is marked black, shouldn't be a problem, because the cycle collector will see the edge from the weak map to the value, and thus not unlink anything it shouldn't.  (I was thinking this would be problematic, but it should be okay.)
Seems similar/related to bug 684711 so giving the same rating, though with WeakMaps maybe this would be much more straightforward to exploit if it's indeed vulnerable.
Whiteboard: [sg:moderate]
This is easy to reliably trigger, but the actual consequence is probably not that bad.  You won't have the cycle collector walking over random memory.  Most of the time, Unlink() just nulls out some fields, so the worst consequence will probably be a null deref.  But there are probably at least a hundred different Unlink implementations in the code base, so it is possible one does something more dangerous.

I don't think that it is really a strongly enforced invariant in the codebase that an object is always unreachable after it is unlinked, so there may be other ways to trigger this.  With a certain warning mode on, the browser will produce errors when something is unlinked but not freed, and I've seen that with just regular browsing.
I think this can also be triggered using watchpoints.  My test case in Bug 693527 leaks because there's a cycle involving a watchpoint edge that the cycle collector doesn't see. But if you add a line like "d.loop = d;", then suddenly it stops leaking!  I haven't investigated it deeply, but I suspect this is the same basic problem as my test case in this bug: adding the loop makes the cycle collector think that the DOM object is part of a garbage cycle, so it unlinks it, which causes the whole cycle to fall apart, so there's no leak when there should be.

Note that this happened on top of my patches for bug 668855, so it looks like we'll need bug 693527, too, to be rid of this.  I should also audit the other weak references in the Debugger to make sure they don't also cause this...
Summary: cycle collector can be made to Unlink live objects (by using weak maps) → cycle collector can be made to Unlink live objects (by using weak maps or watchpoints)
I was talking with Bill today about weak maps, watch points, and unmark gray and it got me to thinking about a possible problem with my patch in Bug 668855 (well, not a problem with my patch per se, just an insufficiency).  If we have a black weak map, and a gray key, then when the GC runs it will make the value gray.  If the DOM side grabs the key, it will be turned black with unmark gray.  But the key doesn't know it is a key, so the value won't be ungrayed.  The DOM side can then give the key to JS. JS can then look up the key in the map, to get access to the still-gray value.

This would be another way to create black-gray edges, which can lead to Unlinking live objects.  There is going to be a similar problem with watchpoints and debugger foo, I would imagine.

The best way I've thought of to avoid this is to do unmarkGray in the weak map lookup function: if the map and key are both non-gray, and the value is gray, ungray it.  Kind of gross, as it adds ungraying logic to the JS engine, whereas right now it is in XPConnect.

I'm putting this in the closed bug in part because it is related and in part because I'm not sure if it affects trunk or not.  But I don't think it is anything worse than what I've already found, just another hole.
I was thinking that my weak map patch would fix this, because when you ungray a map, it will ungray all of the values in the map.  But if the map is initially black, then the map will never be ungrayed.  Oops.

Also, the fix is probably to just ungray any value you return when you look things up in a weak map/watchpoint/etc.  If something can get at a weak map entry, it is probably time to ungray it.  I'll talk to Bill today about how best to move ungraying into the JS engine.  (It probably makes sense to make the ungraying tracer not visit weak map values, once this change happens, which will be easy to do.)
I totally forgot about comment 6 and 7 here.  I guess I should think about that some more.
Depends on: 718029
Steve, here's another example of how we can end up with black-gray edges (weak edges).  I tend to file these as security bugs because they can make the CC do weird things.  Probably not a big deal, but it is hard to know...
Andrew: is this bug now fixed? You referenced bug 668855 as a fix, and then mentioned "an insufficiency" and filed bug 718129, also now fixed. Is there more here?
Depends on: 668855
Uh, we should fix this.  There are a lot of classes that don't react well to multiple or untimely Unlinking.
Clearing rating so we can re-evaluate this one.
Keywords: sec-moderate
Whiteboard: [sg:moderate]
This isn't fixed.

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #11)
> Uh, we should fix this.  There are a lot of classes that don't react well to
> multiple or untimely Unlinking.

Double unlinking can always happen for anything that is wrappable when we CC twice in a row, so Unlink functions have to be hardened against it anyways.  This is worse, in that the Unlinked object is still live.  In general, this shouldn't matter much, but it could be nasty somewhere.
(In reply to Andrew McCreight [:mccr8] from comment #4)
> With a certain warning mode on, the
> browser will produce errors when something is unlinked but not freed, and
> I've seen that with just regular browsing.

Just curious, do you still see this with regular browsing?
(In reply to David Bolter [:davidb] (I'm back!) from comment #14)
> Just curious, do you still see this with regular browsing?
I think I removed that warning.
Marking sec-moderate during sec triage.
Keywords: sec-moderate
Attachment #563899 - Attachment mime type: application/vnd.mozilla.xul+xml → text/plain
This is worse than I thought. Classes that are wrapper cached (with preserved wrappers) and have their own separate JS object pointers will unroot themselves during Unlinking, but they still have JS pointers. This means that after we do the Unlink and a GC, we'll have a dangling pointer in the object. If the native is still alive, through the technique described in this bug, then we can have a UAF.

I haven't looked at this bug in a long time, so I need to refresh myself.
Assignee: nobody → continuation
Keywords: sec-moderatesec-high
(This has affected us since at least Firefox 7, but I'm not going to bother setting the old flags.)
The initial problem reported here arose from the CC not understanding weak maps, but that has been fixed.

However, there's a similar problem with UnmarkGray (see comment 6 and comment 7). If a weak map is black, and it has an entry that is gray, then if we look up the key and hand out the value, I believe the value will not get marked black, though the key will be. This creates a black-gray edge.

The best solution is probably to add a read barrier on weak map lookup that does UnmarkGray on the value. This requires moving UnmarkGray into the JS engine. I'll check how bad that is.
Bill said it makes sense to just leave UnmarkGray in XPConnect, and add a callback, so here is that.
Attachment #563899 - Attachment is obsolete: true
Still to come: need a similar read barrier on watchpoints, also need to scan all watchpoints and weak map entries before we start the CC to unmark gray any values with black keys and maps. The latter will probably require some more XPConnect plumbing. Yay.
Attachment #681315 - Attachment description: add unmarkgray callback → part 1: add unmarkgray callback
Attachment #681316 - Attachment is obsolete: true
There's some weirdness where the key can get marked when we're currently executing the closure, but I'm assuming in that case it counts as being reachable from JS, and thus it will always end up being marked black, so we don't need to worry about it.

I still need to write a fourth part that scans all weak references, and does an unmark gray when we find violations of the black-gray invariant. The existing interface fortunately looks like it will be okay.

The fun part here is that we have to do it way, way early in the CC. Not only before the CC sees any JS object, but before we check whether we've had an overflow while doing unmark gray: one of the unmark grays we trigger to do the fixups at the start of the CC could blow its stack, requiring a GC we wouldn't otherwise need.

I should also try to write a test, which will be tricky.
This version does all of the incremental GC barrier stuff on the XPConnect side, per billm's advice.
Attachment #681315 - Attachment is obsolete: true
Attachment #682213 - Attachment is obsolete: true
The final piece of the puzzle is that we have to ensure that the weak mappings the CC sees don't violate the black-gray invariant, if the keys were unmarkGrayed but the mappings were never looked up.

To do this, we add a new phase that ensures that any of the edges implied by weak mappings (from key+map to value and from delegate to key) aren't black-gray. This phase performs UnmarkGray when violations are detected, which implies a few things:
a) It must be done before we begin constructing the CC graph.
b) An unmarkGray may fail due to overflow, so we must do it before we check NeedCollect()
c) It must be done iteratively, as doing UnmarkGray on one value may result in a gray key becoming black.

To this end, I renamed nsCycleCollector::GCIfNeeded to nsCycleCollector::FixGrayBits, as it is really a generic place for fixups of gray bits. If we're not forcing a GC, but before we check if we need to GC from NeedCollect() we call into a new function in XPConnect that fixes the bits. This function is a slightly simplified version of TraceWeakMapping that checks the edges, and marks the gray target if the source is black.
Attachment #682661 - Attachment is patch: true
Attachment #682657 - Attachment is obsolete: true
Hmm. Hopefully the closure of a watchpoint can't be NULL or we're going to have problems.
Apparently it can be NULL.

I was concerned we'd need some weird read barrier for the wrappee-wrapper edge, but billm convinced me it is okay, because there's no actual edge to follow there.
Attachment #682661 - Attachment is obsolete: true
My patches here leave add a hook for the JS engine to call back into XPConnect to call UnmarkGray, but Bill needed to move UnmarkGray to the JS engine anyways to fix an IGC bug, so I'll rebase this patch on top of that.
Depends on: 817218
I'll have to backport bits of bug 815999 for ExposeGCThingToActiveJS and some other utilities if I backport the new version of these patches.
Depends on: 815999
Attachment #682655 - Attachment is obsolete: true
Attachment #682658 - Attachment is obsolete: true
Attachment #683694 - Attachment is obsolete: true
Attachment #684236 - Attachment is obsolete: true
Comment on attachment 688445 [details] [diff] [review]
part 1: unmark gray read barrier for weak maps

These first two patches are a lot nicer now that you've built out the JS-side unmark gray infrastructure a bit.
Attachment #688445 - Flags: review?(wmccloskey)
Attachment #688446 - Flags: review?(wmccloskey)
Comment on attachment 688445 [details] [diff] [review]
part 1: unmark gray read barrier for weak maps

Review of attachment 688445 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/public/HeapAPI.h
@@ +94,5 @@
> +ExposeValueToActiveJS(const Value &v)
> +{
> +    if (v.isMarkable()) {
> +        void *vc = v.toGCThing();
> +        ExposeGCThingToActiveJS(vc, js::GCThingTraceKind(vc));

Instead of js::GCThingTraceKind, please use v.gcKind().

::: js/src/jsweakmap.cpp
@@ +186,5 @@
>          return false;
>  
>      if (ObjectValueMap *map = GetObjectMap(&args.thisv().toObject())) {
>          if (ObjectValueMap::Ptr ptr = map->lookup(key)) {
> +            ExposeValueToActiveJS(ptr->value.get());

Please put a comment here explaining this.
Attachment #688445 - Flags: review?(wmccloskey) → review+
Comment on attachment 688446 [details] [diff] [review]
part 2: unmark gray read barrier for watchpoints

Review of attachment 688446 [details] [diff] [review]:
-----------------------------------------------------------------

Both of these calls should have comments too. Maybe just refer people to the one in jsweakmap.cpp?
Attachment #688446 - Flags: review?(wmccloskey) → review+
I expanded the comment on UnmarkGrayChildren to discuss these implicit edges, and fixed up the definition of ExposeValueToActiveJS. I'll add a similar explanation to the two read barriers on watchpoints as the one I added here for weak maps.
Attachment #688445 - Attachment is obsolete: true
Attachment #689257 - Flags: review?(wmccloskey)
Comment on attachment 689257 [details] [diff] [review]
part 1: unmark gray read barrier for weak maps

Review of attachment 689257 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsweakmap.cpp
@@ +187,5 @@
>  
>      if (ObjectValueMap *map = GetObjectMap(&args.thisv().toObject())) {
>          if (ObjectValueMap::Ptr ptr = map->lookup(key)) {
> +
> +            // Read barrier to prevent an incorrectly gray value from escaping

No blank line here.

@@ +188,5 @@
>      if (ObjectValueMap *map = GetObjectMap(&args.thisv().toObject())) {
>          if (ObjectValueMap::Ptr ptr = map->lookup(key)) {
> +
> +            // Read barrier to prevent an incorrectly gray value from escaping
> +            // the weak map. See the comment before UnmarkGrayChildren.

...in gc/Marking.cpp.
Attachment #689257 - Flags: review?(wmccloskey) → review+
Addressed review comments. Carrying forward billm's r+.
Attachment #688446 - Attachment is obsolete: true
Attachment #688453 - Attachment is obsolete: true
Attachment #689257 - Attachment is obsolete: true
Attachment #692378 - Flags: review+
Addressed review comments. Carrying forward billm's r+.
Attachment #692379 - Flags: review+
This patch has two major parts. First, it mirrors the logic of TraceWeakMapping to fix all implicit black-gray edges in FixWeakMappingGrayBitsTracer.

For the supporting part in the CC itself, I rename GCIfNeeded to FixGrayBits, as this is really the general purpose of this function. Then I add a call to the gray bits fixer I just implemented.
Attachment #692385 - Flags: review?(wmccloskey)
Comment on attachment 692385 [details] [diff] [review]
part 3: fix implicit black-gray edges at start of CC

Review of attachment 692385 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/nsXPConnect.cpp
@@ +488,5 @@
> +            js::TraceWeakMaps(this);
> +        } while (mAnyMarked);
> +    }
> +
> +  private:

I don't know anything about Gecko style, but I think they usually don't indent these.

@@ +502,5 @@
> +        FixWeakMappingGrayBitsTracer *tracer = static_cast<FixWeakMappingGrayBitsTracer*>(trc);
> +
> +        // If nothing that could be held alive by this entry is marked gray, return.
> +        if (!k || !xpc_IsGrayGCThing(k)) {
> +            if (!v || !xpc_IsGrayGCThing(v) || vkind == JSTRACE_STRING)

I think this would be clearer:

bool delegateMightNeedMarking = k && xpc_IsGrayGCThing(k);
bool valueMightNeedMarking = v && xpc_IsGrayGCThing(v) && vkind != JSTRACE_STRING;
if (!delegateMightNeedMarking && !valueMightNeedMarking)
    return;

@@ +512,5 @@
> +
> +        if (k && kkind == JSTRACE_OBJECT) {
> +            JSObject *kdelegate = js::GetWeakmapKeyDelegate((JSObject *)k);
> +            if (kdelegate && !xpc_IsGrayGCThing(kdelegate) &&
> +                k && xpc_IsGrayGCThing(k))

You can remove the redundant |k| test and move the |xpc_IsGrayGCThing| test to the preceding if statement.
Attachment #692385 - Flags: review?(wmccloskey) → review+
Good suggestions. I addressed the review comments and also CSE'd delegateMightNeedMarking in the later test. carrying forward Bill's r+.

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Probably not too easily. You'd need to find an Unlink function that is buggy, and Olli recently audited these for the most likely cause of trouble.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? The comments talk about how this is an unmarkgray problem, but that's obvious enough already from the code itself. There are public bugs about unmarkgray problems, but I don't know how widely known they are.

Which older supported branches are affected by this flaw? Probably everything back to the initial shipping of WeakMaps in Firefox 7.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? I don't think it will be that bad.

How likely is this patch to cause regressions; how much testing does it need? This code is kind of wonky, but it should be okay. This isn't a super heavily used feature. Though we do use it in the browser itself now.
Attachment #692385 - Attachment is obsolete: true
Attachment #693633 - Flags: sec-approval?
Attachment #693633 - Flags: review+
Adding Release Management so we can have a discussion on when we want to sec-approval+ this.
(sec-approval is for all three patches, I just only marked the third...)
Comment on attachment 693633 [details] [diff] [review]
part 3: fix implicit black-gray edges at start of CC

sec-approval+ but please check in on or after January 8 to catch the next cycle. Also, please nominate for branches as well.
Attachment #693633 - Flags: sec-approval? → sec-approval+
We're now past the merge, and ready to take a fix on branches.
Please nominate for uplift when ready.
Depends on: 829430
Comment on attachment 693633 [details] [diff] [review]
part 3: fix implicit black-gray edges at start of CC

This approval is for all three patches in this bug, plus the patch in bug 829430, which only fixes a single assertion I added here so I'm not going to put it up separately.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): something in 2011
User impact if declined: possible security problems
Testing completed: it has been on m-c for a week
Risk to taking this patch (and alternatives if risky): There's some risk, because this feature isn't that heavily tested, but it also isn't used that widely, so problems with it won't matter much.  We've had this problem long enough that not landing on beta probably isn't the end of the world.
String or UUID changes made by this patch: none
Attachment #693633 - Flags: approval-mozilla-esr17?
Attachment #693633 - Flags: approval-mozilla-beta?
Attachment #693633 - Flags: approval-mozilla-b2g18?
Attachment #693633 - Flags: approval-mozilla-aurora?
Comment on attachment 693633 [details] [diff] [review]
part 3: fix implicit black-gray edges at start of CC

Approved for all branches, we're early enough in beta to have time to watch for any fallout.
Attachment #693633 - Flags: approval-mozilla-esr17?
Attachment #693633 - Flags: approval-mozilla-esr17+
Attachment #693633 - Flags: approval-mozilla-beta?
Attachment #693633 - Flags: approval-mozilla-beta+
Attachment #693633 - Flags: approval-mozilla-b2g18?
Attachment #693633 - Flags: approval-mozilla-b2g18+
Attachment #693633 - Flags: approval-mozilla-aurora?
Attachment #693633 - Flags: approval-mozilla-aurora+
Urk, I forgot that this is going to require backporting Bill's patch to move unmark gray stuff into the JS engine. I'll think about how to best go about that.
Andrew, comment 0 implies that this issue was surfaced and is verified by an existing mochitest chrome test. Is that correct?
Matt: no, I wrote a new test to demonstrate the issue, which was attached to this bug.  A patch I landed after that, in 11/2011, made it so that test started to pass, but didn't fix the underlying issue.  I think.  So there's no test, unfortunately.
Thanks Andrew. 

So, in the absence of a test, is there a way for someone to verify the work you've done?
No, sorry.  This is all very theoretical. ;)
I'm requesting review for this because it shuffles code around a bit.  This is needed for branches before the Great UnmarkGray Inlining in Fx 20.  IsIncrementalBarrierNeededOnGCThing isn't inline, because jsgc.h doesn't include jscompartment.h

I didn't change the comment or the jsweakmap part, so you can ignore those.
Attachment #707261 - Flags: review?(wmccloskey)
(part 2 and 3 didn't need to change in any important way)
Attachment #707261 - Flags: review?(wmccloskey) → review+
I'm going to wait until I get approval to land bug 812380 on b2g18 before I land this there, as this patch mirrors the logic there and I want to keep them in sync.
Depends on: 812380
Whiteboard: [adv-main19+][adv-esr1703+]
This happened very close to a b2g18 branch point - we should double check that this made it into v1.0.1, and uplift if need be.
blocking-b2g: --- → tef+
I poked around in the 1.0.1 hg repo, and it looks like it is there.
Group: core-security
You need to log in before you can comment on or make changes to this bug.