Closed Bug 730208 Opened 8 years ago Closed 8 years ago

Hunt down black -> gray edges in the JS heap

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla15
Tracking Status
firefox15 --- fixed
firefox-esr10 --- wontfix

People

(Reporter: sfink, Assigned: sfink)

References

Details

(Keywords: sec-moderate, Whiteboard: [sg:moderate][advisory-tracking+])

Attachments

(7 files, 11 obsolete files)

5.21 KB, patch
Details | Diff | Splinter Review
13.77 KB, patch
billm
: review+
Details | Diff | Splinter Review
950 bytes, patch
smaug
: review+
Details | Diff | Splinter Review
12.29 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.40 KB, patch
billm
: review+
Details | Diff | Splinter Review
10.53 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
2.08 KB, patch
billm
: review+
Details | Diff | Splinter Review
billm, was there a bug for this? I can't remember one, so I'm making one to attach my current WIP.
This is billm's original patch that tries to find places where C++ code passes gray JS objects back into JSAPI without unmarking them. I added some additional (and excessive) checks while trying to track some of these to their sources.
Attached patch Various UnmarkGray fixes (obsolete) — Splinter Review
Can you take a quick look at this set? This seems to catch most of them, but they may be in the wrong places and/or excessive.

I'm not really asking for review yet, more of a sanity check, and I'll split up the patch before landing so I can assign appropriate reviewers for different areas.

I wonder if I should make XPConnect accessors like xpc_GetGlobalObject(JSContext *)?
Attachment #600318 - Flags: review?(wmccloskey)
I have a couple of these bugs filed I should CC you on.
Comment on attachment 600318 [details] [diff] [review]
Various UnmarkGray fixes

Review of attachment 600318 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/nsXPConnect.cpp
@@ +735,5 @@
>          xpc->EnsureGCBeforeCC();
>          return;
>      }
>  
> +    if (!xpc_IsGrayGCThing(thing))

I wouldn't make this change lightly.  It could cause stack overflows in the CC, which won't cause crashes, but will make the CC invoke the GC, which will make it take longer.  I don't know how common that is in practice.
Olli might want to take a look at this too, as it is mostly DOM code.  Or another DOM person.
Comment on attachment 600318 [details] [diff] [review]
Various UnmarkGray fixes


>+  xpc_UnmarkGrayObject(JS_GetGlobalForScopeChain(mContext));
This looks like a missing or buggy JSAPI


>+static void
>+UnmarkGrayValue(JS::Value &v)
>+{
>+  if (!JSVAL_IS_PRIMITIVE(v))
>+    xpc_UnmarkGrayObject(JSVAL_TO_OBJECT(v));
>+}
Could you move this to xpcpublic.h


>@@ -1942,16 +1968,18 @@ nsJSContext::CallEventHandler(nsISupport
> 
>     // Use |target| as the scope for wrapping the arguments, since aScope is
>     // the safe scope in many cases, which isn't very useful.  Wrapping aTarget
>     // was OK because those typically have PreCreate methods that give them the
>     // right scope anyway, and we want to make sure that the arguments end up
>     // in the same scope as aTarget.
>     rv = ConvertSupportsTojsvals(aargv, target, &argc, &argv, tempStorage);
>     NS_ENSURE_SUCCESS(rv, rv);
>+    for (uint32_t i = 0; i < argc; i++)
>+      UnmarkGrayValue(argv[i]);
Coding style is 
for (;;) {
  stmt;
}



>       nsCOMPtr<nsIXPConnectJSObjectHolder> wrapper;
>+      JSObject *global = xpc_UnmarkGrayObject(::JS_GetGlobalObject(cx));
Again, why is JSAPI giving us a gray object.
If it does that, it is very error prone API

>+    JSObject*
>+    GetPrototypeJSObject() const
>+        {if (mPrototypeJSObject != INVALID_OBJECT)
>+             xpc_UnmarkGrayObject(mPrototypeJSObject);
>+         return mPrototypeJSObject;}
Ok, this is xpconnect so horrible coding style is expected :/


r- because of JSAPI problems. Either fix the existing JSAPI methods or add something new for
preservecolor, IMO. (Not that I know much about JSAPI internals)
Attachment #600318 - Flags: review-
Though, perhaps I'm wrong, because this is about CC <-> GC setup.
Don't take this patch too seriously yet; I fully expect some of these to be in absolutely the wrong place.

In general, though, the JSAPI absolutely doesn't mind handing out gray objects. Other than the more incestuous aspects of the CC <-> GC setup, the JSAPI doesn't pay any attention to the gray mark bits.

But it's still a good point. I suspect the "proper" fix would be to add XPConnect accessor APIs for all/most of these.

The nastiest pieces are probably the

  xpc_UnmarkGrayObject(::JS_GetGlobalObject(cx))

and

  xpc_UnmarkGrayObject(::JS_GetGlobalForScopeChain(mContext))

ones, because you can't hide those with an accessor -- you're not even really aware that you might be accessing them. (The danger is that you compile a script using a context whose global is gray, and that entrains the global or its gray properties.) Perhaps an xpc_PrepareContext or somesuch? (I'm not really familiar with any of this code, so I don't know what is "standard".)

I just don't want to wrap the whole JSAPI with an xpc slime layer.
(In reply to Andrew McCreight [:mccr8] from comment #4)
> Comment on attachment 600318 [details] [diff] [review]
> Various UnmarkGray fixes
> 
> Review of attachment 600318 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/xpconnect/src/nsXPConnect.cpp
> @@ +735,5 @@
> >          xpc->EnsureGCBeforeCC();
> >          return;
> >      }
> >  
> > +    if (!xpc_IsGrayGCThing(thing))
> 
> I wouldn't make this change lightly.  It could cause stack overflows in the
> CC, which won't cause crashes, but will make the CC invoke the GC, which
> will make it take longer.  I don't know how common that is in practice.

That's actually an independent fix. billm explained it to me and it made sense at the time, but it fell out of my brain.

And I think you're talking about

-    // If this thing is not a CC-kind or already non-gray then we're done.
-    if (!AddToCCKind(kind) || !xpc_IsGrayGCThing(thing))
+    if (!xpc_IsGrayGCThing(thing))
         return;

right? (Must be splinter being silly again.)
(In reply to Steve Fink [:sfink] from comment #9)
> And I think you're talking about
> 
> -    // If this thing is not a CC-kind or already non-gray then we're done.
> -    if (!AddToCCKind(kind) || !xpc_IsGrayGCThing(thing))
> +    if (!xpc_IsGrayGCThing(thing))
>          return;
> 
> right? (Must be splinter being silly again.)

Yes, that.  It might be my fault, I was only sort of paying attention when I decided where to click.
Marking this s-s just in case.
Group: core-security
Whiteboard: [sg:moderate]
It seems like dealing with global objects is the main problem here. Everything else looks basically fine. Steve, in some cases you might want to use JSVAL_IS_TRACEABLE and JSVAL_TO_TRACEABLE when you don't know something is a JSObject*.

Olli, this isn't really a problem with JSAPI. The JS engine isn't even supposed to know what gray means; clients are supposed to be able to use the extra mark colors however they wish.

Normally, the JS engine would mark JSContext::globalObject black for every JS context. However, XPConnect sets a special flag on all contexts specifying that their globalObject field should not be marked. Then XPConnect marks all the globalObject fields gray itself. The JS engine has no knowledge of this. In theory, XPConnect should ensure that the globalObject is marked black before you actually use a context. But it totally fails to do this.

Luke told me that the globalObject field will be removed from JSContext in the big compartment-per-global patch. This means that nsJSContext will be responsible for keeping track of the global object. Anyone else holding a JSContext will have to do their own thing.

Does anyone know why we mark cx->globalObject gray anyway? I would be surprised if we ever expect to collect the globalObject of a JSContext attached to an nsJSContext. So is it for other JSContexts? Waldo mentioned web workers to me as a possible reason.

If this is true, then once c-p-g lands, it seems feasible that the globalObject connected to an nsJSContext could be marked black. And if any other contexts want to mark their global objects gray, then they would be responsible for calling UnmarkGray before ever using the context.

Olli, Ben, does this seem reasonable? I have to admit that I don't understand these issues very well.
That sounds about right with respect to cx->globalObject, but I saw several cases where it was JS_GetGlobalForScopeChain that was the problematic gray object. I guess I didn't check to see whether it was ever different from cx->globalObject in those cases. Let's see... when would that arise? You'd probably have to have

  C++ ->
    JS ->
      boundary crossing, causing a dummy frame to be pushed ->
        back into C++ ->
          JS that entrains its scope somehow

Or maybe that's impossible? The "back into C++" layer has to be something that goes back to nsJSEnvironment or nsXBLDocument or something, not just any C++, and perhaps you'd get a JS_SaveFrameChain in that case?

(In reply to Bill McCloskey (:billm) from comment #12)
> no knowledge of this. In theory, XPConnect should ensure that the
> globalObject is marked black before you actually use a context. But it
> totally fails to do this.

I think there are some cases where XPConnect is skipped, and the code directly uses the JSAPI. Oh! They probably don't store their JSContexts directly. Maybe ContextStack::Peek is a good interception point.

billm, one thing I don't understand about all this -- couldn't this happen:

1. GC runs, marks stuff black
2. XPConnect marks things gray
3. nsJSEnvironment marks an object black, then accesses it
4. During the access operation, another GC runs and marks stuff black
5. XPConnect marks things gray, including that same object
6. The access operation continues, and points a black object at the again-gray object
(In reply to Steve Fink [:sfink] from comment #13)
> That sounds about right with respect to cx->globalObject, but I saw several
> cases where it was JS_GetGlobalForScopeChain that was the problematic gray
> object. I guess I didn't check to see whether it was ever different from
> cx->globalObject in those cases. Let's see... when would that arise? You'd
> probably have to have
> 
>   C++ ->
>     JS ->
>       boundary crossing, causing a dummy frame to be pushed ->
>         back into C++ ->
>           JS that entrains its scope somehow
> 
> Or maybe that's impossible? The "back into C++" layer has to be something
> that goes back to nsJSEnvironment or nsXBLDocument or something, not just
> any C++, and perhaps you'd get a JS_SaveFrameChain in that case?

I didn't understand the last part. However, I think in the future we would want to maintain the invariant that anything on the stack should be black. And I think we can do that by doing UnmarkGray when we push. Right now we can't do that, because cx->globalObject always serves as the backstop for JS_GetGlobalForScopeChain.

However, it would be interesting to make sure that all stack pushes maintain this invariant. We should be able to assert that now.

> (In reply to Bill McCloskey (:billm) from comment #12)
> > no knowledge of this. In theory, XPConnect should ensure that the
> > globalObject is marked black before you actually use a context. But it
> > totally fails to do this.
> 
> I think there are some cases where XPConnect is skipped, and the code
> directly uses the JSAPI. Oh! They probably don't store their JSContexts
> directly. Maybe ContextStack::Peek is a good interception point.
> 
> billm, one thing I don't understand about all this -- couldn't this happen:
> 
> 1. GC runs, marks stuff black
> 2. XPConnect marks things gray
> 3. nsJSEnvironment marks an object black, then accesses it
> 4. During the access operation, another GC runs and marks stuff black
> 5. XPConnect marks things gray, including that same object
> 6. The access operation continues, and points a black object at the
> again-gray object

I guess the idea is that when the GC runs the second time, it's not supposed to mark the object gray. It's supposed to realize that the object is currently being used, and so it should mark it black. I don't know if that's actually happening right now, though.

Luckily, outside the JS engine, there aren't as many places where GC is going to run. It's really only going to happen during JSAPI calls. And once you pass something into JSAPI, it should get rooted in a way that it will be marked black if any GCs happen there.
(In reply to Bill McCloskey (:billm) from comment #12)
> Normally, the JS engine would mark JSContext::globalObject black for every
> JS context. However, XPConnect sets a special flag on all contexts
> specifying that their globalObject field should not be marked. Then
> XPConnect marks all the globalObject fields gray itself. The JS engine has
> no knowledge of this.

It seems to me that the root of the problem is that we unmark gray when someone accesses an object. Ideally we should unmark gray when we add an edge from a black object to a gray object. I have no idea if it's possible to do that, but it would solve the problem with globals and it seems more correct to me. It would also avoid chasing down random code that hands out a gray object.
(In reply to Bill McCloskey (:billm) from comment #14)
> (In reply to Steve Fink [:sfink] from comment #13)
> > billm, one thing I don't understand about all this -- couldn't this happen:
> > 
> > 1. GC runs, marks stuff black
> > 2. XPConnect marks things gray
> > 3. nsJSEnvironment marks an object black, then accesses it
> > 4. During the access operation, another GC runs and marks stuff black
> > 5. XPConnect marks things gray, including that same object
> > 6. The access operation continues, and points a black object at the
> > again-gray object
> 
> I guess the idea is that when the GC runs the second time, it's not supposed
> to mark the object gray. It's supposed to realize that the object is
> currently being used, and so it should mark it black. I don't know if that's
> actually happening right now, though.

Oh, right, sorry. Step 5 wouldn't happen, because it would get marked black in step 4.

(In reply to Peter Van der Beken [:peterv] from comment #15)
> It seems to me that the root of the problem is that we unmark gray when
> someone accesses an object. Ideally we should unmark gray when we add an
> edge from a black object to a gray object. I have no idea if it's possible
> to do that, but it would solve the problem with globals and it seems more
> correct to me. It would also avoid chasing down random code that hands out a
> gray object.

Makes sense. I would think that this would be possible now with the barriers, but the GC is currently unaware of the gray mark bits and their semantics. To keep that separation, we'd need a callback to be invoked on a given pattern of color changes or something. And I have no idea how much you can do from a barrier without needing to worry about interactions with other stuff.
(In reply to Peter Van der Beken [:peterv] from comment #15)
> It seems to me that the root of the problem is that we unmark gray when
> someone accesses an object. Ideally we should unmark gray when we add an
> edge from a black object to a gray object. I have no idea if it's possible
> to do that, but it would solve the problem with globals and it seems more
> correct to me. It would also avoid chasing down random code that hands out a
> gray object.

If we allow arbitrary objects to be gray, then that doesn't seem possible. We'd have to instrument every SETPROP operation to check if the object we're assigning is gray. SETPROP is really hot, and checking the mark color isn't all that fast.

I talked to Johnny about this yesterday, and it sounds like maybe we could take a two-pronged approach:

1. If an nsJSContext is tied to a window that's actually being displayed, then we can mark its global black. This would also speed up the cycle collector, and it ties in well with bug 717500.

2. For other contexts, maybe we could find some convenient bottlenecks for when their contexts are used. Once example Johnny mentioned was that nsJSEventListener can keep a context alive even after window is closed. We could add UnmarkGray to its GetEventContext method. Hopefully we could do similar things in other places where nsJSContext is kept alive.

Peter, does that sound at all feasible?
Ok, here's a nasty one:

 0  linux-gate.so + 0x424
 1  firefox-bin!MOZ_Crash [Assertions.cpp : 79 + 0x6]
 2  firefox-bin!MOZ_Assert [Assertions.cpp : 88 + 0x4]
 3  libxul.so!js::ContextStack::pushDummyFrame [Stack.cpp : 843 + 0x1f]
 4  libxul.so!js::AutoCompartment::enter [jswrapper.cpp : 474 + 0x19]
 5  libxul.so!JS_TransplantObject [jsapi.cpp : 1580 + 0x4]
 6  libxul.so!nsGlobalWindow::SetNewDocument [nsGlobalWindow.cpp : 2087 + 0x1a]

I've already added UnmarkGray to all arguments to JS_TransplantObject.

What is happening, assuming I am interpreting things correctly: you are calling JS_TransplantObject(orig, target). That means you want to eat orig's brain with a spoon, then replace it with target. That means you also need to re-point any wrapper in any other compartment to the target object.

During that process, you find a wrapper[orig] in some random compartment. You create a new wrapper[target] in that same compartment, do a brain transplant with the original wrapper[orig], and replace wrapper[orig] in the crossCompartmentWrapperMap with wrapper[target].

Except that wrapper[orig]'s global object is gray, so before you get a chance to create wrapper[target] in this random compartment, you hit the assert.

So the question is, is the assert valid in this case? If the assert was not hit, the new wrapper would point to the same gray global, and that wrapper would point to the black target. So if the global can reach the wrapper, we have a problem. And I don't know of an existing way to unmark gray for an object and all wrappers that point to the object. Obviously, I could add one, but it seems pretty crazy. Though it's not surprising that object transplantation is a special case.

Does my analysis sound correct?
(In reply to Steve Fink [:sfink] from comment #18)
> Does my analysis sound correct?

Answer: no, it does not.

Well, I still think it's fine up until "if the global can reach the wrapper, we have a problem". That's not a problem; that's just a gray->black edge (or perhaps gray->white->black, but whatever.)

Assume the global is gray. Then before the transplant operation, the wrapper could not have been black, because the wrapper points to its global (via parent links) and we (theoretically) don't have black->gray edges. The transplant operation creates a new wrapper, which is white and therefore contains a scary but temporary white->gray edge, then it transplants the new wrapper's guts into the old wrapper and sanity is restored -- well, other than having a white zombie wrapper lying around with a pointer to a gray global. But that's actually a good thing, because it serves as a cool band name. Except you'd probably want to spell it White Zombie Rappers.

Anyway, nothing should be able to reach that white zombie wrapper, so it seems like the assertion is just noise. It does technically violate the invariant, though, so it would be nice if we could explicitly erase (exorcise?) the white zombie wrapper.

I'm about to post a patch that fixes the assertion by brute force: it iterates through all compartments' wrapper tables to find all wrappers[orig], and UnmarkGray's them. But now I'm thinking that maybe that's just extra work to shut up an unnecessary assertion, especially since it (incorrectly) ends up unmarking a global which is still unreachable from black and so ought to be gray.

I suppose I ought to at least make it DEBUG-only or something. But exorcism seems cleaner. I'll wait for an r-.
Version: unspecified → Trunk
Attachment #605871 - Flags: review?(wmccloskey)
A common source of potential black -> gray edges is JSAPI calls made on objects with gray globals or contexts holding gray globals. (The call could potentially update a black object with a pointer to that global.) This patch mostly traps places where contexts are used, and unmarks their globals. It also includes some more global unmarking.
Attachment #605872 - Flags: review?(wmccloskey)
Nasty case. JS_TransplantObject swaps the brains of two JSObjects to maintain pointer identity. Unfortunately, that means updating all cross-compartment wrappers as well, and if some random compartment contains a wrapper for the target object, then a global from that compartment will be used to construct a new wrapper (which will then be transplanted into the old wrapper.) If that global is gray, then we end up with a temporary new wrapper (marked white), which points to the gray global, which will trigger an assertion failure even though it isn't a real problem.
Attachment #605874 - Flags: review?(wmccloskey)
I wonder if this will fix a problem I'm seeing in the CC, where we're needlessly visiting a gray JS Object (ChromeWindow) (and the stuff it holds onto) that is held alive by a currently displayed window.
Attachment #600318 - Attachment is obsolete: true
Attachment #600318 - Flags: review?(wmccloskey)
Oops, sorry, I should've split up that first patch because it touches some random files in the tree well away from JS/xpconnect.
Also note: this patch series is still not a complete solution. My current try push has 6 greens out of 10 total tests with the assertion patch applied. (I think it's 0/10 or 1/10 without these patches.)
(In reply to Steve Fink [:sfink] from comment #25)
> Oops, sorry, I should've split up that first patch because it touches some
> random files in the tree well away from JS/xpconnect.

Yeah, I don't really feel comfortable reviewing the browser changes. Maybe ask peterv or smaug or bent? My main fear is that we'll suddenly be spending a lot of time in UnmarkGray when we start calling it on all these globals.
What holds these globals? I've been thinking for CC speed reasons we may want to mark black all globals held by live windows, or some such thing.

The thing to remember is that the size of the CC graph is an upper bound on the number of gray objects, which is an upper bound on how many objects we have to ungray, and the CC graph has been getting smaller and smaller. Though the first CC cleanup phase can be pretty long, so maybe it would be bad if you end up doing all that work right after the GC and before that cleanup phase, at some random point in time where it is bad.
(In reply to Andrew McCreight [:mccr8] from comment #28)
> What holds these globals? I've been thinking for CC speed reasons we may
> want to mark black all globals held by live windows, or some such thing.

This would be fantastic, and it would really help this bug.

> The thing to remember is that the size of the CC graph is an upper bound on
> the number of gray objects, which is an upper bound on how many objects we
> have to ungray, and the CC graph has been getting smaller and smaller.
> Though the first CC cleanup phase can be pretty long, so maybe it would be
> bad if you end up doing all that work right after the GC and before that
> cleanup phase, at some random point in time where it is bad.

Yeah, it's the time right after GC that I'm worried about.
(In reply to Bill McCloskey (:billm) from comment #29)
> (In reply to Andrew McCreight [:mccr8] from comment #28)
> > What holds these globals? I've been thinking for CC speed reasons we may
> > want to mark black all globals held by live windows, or some such thing.
> 
> This would be fantastic, and it would really help this bug.
> 
> > The thing to remember is that the size of the CC graph is an upper bound on
> > the number of gray objects, which is an upper bound on how many objects we
> > have to ungray, and the CC graph has been getting smaller and smaller.
> > Though the first CC cleanup phase can be pretty long, so maybe it would be
> > bad if you end up doing all that work right after the GC and before that
> > cleanup phase, at some random point in time where it is bad.
> 
> Yeah, it's the time right after GC that I'm worried about.

I hesitate to suggest anything that further complicates things, but could we finesse this? As in, if you UnmarkGray something after a GC and before a CC, it does a DeferredUnmarkGray on them, adding them (if non-black) to a list that is processed during the CC? You'd do it after marking live windows' globals black, so that most of them wouldn't do anything.

Hm, maybe that isn't entirely sound. You could (1) do a GC, (2) UnmarkGray a global, (3) reach through it to grab onto a gray object, (4) mutate the graph to remove all edges from the global to that gray object, then (5) run the CC. The recursive UnmarkGray would then fail to blacken the gray object.

Alternatively, what if live windows just rooted their globals via AddNamedRoot, and unrooted them in their destructors (or whatever finalizers are called in refcount-land)?
(In reply to Steve Fink [:sfink] from comment #30)
> Alternatively, what if live windows just rooted their globals via
> AddNamedRoot, and unrooted them in their destructors (or whatever finalizers
> are called in refcount-land)?

Oh. I think the answer is because the global could keep those windows alive. Unless there's some other way to tell when a live window dies that does not involve the object graph? If so, that's when we'd unroot.
Yes, the black-gray problem is at its root just like incremental GC, and we need to have barriers in place. (read instead of write)

We know when a window is currently being displayed. We can use this information to mark everything reachable from it black. This does not give us 100% soundness, but it probably covers enough of the cases to reduce the number of gray objects in the common case, and reduce the cost of the read barriers.
Whoops, this is the same component as the other patch. I could've merged them. Oh well.
Attachment #606460 - Flags: review?(bugs)
Sorry, this could've been merged into the other patch. I didn't realize it was the same component.
Attachment #606464 - Flags: review?(bugs)
There aren't really any active JSD reviewers, but this is only relevant to GC anyway.
Attachment #606468 - Flags: review?(wmccloskey)
Attachment #605870 - Attachment is obsolete: true
Attachment #605870 - Flags: review?(wmccloskey)
Attachment #606460 - Attachment is obsolete: true
Attachment #606460 - Flags: review?(bugs)
Attachment #606459 - Flags: review?(bugs) → review+
Attachment #606464 - Flags: review?(bugs) → review+
Attachment #605871 - Attachment is obsolete: true
Attachment #605871 - Flags: review?(wmccloskey)
Attachment #606464 - Attachment description: UnmarkGray various JS objects to prevent them from being used to create black -> gray edges → patch3: UnmarkGray various JS objects to prevent them from being used to create black -> gray edges
Attachment #606468 - Attachment description: Root JSD's pet global so it can't participate in CC cycles → patch5: Root JSD's pet global so it can't participate in CC cycles
Attachment #605872 - Attachment description: UnmarkGray fixups for globals and contexts → patch6: UnmarkGray fixups for globals and contexts
Attachment #605874 - Attachment description: UnmarkGray for JS_TransplantObject → patch7: UnmarkGray for JS_TransplantObject
Attachment #606459 - Attachment description: UnmarkGray the global object held by nsXULPDGlobalObject to avoid it being used to create black -> gray edges → patch4: UnmarkGray the global object held by nsXULPDGlobalObject to avoid it being used to create black -> gray edges
Attachment #606465 - Attachment description: XPConnect changes to UnmarkGray some more objects and return them for convenience → patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience
Comment on attachment 606468 [details] [diff] [review]
patch5: Root JSD's pet global so it can't participate in CC cycles

Review of attachment 606468 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/jsd/jsd_high.c
@@ +170,5 @@
>      return jsdc;
>  
>  label_newJSDContext_failure:
>      if( jsdc ) {
> +        JS_RemoveObjectRoot(jsdc->dumbContext, &jsdc->glob);

I think this should be guarded by |if (jsdc->dumbContext)|.
Attachment #606468 - Flags: review?(wmccloskey) → review+
Utility function needed for UnmarkGray stuff.
Attachment #606661 - Flags: review?(wmccloskey)
Attachment #606661 - Attachment description: Implement JS_GetGCThingRuntime → patch1: Implement JS_GetGCThingRuntime
Attachment #606661 - Attachment filename: bug-730208-p1-unmark-gray-fixups → bug-730208-p1-getgcthingrt
Comment on attachment 605872 [details] [diff] [review]
patch6: UnmarkGray fixups for globals and contexts

This seems fine to me, but I'd feel more comfortable if we wait to land it until bug 736563 (just filed) is fixed.
Attachment #605872 - Flags: review?(wmccloskey) → review+
Comment on attachment 606661 [details] [diff] [review]
patch1: Implement JS_GetGCThingRuntime

I don't think we need this. You should be able to do something like:
  nsXPConnect::GetXPConnect()->GetRuntime()->GetJSRuntime()
(modulo some annoying null checks). This is actually pretty fast, so we don't have to worry about performance.
Attachment #606661 - Flags: review?(wmccloskey)
Comment on attachment 606465 [details] [diff] [review]
patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience

I think mccr8 would be a better reviewer here. I don't know a ton about the CC, and I don't have time at the moment to really do so. Hopefully I'll do that soon.
Attachment #606465 - Flags: review?(bobbyholley+bmo) → review?(continuation)
Comment on attachment 606465 [details] [diff] [review]
patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience

Review of attachment 606465 [details] [diff] [review]:
-----------------------------------------------------------------

Which places that are using the functions in XPCLazyCallContext, XPCWrappedNativeScope and XPCWrappedNativeProto have had their behavior changed, and why is that okay?  I don't really know anything about those classes.  r- because I'd like to know more about that.

The places where you change the function look reasonable enough to me.

::: js/xpconnect/src/XPCWrappedNativeJSOps.cpp
@@ +651,4 @@
>      NS_ASSERTION(obj, "bad scope JSObject");
>      JS_CALL_OBJECT_TRACER(trc, obj, "XPCWrappedNativeScope::mGlobalJSObject");
>  
> +    obj = scope->GetPrototypeJSObjectPreserveColor();

You don't unmark gray here because you are just tracing through things and not returning anything?

::: js/xpconnect/src/XPCWrappedNativeScope.cpp
@@ +151,5 @@
>      {   // scoped lock
>          XPCAutoLock lock(mRuntime->GetMapLock());
>  
>  #ifdef DEBUG
>          for (XPCWrappedNativeScope* cur = gScopes; cur; cur = cur->mNext)

You don't unmark gray here because the object isn't being returned, just examined for an internal property?

@@ +746,5 @@
>  
>          DEBUG_TrackScopeTraversal();
>  
>          for (XPCWrappedNativeScope* cur = gScopes; cur; cur = cur->mNext) {
> +            if (obj == cur->GetGlobalJSObjectPreserveColor()) {

You don't unmark gray here because you aren't actually returning the object itself?

::: js/xpconnect/src/nsXPConnect.cpp
@@ +775,5 @@
>      } while (thing);
>  }
>  
>  void
> +xpc_UnmarkGrayObjectRecursive(void *thing, JSGCTraceKind kind)

please rename to xpc_UnmarkGrayGCThingRecursive

@@ +780,2 @@
>  {
> +    NS_ASSERTION(thing, "Don't pass me null!");

Probably should change this to MOZ_ASSERT(thing) while you are here.

@@ +785,3 @@
>  
>      // Trace children.
>      UnmarkGrayTracer trc;

This is a bit unfortunate.  UnmarkGrayTracer is not really set up to trace shapes at the top level.  It is probably okay to do, but I'm not 100% sure.  But it looks like you never call it with a shape, so if you could add an assertion that kind != JSTRACE_SHAPE, with a comment that UnmarkGrayTracer may not be safe for top level use with a shape, or something to that effect, that's good enough I think.  If somebody wants to actually do that, Bill or I can think about it some more.
Attachment #606465 - Flags: review?(continuation) → review-
(In reply to Andrew McCreight [:mccr8] from comment #43)
> Comment on attachment 606465 [details] [diff] [review]
> patch2: XPConnect changes to UnmarkGray some more objects and return them
> for convenience
> 
> Review of attachment 606465 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Which places that are using the functions in XPCLazyCallContext,
> XPCWrappedNativeScope and XPCWrappedNativeProto have had their behavior
> changed, and why is that okay?  I don't really know anything about those
> classes.  r- because I'd like to know more about that.

What do you mean by having their behavior changed? Other than the unmarking, they get the same object back as always. So the only behavior change possible would be not freeing a cycle because a JS object in the cycle got unmarked. (But if they're part of a garbage cycle, nothing should be looking at them, I would think.)

Or at least, that was my logic. Maybe I'm misunderstanding?

> The places where you change the function look reasonable enough to me.
> 
> ::: js/xpconnect/src/XPCWrappedNativeJSOps.cpp
> @@ +651,4 @@
> >      NS_ASSERTION(obj, "bad scope JSObject");
> >      JS_CALL_OBJECT_TRACER(trc, obj, "XPCWrappedNativeScope::mGlobalJSObject");
> >  
> > +    obj = scope->GetPrototypeJSObjectPreserveColor();
> 
> You don't unmark gray here because you are just tracing through things and
> not returning anything?

Don't mistake me for someone who knows what he's doing. :)

My thinking was mostly that mutating mark bits during tracing is a little funky, unless it's directly related to the traversal you're doing. Especially since any unnecessary unmarking could be slow. And it shouldn't be needed because the whole point here is to avoid getting a gray object and pointing a black one at it, and I assumed that you wouldn't be mutating the graph while tracing it.

> 
> ::: js/xpconnect/src/XPCWrappedNativeScope.cpp
> @@ +151,5 @@
> >      {   // scoped lock
> >          XPCAutoLock lock(mRuntime->GetMapLock());
> >  
> >  #ifdef DEBUG
> >          for (XPCWrappedNativeScope* cur = gScopes; cur; cur = cur->mNext)
> 
> You don't unmark gray here because the object isn't being returned, just
> examined for an internal property?

If it were returning an internal property that could be used in a graph mutation, we'd want to unmark it. Given that this is DEBUG-only, it seems safe to assume that we don't need to unmark something that wouldn't be unmarked non-DEBUG. If you'll excuse the triple? negative.

> 
> @@ +746,5 @@
> >  
> >          DEBUG_TrackScopeTraversal();
> >  
> >          for (XPCWrappedNativeScope* cur = gScopes; cur; cur = cur->mNext) {
> > +            if (obj == cur->GetGlobalJSObjectPreserveColor()) {
> 
> You don't unmark gray here because you aren't actually returning the object
> itself?

Yes. No possibility of problematic graph mutation if the caller isn't handed back a gray object to point something at.

> 
> ::: js/xpconnect/src/nsXPConnect.cpp
> @@ +775,5 @@
> >      } while (thing);
> >  }
> >  
> >  void
> > +xpc_UnmarkGrayObjectRecursive(void *thing, JSGCTraceKind kind)
> 
> please rename to xpc_UnmarkGrayGCThingRecursive

Ok.

> @@ +780,2 @@
> >  {
> > +    NS_ASSERTION(thing, "Don't pass me null!");
> 
> Probably should change this to MOZ_ASSERT(thing) while you are here.

Ok

> @@ +785,3 @@
> >  
> >      // Trace children.
> >      UnmarkGrayTracer trc;
> 
> This is a bit unfortunate.  UnmarkGrayTracer is not really set up to trace
> shapes at the top level.  It is probably okay to do, but I'm not 100% sure. 
> But it looks like you never call it with a shape, so if you could add an
> assertion that kind != JSTRACE_SHAPE, with a comment that UnmarkGrayTracer
> may not be safe for top level use with a shape, or something to that effect,
> that's good enough I think.  If somebody wants to actually do that, Bill or
> I can think about it some more.

I don't know why shapes are special, but I don't know many things. Added an assert.
The question of behavior changes is still open, but I made the other changes. (Given that the r- was for the behavior changes, that doesn't mean much. But I'm happy to look into it if you can clarify what you're asking.)
Attachment #606465 - Attachment is obsolete: true
Attachment #606661 - Attachment is obsolete: true
Attachment #607849 - Flags: review?(continuation)
(In reply to Steve Fink [:sfink] from comment #44)
> Or at least, that was my logic. Maybe I'm misunderstanding?

I guess that's true.  Though I guess if you unmark during tracing that would be bad.

> My thinking was mostly that mutating mark bits during tracing is a little
> funky, unless it's directly related to the traversal you're doing.
> Especially since any unnecessary unmarking could be slow. And it shouldn't
> be needed because the whole point here is to avoid getting a gray object and
> pointing a black one at it, and I assumed that you wouldn't be mutating the
> graph while tracing it.

Yeah, I think you should never unmarkGray during tracing.  I wonder if we could/should enforce that somehow.

> I don't know why shapes are special, but I don't know many things. Added an
> assert.

There are some optimizations in place for tracing shapes to avoid blowing up the stack that make things a little weird.
Comment on attachment 607849 [details] [diff] [review]
patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience

Review of attachment 607849 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  I'm going to look over the callers of the functions that are now unmark-graying, hopefully tomorrow morning, before I r+ it.

::: js/xpconnect/src/nsXPConnect.cpp
@@ +780,3 @@
>  {
> +    MOZ_ASSERT(thing, "Don't pass me null!");
> +    MOZ_ASSERT(kind != JSTRACE_SHAPE, "UnmarkGrayTracer not intended for Shapes");

I think xpc_UnmarkGrayGCThingRecursing instead of UnmarkGrayTracer.  It is okay to use UnmarkGrayTracer for shapes, just under more restricted circumstances. ;)

::: js/xpconnect/src/xpcpublic.h
@@ +194,5 @@
> +    return obj;
> +}
> +
> +inline JSScript *
> +xpc_UnmarkGrayScript(JSScript *script)

You should check with Bill if this new function also needs the IncrementalReferenceBarrier thing, which I don't know much about.
Comment on attachment 605874 [details] [diff] [review]
patch7: UnmarkGray for JS_TransplantObject

This seems like a pretty hard problem. I was hoping that compartment-per-global could somehow fix this. However, JS_TransplantObject creates new wrappers (that will be black) and their global object will potentially be gray. I don't think this patch fixes that problem, and neither will c-p-g.

Rather than worry about the assertion, I think we should consider what new edges are actually being created by this function. As long as it doesn't create any black-to-gray edges, we can disable the grayness assertion locally (possibly with a check after the function runs that ensures that no wrappers have black-to-gray edges).

This function basically switches some wrapper pointers from orig to target. As long as orig and target are black (which can be ensured with UnmarkGray), this can't be bad. However, there are a few corner cases.

1. In some cases, rather than redirecting the pointers to orig or target, we redirect them to the existing wrapper for orig that is in the target compartment. Perhaps it makes sense for the JS_TransplantObject callers to UnmarkGray this object as well if it exists. If we do this, we probably should have an XPConnect version of JS_TransplantObject that safely handles the UnmarkGray calls. Everyone would just call that.

2. When we redirect wrappers, we do so by creating a new wrapper and then swapping it with the old wrapper. The new wrapper object will be black. After the swap it will point to whatever the old wrapper pointed to, including the global object, which could be gray. So this seems like a problem. One idea would be to color the new wrapper the same color as the existing wrapper. This kinda makes sense, and I think it solves the problem.

So this looks feasible to me. My recommendation would be to implement (1) and (2). Then we'll need to disable the grayness checker while JS_TransplantObject is running. Maybe we should just have a counter in the runtime for how many invocations of JS_TransplantObject are on the stack. We wouldn't assert if this is non-zero.

Finally, I think we should add a pretty heavyweight assertion at the end of JS_TransplantObject. It would accumulate the set of all wrappers and objects touched, and then it would use a non-recursive JS_TraceChildren call to look for black-to-gray edges. I won't be at all surprised if this finds some bugs since JS_TransplantObject is pretty complicated.
Attachment #605874 - Flags: review?(wmccloskey) → review-
Comment on attachment 607849 [details] [diff] [review]
patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience

Bill, does this patch need an IncrementalReferenceBarrier sort of thing for xpc_UnmarkGrayScript in the same way that xpc_UnmarkGrayObject has?
Attachment #607849 - Flags: feedback?(wmccloskey)
(In reply to Andrew McCreight [:mccr8] from comment #49)
> Comment on attachment 607849 [details] [diff] [review]
> patch2: XPConnect changes to UnmarkGray some more objects and return them
> for convenience
> 
> Bill, does this patch need an IncrementalReferenceBarrier sort of thing for
> xpc_UnmarkGrayScript in the same way that xpc_UnmarkGrayObject has?

Yes it does. Any UnmarkGray call should also do the incremental check.
Comment on attachment 607849 [details] [diff] [review]
patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience

Review of attachment 607849 [details] [diff] [review]:
-----------------------------------------------------------------

r- due to the IncrementalReferenceBarrier thing that needs to be fixed.  Otherwise, it looked fine to me.  I glanced over the various places that the functions that now ungray are being called, and it didn't look like any of it was inside of any GC or CC related thing so they should be okay.
Attachment #607849 - Flags: review?(continuation)
Attachment #607849 - Flags: review-
Attachment #607849 - Flags: feedback?(wmccloskey)
This is needed for xpc_UnmarkGrayScript in the patch2 that I'm about to update, which is itself needed for patch3.
Attachment #616338 - Flags: review?(wmccloskey)
Attachment #607849 - Attachment is obsolete: true
Attachment #616339 - Flags: review?(continuation)
Attachment #616338 - Attachment description: Implement IsIncrementalBarrierNeededOnScript → patch1: Implement IsIncrementalBarrierNeededOnScript
Un-inlining xpc_UnmarkGrayObject seems bad, especially since Boris is already complaining about how slow it is (see bug 747066).  Can you grab the runtime in *Recursive and leave it inlined?
Oops, was un-inlining xpc_UnmarkGrayObject to grab the runtime.
Attachment #616339 - Attachment is obsolete: true
Attachment #616693 - Flags: review?(continuation)
Attachment #616339 - Flags: review?(continuation)
Attachment #616338 - Flags: review?(wmccloskey) → review+
Comment on attachment 616693 [details] [diff] [review]
patch2: XPConnect changes to UnmarkGray some more objects and return them for convenience

Review of attachment 616693 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/nsXPConnect.cpp
@@ +784,4 @@
>  
>      // Trace children.
>      UnmarkGrayTracer trc;
> +    JSRuntime *rt = nsXPConnect::GetXPConnect()->GetRuntime()->GetJSRuntime();

I think you can use nsXPConnect::GetRuntimeInstance()->GetJSRuntime() here, which looks like it is the same, but slightly shorter.  No big deal.

::: js/xpconnect/src/xpcpublic.h
@@ +185,5 @@
>  xpc_UnmarkGrayObject(JSObject *obj)
>  {
>      if (obj) {
> +        if (xpc_IsGrayGCThing(obj)) {
> +            xpc_UnmarkGrayGCThingRecursive(obj, JSTRACE_OBJECT);

nit: You don't need to brace this if/then/else.

@@ +198,5 @@
> +xpc_UnmarkGrayScript(JSScript *script)
> +{
> +    if (script) {
> +        if (xpc_IsGrayGCThing(script)) {
> +            xpc_UnmarkGrayGCThingRecursive(script, JSTRACE_SCRIPT);

nit: You don't need to brace this if/then/else.
Attachment #616693 - Flags: review?(continuation) → review+
Oops. Except I didn't notice that this got backed out:

https://hg.mozilla.org/integration/mozilla-inbound/rev/cecc9064c6cb

for breaking xpcshell tests in debug builds, which looks like it must have been a PITA to track down given all of the intervening failures and lack of test runs. Sorry about that; I thought I'd run everything in the final push through try.
Target Milestone: mozilla15 → ---
Backed out: https://hg.mozilla.org/mozilla-central/rev/cecc9064c6cb
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Request re-review because removing the root could be done outside of a request.
Attachment #606468 - Attachment is obsolete: true
Attachment #619112 - Flags: review?(wmccloskey)
Comment on attachment 619112 [details] [diff] [review]
patch5: Root JSD's pet global so it can't participate in gray CC cycles

Assuming you can easily get your hands on a runtime, it's probably better to use JS_RemoveObjectRootRT here. Then you wouldn't need a request. The other rooting functions could be changed too, although they don't have to be.
Attachment #619112 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #62)
> Comment on attachment 619112 [details] [diff] [review]
> patch5: Root JSD's pet global so it can't participate in gray CC cycles
> 
> Assuming you can easily get your hands on a runtime, it's probably better to
> use JS_RemoveObjectRootRT here. Then you wouldn't need a request. The other
> rooting functions could be changed too, although they don't have to be.

Oh, great, thanks! I didn't know about that one. I have a context, so a runtime is just a JS_GetRuntime away.

I'll change the other one too, even though at the moment it's guaranteed to be in a request. But that's brittle wrt future changes.
As long as I can't commit due to the tree closure, I may as well pile on one more of these patches. This fixes mozJSComponentLoader.
Attachment #605874 - Attachment is obsolete: true
Attachment #620517 - Flags: review?(continuation)
Comment on attachment 620517 [details] [diff] [review]
path7: UnmarkGray for component loader

Review of attachment 620517 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/loader/mozJSComponentLoader.h
@@ +134,5 @@
>  
>                  JSAutoEnterCompartment ac;
>                  ac.enterAndIgnoreErrors(sSelf->mContext, global);
>  
> +                JS_ClearScope(sSelf->mContext, xpc_UnmarkGrayObject(global));

Does anything "escape" from clear scope?  I thought it would just delete some properties, not copy them out.
(In reply to Andrew McCreight [:mccr8] from comment #65)
> Comment on attachment 620517 [details] [diff] [review]
> path7: UnmarkGray for component loader
> 
> Review of attachment 620517 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/xpconnect/loader/mozJSComponentLoader.h
> @@ +134,5 @@
> >  
> >                  JSAutoEnterCompartment ac;
> >                  ac.enterAndIgnoreErrors(sSelf->mContext, global);
> >  
> > +                JS_ClearScope(sSelf->mContext, xpc_UnmarkGrayObject(global));
> 
> Does anything "escape" from clear scope?  I thought it would just delete
> some properties, not copy them out.

Um... er... well, ok, you have a point. I don't see how this could ever cause black->gray.

On the other hand, this trips the assertion, and I'm not sure if I want to come up with a way of conditionally disabling the assert. Maybe I'll make this DEBUG-only.

What if an exception was triggered? I wonder if the exception could hold onto gray objects. In this case, it still wouldn't matter, because of the enterAndIgnoreErrors which I assume does a clearPendingException.
What assertion specifically is it triggering and where?  I'm not really that familiar with JS_ClearScope.
(In reply to Andrew McCreight [:mccr8] from comment #67)
> What assertion specifically is it triggering and where?  I'm not really that
> familiar with JS_ClearScope.

Huh? No, that was just me speculating on tricky ways that things can escape, and it's not even relevant here because we nuke the exception. But I assume you could get eg an OOM exception. I don't know how much stuff exceptions can point to, but come to think of it the exception object itself would point to the global, and that's enough.

Anyway, I'm going to just delete this patch from the attachments list, because it's only relevant to detecting the leaks. I'll merge it with the patch that adds the asserts.
Attachment #620517 - Attachment is obsolete: true
Attachment #620517 - Flags: review?(continuation)
backporting this to ESR (or even Aurora) doesn't seem sane or worth the risk of regression. If we find a specific abuse of this problem we can figure out how to deal with it then.
Depends on: 752081
Changeset https://hg.mozilla.org/mozilla-central/rev/2e603a571520 caused Thunderbird to start printing many warnings:

JavaScript error: , line 0: nothing active on context

See https://bugzilla.mozilla.org/show_bug.cgi?id=752250

At first glance, it looks like the global object in the Javascript context isn't set up properly.

Any advice on how to resolve this in Thunderbird?
See Also: → 752250
Now that CPG has landed, we should think about doing the UnmarkGray inside AutoCompartment::enter. I talked this over with Luke, and I think the way we want to do it is as follows:

bool
AutoCompartment::enter()
{
    JS_ASSERT(!entered);
    if (origin != destination) {
        JSObject &scopeChain = target->global();
        // ...
        if (scopeChain.isMarked(GRAY) ||
            scopeChain.compartment()->rt->gcIncrementalState == MARK)
        {
            (*scopeChain.compartment()->rt->enterGrayGlobal)(&scopeChain);
        }
    }
    entered = true;
    return true;
}

The enterGrayGlobal would be a callback implemented by XPConnect that would call xpc_UnmarkGray on the global. It's a little gross, but it seems like a fairly complete solution to our problems.
Depends on: 752250
I assume that there is nothing to do done here to specifically verify this fix?
Whiteboard: [sg:moderate] → [sg:moderate][advisory-tracking+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.