1285355 - [meta] Make it possible to do only zone GCs

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Description

•

8 years ago

This bug is about finding all the stuff that only gets cleaned up by full (i.e., non-zone) GCs and figure out how to clean that stuff up in a zone GC. This is a prerequisite for getting rid of full GCs entirely.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Comment 1

•

8 years ago

How will we collect cycles across zones? Is the assumption that we will always be nuking compartments?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 2

•

8 years ago

If there are cycles across zones, we can detect them and collect all the zones containing the cycle at once. However, in an e10s content process, it's very unlikely that we'll see cycles across zones (except for zones for tabs linked via window.opener, which probably should go in the same zone anyway).

I'll post the scripts I used to analyze this in a moment.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 3

•

8 years ago

Attached patch annotate object compartments in heap dumps — Details — Splinter Review

It would be nice to have something like this in general, but this patch is not for checkin.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 4

•

8 years ago

Attached file cross-zone.py — Details

This script relies in the previous patch to print out cross-zone edges in a heap dump. Generally I see a few things from the system zone to content compartments. I don't think add-ons should affect this much.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Comment 5

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #2)
> If there are cycles across zones, we can detect them and collect all the
> zones containing the cycle at once.

Makes sense.

> However, in an e10s content process,
> it's very unlikely that we'll see cycles across zones (except for zones for
> tabs linked via window.opener, which probably should go in the same zone
> anyway).

Agreed that they should be in the same zone, if they're not then we should make it so.

I think cycles likely occur with the devtools server and the debuggees. Although Debugger -> debuggee edges should mostly no one-way, some of the devtools do things like monkey patch various DOM methods to add instrumentation or tracing and I bet this creates cycles. Just thinking out loud.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 6

•

8 years ago

Jon, is there any chance you could fill in some dependencies here before the meeting? Thanks.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Comment 7

•

8 years ago

We talked about this a little at the GC meeting yesterday but didn't go so far as to agree exactly what the dependencies are.

From looking at the code, I can see the following:

1) The atoms table

I don't know how we can solve this other than moving non-permanent atoms to the zone.

2) XPConnect

The GC finalize callback takes a boolean indicating full/compartment GC.  It seems we can only sweep certain things in a full GC (NativeInterfaces and mClassInfo2NativeSetMap).

We were already talking about making XPConnect more zone-aware and I'm sure this would help here.  More investigation is required to know how to adapt XPConnect to zone-only GC though.

3) Gray bits are marked valid at the end of a full GC only

We would need to make the cycle collector zone-aware too (if that isn't covered by XPConnect above).

4) SharedScriptData table

This contains bytecode and is shared between all zones.  We were talking about making the entries GC things in their own right.  On the other hand, it might make sense to reference count these (they can't be part of a cycle).

There are probably others too.

Flags: needinfo?(jcoppeard)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 8

•

8 years ago

Thanks Jon!

To remember: the XPConnect stuff is related to bug 1207321.

Flags: needinfo?(wmccloskey)

Andrew McCreight [:mccr8]

Comment 9

•

8 years ago

To summarize what we talked about in the meeting, for gray bits (3) there are the following issues, at least:
1. Preserve gray bits for compartments we're not collecting.
2. Treat incoming gray edges as gray roots, not black roots.
3. Ensure that outgoing black edges don't touch gray objects.

The last one may be the hardest, as it isn't obvious how to do that without marking other zones. We can't just turn the other zone black because then we won't be able to cycle collect huge multizone cycles.

Andrew McCreight [:mccr8]

Comment 10

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #8)
> To remember: the XPConnect stuff is related to bug 1207321.

The main idea here is that if we can ensure that all of the weird XPConnect wrappers are only going to be used from the system principal, then they will only be used from the system zone, making it easier to move the XPConnect wrappers to per-zone data structures. Terrence has done some previous work with splitting tables into one per zone, plus one for multi-zone objects, which may need to be done in the interim for other tables.

Andrew McCreight [:mccr8]

Comment 11

•

8 years ago

Bug 1110928 will eliminate one common source of full GCs from the browser. This may depend on other work  if eliminating too many full GCs ends up causing OOM issues.

Depends on: 1110928

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Depends on: 1288177

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 12

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #9)
> To summarize what we talked about in the meeting, for gray bits (3) there
> are the following issues, at least:
> 1. Preserve gray bits for compartments we're not collecting.
> 2. Treat incoming gray edges as gray roots, not black roots.
> 3. Ensure that outgoing black edges don't touch gray objects.
> 
> The last one may be the hardest, as it isn't obvious how to do that without
> marking other zones. We can't just turn the other zone black because then we
> won't be able to cycle collect huge multizone cycles.

I was thinking some more about this. Perhaps I'm wrong, but I thought that the only way an object could transition from gray to black is via UnmarkGray. That is, if an object was marked gray by a previous GC, then a new GC should never mark it black AFAIK. If it did, then that would mean that an intervening CC might have seen the wrong color.

So it seems like we could just leave the mark bits the same for zones we're not going to collect. If an object was marked gray, then the only way it could become black is through UnmarkGray, which doesn't respect zone boundaries. Does that sound right Andrew?

Flags: needinfo?(wmccloskey) → needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 13

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #12)
> I was thinking some more about this. Perhaps I'm wrong, but I thought that
> the only way an object could transition from gray to black is via
> UnmarkGray.

Hmm, I guess that's true for direct manipulating of JS, but I'm not sure it holds up 100% of the time for manipulating C++. Most CC optimizations work by calling UnmarkGray, but there are a few in mozilla::dom::TraceBlackJS() that mark reflectors black directly, for instance the globals of windows being displayed. Thus if you had a window that wasn't being displayed, and somehow you made it being displayed (maybe via a weak ref or something?) then the reflector would go from being gray to being black when you do a GC. In a world with only full GCs, this is okay for the CC because it never creates a black-gray edge. (Refcounted object to gray edges are okay, as they are visible to the CC, in contrast to black-gray JS edges.)

Another problem is where UnmarkGray has failed and we need to recompute all gray bits. This is triggered from nsCycleCollector::FixGrayBits(), and the easiest thing to do is to just ensure it is always a full GC. They should be rare enough that it is okay to do full GCs. (As we do them right now for this anyways.)

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 14

•

8 years ago

I guess we could call UnmarkGray on any JS held by a C++ object that starts being eligible to being marked during the black-marking phase. It seems a little fragile but I suppose there aren't too many such cases.

Another potentially dangerous case would be rooting. I could imagine us rooting a gray object (this is okay from a CC perspective if we never read or write from it directly). Though maybe we want/have barriers that would unmarkgray in that case anyways.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 15

•

8 years ago

Hi Olli,
Do you know if it's possible for objects that are marked in mozilla::dom::TraceBlackJS to have previously been gray? The main stuff we seem to mark is window objects that have a docshell. Is it possible for a window to be created, marked, and then later get a docshell?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #15)
> Do you know if it's possible for objects that are marked in
> mozilla::dom::TraceBlackJS to have previously been gray?
XULProtoCache is actually quite large part too, and yes, that would be gray usually.
And why wouldn't the windows be gray?
(the code was added in bug 736563)

> Is it possible for a
> window to be created, marked, and then later get a docshell?
To re-get a docshell, no. Windows in bfcache should have a docshell still.
But there are Window objects without docshell around in the web pages, like
var i = document.createElement("iframe");
document.body.appendChild(i);
var win = i.contentWindow;
i.remove();
// win stays alive

Flags: needinfo?(bugs)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 17

•

8 years ago

Talked about this on IRC with Andrew and Olli. Here's a summary.

Let's say we GC zone A and not zone B. We want to avoid a situation where there's a gray edge from A to B, and the GC marks the A side of the edge black and the B side of the edge is still gray, since that would break our "no black to gray edges" invariant.

Ideally, this shouldn't happen because the GC will never mark a node black if it was gray before the GC started. The gray->black transition should only happen in between GCs, by UnmarkGray. That's because:

- If a gray object becomes a stack root (through Rooted), it should go through UnmarkGray first. Otherwise the JS engine would have direct access to a gray object, which isn't supposed to happen.

- If the object is not a root, then a black object must point to it in order for it to be marked black. However, whoever created that edge was implicitly exposing the gray object to running JS code, which isn't supposed to happen.

Andrew pointed out that dom::TraceBlackJS is another root tracing function that could, in theory, turn objects black when they used to be gray. Mostly it seems like that won't happen.

- Scripts in the XUL prototype cache are always freshly created when they're added to the cache, so they'll never be gray. Once they're in the cache, they'll always be marked black.

- XUL document prototypes are also freshly parsed when they're added to the document's proto list, so they'll never be gray.

- Event listeners will have been added through addEventListener, and they should be black when that happens (since they're accessible from JS). After that, they'll always be marked black by TraceBlackJS.

There is a problem with window globals, however. Suppose a top-level window is in the bfcache. It will not be marked by TraceBlackJS because only outer windows are marked, and the outer window will refer to the currently active window. So it could be gray. Now the user hits the back button and makes that window current. At this time TraceBlackJS will trace it.

It seems like we can work around this problem by changing TraceBlackJS to mark all windows in the history black.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 18

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #17)
> Ideally, this shouldn't happen because the GC will never mark a node black
> if it was gray before the GC started. The gray->black transition should only
> happen in between GCs, by UnmarkGray.
Or during incremental GC when running some JS, but do we consider that happening between GCs?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 19

•

8 years ago

(In reply to Olli Pettay [:smaug] from comment #18)
> (In reply to Bill McCloskey (:billm) from comment #17)
> > Ideally, this shouldn't happen because the GC will never mark a node black
> > if it was gray before the GC started. The gray->black transition should only
> > happen in between GCs, by UnmarkGray.
> Or during incremental GC when running some JS, but do we consider that
> happening between GCs?

That's a very good point. When the GC is running, we don't do UnmarkGray, we use a barrier. And the barrier will stop at zone boundaries.
http://searchfox.org/mozilla-central/rev/496904277ce0143bc1a952f2eb2c7e6a81aa3d4d/js/public/GCAPI.h#610-613

I wonder how we can handle that.

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Depends on: 1288817

Andrew McCreight [:mccr8]

Comment 20

•

8 years ago

There are three classes that only have the weird sweeping stuff done in full GCs: XPCNativeScriptableShared, XPCNativeSet, and XPCNativeInterface. It looks like the first one could just be made refcounted, and I have some patches to do that.

XPCNativeSet is a list of XPCNativeInterface. XPCNativeInterface has a bunch of jsid, but it never tells the GC about them (there's some comment on XPCNativeMember about the jsid being pinned which is presumably why it works). So I guess it could potentially be done, too.

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Depends on: 1288870

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Depends on: 1288909

Terrence Cole [:terrence]

Comment 21

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #19)
> (In reply to Olli Pettay [:smaug] from comment #18)
> > (In reply to Bill McCloskey (:billm) from comment #17)
> > > Ideally, this shouldn't happen because the GC will never mark a node black
> > > if it was gray before the GC started. The gray->black transition should only
> > > happen in between GCs, by UnmarkGray.
> > Or during incremental GC when running some JS, but do we consider that
> > happening between GCs?
> 
> That's a very good point. When the GC is running, we don't do UnmarkGray, we
> use a barrier. And the barrier will stop at zone boundaries.
> http://searchfox.org/mozilla-central/rev/
> 496904277ce0143bc1a952f2eb2c7e6a81aa3d4d/js/public/GCAPI.h#610-613
> 
> I wonder how we can handle that.

Uh oh! In the automatic read barrier at [1], we treat both sides of that branch independently, I think for this very reason. I thought ExposeToActiveJS was supposed to have the same semantics.

http://searchfox.org/mozilla-central/source/js/src/gc/Heap.h#1293-1300

(In reply to Andrew McCreight [:mccr8] from comment #20)
> There are three classes that only have the weird sweeping stuff done in full
> GCs: XPCNativeScriptableShared, XPCNativeSet, and XPCNativeInterface. It
> looks like the first one could just be made refcounted, and I have some
> patches to do that.
> 
> XPCNativeSet is a list of XPCNativeInterface. XPCNativeInterface has a bunch
> of jsid, but it never tells the GC about them (there's some comment on
> XPCNativeMember about the jsid being pinned which is presumably why it
> works). So I guess it could potentially be done, too.

IIRC, that code is the only thing keeping atom pinning alive, so tracing those normally would allow us to drop that annoying sub-system too.

Jon Coppeard (:jonco)

Updated

•

8 years ago

Depends on: 1290108

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Depends on: 1296639

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 22

•

8 years ago

(In reply to Terrence Cole [:terrence] from comment #21)
> (In reply to Bill McCloskey (:billm) from comment #19)
> > (In reply to Olli Pettay [:smaug] from comment #18)
> > > (In reply to Bill McCloskey (:billm) from comment #17)
> > > > Ideally, this shouldn't happen because the GC will never mark a node black
> > > > if it was gray before the GC started. The gray->black transition should only
> > > > happen in between GCs, by UnmarkGray.
> > > Or during incremental GC when running some JS, but do we consider that
> > > happening between GCs?
> > 
> > That's a very good point. When the GC is running, we don't do UnmarkGray, we
> > use a barrier. And the barrier will stop at zone boundaries.
> > http://searchfox.org/mozilla-central/rev/
> > 496904277ce0143bc1a952f2eb2c7e6a81aa3d4d/js/public/GCAPI.h#610-613
> > 
> > I wonder how we can handle that.
> 
> Uh oh! In the automatic read barrier at [1], we treat both sides of that
> branch independently, I think for this very reason. I thought
> ExposeToActiveJS was supposed to have the same semantics.
> 
> http://searchfox.org/mozilla-central/source/js/src/gc/Heap.h#1293-1300

Does that actually work? needsIncrementalBarrier() is only true if the gray bits have been cleared out, as far as I know. So if needsIncrementalBarrier() is true, isMarked(GRAY) should always be false IIUC. So I don't think that moving the check like that solves the problem.

> (In reply to Andrew McCreight [:mccr8] from comment #20)
> > There are three classes that only have the weird sweeping stuff done in full
> > GCs: XPCNativeScriptableShared, XPCNativeSet, and XPCNativeInterface. It
> > looks like the first one could just be made refcounted, and I have some
> > patches to do that.
> > 
> > XPCNativeSet is a list of XPCNativeInterface. XPCNativeInterface has a bunch
> > of jsid, but it never tells the GC about them (there's some comment on
> > XPCNativeMember about the jsid being pinned which is presumably why it
> > works). So I guess it could potentially be done, too.
> 
> IIRC, that code is the only thing keeping atom pinning alive, so tracing
> those normally would allow us to drop that annoying sub-system too.

It looks like there are lots of other places too:
http://searchfox.org/mozilla-central/search?q=symbol:_Z22JS_AtomizeAndPinStringP9JSContextPKc&redirect=false
Those are just some examples.

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Depends on: 1324002

Jon Coppeard (:jonco)

Comment 23

•

7 years ago

(In reply to Bill McCloskey (:billm) from comment #19)
> That's a very good point. When the GC is running, we don't do UnmarkGray, we
> use a barrier. And the barrier will stop at zone boundaries.

The barrier will ensure that the target of cross-zone edges into non-marking zones have UnmarkGray called on them (eventually).

Jon Coppeard (:jonco)

Updated

•

7 years ago

Whiteboard: [qf:p1]

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf:p1] → [qf:p2]

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 24

•

7 years ago

Jon and I talked about this a bit today. Since 2012, we have cleared the mark bits only for zones being collected. And we explicitly search for black-gray edges across zones and call UnmarkGray on them here:
http://searchfox.org/mozilla-central/rev/fdb811340ac4e6b93703d72ee99217f3b1d250ac/js/src/jsgc.cpp#6561

So I think this bug is fixed!

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Jon Coppeard (:jonco)

Updated

•

7 years ago

Depends on: 1376901

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Updated

•

7 years ago

Blocks: 1376904

Jon Coppeard (:jonco)

Updated

•

7 years ago

No longer depends on: 1376901

Andrew McCreight [:mccr8]

Updated

•

7 years ago

No longer depends on: 1288177

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: meta

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → P2

Whiteboard: [qf:p2]

annotate object compartments in heap dumps 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 2.03 KB, patch		Details \| Diff \| Splinter Review
cross-zone.py 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 1.41 KB, text/x-python		Details