Simplify/fix "dead compartment" logic

RESOLVED FIXED in mozilla34

Status

()

Core
JavaScript Engine
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: billm, Assigned: billm)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla34
x86_64
Linux
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox32 wontfix, firefox33 affected)

Details

Attachments

(2 attachments, 2 obsolete attachments)

Created attachment 8429697 [details] [diff] [review]
dead-compartments

The dead compartment/zone code has been polluting our codebase for a while. It's really weird and hard to understand. Eric ran into a problem with it in bug 966518. Originally, I thought we could just work around the problem, but I think it's better to simplify the whole setup.

The purpose of the code is to collect objects even if they get "revived" by wrapper remapping during incremental GC. This revival can happen because we do the remapping to all wrappers, not just ones that are alive. If we have a compartment that's completely dead, then remapping a wrapper inside it might cause us to trigger a barrier or allocate. Either of these operations will likely cause the global to be marked, which will then cause lots of other stuff to be marked.

This is sort of a big deal because it happens consistently during tests, where IGCs span over many tests, and each test causes a new navigation (and consequent wrapper remapping) when we load it.

The idea I came up to simplify this is that we can detect, at the end of every GC, whether a dead compartment was revived. If it was, we can do a non-incremental collection of just the zones that we want to collect. In theory, that collection should be really fast. We won't need to do any marking, just finalization. Previously, we were doing a non-incremental GC of all zones in that case. Since the cost of this collection is no longer so expensive, I think it's fine to remove all the weird guards and stuff that were intended to ensure that these collections where rare. So the patch eliminates AutoMaybeTouchDeadZones and AutoMarkInDeadZone.

While testing the patch, I ran into a further issue. Apparently the existing mechanism hasn't really been working since the zones patch landed. My original testcase for all this was to run reftests and watch compartments accumulate. With zones, the maybeAlive and scheduledForDestruction flags moved to the zone. However, reftests uses a single zone for all the tests. So the flags aren't much use--the zone is alive, but many of its compartments are dead. As such, the number of compartments can grow really big while running reftests. I guess we were just sort of getting away with this somehow.

Anyway, I moved these flags back to the compartment. Since we can't always get to the compartment from a given GC thing, I had to make an approximation that a compartment is alive only if it has live objects or scripts. I think that's pretty much true. Even if the approximation is wrong, the only bad thing that will happen is that we'll try to do one of these non-incremental GCs on the given zone.

With these changes, the number of compartments seems to stay fairly steady during the first few hundred reftests.
Created attachment 8430208 [details] [diff] [review]
dead-compartments v2

This seems pretty green on try. Luke, you reviewed the previous changes related to this stuff. Jon, maybe you should look at it too.
Attachment #8429697 - Attachment is obsolete: true
Attachment #8430208 - Flags: review?(luke)
Attachment #8430208 - Flags: review?(jcoppeard)
Comment on attachment 8430208 [details] [diff] [review]
dead-compartments v2

Review of attachment 8430208 [details] [diff] [review]:
-----------------------------------------------------------------

It's nice to see AutoMaybeTouchDeadZones and AutoMarkInDeadZone go.

::: js/src/gc/Tracer.cpp
@@ +639,5 @@
>      Zone *zone = static_cast<Cell *>(thing)->tenuredZone();
>      if (zone->isCollecting()) {
> +        // See the comment on SetMaybeAliveFlag to see why we only do this for
> +        // objects and scripts.
> +        switch (kind) {

It might be worth adding a comment GCMarker::appendGrayRoot() only gets called on incremental GC (we don't buffer gray roots for non-incremental GC).  I don't think this matters because dead compartments never have their contents inadvertently marked live during non-incremental GC.
Attachment #8430208 - Flags: review?(jcoppeard) → review+

Comment 3

4 years ago
Comment on attachment 8430208 [details] [diff] [review]
dead-compartments v2

Review of attachment 8430208 [details] [diff] [review]:
-----------------------------------------------------------------

Much nicer

::: js/src/jsgc.cpp
@@ +3023,5 @@
>      for (CompartmentsIter c(rt, SkipAtoms); !c.done(); c.next()) {
>          for (JSCompartment::WrapperEnum e(c); !e.empty(); e.popFront()) {
> +            const CrossCompartmentKey &key = e.front().key();
> +            if (key.kind == CrossCompartmentKey::ObjectWrapper ||
> +                key.kind == CrossCompartmentKey::DebuggerObject)

I think you should additionally be able to get to the compartment of DebuggerScript, DebuggerEnvironment (environment = scope object), and maybe DebuggerSource.

@@ +4903,1 @@
>      } while (repeat);

Unrelated, but: it seems 'repeat' is only used for the while condition, so I think it'd be a tiny bit simpler to put the condition in the while() directly.

::: js/src/jswrapper.h
@@ -310,5 @@
> - * because they operate on all compartments, whether live or dead. A brain
> - * transplant can cause a formerly dead object to be "reanimated" by causing a
> - * read or write barrier to be invoked on it during the transplant. In this way,
> - * a zone becomes a zombie, kept alive by repeatedly consuming
> - * (transplanted) brains.

I will miss this comment.
Attachment #8430208 - Flags: review?(luke) → review+
https://hg.mozilla.org/mozilla-central/rev/313dee1cd228
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32

Updated

4 years ago
Depends on: 1022509
I had to back this out because of some WebRTC crashes.

https://hg.mozilla.org/integration/mozilla-inbound/rev/7831f5db9e25
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Once we verify that this makes the crashes go away, we'll want to uplift this backout to Aurora as well.
status-firefox32: --- → fixed
status-firefox33: --- → affected
Target Milestone: mozilla32 → ---
No crashes on inbound since this landed. Please request Aurora approval at your earliest convenience.
Comment on attachment 8430208 [details] [diff] [review]
dead-compartments v2

Are we supposed to request approval for backouts?

[Approval Request Comment]
Bug caused by (feature/regressing bug #): this one
User impact if declined: unknown
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): none
String or IDL/UUID changes made by this patch: none
Attachment #8430208 - Flags: approval-mozilla-aurora?
Comment on attachment 8430208 [details] [diff] [review]
dead-compartments v2

Fair point.
Attachment #8430208 - Flags: approval-mozilla-aurora?
Bug 1022235 is now fixed, which was the thing most heavily triggered by this change.  Note however that when this change landed, we started getting oranges without the load manager turned on; turning it on in bug 1022212 triggered bug 1022235.  So it must have been tripping up something else - possibly another incorrect lifetime, or a bug in this patch.

With bug 1022235 landed (and on Aurora), it should be easier to look into this.
In the hopes that this is all fixed now, I've relanded this.
https://hg.mozilla.org/integration/mozilla-inbound/rev/aca2ec2457ec
Sorry, looks like they're still there. Backed out.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3124e2c41202
https://hg.mozilla.org/mozilla-central/rev/aca2ec2457ec
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
when this was backed out we had a perf win on android 4.0 tsvgx tests.  Something to keep an eye out for when we land it again.
Blocks: 1008341
Terrence mentioned on IRC the other day that a GC bug got fixed that might have impacted this patch. I pushed it to try again and the results look better. I no longer see the nuking crashes that I remember from before (I think they were in mochitest-1). However, it looks like there are still some test failures that need addressing.

I need some advice on what to focus on. There's an OSX 10.8 opt xpcshell test failure that clearly needs to be fixed. Other than that, I'm a little unsure. There's one Linux x64 debug crashtest failure that looks like one of the WebRTC issues we ran into before (initially starred in bug 1018372). But it seems pretty rare. There's also some weird stuff that I'm not sure about.

Ryan, can you offer any help here? What failures need to be addressed before landing and what do you think we can live with? Despite the multiple backouts we've had on this bug, I still haven't found a bug in the patch. It just seems to be really good at turning up bugs elsewhere.

Here are my try pushes. Please feel free to retrigger as you see fit.
https://tbpl.mozilla.org/?tree=Try&rev=ebce8337a8ed
https://tbpl.mozilla.org/?tree=Try&rev=f7fb02279093
https://tbpl.mozilla.org/?tree=Try&rev=b38987fbab99
Flags: needinfo?(ryanvm)
My recollection was the issues mainly manifested in the first DOM crashtest after WebRTC (i.e. things going down the tubes during cleanup). I'll retrigger more crashtest runs, but if things look good there, I don't see any reason to hold back on this.

That said, this looks exactly like the kind of thing we were hitting before:
https://tbpl.mozilla.org/php/getParsedLog.php?id=45153067&tree=Try

Seems like if we think this is an issue with WebRTC shutdown, we need more debugging done with this patch applied.
Flags: needinfo?(ryanvm)
Target Milestone: mozilla33 → ---
Created attachment 8467996 [details] [diff] [review]
zombie-zones

Rebased version.
Attachment #8430208 - Attachment is obsolete: true
Attachment #8467996 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/906c4230aec8

The WebRTC stuff should be fixed. This landing may lead to an increase in a pre-existing (and already very frequent) 10.8 xpcshell failure. RyanVM told me that was okay. Hopefully, if it happens, it will make it easier to track the issue down.
Created attachment 8468360 [details] [diff] [review]
1016738-warning.patch

Introduced a build warning:

In file included from /home/ben/code/moz/inbound/js/src/jit/Ion.h:13:0,
                 from /home/ben/code/moz/inbound/js/src/jit/IonAllocPolicy.h:17,
                 from /home/ben/code/moz/inbound/js/src/jit/LIR.h:17,
                 from /home/ben/code/moz/inbound/js/src/jit/shared/Lowering-shared.h:13,
                 from /home/ben/code/moz/inbound/js/src/jit/shared/Lowering-x86-shared.h:10,
                 from /home/ben/code/moz/inbound/js/src/jit/x64/Lowering-x64.h:10,
                 from /home/ben/code/moz/inbound/js/src/jit/x64/Lowering-x64.cpp:7,
                 from /home/ben/code/moz/builds/d64/js/src/Unified_cpp_js_src7.cpp:2:
/home/ben/code/moz/inbound/js/src/jscompartment.h: In constructor ‘JSCompartment::JSCompartment(JS::Zone*, const JS::CompartmentOptions&)’:
/home/ben/code/moz/inbound/js/src/jscompartment.h:460:30: warning: ‘JSCompartment::jitCompartment_’ will be initialized after [-Wreorder]
     js::jit::JitCompartment *jitCompartment_;
                              ^
/home/ben/code/moz/inbound/js/src/jscompartment.h:456:10: warning:   ‘bool JSCompartment::scheduledForDestruction’ [-Wreorder]
     bool scheduledForDestruction;
          ^
In file included from /home/ben/code/moz/builds/d64/js/src/Unified_cpp_js_src7.cpp:93:0:
/home/ben/code/moz/inbound/js/src/jscompartment.cpp:37:1: warning:   when initialized here [-Wreorder]
 JSCompartment::JSCompartment(Zone *zone, const JS::CompartmentOptions &options = JS::CompartmentOptions())
 ^


I'll check in once inbound reopens.
Attachment #8468360 - Flags: review+
(forgot to say that r+ is from efaust on irc)

https://hg.mozilla.org/integration/mozilla-inbound/rev/01d12ccde2bb
(In reply to Bill McCloskey (:billm) from comment #23)
> The WebRTC stuff should be fixed.

The final patch didn't land yet? Oh well, guess we'll fast-track it at this point.
https://hg.mozilla.org/mozilla-central/rev/906c4230aec8
https://hg.mozilla.org/mozilla-central/rev/01d12ccde2bb
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.