Closed Bug 1318384 Opened 8 years ago Closed 8 years ago

Improve dead compartment collection

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(8 files, 2 obsolete files)

At the end of an incremental GC we can run a new non-incremental GC to collect uncollected compartments that we think are really dead.  This can lead to slow GCs (bug 1311734).

There are a few things we can do to increase the accuracy of our prediction that a compartment is dead and to avoid keeping dead compartments alive unnecessarily.
Putting a WatchKey on the stack in markAll() results in its pre-write barrier being triggered in the destructor because its members are both Prebarriered<> wrappers.

This is called when tracing the runtime in a minor GC which can result in all these GC things getting marked if this happens during incremental marking.  This is unnecessary since we don't change the GC graph at all here.

The patch uses unbarriered JSObject* and jsid on the stack instead.
Attachment #8811824 - Flags: review?(jdemooij)
Attached patch ipc-changesSplinter Review
WrapperAnswer::RecvDropObject() will currently call ExposeObjectToActiveJS on the JSObject being dropped due to automatic barriers in JS::Heap<T>.  This can result in objects being marked in dead compartments.

The patch add IdToObjectMap::findPreserveColor that doesn't trigger the barrier and uses that in RecvDropObject.  I adopted the 'preserve color' terminology used in the rest of the browser.
Attachment #8811826 - Flags: review?(wmccloskey)
Attachment #8811826 - Flags: review?(wmccloskey) → review+
Attachment #8811824 - Flags: review?(jdemooij) → review+
Keywords: leave-open
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8ed8747b33e0
Don't trigger pre barrier in WatchpointMap::markAll r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/50b7f70fbd5e
Don't trigger pre barrier in WrapperAnswer::RecvDropObject r=billm
Attached patch use-fresh-arenas-while-marking (obsolete) — Splinter Review
At the moment there is the following comment in beginMarkPhase():

     /*
      * At the end of each incremental slice, we call prepareForIncrementalGC,
      * which marks objects in all arenas that we're currently allocating
      * into. This can cause leaks if unreachable objects are in these
      * arenas. This purge call ensures that we only mark arenas that have had
      * allocations after the incremental GC started.
      */

So the purge() call means that the we don't do any more allocation into the current arena for each thing kind in this GC, to avoid marking all objects in that arena.  We call delayMarkingArena() to mark the whole arena in prepareForIncrementalGC() but also when we allocate a new arena.  What this means is that if we start allocating into any partially full arena during an incremental GC we mark all the existing cells, which can cause the leak described above and also mark things in compartments that we think are dead.  So the purge() call above fixes this for this current arena but not for any subsequently allocated ones.

This is a patch to stop allocation in to any existing partially full arena during incremental marking.  At the start of GC we advance the arena list cursor to the end of the list so that all existing arenas are considered full and any subsequent allocation will allocate fresh arenas.  This way we don't have to mark any existing things.

I used the name prepareForIncrementalGC() for this because I thought it made more sense, and made the existing use of this just call purge() which is all that is required there.

The drawback of this is that it will lead to more fragmentation, but that should be handled by compacting.

I tested the JS browser ref tests and found this results in fewer windows being kept alive and also fewer dead compartment GCs.
Attachment #8812231 - Flags: review?(wmccloskey)
Attached patch refactor-repeat-for-dead-zone (obsolete) — Splinter Review
This patch factors out the check for whether we should re-run the GC, and also makes sure only collect those zones containing dead compartments.
Attachment #8812236 - Flags: review?(wmccloskey)
Comment on attachment 8812231 [details] [diff] [review]
use-fresh-arenas-while-marking

Review of attachment 8812231 [details] [diff] [review]:
-----------------------------------------------------------------

Can you remind me what prepareForIncrementalGC currently does? If we purge at the start of marking and we always set allocatedDuringIncremental for arenas we allocate into after that, then why is it needed?

::: js/src/jsgc.cpp
@@ +5456,5 @@
>  
>      gcstats::AutoPhase ap(stats, gcstats::PHASE_COMPACT);
>  
> +    for (CompartmentsIter c(rt, WithAtoms); !c.done(); c.next())
> +        c->scheduledForDestruction = false;

What's this for? It seems like it disables the COMPARTMENT_REVIVED thing when we do a compacting GC.
Comment on attachment 8812236 [details] [diff] [review]
refactor-repeat-for-dead-zone

Review of attachment 8812236 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +5492,5 @@
>      MOZ_ASSERT(startedCompacting);
>  
>      gcstats::AutoPhase ap(stats, gcstats::PHASE_COMPACT);
>  
> +    // This data is out of date and we don't recompute it here.

I'm afraid I still don't understand this. Won't this clear the flag before we have a chance to look at it in shouldRepeatForDeadZone?

@@ +6314,5 @@
> +    if (!isIncremental || isIncrementalGCInProgress())
> +        return false;
> +
> +    bool shouldRepeat = false;
> +    for (ZonesIter zone(rt, SkipAtoms); !zone.done(); zone.next()) {

Can you just use a CompartmentIter here?

@@ +6328,5 @@
> +        return false;
> +
> +    // Schedule only zones containing dead compartments for collection.
> +    for (ZonesIter zone(rt, SkipAtoms); !zone.done(); zone.next()) {
> +        zone->unscheduleGC();

If the GC was reset, could we end up here and see scheduledForDestruction compartments and then unschedule most of the zones for GC?
> Can you remind me what prepareForIncrementalGC currently does? If we purge at the start of marking and we always set allocatedDuringIncremental for arenas we allocate into after that, then why is it needed?

I don't think it is needed.  As you say, already handle this by setting allocatedDuringIncremental.  I think this may have been required when we had a separate copy of the free list that need to be synced at this point.

> > +    for (CompartmentsIter c(rt, WithAtoms); !c.done(); c.next())
> > +        c->scheduledForDestruction = false;
>
> What's this for? It seems like it disables the COMPARTMENT_REVIVED thing when we do a compacting GC.

Oh I didn't mean to leave that in.  Removed.

BTW I'm testing this patch on AWSY to see if it has any negative effects on memory consumption.
Attachment #8812231 - Attachment is obsolete: true
Attachment #8812231 - Flags: review?(wmccloskey)
Attachment #8812855 - Flags: review?(wmccloskey)
You're right.  It's probably not a good idea to mess with the scheduled compartments so I changed the way this worked to only select zones with dead compartments in when we start the GC, if the reason is COMPARTMENT_REVIVED.
Attachment #8812858 - Flags: review?(wmccloskey)
Attachment #8812236 - Attachment is obsolete: true
Attachment #8812236 - Flags: review?(wmccloskey)
This is a patch to add a new AbortReason for these compartment revived GCs.  This will make them show up as separate from NonIncrementalRequested GCs in the GC_NON_INCREMENTAL_REASON telemetry.
Attachment #8812859 - Flags: review?(wmccloskey)
Attachment #8812855 - Flags: review?(wmccloskey) → review+
Comment on attachment 8812858 [details] [diff] [review]
refactor-repeat-for-dead-zone

Review of attachment 8812858 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.cpp
@@ +6347,2 @@
>              JS::PrepareForFullGC(rt->contextFromMainThread());
> +        } else if (shouldRepeatForDeadZone()) {

I still have the same essential concern as the last patch. It would allay my concern if you added |&& !wasReset| here.
Attachment #8812858 - Flags: review?(wmccloskey) → review+
Comment on attachment 8812859 [details] [diff] [review]
add-nonincremental-reason

Review of attachment 8812859 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.h
@@ +60,5 @@
>      D(ModeChange) \
>      D(MallocBytesTrigger) \
>      D(GCBytesTrigger) \
> +    D(ZoneChange) \
> +    D(CompartmentRevived)

Can you also update the docs here?
http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/docs/data/main-ping.rst#403
Attachment #8812859 - Flags: review?(wmccloskey) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7cd44d064d92
Always allocate into fresh arenas while marking r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/fdc0f9d0f338
Don't collect all scheduled zones when repeating a GC to collect dead compartments r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/45b237733615
Add AbortReason::CompartmentRevived so repeated GCs don't get counted under NonIncrementalRequested r=billm
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e453f2aaef94
Update telemetry docs with new reason code DONTBUILD
Thanks for all the reviews.  Just a few more patches.

This one changes the way cross compartment edges are used to calculate maybeAlive, by propagating the flag transitively from the initial set of maybeAlive compartments, so cross compartment edges only mark the target compartment live if the source compartment is considered live.

This should mean we will find more dead compartments to collect.
Attachment #8813320 - Flags: review?(wmccloskey)
Another source of revived compartments I found was nuking proxies.  This happens because this clears their extra slots which triggers the pre barrier if we are currently marking.  I think we should probably just not do this - the important thing about nuking is that the link to the proxy target is cut, and nuked proxies are probably dead anyway.

This patch just the clearing of extra slots.  They are inaccessible in the nuked proxy, but still traced.
Attachment #8813324 - Flags: review?(wmccloskey)
The final problem I found was this typeDescrObjects set in the zone.  This is a weakly marked set of typed object descriptor objects (compacting GC needs to know about them).  Currently this uses HeapPtr which has pre- and post-barries.  What's happening is that this table can be resized while we're marking which then triggers barriers when copying the table contents.

It would be great if we could somehow not trigger pre-barriers when resizing hash tables since this doesn't change the GC graph at all, but I couldn't figure out any nice way of doing that.

But these barriers are unnecessary for this set since the objects are always tenured and the pointers are weak so it seemed simpler just to remove them.
Attachment #8813325 - Flags: review?(wmccloskey)
Comment on attachment 8813320 [details] [diff] [review]
improve-mark-compartments

Review of attachment 8813320 [details] [diff] [review]:
-----------------------------------------------------------------

r+, but I'm worried about the non-incremental question below, so please don't that part land unless I'm being stupid.

::: js/src/gc/RootMarking.cpp
@@ +474,5 @@
>      MOZ_ASSERT(grayBufferState == GrayBufferState::Unused);
>      for (GCZonesIter zone(rt); !zone.done(); zone.next())
>          MOZ_ASSERT(zone->gcGrayRoots.empty());
>  
> +    gcstats::AutoPhase ap3(stats, gcstats::PHASE_BUFFER_GRAY_ROOTS);

Maybe just call it |ap|.

::: js/src/jsgc.cpp
@@ +3916,2 @@
>          bufferGrayRoots();
> +        markCompartments();

Why do this for non-incremental GCs? If we do a non-incremental GC and still fail to collect a "dead" compartment, then another non-incremental COMPARTMENT_REVIVED GC seems unlikely to fix that. We could even end up GCing in an endless loop.

@@ -3963,5 @@
>      }
>  
> -    /*
> -     * For black roots, code in gc/Marking.cpp will already have set maybeAlive
> -     * during MarkRuntime.

Instead of this, could put something at the top of the function that says maybeAlive is set either in beginMarkPhase or in RootMarking.cpp?
Attachment #8813320 - Flags: review?(wmccloskey) → review+
Comment on attachment 8813324 [details] [diff] [review]
dont-clear-proxy-extras

Review of attachment 8813324 [details] [diff] [review]:
-----------------------------------------------------------------

Perhaps mention something in the comment about the extra slot not being able to cross compartments, so we can't cause the target compartment to leak because we failed to sever this edge.
Attachment #8813324 - Flags: review?(wmccloskey) → review+
Comment on attachment 8813325 [details] [diff] [review]
remove-type-descr-barriers

Review of attachment 8813325 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to trust you on this one since I don't feel like wading through all these templates :-). Sounds good though.

::: js/src/gc/Zone.h
@@ +356,5 @@
>      // This is used by the GC to trace them all first when compacting, since the
>      // TypedObject trace hook may access these objects.
> +    //
> +    // There are no barriers here - the set contains only tenured objects so no
> +    // post-barrier is requried, and these are weak references so no pre-barrier

missing a letter in required
Attachment #8813325 - Flags: review?(wmccloskey) → review+
(In reply to Bill McCloskey (:billm) from comment #20)

> Why do this for non-incremental GCs? If we do a non-incremental GC and still
> fail to collect a "dead" compartment, then another non-incremental
> COMPARTMENT_REVIVED GC seems unlikely to fix that. We could even end up
> GCing in an endless loop.

You're right, and we don't do this for non-incremental GCs.  Currently we call markCompartments() for all GCs but with the patch we only do it for incremental GCs.  We don't do COMPARTMENT_REVIVED GCs for non-incremental GCs because we check whether we're at the end of an incremental GC in shouldRepeatForDeadZone().

> Instead of this, could put something at the top of the function that says
> maybeAlive is set either in beginMarkPhase or in RootMarking.cpp?

Good idea, I'll update the comments.
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dd83ff2a1d85
Improve accuracy of code to find dead compartments r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/ace95a181f29
Don't clear extra slots when nuking a proxy to avoid reviving dead compartments r=billm
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1a5a845e294
Remove barriers on Zone typed object descriptors set to avoid reviving compartments r=billm
(In reply to Jon Coppeard (:jonco) from comment #24)
> (In reply to Bill McCloskey (:billm) from comment #20)
> 
> > Why do this for non-incremental GCs? If we do a non-incremental GC and still
> > fail to collect a "dead" compartment, then another non-incremental
> > COMPARTMENT_REVIVED GC seems unlikely to fix that. We could even end up
> > GCing in an endless loop.
> 
> You're right, and we don't do this for non-incremental GCs.  Currently we
> call markCompartments() for all GCs but with the patch we only do it for
> incremental GCs.  We don't do COMPARTMENT_REVIVED GCs for non-incremental
> GCs because we check whether we're at the end of an incremental GC in
> shouldRepeatForDeadZone().

Sorry, I somehow read things backwards. You're right of course.
Status: NEW → RESOLVED
Closed: 8 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Here's some GC_MAX_PAUSE data that shows the effect of this change:
https://mzl.la/2gEUX7C
The mean doesn't change much, but the 95th percent drops from around 250..300 to around 180.
Comment on attachment 8813324 [details] [diff] [review]
dont-clear-proxy-extras

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Uplift bug 1354294 depends on this patch, and *that* patch is causing major crashes.
Fix Landed on Version: m-c
Risk to taking this patch (and alternatives if risky): I think low. I'm not the author but the patch is a self-contained change to GC behavior of nuked wrappers -- no observable behavior of the wrappers are changed.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8813324 - Flags: approval-mozilla-esr52?
Comment on attachment 8813324 [details] [diff] [review]
dont-clear-proxy-extras

ok let's take this in 52esr so we can fix the lastpass crash in 1354294
Attachment #8813324 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment on attachment 8813324 [details] [diff] [review]
dont-clear-proxy-extras

This one patch landed in esr52, but we don't want to call this bug fixed there, so I'm clearing the approval flag so this doesn't show up on "uplift approved but not landed" queries.
Attachment #8813324 - Flags: approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: