Open Bug 810022 Opened 12 years ago Updated 1 year ago

Make nsCycleCollector::GCIfNeeded incremental somehow

Categories

(Core :: Cycle Collector, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: BenWa, Unassigned)

Details

(Whiteboard: [Snappy], sps)

Here's a profile:
http://people.mozilla.com/~bgirard/cleopatra/#report=a3fd87e4be28d74eef47c24de494c83988e7b700

This is taken on a profile with about 2 minutes of age and less then 5 tabs after running a JS heavy add-on+page. I was testing the profiler add-on on a fresh profile to be specific.

In the profile we spend about 100ms doing the full non-incremental GC, then 17 ms waiting on the CC CondVar.

Is it possible to run an IGC and run CC after instead?
Whiteboard: [Snappy], sps
We should look at telemetry to see how common this is (CYCLE_COLLECTOR_NEED_GC).
Assignee: general → nobody
Component: JavaScript Engine → XPCOM
Looks like about 0.001. I'm guessing that's 0.1%? Vladan's got some interesting chromehang data he's compiling regarding this.
When I look at telemetry, it says CYCLE_COLLECTOR_NEED_GC is 0.0463%. If we do a CC every 10 seconds, then that means they happen every 6 hours on average. Of course, they might not be evenly distributed, which could make it a worse problem.

It probably wouldn't be hard to make an effort to schedule an IGC when UnmarkGray runs out of stack. We'd just have to ensure that we eventually make progress and run the cycle collector.
For completeness, what happens is that the GC marks objects that are reachable only from XPConnect roots gray. The CC only looks at gray objects. If a gray object is handed out to C++ or JS, then we don't know for sure that it will remain only reachable from XPConnect roots, so we mark it, and any gray object reachable from it, black.

This black marking process, known as UnmarkGray, is implemented recursively, so it can run out of stack space and gives up.  If it does, then we can't trust the gray bits any more, and need to run the GC again to reestablish the bits.
By contrast, TRANSPLANT GCs, which are being addressed in bug 803376, are 0.318% of all GCs.  Bill described TRANSPLANT GCs as being around 75% of the worst GC pauses. So these are about 10 times rarer.
GC invoked by CC is the most common main-thread jank involving GCs over the past 3 months, according to chrome-hang data:

JSScript::markChildren(JSTracer *) (in mozjs.dll)
 -> js::gc::MarkUnbarriered<JSScript> (in mozjs.dll)
 -> fun_trace (in mozjs.dll)
 -> js::GCMarker::processMarkStackTop(js::SliceBudget &) (in mozjs.dll)
 -> js::GCMarker::drainMarkStack(js::SliceBudget &) (in mozjs.dll)
 -> GCCycle (in mozjs.dll)
 -> Collect (in mozjs.dll)
 -> js::GCFinalSlice(JSRuntime *,js::JSGCInvocationKind,js::gcreason::Reason) (in mozjs.dll)
 -> CCTimerFired (in xul.dll)

It's also the 12th most common chrome hang signature after 1) plugin hangs, 2) bug 790370, and 3) reflows.
I've noticed TRANSPLANT GCs in my profile as well. I can't say which is worse but the chromehang data above suggests that they're probably both worth solving.
Ah, okay, that's a slightly different thing than what is tracked by CYCLE_COLLECTOR_NEED_GC, so UnmarkGray isn't involved here. :)  Looking at GC_REASON_2, I think that is reason 15, which according to telemetry is 0.0207% of GCs (well, that includes both kinds of forced GCs).
(In reply to Andrew McCreight [:mccr8] from comment #8)
> Ah, okay, that's a slightly different thing than what is tracked by
> CYCLE_COLLECTOR_NEED_GC, so UnmarkGray isn't involved here. :)  Looking at
> GC_REASON_2, I think that is reason 15, which according to telemetry is
> 0.0207% of GCs (well, that includes both kinds of forced GCs).

I'm not sure what you mean. Since the GC in Benoit's profile had only one slice, I think it had to be caused by the UnmarkGray thing.
(In reply to Bill McCloskey (:billm) from comment #9)
> I'm not sure what you mean. Since the GC in Benoit's profile had only one
> slice, I think it had to be caused by the UnmarkGray thing.

I was referring to the hang detector stack data from comment 6, which seems more troubling than a pause from a single profile.

Vladan says the chrome hang detector triggers on hangs of 5 seconds or more.

I guess I could imagine that these anti-lockout GCs are going to happen more often when the system is overloaded already, so we're falling behind, resulting in long pauses, compared to Transplant GCs which probably aren't going to be more frequent when the system is loaded, so we won't get one that are particularly long.
Could we fix unmarkgray? When we're running out of stack space, store the currently-being-unmarked
js objects to an array and unwind. Then restart unmarking from those objects.
I think we should split this into two bugs, for the two kinds of GC activity the CC can force, as this is getting a little confusing.

1. Due to unmarkGray overflow, the cycle collector can force a full GC in nsCycleCollector.cpp. This appears to be what BenWa was seeing in his profile, because that's slice 0 of a GC. http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#2702

2. If the CC timer fires when the CC is locked out, then the CC forces the GC to finish its current GC, so it can work. This is apparently one of the top sources of 5+ second hangs in the browser, based on Vladan's data.

http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#3252

(There's also another place in CycleCollectNow that is similar to 2: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#3056 )

These are two completely separate problems, so we should have one bug for each. Since this bug has mostly been about problem (1), I'll leave this as that, and file a new one for problem (2).
Filed bug 810127 for problem 2.
Summary: Don't force a Full (non-incremental) GC before CC → Make nsCycleCollector::GCIfNeeded incremental somehow
Anyways, the "right" solution to UnmarkGray overflow is probably making it use the GC's marking path, though I seem to recall that would be annoying for some reason.
I think an easy way to fix this would be to cancel any upcoming CC and schedule a full, incremental GC when we overflow the stack. We could keep a counter to ensure that we only do this once or something.
Severity: normal → S3
Component: XPCOM → Cycle Collector
You need to log in before you can comment on or make changes to this bug.