810022 - Make nsCycleCollector::GCIfNeeded incremental somehow

Reporter

Description

•

12 years ago

Here's a profile:
http://people.mozilla.com/~bgirard/cleopatra/#report=a3fd87e4be28d74eef47c24de494c83988e7b700

This is taken on a profile with about 2 minutes of age and less then 5 tabs after running a JS heavy add-on+page. I was testing the profiler add-on on a fresh profile to be specific.

In the profile we spend about 100ms doing the full non-incremental GC, then 17 ms waiting on the CC CondVar.

Is it possible to run an IGC and run CC after instead?

Benoit Girard (:BenWa)

Reporter

Updated

•

12 years ago

Whiteboard: [Snappy], sps

Andrew McCreight [:mccr8]

Comment 1

•

12 years ago

We should look at telemetry to see how common this is (CYCLE_COLLECTOR_NEED_GC).

Assignee: general → nobody

Component: JavaScript Engine → XPCOM

Benoit Girard (:BenWa)

Reporter

Comment 2

•

12 years ago

Looks like about 0.001. I'm guessing that's 0.1%? Vladan's got some interesting chromehang data he's compiling regarding this.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 3

•

12 years ago

When I look at telemetry, it says CYCLE_COLLECTOR_NEED_GC is 0.0463%. If we do a CC every 10 seconds, then that means they happen every 6 hours on average. Of course, they might not be evenly distributed, which could make it a worse problem.

It probably wouldn't be hard to make an effort to schedule an IGC when UnmarkGray runs out of stack. We'd just have to ensure that we eventually make progress and run the cycle collector.

Andrew McCreight [:mccr8]

Comment 4

•

12 years ago

For completeness, what happens is that the GC marks objects that are reachable only from XPConnect roots gray. The CC only looks at gray objects. If a gray object is handed out to C++ or JS, then we don't know for sure that it will remain only reachable from XPConnect roots, so we mark it, and any gray object reachable from it, black.

This black marking process, known as UnmarkGray, is implemented recursively, so it can run out of stack space and gives up.  If it does, then we can't trust the gray bits any more, and need to run the GC again to reestablish the bits.

Andrew McCreight [:mccr8]

Comment 5

•

12 years ago

By contrast, TRANSPLANT GCs, which are being addressed in bug 803376, are 0.318% of all GCs.  Bill described TRANSPLANT GCs as being around 75% of the worst GC pauses. So these are about 10 times rarer.

Vladan Djeric (:vladan)

Comment 6

•

12 years ago

GC invoked by CC is the most common main-thread jank involving GCs over the past 3 months, according to chrome-hang data:

JSScript::markChildren(JSTracer *) (in mozjs.dll)
 -> js::gc::MarkUnbarriered<JSScript> (in mozjs.dll)
 -> fun_trace (in mozjs.dll)
 -> js::GCMarker::processMarkStackTop(js::SliceBudget &) (in mozjs.dll)
 -> js::GCMarker::drainMarkStack(js::SliceBudget &) (in mozjs.dll)
 -> GCCycle (in mozjs.dll)
 -> Collect (in mozjs.dll)
 -> js::GCFinalSlice(JSRuntime *,js::JSGCInvocationKind,js::gcreason::Reason) (in mozjs.dll)
 -> CCTimerFired (in xul.dll)

It's also the 12th most common chrome hang signature after 1) plugin hangs, 2) bug 790370, and 3) reflows.

Benoit Girard (:BenWa)

Reporter

Comment 7

•

12 years ago

I've noticed TRANSPLANT GCs in my profile as well. I can't say which is worse but the chromehang data above suggests that they're probably both worth solving.

Andrew McCreight [:mccr8]

Comment 8

•

12 years ago

Ah, okay, that's a slightly different thing than what is tracked by CYCLE_COLLECTOR_NEED_GC, so UnmarkGray isn't involved here. :)  Looking at GC_REASON_2, I think that is reason 15, which according to telemetry is 0.0207% of GCs (well, that includes both kinds of forced GCs).

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 9

•

12 years ago

(In reply to Andrew McCreight [:mccr8] from comment #8)
> Ah, okay, that's a slightly different thing than what is tracked by
> CYCLE_COLLECTOR_NEED_GC, so UnmarkGray isn't involved here. :)  Looking at
> GC_REASON_2, I think that is reason 15, which according to telemetry is
> 0.0207% of GCs (well, that includes both kinds of forced GCs).

I'm not sure what you mean. Since the GC in Benoit's profile had only one slice, I think it had to be caused by the UnmarkGray thing.

Andrew McCreight [:mccr8]

Comment 10

•

12 years ago

(In reply to Bill McCloskey (:billm) from comment #9)
> I'm not sure what you mean. Since the GC in Benoit's profile had only one
> slice, I think it had to be caused by the UnmarkGray thing.

I was referring to the hang detector stack data from comment 6, which seems more troubling than a pause from a single profile.

Vladan says the chrome hang detector triggers on hangs of 5 seconds or more.

I guess I could imagine that these anti-lockout GCs are going to happen more often when the system is overloaded already, so we're falling behind, resulting in long pauses, compared to Transplant GCs which probably aren't going to be more frequent when the system is loaded, so we won't get one that are particularly long.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

12 years ago

Could we fix unmarkgray? When we're running out of stack space, store the currently-being-unmarked
js objects to an array and unwind. Then restart unmarking from those objects.

Andrew McCreight [:mccr8]

Comment 12

•

12 years ago

I think we should split this into two bugs, for the two kinds of GC activity the CC can force, as this is getting a little confusing.

1. Due to unmarkGray overflow, the cycle collector can force a full GC in nsCycleCollector.cpp. This appears to be what BenWa was seeing in his profile, because that's slice 0 of a GC. http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsCycleCollector.cpp#2702

2. If the CC timer fires when the CC is locked out, then the CC forces the GC to finish its current GC, so it can work. This is apparently one of the top sources of 5+ second hangs in the browser, based on Vladan's data.

http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#3252

(There's also another place in CycleCollectNow that is similar to 2: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsJSEnvironment.cpp#3056 )

These are two completely separate problems, so we should have one bug for each. Since this bug has mostly been about problem (1), I'll leave this as that, and file a new one for problem (2).

Andrew McCreight [:mccr8]

Comment 13

•

12 years ago

Filed bug 810127 for problem 2.

Summary: Don't force a Full (non-incremental) GC before CC → Make nsCycleCollector::GCIfNeeded incremental somehow

Andrew McCreight [:mccr8]

Comment 14

•

12 years ago

Anyways, the "right" solution to UnmarkGray overflow is probably making it use the GC's marking path, though I seem to recall that would be annoying for some reason.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 15

•

12 years ago

I think an easy way to fix this would be to cancel any upcoming CC and schedule a full, incremental GC when we overflow the stack. We could keep a counter to ensure that we only do this once or something.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Andrew McCreight [:mccr8]

Updated

•

1 year ago

Component: XPCOM → Cycle Collector

Bugzilla

Quick Search

Make nsCycleCollector::GCIfNeeded incremental somehow

Categories

(Core :: Cycle Collector, defect)

Tracking

()

People

(Reporter: BenWa, Unassigned)

References

Details

(Whiteboard: [Snappy], sps)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Updated