Closed Bug 1616018 Opened 4 years ago Closed 3 years ago

Avoid running the garbage collector in content processes that are shutting down

Categories

(Core :: DOM: Content Processes, enhancement, P2)

enhancement

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: gsvelto, Unassigned)

References

(Blocks 2 open bugs)

Details

Crash Data

While looking at various content child shutdown kills in bug 1279293 I noticed that quite a few of them were running garbage-collection cycles. Since content process shutdown has to be as fast as possible I was wondering if it's possible to reduce the number of GCs.

Aborting current iterations via JS::AbortIncrementalGC() might also be a good idea.

Content process shutdown can be detected by listening to the "content-child-will-shutdown" notification.

Could you give some examples of these crashes? I'm curious what kinds of GCs are being run.

Flags: needinfo?(gsvelto)

So I guess https://searchfox.org/mozilla-central/rev/df94cd5ba431234bc220ac081def0801fe44b89e/dom/base/nsJSEnvironment.cpp#2663
could observe also content-child-will-shutdown and abort incremental GC in https://searchfox.org/mozilla-central/rev/df94cd5ba431234bc220ac081def0801fe44b89e/dom/base/nsJSEnvironment.cpp#382
(Though I don't quite understand why, since I assume there can be other GCs during shutdown, and they can be slow if there is lots of garbage to collect.)

Yeah, I'm wondering how many of these GCs are long running GCs that happen to be in progress (or maybe somebody closes a tab that is hanging because of a GC) compared to how many might be happening because we do some weird GC during shutdown for some reason. There's a lot of the latter, but I don't know how much of an issue they cause.

Here's two signatures where we're doing a GC during shutdown:

Generally speaking I was just looking for ways to minimize GC activity during shutdown knowing that we're going to kill the process anyway if shutdown takes too long.

It could also help with responsiveness on machines that don't have much memory: over 20% of the shutdown kill reports have less than 512MiB of free physical memory so the process that's being shut down might have been partially paged out. The less memory we touch in that process the less likely we page-in stuff that we know we won't use anymore (and possibly page-out stuff that we still need in other processes).

Flags: needinfo?(gsvelto)

The stacks for the first signature look like regular GC work happening off an idle callback. The second one looks interesting because we're waiting for the GC helper threads to come back. I think we've had issues with that taking a while in the past.

It looks like Jon added the mechanism to interrupt GCs in bug 1149752. He might have some ideas about how to improve this.

Flags: needinfo?(jcoppeard)

Should this bug move to the JavaScript: GC component? Does the GC know when a process is shutting down? Or is this bug about adding such a notification?

Priority P2 because IPCError_ShutDownKill meta bug is P2.

Priority: -- → P2

Maybe we could make nsJSEnvironment listen for content-child-will-shutdown and kill the GC timers? I think that will keep us from running the idle task stuff. Maybe we could even stop running any idle tasks once we get that. If we're getting ready to shut down, arguably we're never really idle!

Depends on: 1619455
See Also: → 1619213
Crash Signature: [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop ] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask ] [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren ]
Depends on: 1620623
Crash Signature: [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop ] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask ] [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren ] → [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren] [@ IPCError-browser | ShutDo…

CCGraphBuilder::NoteJSChild is the cycle collector, not the garbage collector. I'm not sure if we want to bundle them together.

Crash Signature: [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren] [@ IPCError-browser | ShutDo… → [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren]

Sorry, I assumed the two would always run together.

On the main thread, running one will typically schedule running the other, so they are linked, but they don't usually run one right after the other. (On workers, they do run together.) The changes I made in nsJSEnvironment to not schedule stuff when we're shutting down will affect both GCs and CCs, but other work, like interrupting a GC in progress, will be specific to one or the other.

Added another signature, maybe it would be worth duplicating bug 1619213 against this because it also seems to track GC-related operations during shutdown.

Crash Signature: [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren] → [@ IPCError-browser | ShutDownKill | js::GCMarker::eagerlyMarkChildren] [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | Sh…

Yes, it looks like the same thing (though the recent crashes with signatures in that bug didn't look GC-related...)

Crash Signature: ShutDownKill | js::BaseScript::traceChildren] → ShutDownKill | js::BaseScript::traceChildren] [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop]

Added a signature.

Crash Signature: [@ IPCError-browser | ShutDownKill | js::GCMarker::eagerlyMarkChildren] [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | Sh… → [@ IPCError-browser | ShutDownKill | js::BaseScript::traceChildren] [@ IPCError-browser | ShutDownKill | js::GCMarker::eagerlyMarkChildren] [@ IPCError-browser | ShutDownKill | js::GCMarker::processMarkStackTop] [@ IPCError-browser | ShutDownKill | js:…

Added another signature.

Crash Signature: js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | ShutDownKill | js::TraceManuallyBarrieredGenericPointerEdge] → js::GCParallelTask::joinRunningOrFinishedTask] [@ IPCError-browser | ShutDownKill | js::GlobalHelperThreadState::wait] [@ IPCError-browser | ShutDownKill | js::TraceManuallyBarrieredGenericPointerEdge]

Did this get sufficiently better since Bug 1619455 and Bug 1620623 were resolved? Or is it worth while raising a flag with the JS engine itself to tell it not to bother GCing? Since it has internal triggers based on how much has been allocated.

Flags: needinfo?(gsvelto)

I looked at some of the residual crashes with these signatures before, and if I recall correctly there were no IPC shutdown state annotations, so I think the content process was too busy to even notice that it was told to shut down (which would happen when we run ContentChild::RecvShutdown()).

For instance, in this one I checked just now it looks like it is busy running a GC in ContentChild::RecvFlushMemory bp-f6486beb-bcd1-42aa-8b56-2faeb0210101

I second Andrew's comment, the vast majority of the remaining crashes have no IPCShutdownState annotation so the affected content process didn't even notice it was being shutdown. I think we can mark it as resolved, the remaining volume is very small anyway.

Flags: needinfo?(gsvelto)
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jcoppeard)
Resolution: --- → WORKSFORME

I could imagine a world where we use the background hang monitor to tell a child process to shut down even when the main thread is hanging, by interrupting the GC or JS or something, but that's a bunch of different work I think.

You need to log in before you can comment on or make changes to this bug.