Open Bug 1966165 Opened 1 year ago Updated 1 year ago

Check if the CCGCScheduler could have some more shutdown checks

Categories

(Core :: DOM: Core & HTML, task)

task

Tracking

()

People

(Reporter: jstutte, Unassigned)

References

(Blocks 1 open bug)

Details

We probably do not need to GC/CC during shutdown in content processes. The scheduler could avoid to schedule new IdleTaskRunner instances once we entered shutdown and/or do a shutdown check in its payload before running it.

(The component might be wrong, maybe it is just DOM Core ?)

Either way is probably okay.

Component: Cycle Collector → DOM: Core & HTML

There is a non-trivial amount of ~5% of shutdown kills in bug 1279293 happening while things run by IdleTaskRunner are running which seem to all be CC/GC things.

It looks like we already call sScheduler->Shutdown(); on "content-child-will-shutdown".

(In reply to Andrew McCreight [:mccr8] from comment #4)

It looks like we already call sScheduler->Shutdown(); on "content-child-will-shutdown".

Yeah, I was looking at that, too. And KillAllTimersAndRunners is supposed to cancel also already scheduled IdleTaskRunner. That could mean that some IdleTaskRunner did already fire when we entered shutdown and it is that very same run that blocks the main thread, preventing us from getting "content-child-will-shutdown" ? Not sure if we can do much more then (if we do not want to add some AppShutdown::IsShutdownImpending checks inside some loop to bail out).

It feels a bit surprising that we would see +20s GC/CC, but maybe we want to traverse memory that has been swapped into the pagefile? FWIW, at least in one of the Windows reports I see from the annotations that on a machine with 16GB RAM and a ~46GB page file only we have only 10GB page file and 1,5GB RAM left, which sounds like a bad situation...

Summary: The CCGCScheduler could have some more shutdown checks → Check if the CCGCScheduler could have some more shutdown checks

In terms of shutdown checks things are probably fine as is. I wonder though, if there should be more checks for the given CC/GC budget somewhere inside nested loops, it feels wrong in general that we can end up with such long times? Obviously on already completely clogged machines there is not much we can do, though...

(In reply to Jens Stutte [:jstutte] from comment #7)

In terms of shutdown checks things are probably fine as is. I wonder though, if there should be more checks for the given CC/GC budget somewhere inside nested loops, it feels wrong in general that we can end up with such long times? Obviously on already completely clogged machines there is not much we can do, though...

Summarizing from a chat with :smaug:

We cannot have budget checks everywhere as those aren't cheap enough for timestamp overhead. It might be easier (less invasive) and cheaper in terms of computational effort to add some AppShutdown::IsShutdownImpending here and there. That atomic flag is set on the IPC thread in content processes and thus is observable earliest possible.
In alternative, rather than avoiding the hang we might want to find ways to detect and kill earlier content processes that hang in such way, if that is ever feasible.

We do have frequent budget checks. They don't use timestamps. Not all GC/CCs are incremental, however.

At some point there was work on making GCs interruptible via something like an atomic flag, but it was causing too many soundness issues, IIRC, so I don't think we'd want to bring it back just for shutdown.

In alternative, rather than avoiding the hang we might want to find ways to detect and kill earlier content processes that hang in such way, if that is ever feasible.

That's basically what these crashes already are.

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #10)

I think we do have the interruptible thingie for GC
https://searchfox.org/mozilla-central/rev/3d294b119bf2add880f615a0fc61a5d54bcd6264/dom/base/CCGCScheduler.cpp#582,590

Interesting, though I wonder if a relaxed atomic flip from a different thread will immediately be observable by the main thread?

You need to log in before you can comment on or make changes to this bug.