Closed Bug 1648776 Opened 4 years ago Closed 3 years ago

HelperThreadPool should be able to process the tasks in shutdown

Categories

(Core :: XPCOM, task)

task

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox79 --- affected

People

(Reporter: allstars.chh, Unassigned)

References

Details

In ShutdownXPCOM, the shutdown sequence is:

  1. nsThreadManager::Shutdown 1
  2. nsCycleCollector_shutdown 2
  3. JS_Shutdown 3

and HelperThreadPool is shutdown before nsThreadManager::Shutdown 4 5, otherwise it won't have any thread to process the RunnableTask. 6

The problem is during CC shutdown (which will trigger GC) and JS shutdown, there will many off thread tasks needed to be executed, for example, GCParallelTask, and because HelperThread is already shutdown, those off thread tasks won't be handled.

So we'd like to have the HelperThreadPool could have some Threads to process the RunnableTask even after nsThreadManager::Shutdown (which also implies HelperThreadPool should shutdown after JS_Shutdown)

Kris has mentioned some 2-stage shutdown in https://bugzilla.mozilla.org/show_bug.cgi?id=1632825#c3

Could there be some way to allow only certain specific tasks to run by the time we get to the CC? If those tasks create anything cycle collected, they could cause intermittent leaks, which are a huge pain to figure out.

I fixed a similar issue in bug 1637890.

The flow was as follow:

mozilla::KillClearOnShutdown(ShutdownPhase::ShutdownThreads);
// Destroy nsThreadManager
mozilla::KillClearOnShutdown(ShutdownPhase::ShutdownFinal);
mozilla::dom::JSExecutionManager::Shutdown();
mozilla::AppShutdown::MaybeFastShutdown(mozilla::ShutdownPhase::ShutdownFinal); <-- App may exit here on some platforms
mozilla::services::Shutdown();
mozilla::KillClearOnShutdown(ShutdownPhase::ShutdownPostLastCycleCollection);
mozilla::AppShutdown::MaybeFastShutdown(mozilla::ShutdownPhase::ShutdownPostLastCycleCollection);

Quite a bit of code used ClearOnShutdown with default parameter (which is for ShutdownFinal) ; to clean up after themselves; which itself could cause tasks to be dispatched (such as if there are any open IPC communications).

Those tasks would not be run, and instead made them to leak (to avoid destroying them on the wrong thread).
To top it all, ClearOnShutdown would run the observers in the order they registered themselves. So the first component ever registered would be the first to be deallocated.

I modified things so that the thread manager isn't destroyed until ShutdownFinal has fully completed; to give a one last chance for all dispatch tasks to run, so it now looks like this:

mozilla::KillClearOnShutdown(ShutdownPhase::ShutdownThreads);
mozilla::KillClearOnShutdown(ShutdownPhase::ShutdownFinal);
mozilla::dom::JSExecutionManager::Shutdown();
mozilla::AppShutdown::MaybeFastShutdown(mozilla::ShutdownPhase::ShutdownFinal); <-- App may exit here on some platforms
// Destroy nsThreadManager and main thread
mozilla::services::Shutdown();
mozilla::KillClearOnShutdown(ShutdownPhase::ShutdownPostLastCycleCollection);
mozilla::AppShutdown::MaybeFastShutdown(mozilla::ShutdownPhase::ShutdownPostLastCycleCollection);

We still have the services shutdown happening after that; and that means many nsISupports object being destroyed after xpcom has shutdown (this is blocking me with bug 1648031). Those same objects were initialized
One core issue is that there's code calling GetCurrentSerialEventTarget() to know where to dispatch themselves; this method will return null after xpcom has shutdown ; there are no more running nsThread ; all we have left is a single PRThread

For the CC tasks; they are hopefully only running on the main thread so we could get away with that.

In all; it's a mess.

Ideally we would want to split the nsThreadManager from xpcom and be shutdown last. It doesn't need to be linked to xpcom it doesn't bear much relation to it.

I started this in https://phabricator.services.mozilla.com/D81084 ; where the nsThreadManager is initialised lazily (it is first used by the profiler; almost right at the start).
Shutdown is still happening within xpcom shutdown.

In all, this bug may be a dupe of bug 1648031 ; because as it comes now, the only way to fix bug 1648031 is to have the nsThreadManager exist from very early on, to the very last time.

See Also: → 1648031

(In reply to Andrew McCreight [:mccr8] from comment #1)

Could there be some way to allow only certain specific tasks to run by the time we get to the CC? If those tasks create anything cycle collected, they could cause intermittent leaks, which are a huge pain to figure out.

Leaking those on shutdown isn't too much of a worry.
And I think in some test conditions already, the AppShutdown::MaybeFastShutdown() will exit early.

And as I understand it, to save cost on the try run, the plan is to kill the binary even earlier.

(In reply to Jean-Yves Avenard [:jya] from comment #3)

Leaking those on shutdown isn't too much of a worry.
And I think in some test conditions already, the AppShutdown::MaybeFastShutdown() will exit early.

My concern is debug builds where we do leak checking, not regular browser behavior.

It doesn't seem like there's any one perfect solution to fix the shutdown ordering issues. Do we know just how often the thread pool tries to service tasks during shutdown? If it's not often would it be possible to perform shutdown work synchronously? I don't know a whole lot about the GC parallel jobs themselves, and looking at them it's not super clear to me how often they trigger during shutdown or if they could be done synchronously when the thread pool isn't available.

I agree with :jya that nsThreadManager could ultimately be shut down last. It could solve a few problems if we held off on shutting down threads as long as possible.

(In reply to Kris Wright :KrisWright from comment #5)

It doesn't seem like there's any one perfect solution to fix the shutdown ordering issues. Do we know just how often the thread pool tries to service tasks during shutdown? If it's not often would it be possible to perform shutdown work synchronously? I don't know a whole lot about the GC parallel jobs themselves, and looking at them it's not super clear to me how often they trigger during shutdown or if they could be done synchronously when the thread pool isn't available.

Most of the tasks dispatched during CC were related to the dom/media code using StateMirror to synchronise main thread attributes with off-mainthread mirrors. In the CC code, the attribute value is reset or clear ; which automatically dispatch a synchronisation task.

There was also some others closing their attached MessageChannel causing an IPC task to be dispatched.

I'm currently having a look at reworking the tail dispatcher in bug 1645785, could fix that as we go; like checking if we're in shutdown and give up early. AbstractThread::XPCOMThreadWrapper does that already.

(In reply to Andrew McCreight [:mccr8] from comment #4)

(In reply to Jean-Yves Avenard [:jya] from comment #3)

Leaking those on shutdown isn't too much of a worry.
And I think in some test conditions already, the AppShutdown::MaybeFastShutdown() will exit early.

My concern is debug builds where we do leak checking, not regular browser behavior.

That's what I was covering, under tests, MaybeFastShutdown will exit early, before the final CC ; so leak there won't be reported.

(In reply to Jean-Yves Avenard [:jya] from comment #7)

That's what I was covering, under tests, MaybeFastShutdown will exit early, before the final CC ; so leak there won't be reported.

Leak checking works by recording a log of the live objects that are late in shutdown, after the final CC, so exiting before the final CC will essentially disable leak checking, and the test harness will produce an error. It looks like fast shutdown is disabled in debug builds.

(In reply to Kris Wright :KrisWright from comment #5)

Do we know just how often the thread pool tries to service tasks during shutdown? If it's not often would it be possible to perform shutdown work synchronously? I don't know a whole lot about the GC parallel jobs themselves, and looking at them it's not super clear to me how often they trigger during shutdown

Yes, GC parallel tasks are used in shutdown GCs (and in all GCs). We've done a lot of work to make use of parallel threads where possible so I'd rather not do something that goes in the other direction and would increase the time taken by shutdown GCs.

(In reply to Andrew McCreight [:mccr8] from comment #1)

Could there be some way to allow only certain specific tasks to run by the time we get to the CC? If those tasks create anything cycle collected, they could cause intermittent leaks, which are a huge pain to figure out.

That sounds bad. Is there some way we can assert if we end up suspecting a cycle collected objects after the CC has been shut down?

We're trying to dispatch to TaskController now

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.