Closed Bug 1764251 Opened 2 years ago Closed 2 years ago

Avoid late creation of content processes

Categories

(Core :: DOM: Content Processes, task)

task

Tracking

()

RESOLVED FIXED
101 Branch
Tracking Status
firefox101 --- fixed

People

(Reporter: jstutte, Assigned: jstutte)

References

Details

Attachments

(3 files, 1 obsolete file)

We have a series of issues caused by:

  • the late creation of content processes (see bug 1632740)
  • inconsistencies in our shutdown blocker handling for launching processes (see bug 1762299)

In the time span between creating a new process and receiving the ok from the created process we might have entered shutdown in the meantime. We want to:

  • add the shutdown blockers early enough to avoid races
  • be more correct about until when we allow the shutdown blocker creation
Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Blocks: 1632740
Blocks: 1762299
See Also: → 1764119, 1696771, 1761182
Blocks: 1764181
Attachment #9271854 - Attachment is obsolete: true

(In reply to Jens Stutte [:jstutte] from comment #4)

Bug 1764251: Avoid race between process creation callback and application shutdown. r?smaug

My strategy here would be to land first this patch (and the base patch) to see, what it helps with. My assumption is that this could already reduce the noise from late-launching processes significantly.

Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/62a79239983a
Ensure that ContentParent::BlockShutdown reacts well on different IPC channel states. r=smaug
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
See Also: → 1742677
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/95cee3ef708c
Avoid race between process creation callback and application shutdown. r=smaug

So actually the change that caused this was from https://phabricator.services.mozilla.com/D143710 adding the ShutDownProcess(CLOSE_CHANNEL); being called on the main thread, but this patch made it much more likely that this code path is actually ever hit, because we can get here during process launch now. It also shows that the case is not just theoretical...

I think that ContentParent::ShutDownProcess should be hardened against being called from the wrong thread such that it dispatches the CLOSE_CHANNEL to the right thread here? Any other suggestions?

Edit: Actually the mWorker the assertion is talking about is private to the MessageChannel, such that we cannot really access it for direct dispatch. We would probably need a CloseFromAnyThread()variant on MessageChannel itself ?

Flags: needinfo?(jstutte) → needinfo?(bugs)

(In reply to Jens Stutte [:jstutte] from comment #10)

Edit: Actually the mWorker the assertion is talking about is private to the MessageChannel, such that we cannot really access it for direct dispatch. We would probably need a CloseFromAnyThread()variant on MessageChannel itself ?

This is probably more a question to :nika.

Flags: needinfo?(nika)

I don't quite understand the assertion failure. We've called Close(CLOSE_CHANNEL) forever...
oh, ActorDestroy sets mCalledClose = true;
This setup is so fragile.

Flags: needinfo?(bugs)

(In reply to Jens Stutte [:jstutte] from comment #10)

I think that ContentParent::ShutDownProcess should be hardened against being called from the wrong thread such that it dispatches the CLOSE_CHANNEL to the right thread here? Any other suggestions?

All ContentParent instances are fundamentally bound to the main thread, and all methods on it should be assumed to be main thread only unless otherwise noted.

Edit: Actually the mWorker the assertion is talking about is private to the MessageChannel, such that we cannot really access it for direct dispatch. We would probably need a CloseFromAnyThread()variant on MessageChannel itself ?

All actors are fundamentally thread-unsafe so any use of them from a thread other than the thread they were bound to is incorrect. Adding a CloseFromAnyThread() wouldn't make sense, because you can't actually close a MessageChannel from any thread.

The fact that MessageChannel internally contains locks is mostly an implementation detail. Unless you're part of the implementation you shouldn't be holding a reference to one from a thread other than the bound thread.

The error here is actually because you're calling Close() on a channel which was never actually opened, because process launch never finished. You'll need to make the code able to handle failing before being opened properly, because you can't rely on things like ActorDestroy being called if the actor is never actually opened.

Flags: needinfo?(nika)

OK, it makes sense I cannot destroy something that never fully initialized... And looking a bit closer I discovered, that we already have mLifecycleState that can tell us, in which state we actually are.

My guess how to solve this would then be:

  • If we cannot send in BlockShutdown, check if we are IsLaunching(). In case directly move to state LifecycleState::DEAD.
    • This will make us bail out with ShutDownProcess(SEND_SHUTDOWN_MESSAGE); immediately in LaunchSubprocessResolve...
    • ...and kick off all our normal cleanup, including removal of shutdown blockers.
  • If instead we are already dead in BlockShutdown, just assert and let go (assuming that we already kicked off the shutdown sequence), as I would not expect this to ever happen.

I'll prepare an additional patch for this. - Edit: Maybe I can just incorporate this in the backed out patch.

(In reply to Jens Stutte [:jstutte] from comment #14)

  • If instead we are already dead in BlockShutdown, just assert and let go (assuming that we already kicked off the shutdown sequence), as I would not expect this to ever happen.

Thinking a bit more about this case, I think we should just ignore it, assuming that we race with an already ongoing shutdown of the child.

See Also: → 1765732
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10519640b4b3
Avoid race between process creation callback and application shutdown. r=smaug
Regressions: 1765445
No longer regressions: 1765445
Regressions: 1765822
Regressions: 1766016
Regressions: 1765956
Keywords: leave-open
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac88a47b070f
Substitute sCanLaunchSubprocesses with AppShutdown::IsInOrBeyond and add shutdown checks to BeginSubprocessLaunch and ContentProcessManager singleton creation. r=smaug,jesup
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
See Also: → 1766572
See Also: → 1845778
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: