Closed Bug 1764251 Opened 2 years ago Closed 2 years ago

Avoid late creation of content processes

Tracking

()

Status:

RESOLVED FIXED

Milestone:

101 Branch

Tracking Flags:

Tracking

Status

firefox101

---

fixed

People

(Reporter: jstutte, Assigned: jstutte)

References

Details

Attachments

(3 files, 1 obsolete file)

Bug 1764251: Avoid race between process creation callback and application shutdown. r?smaug 2 years ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1764251: Substitute sCanLaunchSubprocesses with AppShutdown::IsInOrBeyond and add shutdown checks to BeginSubprocessLaunch and ContentProcessManager singleton creation. r=smaug 2 years ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1764251: Ensure that ContentParent::BlockShutdown reacts well on different IPC channel states. r?smaug 2 years ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1764251: Avoid race between process creation callback and application shutdown. r?smaug 2 years ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review

Jens Stutte [:jstutte]

Assignee

Description

•

2 years ago

We have a series of issues caused by:

the late creation of content processes (see bug 1632740)
inconsistencies in our shutdown blocker handling for launching processes (see bug 1762299)

Jens Stutte [:jstutte]

Assignee

Comment 1

•

2 years ago

Attached file Bug 1764251: Avoid race between process creation callback and application shutdown. r?smaug (obsolete) — Details

In the time span between creating a new process and receiving the ok from the created process we might have entered shutdown in the meantime. We want to:

add the shutdown blockers early enough to avoid races
be more correct about until when we allow the shutdown blocker creation

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → jstutte

Status: NEW → ASSIGNED

Jens Stutte [:jstutte]

Assignee

Comment 2

•

2 years ago

Attached file Bug 1764251: Substitute sCanLaunchSubprocesses with AppShutdown::IsInOrBeyond and add shutdown checks to BeginSubprocessLaunch and ContentProcessManager singleton creation. r=smaug — Details

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Blocks: 1632740

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Blocks: 1762299

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Updated

•

2 years ago

Blocks: 1764181

Phabricator Automation

Updated

•

2 years ago

Attachment #9271854 - Attachment is obsolete: true

Jens Stutte [:jstutte]

Assignee

Comment 3

•

2 years ago

Attached file Bug 1764251: Ensure that ContentParent::BlockShutdown reacts well on different IPC channel states. r?smaug — Details

Jens Stutte [:jstutte]

Assignee

Comment 4

•

2 years ago

Attached file Bug 1764251: Avoid race between process creation callback and application shutdown. r?smaug — Details

Depends on D143710

Jens Stutte [:jstutte]

Assignee

Comment 5

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #4)

Bug 1764251: Avoid race between process creation callback and application shutdown. r?smaug

My strategy here would be to land first this patch (and the base patch) to see, what it helps with. My assumption is that this could already reduce the noise from late-launching processes significantly.

Pulsebot

Comment 6

•

2 years ago

Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/62a79239983a
Ensure that ContentParent::BlockShutdown reacts well on different IPC channel states. r=smaug

Sandor Molnar[:smolnar]

Comment 7

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/62a79239983a

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox101: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 101 Branch

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Status: RESOLVED → REOPENED

Keywords: leave-open

Resolution: FIXED → ---

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

status-firefox101: fixed → affected

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Comment 8

•

2 years ago

Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/95cee3ef708c
Avoid race between process creation callback and application shutdown. r=smaug

Atila Butkovits

Comment 9

•

2 years ago

Backed out for causing assertion failures at ipc/MessageChannel.h.

Backout link: https://hg.mozilla.org/integration/autoland/rev/d581fde398bab8a4e9c1de008134d8aeeef7ebd3

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=Jaq9VP5SRmSoa9Sqdr6gcA.0&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Casan%2Copt%2Cmochitests%2Cwith%2Csoftware%2Cwebrender%2Cwith%2Cfission%2Cenabled%2Ctest-linux1804-64-asan-qr%2Fopt-mochitest-browser-chrome-swr-fis-e10s%2Cbc10&revision=95cee3ef708c39d1ed725fb05766e03a38523278

Failure log: https://treeherder.mozilla.org/logviewer?job_id=375147155&repo=autoland&lineNumber=9406

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Assignee

Comment 10

•

2 years ago

•

Edited

So actually the change that caused this was from https://phabricator.services.mozilla.com/D143710 adding the ShutDownProcess(CLOSE_CHANNEL); being called on the main thread, but this patch made it much more likely that this code path is actually ever hit, because we can get here during process launch now. It also shows that the case is not just theoretical...

I think that ContentParent::ShutDownProcess should be hardened against being called from the wrong thread such that it dispatches the CLOSE_CHANNEL to the right thread here? Any other suggestions?

Edit: Actually the mWorker the assertion is talking about is private to the MessageChannel, such that we cannot really access it for direct dispatch. We would probably need a CloseFromAnyThread()variant on MessageChannel itself ?

Flags: needinfo?(jstutte) → needinfo?(bugs)

Jens Stutte [:jstutte]

Assignee

Comment 11

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #10)

Edit: Actually the mWorker the assertion is talking about is private to the MessageChannel, such that we cannot really access it for direct dispatch. We would probably need a CloseFromAnyThread()variant on MessageChannel itself ?

This is probably more a question to :nika.

Flags: needinfo?(nika)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 12

•

2 years ago

I don't quite understand the assertion failure. We've called Close(CLOSE_CHANNEL) forever...
oh, ActorDestroy sets mCalledClose = true;
This setup is so fragile.

Flags: needinfo?(bugs)

Nika Layzell [:nika] (ni? for response)

Comment 13

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #10)

I think that ContentParent::ShutDownProcess should be hardened against being called from the wrong thread such that it dispatches the CLOSE_CHANNEL to the right thread here? Any other suggestions?

All ContentParent instances are fundamentally bound to the main thread, and all methods on it should be assumed to be main thread only unless otherwise noted.

Edit: Actually the mWorker the assertion is talking about is private to the MessageChannel, such that we cannot really access it for direct dispatch. We would probably need a CloseFromAnyThread()variant on MessageChannel itself ?

All actors are fundamentally thread-unsafe so any use of them from a thread other than the thread they were bound to is incorrect. Adding a CloseFromAnyThread() wouldn't make sense, because you can't actually close a MessageChannel from any thread.

The fact that MessageChannel internally contains locks is mostly an implementation detail. Unless you're part of the implementation you shouldn't be holding a reference to one from a thread other than the bound thread.

The error here is actually because you're calling Close() on a channel which was never actually opened, because process launch never finished. You'll need to make the code able to handle failing before being opened properly, because you can't rely on things like ActorDestroy being called if the actor is never actually opened.

Flags: needinfo?(nika)

Jens Stutte [:jstutte]

Assignee

Comment 14

•

2 years ago

•

Edited

OK, it makes sense I cannot destroy something that never fully initialized... And looking a bit closer I discovered, that we already have mLifecycleState that can tell us, in which state we actually are.

My guess how to solve this would then be:

If we cannot send in BlockShutdown, check if we are IsLaunching(). In case directly move to state LifecycleState::DEAD.
- This will make us bail out with ShutDownProcess(SEND_SHUTDOWN_MESSAGE); immediately in LaunchSubprocessResolve...
- ...and kick off all our normal cleanup, including removal of shutdown blockers.
If instead we are already dead in BlockShutdown, just assert and let go (assuming that we already kicked off the shutdown sequence), as I would not expect this to ever happen.

I'll prepare an additional patch for this. - Edit: Maybe I can just incorporate this in the backed out patch.

Jens Stutte [:jstutte]

Assignee

Comment 15

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #14)

If instead we are already dead in BlockShutdown, just assert and let go (assuming that we already kicked off the shutdown sequence), as I would not expect this to ever happen.

Thinking a bit more about this case, I think we should just ignore it, assuming that we race with an already ongoing shutdown of the child.

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Comment 16

•

2 years ago

Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10519640b4b3
Avoid race between process creation callback and application shutdown. r=smaug

Norisz Fay [:noriszfay]

Comment 17

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/10519640b4b3

Cosmin Sabou [:CosminS]

Updated

•

2 years ago

Regressions: 1765445

Cosmin Sabou [:CosminS]

Updated

•

2 years ago

No longer regressions: 1765445

Cosmin Sabou [:CosminS]

Updated

•

2 years ago

Regressions: 1765822

Cosmin Sabou [:CosminS]

Updated

•

2 years ago

Regressions: 1766016

Cosmin Sabou [:CosminS]

Updated

•

2 years ago

Regressions: 1765956

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Keywords: leave-open

Pulsebot

Comment 18

•

2 years ago

Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac88a47b070f
Substitute sCanLaunchSubprocesses with AppShutdown::IsInOrBeyond and add shutdown checks to BeginSubprocessLaunch and ContentProcessManager singleton creation. r=smaug,jesup

Cristian Tuns

Comment 19

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ac88a47b070f

Status: REOPENED → RESOLVED

Closed: 2 years ago → 2 years ago

status-firefox101: affected → fixed

Resolution: --- → FIXED

Jens Stutte [:jstutte]

Assignee

Updated

•

2 years ago

Updated

•

9 months ago