Open Bug 1845778 Opened 1 year ago Updated 2 months ago

ContentParent can spawn more processes than dom.ipc.processCount at shutdown

Tracking

()

Status:

NEW

People

(Reporter: robwu, Unassigned)

References

Details

Attachments

(2 files, 1 obsolete file)

Bug 1845778 - Test case + debugging for further investigation 1 year ago Rob Wu [:robwu] 48 bytes, text/x-phabricator-request		Details \| Review
test_extension_process_alive.js-output.log 1 year ago Rob Wu [:robwu] 41.72 KB, application/octet-stream		Details
WIP: Bug 1845778 - Refactor point of no return sequence. 1 year ago Jens Stutte [:jstutte] 48 bytes, text/x-phabricator-request		Details \| Review

Rob Wu [:robwu]

Reporter

Description

•

1 year ago

During my investigation of bug 1845352, I found that it is possible for the number of extension processes to become 2 despite dom.ipc.processCount.extension being at its default value of 1 (and dom.ipc.keepProcessesAlive.extension set to 0 or 1).

This can happen as follows:

Extension process is started (when an extension is loaded).
dom.ipc.keepProcessesAlive.extension=1
"quit-application-granted" notification is triggered, which results in the invocation of ContentParent::BlockShutdown.
ContentParent::BlockShutdown calls SignalImpendingShutdownToContentJS().
(external) An attempt to open an extension page happens (moz-extension:-URL). The desired process is looked up by ContentParent::GetUsedBrowserProcess.
Whatever selection is ignored because IsShuttingDown() is true due to step 4.
Because the process is not found, a new process is created.
While step 7 features the comment "// Launch aborted because of shutdown. Bailout.", the actual implementation does not bail at this stage and, and consequently there is an extra extension process.
- ContentParent::BeginSubprocessLaunch seems to check ContentProcessManager::GetSingleton() to determine whether shutdown has commenced.
- But ContentProcessManager::GetSingleton() only bails out past XPCOMShutdownFinal. This logic was added in bug 1764251.

I believe that ContentParent::GetNewOrUsedLaunchingBrowserProcess and/or ContentParent::BeginSubprocessLaunch should bail as soon as ShutdownPhase::AppShutdownConfirmed has been reached. And definitely not spawn another process.

Rob Wu [:robwu]

Reporter

Updated

•

1 year ago

Comment 1

•

1 year ago

(In reply to Rob Wu [:robwu] from comment #0)

I believe that ContentParent::GetNewOrUsedLaunchingBrowserProcess and/or ContentParent::BeginSubprocessLaunch should bail as soon as ShutdownPhase::AppShutdownConfirmed has been reached. And definitely not spawn another process.

It seems to me that we already have this check in ContentParent::GetNewOrUsedLaunchingBrowserProcess a few lines above, see bug 1811195. And I would hope that there is no way of calling things in parallel from different threads such that ShutdownPhase::AppShutdownConfirmed changes state in between. Are you able to reproduce this behavior and maybe record a pernsosco session?

Flags: needinfo?(rob)

Comment 2

•

1 year ago

Attached file Bug 1845778 - Test case + debugging for further investigation — Details

Demonstrates bug 1845778: when "quit-application-granted" is triggered,
AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdownConfirmed) is
false.

The additional logging in ContentParent.cpp is not strictly required,
but does help with showing what's going on in more detail.

MOZ_LOG=Process:5,sync ./mach test toolkit/components/extensions/test/xpcshell/test_extension_process_alive.js --log-mach-verbose --verbose --setpref=toolkit.asyncshutdown.log=true

Depends on D184758

Rob Wu [:robwu]

Reporter

Comment 3

•

1 year ago

Attached file test_extension_process_alive.js-output.log — Details

Output of test run from https://bugzilla.mozilla.org/show_bug.cgi?id=1845778#c2

To reproduce locally, apply https://phabricator.services.mozilla.com/D184758 and https://phabricator.services.mozilla.com/D185060, then run the command.

Tips to read the log:

Highlight /Process
Line 136-141 shows how the issue is triggered - i.e. "quit-application-granted" is triggered.
Line 142 shows the "surprising" observation that !isInOrBeyondShutdownPhase(SHUTDOWN_PHASE_APPSHUTDOWNCONFIRMED)
Line 214 - 283 is the full log of the start of the specific test up to the test failure; the test failure is that the extension process is unexpectedly terminated.
Line 253 is noteworthy; with the extra debug logging from the patch it shows that the number of extension processes is two.

Rob Wu [:robwu]

Reporter

Comment 4

•

1 year ago

I took a closer look, and attached two attachments: patch to trigger the issue (comment 2), and a log file from a run of the test case, with some annotations in comment 3.

The reported issue only happens because (until bug 1845352 was fixed), a test could trigger "quit-application-granted" without a matching call to AppShutdown::AdvanceShutdownPhase(ShutdownPhase::AppShutdownConfirmed, ...).

The question would mainly be: how realistic is this scenario be outside of unit tests?

nsAppStartup::Quit triggers "quit-application-granted" and has conditional logic to advance to AppShutdownConfirmed. This is the primary way in the wild to reach shutdown. (I'm not counting the other trigger of "quit-application-granted", since that is dead code per bug 1827807).
Because "quit-application-granted" is not immediately paired with AdvanceShutdownPhase(AppShutdownConfirmed, ...), I wonder whether there is a chance for logic to be run in between. In the context of this bug:
- ContentParent.cpp registers a shutdown blocker (ContentParent::BlockShutdown).
- nsAsyncShutdown.sys.mjs calls blockShutdown in response to "quit-application-granted".
  - "quit-application-granted" observer is registered here by AsyncShutdown.sys.mjs.
  - The event loop is spun until the shutdown blockers have responded: https://searchfox.org/mozilla-central/rev/4044c34031d035fadb588143297ba5421419d44b/toolkit/components/asyncshutdown/AsyncShutdown.sys.mjs#548-549,573-576
- Because the event loop is spun, the window for arbitrary logic running increases.

Given this rough analysis, I suppose that it is realistic for the issue in this bug to happen. Especially because extension shutdown does not immediately happen when quit-application-granted is triggered: https://searchfox.org/mozilla-central/rev/4044c34031d035fadb588143297ba5421419d44b/toolkit/mozapps/extensions/internal/XPIProvider.jsm#2625-2633,2667

Flags: needinfo?(rob)

Jens Stutte [:jstutte]

Updated

•

1 year ago

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 5

•

1 year ago

•

Edited

partially-incorrect

AppShutdownConfirmed aka "quit-application-granted" is indeed a bit special. When it runs, we have basically a 100% running browser and are informed that we should begin our shutdown sequence. It is the point of no return.

mozilla::AppShutdown::OnShutdownConfirmed() is called only once we had ferocity == eForceQuit. But as you pointed out, we already notified "quit-application-granted" observers well above, triggering all kinds of reactions.

We even have some special case inside AppShutdown::AdvanceShutdownPhaseInternal to avoid side effects from processing pending events before notifying the phase (again). I wonder if most of those side effects come exactly from the earlier notification.

I think we could try to just remove the first notification as it happens below again at the presumably better moment in time. This matches also better being "the point of no return" as we have some logic that appears to check if we can really close but that seems to not have any other effect than returning an error after having initiated the shutdown, anyways, as we will always have set ferocity = eForceQuit; before if we get there.

Edit: This is nonsense, obviously above we fire "quit-application-granted" and then AppShutdown fires "quit-application".

Please note that the modified test of course would still show the behavior you discovered as you manually call notifyObservers. If you want the correct shutdown behavior, you need to use advanceShutdownPhase. In fact I wonder if mocking AppShutdown in that test in general may have some unexpected side effects, too.

Edit (after chatting with :smaug): It seems we should overhaul the order here a bit. We should definitely be in the shutdown phase before/while sending the notification. And call CloseAllWindows only afterwards. We can then still check if the closing was successful, but if not, that probably just merits a MOZ_CRASH, as unload handlers on child processes would not be able to create new processes anymore (once the IsInOrBeyondcheck works correctly) and having unload handlers for anything running in the parent process should be considered being an error, probably.

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 6

•

1 year ago

This check might even prevent us from doing silly things also in the parent process case?

Jens Stutte [:jstutte]

Comment 7

•

1 year ago

Attached file WIP: Bug 1845778 - Refactor point of no return sequence. (obsolete) — Details

Comment hidden (obsolete)

Joshua Marshall

Updated

•

1 year ago

Assignee: nobody → jmarshall

Flags: needinfo?(jmarshall)

Jens Stutte [:jstutte]

Comment 9

•

1 year ago

(In reply to Jens Stutte [:jstutte] from comment #8)

I threw this together, but a first smoke test seems to indicate it has bad side effects. This needs more time than I currently have, probably. Joshua, would you mind taking this over?

I suspect we have too many checks for AppShutdownConfirmedin our code that might better be AppShutdown.

Jens Stutte [:jstutte]

Updated

•

1 year ago

Severity: -- → S3

Priority: -- → P3

Phabricator Automation

Updated

•

1 year ago

Attachment #9348409 - Attachment is obsolete: true

Jens Stutte [:jstutte]

Comment 10

•

1 year ago

•

Edited

(In reply to Jens Stutte [:jstutte] from comment #9)

I suspect we have too many checks for AppShutdownConfirmedin our code that might better be AppShutdown.

The patch was based on confusing "quit-application-granted" with "quit-application". But the last statement might still be true, especially together with the weirdness that we have a shutdown barrier quitApplicationGranted but apparently not for quitApplication. Joshua will continue investigation.

Joshua Marshall

Updated

•

2 months ago

Assignee: jmarshall → nobody

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

ContentParent can spawn more processes than dom.ipc.processCount at shutdown

Categories

(Core :: DOM: Content Processes, defect, P3)

Tracking

()

People

(Reporter: robwu, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 1 obsolete file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Updated

Updated

Comment 10

Updated

Attachment

General

Description

File Name

Content Type