Bug 1777198 Comment 3 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

What I learned so far:

- The testcase confirms, that an endlessly running  JS will prevent us from shutting down.
- We have some weirdness in our shutdown order, some necessary and some historically grown.
  - Extensions start to shutdown at "quit-application-granted" but their shutdown is not bound to a single phase and [can last until "profile-change-teardown"](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/toolkit/mozapps/extensions/internal/XPIProvider.jsm#2588,2624). Extensions [broadcast shutdown messages to all children]( https://searchfox.org/mozilla-central/rev/3e1a721bce1da3ae04675539b39a4e95b25a046d/toolkit/components/extensions/Extension.jsm#3249-3252) that the extension is going away and then [waits for all of them to acknowledge the message](https://searchfox.org/mozilla-central/rev/3e1a721bce1da3ae04675539b39a4e95b25a046d/toolkit/components/extensions/Extension.jsm#2717-2749). Thus a blocking JS in a content process will make timeout extensions shutdown.
  - `SessionStore` wants to flush all windows on "quit-application-granted". This [requires an interaction with content process](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/browser/components/sessionstore/TabStateFlusher.jsm#100-105), too. Again a blocking JS in a content process will make timeout `SessionStore` shutdown. It seems clear that we want this to succeed instead in order to ensure we can save our session state.
- The test needs some polishing
  - The arbitrary timeout should be replaced by some messaging/event logic
  - The test does not succeed (but also does not fail). Not sure how easy it is to make that happen on shutdown.

What the patch does:

- Add each `ContentParent` also as blocker to phase `quitApplicationGranted`
- On `quitApplicationGranted`, just do `NotifyImpendingShutdown`, remove that blocker (and thus wait for the next shutdown phase for the real shutdown)
- The child then receives `NotifiedImpendingShutdown` and [sets an atomic flag](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/ipc/glue/ProcessChild.cpp#74) accessible via `ProcessChild::ExpectingShutdown()`
- `XPCJSContext`'s `WatchdogMain` checks `ProcessChild::ExpectingShutdown()` on wakeup and in case checks each context if it is running JS for more than a second. If yes, it issues an interrupt.
- The then issued `InterruptCallback` checks again for `ProcessChild::ExpectingShutdown()` and in case signals an unconditional cancel.

What could be improved:

- There is no clear reason, why extensions should shutdown earlier than the content processes. IIUC, this creates only useless noise during extensions shutdown. We might want to think about changing that order. And having that timeout logic that spans over more than one phase might even have been a way to paper over unresponsive content processes?
- There might be nicer ways of canceling than squeezing this into the `WatchdogMain`, though it seemed to be the less invasive way to do this. If we keep this, we might want to tweak the timeouts a bit: With the current values we can end up waiting up to three seconds before a blocking JS is canceled.  This is an eternity on a modern computer (consider we are issuing a shutdown kill after only 5 seconds). So the watchdog could check more frequently and/or we could consider a shorter running time as a hang.
What I learned so far:

- The testcase confirms, that an endlessly running  JS will prevent us from shutting down.
- We have some weirdness in our shutdown order, some necessary and some historically grown.
  - Extensions start to shutdown at "quit-application-granted" but their shutdown is not bound to a single phase and [can last until "profile-change-teardown"](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/toolkit/mozapps/extensions/internal/XPIProvider.jsm#2588,2624). Extensions [broadcast shutdown messages to all children]( https://searchfox.org/mozilla-central/rev/3e1a721bce1da3ae04675539b39a4e95b25a046d/toolkit/components/extensions/Extension.jsm#3249-3252) that the extension is going away and then [waits for all of them to acknowledge the message](https://searchfox.org/mozilla-central/rev/3e1a721bce1da3ae04675539b39a4e95b25a046d/toolkit/components/extensions/Extension.jsm#2717-2749). Thus a blocking JS in a content process will make timeout extensions shutdown.
  - `SessionStore` wants to flush all windows on "quit-application-granted". This [requires an interaction with content process](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/browser/components/sessionstore/TabStateFlusher.jsm#100-105), too. Again a blocking JS in a content process will make timeout `SessionStore` shutdown. It seems clear that we want this to succeed instead in order to ensure we can save our session state.
- The test needs some polishing
  - The arbitrary timeout should be replaced by some messaging/event logic
  - The test does not succeed (but also does not fail). Not sure how easy it is to make that happen on shutdown.

What the patch does:

- Add each `ContentParent` also as blocker to phase `quitApplicationGranted`
- On `quitApplicationGranted`, just do `NotifyImpendingShutdown`, remove that blocker (and thus wait for the next shutdown phase for the real shutdown)
- The child then receives `NotifiedImpendingShutdown` and [sets an atomic flag](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/ipc/glue/ProcessChild.cpp#74) accessible via `ProcessChild::ExpectingShutdown()`
- `XPCJSContext`'s `WatchdogMain` checks `ProcessChild::ExpectingShutdown()` on wakeup and in case checks each context if it is running JS for more than a second. If yes, it issues an interrupt.
- The then issued `InterruptCallback` checks again for `ProcessChild::ExpectingShutdown()` and in case signals an unconditional cancel.

What could be improved:

- There is no clear reason, why extensions should shutdown earlier than the content processes. IIUC, this creates only useless noise during extensions shutdown. We might want to think about changing that order. And having that timeout logic that spans over more than one phase might even have been a way to paper over unresponsive content processes?
- There might be nicer ways of canceling than squeezing this into the `WatchdogMain`, though it seemed to be the less invasive way to do this. If we keep this, we might want to tweak the timeouts a bit: With the current values we can end up waiting up to three seconds before a blocking JS is canceled.  This is an eternity on a modern computer (consider we are issuing a shutdown kill after only 5 seconds). So the watchdog could check more frequently and/or we could consider a shorter running time as a hang.
- I am puzzled [by this false together with the previous true](https://searchfox.org/mozilla-central/rev/284187d0a7130d21042beeff0af0627c8e68cacc/js/xpconnect/src/XPCJSContext.cpp#526,531,533). IIUC `ForAllActiveContexts` breaks its loop on false, so if the first context in the list was not running for too long, we won't check the other contextes? I'd expect if ever to be it the other way round, such that we ask for one interrupt at a time but check all contextes.

Back to Bug 1777198 Comment 3