Parent process exits too early during Firefox shutdown, causing 'Exiting due to channel error' for content processes
Categories
(Testing :: Marionette Client and Harness, defect, P3)
Tracking
(Not tracked)
People
(Reporter: whimboo, Unassigned)
References
(Blocks 1 open bug)
Details
Follow-up from the discussion on bug 1906583: According to the log output, the parent process of Firefox exits before all content processes have terminated. This immediate process exit occurs as soon as Firefox shutdown is requested, and we do not see the quit-application
observer notification being triggered. As result Exiting due to channel error
failures are shown for each content process that is trying to terminate as well. Here an example of a Marionette unit test:
https://treeherder.mozilla.org/logviewer?job_id=465407101&repo=autoland&lineNumber=194367-194386
INFO - PID 22699 | 1720418048748 Marionette DEBUG 0 -> [0,46,"Marionette:Quit",{"flags":["eForceQuit"]}]
INFO - PID 22699 | 1720418048750 Marionette INFO Stopped listening on port 35322
INFO - PID 22699 | [Child 23092, IPC I/O Child] WARNING: [DD8138FB422205F3.96665EDE0126D631]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | [Child 23069, IPC I/O Child] WARNING: [F56011DAFCE85B03.C674D9DC131A134C]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | [Child 23022, IPC I/O Child] WARNING: [E35855B34FA21CFD.9E9B186565B45410]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | [Utility 22955, IPC I/O Child] WARNING: [B921783F44010B5B.247B520DBFA9C8CB]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | Exiting due to channel error.
INFO - PID 22699 | Exiting due to channel error.
INFO - PID 22699 | Exiting due to channel error.
INFO - PID 22699 | [Utility 22955, Main Thread] WARNING: Shutting down Utility process early due to a crash!: file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessChild.cpp:358
INFO - PID 22699 | [Child 22809, IPC I/O Child] WARNING: [E23DCDDCE61710D3.194247E91B21DA52]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | Exiting due to channel error.
INFO - PID 22699 | [Child 22915, IPC I/O Child] WARNING: [F87F755AC18D5FFE.7B9947573577C9BB]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | Exiting due to channel error.
INFO - PID 22699 | [Child 22843, IPC I/O Child] WARNING: [96FE4553C6936EB4.60451CFC0AA73145]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
INFO - PID 22699 | Exiting due to channel error.
INFO - PID 22699 | [23008] Sandbox: Unexpected EOF, op 0 flags 00 path /proc/cpuinfo
INFO - PID 22699 | 1720418048786 webdriver::server DEBUG Teardown session
INFO - PID 22699 | 1720418048786 geckodriver::marionette ERROR Failed to close browser connection: unknown error
INFO - PID 22699 | 1720418048786 webdriver::server DEBUG <- 500 Internal Server Error {"value":{"error":"unknown error","message":"Failed to decode response from marionette","stacktrace":""}}
A normal shutdown as requested by Marionette:
INFO - PID 24782 | 1720418108424 Marionette DEBUG 0 -> [0,27,"Marionette:Quit",{"flags":["eForceQuit"]}]
INFO - PID 24782 | 1720418108425 Marionette INFO Stopped listening on port 37298
INFO - PID 24782 | 1720418108482 Marionette TRACE [12] MarionetteCommands actor destroyed for window id 6442450945
INFO - PID 24782 | 1720418108483 WebDriver BiDi DEBUG Unregistered session handler: /session/5948386c-95eb-49b5-b027-6e9b529a86d0
INFO - PID 24782 | 1720418108488 RemoteAgent TRACE MessageHandler ROOT for session 5948386c-95eb-49b5-b027-6e9b529a86d0 is being destroyed
INFO - PID 24782 | 1720418108488 RemoteAgent TRACE Unregistered MessageHandler ROOT for session 5948386c-95eb-49b5-b027-6e9b529a86d0
INFO - PID 24782 | 1720418108571 Marionette TRACE Received observer notification quit-application
INFO - PID 24782 | 1720418108581 Marionette DEBUG Marionette stopped listening
INFO - PID 24782 | 1720418108582 Marionette DEBUG 0 <- [1,27,null,{"cause":"shutdown","forced":false,"in_app":true}]
INFO - PID 24782 | 1720418108622 webdriver::server DEBUG Teardown session
So what is causing this early exit of the parent process not following the normal shutdown?
Not sure if that is the right component, so please adjust if needed.
Reporter | ||
Comment 1•1 year ago
|
||
The same can actually happen when we just try to close a browser window. See the test failures over on bug 1898472.
Comment 2•1 year ago
|
||
The IPC messages you're seeing her are just a symptom of the parent process exiting (similar to bug 1898472 comment 2), and there's not really anything from the log which makes it clear why this would be happening unfortunately. It's clear that the parent process exited too quickly without going through normal XPCOM shutdown here, and that lead to the child processes emitting these logs and exiting early. This isn't really an IPC issue, IPC is just showing the symptoms.
Because this is a debug build, we should always be going through the full shutdown codepath.
Reporter | ||
Comment 3•1 year ago
|
||
But this is also not an issue with Marionette or the client. Should it be then living in Core :: General
or something similar?
(In reply to Nika Layzell [:nika] (ni? for response) from comment #2)
Because this is a debug build, we should always be going through the full shutdown codepath.
So what should happen when it's not a debug build? So far we always have seen the quit-application
observer notification when requesting a shutdown of Firefox and as such were able to successfully return the Marionette's response to the client. Why does that not happen in some rare cases?
Comment 4•1 year ago
|
||
I don't know what's happening in this case. I would expect in normal operation for the browser to exit like normal. It seems like there's probably a crash happening or similar, though I don't see anything in the logs for that. You could try adding logging during the Marionette quit codepath to see how far it gets before crashing.
Reporter | ||
Comment 5•1 year ago
|
||
Ok so the entry point for that is nsAppStartup::Quit
. But I actually cannot find the place where we call NotifyObservers()
for quit-application
to narrow down the range when this could happen. Maybe you have a pointer Nika?
Interesting is indeed that we call nsAppStartup::CloseAllWindows()
before the shutdown of Firefox. Given bug 1898472 and that we've also seen it when closing a tab or a window, this issue might actually not related to the shutdown code of Firefox.
[task 2024-05-23T09:17:19.506Z] 09:17:19 INFO - 1716455839506 Marionette DEBUG 6 -> [0,60,"WebDriver:CloseChromeWindow",{}]
[task 2024-05-23T09:17:19.578Z] 09:17:19 INFO - Exiting due to channel error.
In WebDriver:CloseChromeWindow
we call this.curBrowser.closeWindow()
which then forwards to windowManager.closeWindow()
and closes the window via window.close()
.
Marionette sets an environment variable to shutdown Firefox if a content process crashes. Usually this works fine and we see the related output in the log as well as the crash report. But all that is not visible here.
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 6•1 year ago
|
||
Nika, related to my last comment could you imagine that window.close()
could trigger this situation?
Comment 7•1 year ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #6)
Nika, related to my last comment could you imagine that
window.close()
could trigger this situation?
No, no action like window.close()
should lead to this situation. IIRC this should only ever happen in the case of an abnormal crash or exit.
Reporter | ||
Comment 8•1 year ago
|
||
I have seen such kind of aborts today locally when running web-platform tests. Firefox just stopped running and quit early during a test. The reason was that I forgot to rebuilt after running a git pull
. So not sure if the crashes we see in CI might be due to some broken binaries?
Description
•