[mozprocess] Improve the ProcessHandlerMix on Windows to handle detached processes
Categories
(Testing :: Mozbase, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: whimboo, Unassigned)
References
(Blocks 4 open bugs)
Details
Attachments
(1 file)
Reporter | ||
Comment 1•6 years ago
|
||
![]() |
||
Updated•6 years ago
|
Reporter | ||
Comment 2•6 years ago
|
||
Comment hidden (obsolete) |
Reporter | ||
Comment 4•6 years ago
|
||
Reporter | ||
Comment 5•6 years ago
|
||
Reporter | ||
Comment 6•6 years ago
|
||
Reporter | ||
Comment 7•6 years ago
|
||
Reporter | ||
Updated•6 years ago
|
Comment 9•6 years ago
|
||
bugherder |
Comment 10•6 years ago
|
||
Comment 11•6 years ago
|
||
Comment 13•6 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:whimboo, maybe it's time to close this bug?
Reporter | ||
Comment 14•6 years ago
|
||
No, there is still more work to do here, but I don't have the time doing it at the moment. If anyone has interest feel free to do it.
Updated•2 years ago
|
Reporter | ||
Comment 15•2 years ago
|
||
I started some work to improve the stability of wdspec tests on bug 1824841. When pushing to try I saw a high frequent failures on Windows only when mozprocess is handling the shutdown of the geckodriver binary, which look like:
https://treeherder.mozilla.org/logviewer?job_id=410631876&repo=try&lineNumber=109643-109648
[task 2023-03-29T09:20:31.185Z] 09:20:31 INFO - STDOUT: Exiting with code: 64
[task 2023-03-29T09:20:31.187Z] 09:20:31 INFO - STDOUT: DBG::MOZPROC PID:9140 (Thread-9) | process id 9140 exited normally
[task 2023-03-29T09:20:31.187Z] 09:20:31 INFO - STDOUT: DBG::MOZPROC PID:9140 (Thread-9) | job object msg active processes zero
[task 2023-03-29T09:20:31.188Z] 09:20:31 INFO - STDOUT: DBG::MOZPROC ProcessReader | _read loop exited
[task 2023-03-29T09:20:31.188Z] 09:20:31 INFO - STDOUT: DBG::MOZPROC ProcessReader | _read exited
[task 2023-03-29T09:20:33.201Z] 09:20:33 INFO - STDOUT: DBG::MOZPROC PID:9140 (Thread-7) | No process handle
[task 2023-03-29T09:20:33.203Z] 09:20:33 INFO - STDOUT: ** returncode: None 1680081633.2012012
As it can be seen the geckodriver process itself quits with an exit code of 64, but mozprocess continues to report None
as exit code. The problem here is actually part of the Process.poll()
handling. We are on Windows and the handle is gone, but the method returns still None
and not the expected exit code, which means that mozprocess thinks that the process is still running.
I then remembered my patch from 5 years ago which landed on bug 1433905, but then got backed out here. Applying it again (after fixing the merge conflicts) actually makes it work for the above case as this try build shows. But sadly now there is a perma failure for a Marionette test, where an environment variable as set before a restart is not available after the restart anymore. Investigating this clearly showed me that mozprocess is failing to handle in-application restart scenarios as only done via Marionette. And this is clearly something that we have to get fixed to improve the reliability of tests but also the Marionette client itself.
Here some observations:
- On Windows the Firefox binary is launched via a launcher process (2008), which means that the real parent process (4284) is hold within a wrapper process. The client which starts Firefox via mozprocess uses the wrapper process as known PID, and not the real binary. Also all child processes (eg. web content) are attached to the inner PID (4284) and not the launcher.
When Firefox is restarted from within the application itself a new launcher process (7864) is forked/spawned under the existing parent process (4284), which itself then spawns the new Firefox parent process (6392). Right after both the former parent process (4284) and launcher process (2008) exit, as well the new launcher process (7864) so that the new parent process (6392) is now made the top-level one:
0:00.03 INFO Using workspace for temporary data: "C:\mozilla\mozilla-unified"
0:00.05 mozversion INFO application_buildid: 20230413093523
0:00.05 mozversion INFO application_changeset: c35b4a88139530b8c2f2e58e0fededacfe7a8571
0:00.05 mozversion INFO application_display_name: Nightly
0:00.05 mozversion INFO application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
0:00.05 mozversion INFO application_name: Firefox
0:00.05 mozversion INFO application_remotingname: firefox-default
0:00.05 mozversion INFO application_vendor: Mozilla
0:00.05 mozversion INFO application_version: 114.0a1
0:00.05 mozversion INFO platform_buildid: 20230413093523
0:00.06 mozversion INFO platform_changeset: c35b4a88139530b8c2f2e58e0fededacfe7a8571
0:00.06 mozversion INFO platform_version: 114.0a1
0:00.06 INFO Application command: C:/mozilla/mozilla-unified/obj-x86_64-pc-windows-msvc\dist\bin\firefox.exe -no-remote -marionette --wait-for-browser -profile C:\Users\henrik\AppData\Local\Temp\tmpldsuf5ca.mozrunner
*** initial pid: 2008
*** new process - psutils details: {'ppid': 2008, 'name': 'firefox.exe', 'pid': 4284}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 11160}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 13644}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 3896}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 9568}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 13348}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 6536}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 13436}
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 12996}
1681828049924 Marionette DEBUG 3 -> [0,23,"Marionette:Quit",{"flags":["eRestart"]}]
1681828050065 Marionette DEBUG Marionette stopped listening
1681828050067 Marionette DEBUG 3 <- [1,23,null,{"cause":"restart","forced":false,"in_app":true}]
DBG::MOZPROC PID:2008 (Thread-3) | process id 9568 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 3896 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 6536 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 12996 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 13436 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 13348 exited normally
*** new process - psutils details: {'ppid': 4284, 'name': 'firefox.exe', 'pid': 7864}
DBG::MOZPROC PID:2008 (Thread-3) | process id 13644 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 11160 exited normally
*** new process - psutils details: {'ppid': 7864, 'name': 'firefox.exe', 'pid': 6392}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 11432}
DBG::MOZPROC PID:2008 (Thread-3) | process id 7864 exited normally
DBG::MOZPROC PID:2008 (Thread-3) | process id 4284 exited normally
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 2444}
DBG::MOZPROC PID:2008 (Thread-3) | process id 2008 exited normally
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 6228}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 12620}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 8744}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 14456}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 6928}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 996}
*** new process - psutils details: {'ppid': 6392, 'name': 'firefox.exe', 'pid': 7088}
Given that we are working within a job object I wonder if mozprocess should not track the launcher process but the real firefox binary process. As least that would make it easier when the process gets restarted. Also it probably shouldn't matter if we kill the launcher process or the real parent process. The poll logic on Windows uses the Job Object and we exit when all processes are gone, and the launcher should be gone as well when the parent process exited.
- I tried to add some code via
ctypes
which can retrieve the pid of the parent process for a new spawned process. But sadly this doesn't work yet, and I'm currently experimenting by usingpsutil
. Hereby I'm not sure if we can actually use that package viamach
or in CI. Once I have it all working I might find a way to retrieve the information via the WinAPI.
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 16•2 years ago
|
||
I've pushed a try build for my proposed changes but it turns out that quite a lot of test failures started to happen that need investigation:
https://treeherder.mozilla.org/jobs?repo=try&revision=3675ba08ec911eed3a34a79135a2db1836e4db5e
I'll deprioritize this bug a bit would would still check why there are such failures.
Reporter | ||
Comment 17•1 year ago
|
||
We are going forward to reduce the amount of usage of mozprocess in our code. As such I don't think that any time spent on this bug will be useful. I'm going to unassign myself for now at least.
Description
•