Use "psutil" as process handler for mozrunner / mozprocess on MacOS to support the new application restart mechanism in Firefox (bug 1827651)
Categories
(Testing :: Marionette Client and Harness, defect, P2)
Tracking
(firefox127 fixed)
Tracking | Status | |
---|---|---|
firefox127 | --- | fixed |
People
(Reporter: spohl, Assigned: whimboo)
References
(Blocks 4 open bugs)
Details
(Keywords: perf-alert, Whiteboard: [webdriver:m11])
Attachments
(2 files, 1 obsolete file)
23.47 KB,
patch
|
Details | Diff | Splinter Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
We are reworking the way we launch and relaunch applications on macOS in bug 1827651. We are blocked from landing the code because we are running into Marionette test failures that we can't quite explain yet (https://treeherder.mozilla.org/jobs?repo=try&revision=e73afb0ca12508a43f65b1853c478116b44eee6c). In my conversations with Henrik on Matrix it became clear that this might require a change to Marionette, rather than changes to the patch in bug 1827651.
The easiest way that I found to reproduce the issue is by applying the patch in bug 1827651 (moz-phab patch D186701
) and only running test_quit_restart.py
(see attached diff). Marionette expects the returncode of the first process to still be None
(which means "still running"). However, with the patch in bug 1827651 applied, the returncode is actually 0
, the restarted process just sits around (which ultimately causes the timeouts on try) and Marionette even starts a new session because it didn't expect a returncode of 0
.
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Comment 1•2 months ago
|
||
After some investigation today it turns out to be actually a bigger issue. Here some details:
-
The current method for managing restarts in Firefox relies on the assumption that the exit code of the original process remains as
None
. This behavior is a result of the existing implementation and usage of mozrunner and mozprocess, where the reader thread of the process handler never terminates, leaving thereturncode
asNone
. -
With this behavior in mind, we (unintentionally) internally update the detached process ID (pid) and manage that instead of the original one. This allows us to continue correctly piping
stdout
andstderr
to our logger. -
With the implementation of the new mechanism for shutting down Firefox (as addressed in bug 1827651), there's a significant change. Now, during a restart, the reader thread exits once the process has been closed, resulting in a
returncode
of0
. Consequently, mozrunner and mozprofile are no longer able to function with the new process. Internally, this means they cannot attach to the new process and assume that it has quit. -
Additionally, the reader threads are no longer active, meaning any output to
stdout
andstderr
is lost. While I've encountered this in the past on occasion, it's now a permanent situation with this new mechanism.
Considering the situation described above, a potential fix for mozrunner might involve moving away from mozprocess and utilizing psutil
instead. However, this would be quite an invasive change.
Geoff or Joel, has there been any consideration or discussion about porting mozrunner to utilize psutil instead of mozprocess?
Stephen, would you happen to know what internal changes are involved with the new shutdown mechanism for the application? Could it possibly include the proactive closing of the stdout
and stderr
streams, a behavior that wasn't present before?
Comment 2•2 months ago
|
||
I haven't had any discussions (formal or casual) about changing any mozbase modules (except manifestparser) in the last 5 years. I recall :gbrown doing some changes to reduce the scope of some modules. I am game for reducing our reliance on mozprocess, I also suspect modern python and newer OS versions allow for less hacking.
As for bandwidth, there is no "A-team" that works on mozprocess anymore, we all have different day jobs!
Comment 3•2 months ago
|
||
Bug 1843347 is my now long-neglected attempt to reduce use of mozprocess. I haven't looked at mozrunner lately. I think it would be great if it no longer used mozprocess.
Reporter | ||
Comment 4•2 months ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #1)
Stephen, would you happen to know what internal changes are involved with the new shutdown mechanism for the application? Could it possibly include the proactive closing of the
stdout
andstderr
streams, a behavior that wasn't present before?
I can't say what the internal differences are between the NSTask
API that was used to launch applications before and the new NSWorkspace
route. Does Firefox somehow have awareness that it was launched by Marionette? We have replaced the legacy NSTask
-way of starting applications with two new ways, one using a more recent NSTask
API and one using NSWorkspace
. One possible workaround could be to use the NSTask
-route when launched by Marionette. While this doesn't test the NSWorkspace
-route, it might be a temporary solution to land bug 1827651 for Firefox and enable macOS session resume while Marionette is being switched from mozprocess to psutil.
Assignee | ||
Comment 5•2 months ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #3)
Bug 1843347 is my now long-neglected attempt to reduce use of mozprocess. I haven't looked at mozrunner lately. I think it would be great if it no longer used mozprocess.
Yeah, touching mozrunner in this regard might be a challenging task and not easy to get done in a short time frame.
(In reply to Stephen A Pohl [:spohl] from comment #4)
route. Does Firefox somehow have awareness that it was launched by Marionette? We have replaced the legacy NSTask
-way of starting applications with two new ways, one using a more recent NSTask
API and one using NSWorkspace
. One possible workaround could be to use the NSTask
-route when launched by Marionette. While this doesn't test the NSWorkspace
-route, it might be a temporary solution to land bug 1827651 for Firefox and enable macOS session resume while Marionette is being switched from mozprocess to psutil.
It's not registered in the platform which tool launched the application, and this includes Marionette. However, Marionette is quite unique in that it has to support application restarts, a requirement that no other test suite or harness has. Besides the Python client we've discussed, Firefox has the Marionette and Remote Agent (WebDriver BiDi) components. Both of these could be queried via XPConnect to check if they are enabled and running. You can find an example of this in the code here.
Nonetheless, if we were to add a switch for now, I might lean towards using a preference. This approach would allow external tools to control the shutdown behavior in Firefox, making it easier for us to implement support for psutil (or another tool) as a replacement for the processhandler
Python package.
What do you think?
Assignee | ||
Comment 6•2 months ago
|
||
I've done some more investigation yesterday:
-
Using
psutil
for just Marionette to start the process and close it works fine. The benefit is that by usingpsutil.Popen()
we can start our own binary so it's mostly behaving similar with processhandler except the stream reader code doesn't work - but maybe we won't need that or could do it in a different way. But what's most helpful is that withpsutil.Process()
we can attach to a different process that was not launched by us. All of the members of this class are also mapped intopsutil.Popen()
so we have a common set of helpers. -
Using
psutil.Process()
to attach to a process I was not able to get thestdout
andstderr
output handling working. Both streams only work for processes that we start withpsutil.Popen()
by using the appropriate arguments for the constructor, or by accessing the properties afterward. But note that these are not present onpsutil.Process()
. Maybe someone has an idea how to get this accomplished? -
psutil.Process
instances have a lot of very helpful members which would allow us to give better error messages in case something doesn't work correctly with the managed process. -
With the new restart mechanism for Firefox on MacOS the new process is started in a way that no terminal is attached anymore. That results in no output of any log details in my terminal anymore. It also means that we can no longer capture this output which would be crucial for our testing.
As conclusion we could try to remove the custom Process
implementation in mozprocess and replace it with using psutil.Popen
which would fit most of our use cases for mozprocess. For Marionette it's more complicated given that we must support the restart feature and I cannot find a way to connect to the stdout
and stderr
streams.
Assignee | ||
Comment 7•2 months ago
|
||
I've spent some more time by end of last week for using psutil
only for Marionette given that we cannot that easily replace the current mozprocess implementation with it. Especially for MacOS where we definitely need updated code for process handling it works pretty fine and (maybe unrelated) some wpt tests now pass permanently:
https://treeherder.mozilla.org/jobs?repo=try&revision=c2b72576572ef16b9e45b2f321ebe10961d83400
There are issues still for Linux and Windows which so far I cannot reproduce locally but which seem to fail permanently in CI.
As such we may consider using or own psutil process handler class for mozprocess just for MacOS for now and replace it for other platforms later.
Note that there is also still an issue to retrieve the exit code of a restarted process given that it's not under our own control (via POpen
) no returncode
property is available even not with psutil. That means that with a restarted process we don't know what caused the application to quit and if that was a crash (we would have to check for the minidump folder). Happy to hear feedback in case there is some other way to retrieve that information.
We should discuss if such a replacement for Marionette makes sense or if there are other methods that would allow to handle non-child processes that elegantly as psutil.
Assignee | ||
Comment 8•2 months ago
|
||
As discussed with the team we will go ahead by using psutil for Marionette via a drop-in replacement in mozrunner. That means all other consumers are not affected! As first step we will also only do it for MacOS and follow-up with other platforms and features later.
Assignee | ||
Comment 9•2 months ago
|
||
Assignee | ||
Comment 10•2 months ago
|
||
Assignee | ||
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Comment 11•2 months ago
|
||
The severity field is not set for this bug.
:whimboo, could you have a look please?
For more information, please visit BugBot documentation.
Assignee | ||
Updated•2 months ago
|
Updated•2 months ago
|
Updated•1 month ago
|
Comment 12•1 month ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ad37a0c06526 [marionette] Use psutil on MacOS as custom process handler drop-in for mozrunner. r=webdriver-reviewers,spohl,jdescottes
Comment 13•1 month ago
•
|
||
Backed out for causing py3 unit test failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/b7c51567545cbceac3798fa536245baf600b18d4
L.E
Also these perftest failures
Updated•1 month ago
|
Assignee | ||
Comment 14•1 month ago
|
||
These are the Python Mochitest harness tests that are failing. I'm not sure why that is happening and I cannot reproduce locally. I've pushed a try build with Trace logs enabled. Maybe it will give a bit more details:
https://treeherder.mozilla.org/jobs?repo=try&revision=416f55f3f2d2924854169cf9d72a0bf6fa648978
Assignee | ||
Comment 15•1 month ago
|
||
The problem with the patch is actually that there is no conditional import of the new processhandler
class for MacOS only. Given that on other platforms we do not require psutil
the related import fails. Conditionally importing fixes the issue:
https://treeherder.mozilla.org/jobs?repo=try&revision=133ffa960cf1cdd58cebd4a2e612dcfc0b14e66c
I'll add the extra line to the patch and reland.
Updated•1 month ago
|
Updated•1 month ago
|
Comment 16•1 month ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e96fce323a1b [marionette] Use psutil on MacOS as custom process handler drop-in for mozrunner. r=webdriver-reviewers,spohl,jdescottes
Comment 17•1 month ago
|
||
bugherder |
Comment 18•28 days ago
|
||
(In reply to Sandor Molnar[:smolnar] from comment #13)
Backed out for causing py3 unit test failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/b7c51567545cbceac3798fa536245baf600b18d4
L.E
Also these perftest failures
Perfherder has detected a build_metrics performance change from push b7c51567545cbceac3798fa536245baf600b18d4.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
1% | installer size | osx-cross | 94,985,376.58 -> 94,154,010.92 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 42263
For more information on performance sheriffing please see our FAQ.
Description
•