Disable the background QoS manager on MacOS and IPC priority manager on other platforms by default
Categories
(Remote Protocol :: Agent, task, P2)
Tracking
(firefox139 fixed)
Tracking | Status | |
---|---|---|
firefox139 | --- | fixed |
People
(Reporter: whimboo, Assigned: whimboo)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [webdriver:m16][webdriver:relnote])
Attachments
(2 files)
As discovered on bug 1791951 comment 157 the background QoS manager on MacOS causes hangs in WebDriver tests as driven by our Marionette and WebDriver BiDi implementation. It's easy to see when running a test which opens a new tab in the background while all the available CPUs are under high load. In such a case the navigation just hangs because the web content process of the background tab has a very low priority and is basically stalled (see this example Gecko profile).
Steps to reproduce on MacOS:
- Check how many CPUs you have
- Run
yes > /dev/null &
that many times in your terminal to put all available CPUs under 100% load - Run
mach marionette-test -vv --gecko-log - --setpref="remote.log.truncate=true" testing/marionette/harness/marionette_harness/tests/unit/unit-tests.toml
and observe the delays when a new background tab is opened and Marionette waits for the initial navigation to be done. - Alternatively modify an existing Marionette test file and replace every test with just:
def test(self):
result = self.marionette.open(type="tab", focus=False)
The load of the initial about:blank
will be heavily delayed or it stalls completely. Disabling threads.lower_mainthread_priority_in_background.enabled
makes it work. Given that this preference is only for the MacOS platform we should potentially as well disable dom.ipc.processPriorityManager.enabled
for Linux, Windows, and Android.
Haik and Nika, do you foresee any issues with that? For automation it's basically not useful at all to stall the processes when the system is under high load. I assume that we see similar things in CI because we share workers between different jobs and it might be likely that the system's load is higher as expected. Not sure what happens if multiple jobs run in parallel and each of them have the priority managers turned off.
Assignee | ||
Comment 1•2 months ago
•
|
||
I've pushed the pref additions to try to see what effect it will actually have across various test suites as driven by Marionette and WebDriver BiDi.
https://treeherder.mozilla.org/jobs?repo=try&revision=80f6df59ca0bac21e247fccfdfeeff878d3e47b5
Assignee | ||
Comment 2•2 months ago
|
||
This might be as well the underlying reason why we see a lot of hanging IPC calls from our JSWindowActors over on bug 1935939.
Comment 3•2 months ago
|
||
Are the tabs in these scenarios actually background tabs? If not, then we should fix the priority manager to not treat them as background tabs.
Comment 4•2 months ago
|
||
The answer appears to be yes - in self.marionette.open(type="tab", focus=False)
, the focus=False
means "do not switch to the tab".
I think this means that the problem pointed out in this test is a real problem that can be encountered by users. I've filed bug 1960741 about it.
Comment 5•2 months ago
|
||
I'd be ok with turning off the pref in these tests, provided that a follow-up bug is filed to try to turn it back on once bug 1960741 is fixed.
Turning off the pref would be a return to the old behavior for these tests.
I believe turning off dom.ipc.processPriorityManager.enabled
should work for all platforms - on macOS this will also stop the QoS behavior.
Assignee | ||
Comment 6•2 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #5)
I believe turning off
dom.ipc.processPriorityManager.enabled
should work for all platforms - on macOS this will also stop the QoS behavior.
I tried that but this pref's value doesn't have any effect to the behavior. Even when turned off the background tab doesn't load until I move the mouse over the tab.
Comment 7•2 months ago
|
||
I agree with Markus' comment 5. OK to turn it off to workaround test problems, but it sounds like a bug that could affect users and needs to be fixed.
Assignee | ||
Comment 8•2 months ago
|
||
Updated•2 months ago
|
Assignee | ||
Comment 9•2 months ago
|
||
Comment 10•2 months ago
|
||
Comment 11•2 months ago
|
||
(In reply to Haik Aftandilian [:haik] from comment #7)
I agree with Markus' comment 5. OK to turn it off to workaround test problems, but it sounds like a bug that could affect users and needs to be fixed.
My understanding here is effectively:
- We're opening a background tab which doesn't have user focus
- The background tab is given background priority
- The user's computer is under heavy load, so the OS doesn't schedule threads with background priority
- The load in the background tab is delayed, due to us not prioritizing that load
This seems a bit like the expected behaviour to me? In general I would expect we would want to prioritize other things happening for the user (such as their currently active tab) over loading something in a background tab, and the load would begin responding and load as normal once it becomes foreground.
Is the idea that the user can perceive that a loading throbber is ongoing for the background tab, and thus would expect the page to be given foreground priority for that? I worry about giving any page which requests a navigation foreground priority even in the background, but perhaps we could get away with doing it for the initial load? I don't think we'd want to give foreground priority to background tabs when we're doing something like reloading all tabs, or the tab itself decides to navigate.
Perhaps there's some argument for something like allowing up to 1 actively navigating background tab to have foreground priority (with a time limit)? I'm not sure how we'd want to calibrate something like that, especially given that I am not aware of user complaints about our current behaviour.
Comment 12•2 months ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #11)
(In reply to Haik Aftandilian [:haik] from comment #7)
I agree with Markus' comment 5. OK to turn it off to workaround test problems, but it sounds like a bug that could affect users and needs to be fixed.
My understanding here is effectively:
- We're opening a background tab which doesn't have user focus
- The background tab is given background priority
- The user's computer is under heavy load, so the OS doesn't schedule threads with background priority
- The load in the background tab is delayed, due to us not prioritizing that load
This seems a bit like the expected behaviour to me? In general I would expect we would want to prioritize other things happening for the user (such as their currently active tab) over loading something in a background tab, and the load would begin responding and load as normal once it becomes foreground.
Is the idea that the user can perceive that a loading throbber is ongoing for the background tab, and thus would expect the page to be given foreground priority for that? I worry about giving any page which requests a navigation foreground priority even in the background, but perhaps we could get away with doing it for the initial load? I don't think we'd want to give foreground priority to background tabs when we're doing something like reloading all tabs, or the tab itself decides to navigate.
The thinking is it might be a better user experience to load the tab and then go into background QoS mode so that when the user switches to it, it is already loaded. The jank/performance cost would be the same as without QoS support and we'd still get the power savings over time.
Perhaps there's some argument for something like allowing up to 1 actively navigating background tab to have foreground priority (with a time limit)? I'm not sure how we'd want to calibrate something like that, especially given that I am not aware of user complaints about our current behaviour.
That's the thinking with the bug Markus' filed: bug 1960741 "When opening a link in a new background tab, the background tab should have foreground priority for a few seconds while it loads". If we were to implement that as described, we don't have to choose foreground tab QoS level, we could use one of the levels between user interactive and background.
Comment 13•2 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3e94b2a4d88b
https://hg.mozilla.org/mozilla-central/rev/887466638354
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Updated•1 month ago
|
Description
•