MacOS specific part of HangMonitorChild::RecvSetMainThreadQoSPriority seems to hang frequently
Categories
(Core :: XPCOM, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox119 | --- | disabled |
firefox120 | --- | disabled |
firefox121 | --- | disabled |
firefox122 | --- | disabled |
People
(Reporter: jstutte, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 obsolete file)
In the recent ShutdownKill data all MacOS instances I clicked on were stuck inside HangMonitorChild::RecvSetMainThreadQoSPriority.
Reporter | ||
Updated•1 year ago
|
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1834629
:KrisWright, since you are the author of the regressor, bug 1834629, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Comment 2•1 year ago
|
||
Looks like something's causing a hang related to the new codepath. I'm curious if it's related to some recent content process crashes that make the main thread unable to change contexts, resulting in a hang. I'll look into this. As it stands, this code is still in the experiment stage and hasn't been introduced outside of nightly populations outside of the experiment.
Comment 3•1 year ago
|
||
Set release status flags based on info from the regressing bug 1834629
Comment 4•1 year ago
|
||
For posterity, this is gated on the threads.use_low_power.enabled
pref.
Comment 5•1 year ago
|
||
pthread_override_qos_class_start_np
can return NULL. In such a case, in the current code, if the dispatch fails, we'll call pthread_override_qos_class_end_np(NULL)
which might be what we're hanging on. I'll build a patch.
Comment 6•1 year ago
|
||
Comment 7•1 year ago
|
||
I'll take the Bug since I can reproduce it fairly consistently, using these Steps to Reproduce:
- Start building Firefox, causing CPU usage to reach >95% usage or higher via clang. It's likely that other methods of increasing CPU usage would also work, but I haven't been able to demonstrate that. In theory, multiple invocations of
yes > /dev/null &
will do this. - While CPU usage is still high, navigate to "https://www.polygon.com/archives".
- Scroll up and down a few times, then click the "Next" button at the bottom. Switch to another window, then back to the browser window.
- Repeat Step 3 until the browser hangs.
This method works for me to cause a hang quite consistently, though it usually takes 5 minutes to make it happen. It's not easy to replicate, but it is consistent.
Comment 8•1 year ago
|
||
Here's one of my crash reports, generated once the hang has occurred, and then I force-quit the application. https://crash-stats.mozilla.org/report/index/6bb3fa75-f252-4a4d-bff1-dce380231129
Updated•1 year ago
|
Comment 9•1 year ago
|
||
Just reproduced this again, this time while watching a Twitch video during background compilation.
https://crash-stats.mozilla.org/report/index/dcfae4fa-5cbd-4b12-ba85-677420240112
I don't think I'm equipped to solve this. Taking myself off the Bug.
Updated•1 year ago
|
Comment 10•10 months ago
•
|
||
I had a similar hang today. I switched to a phabricator tab that I had, and the content process was unresponsive for a long time. I captured a profile and it was spending the whole time in __bsdthread_ctl
that's inside HangMonitorChild::RecvSetMainThreadQoSPriority
: https://share.firefox.dev/4azCaTY
Comment 11•9 months ago
|
||
Nazim, do you remember if you dragged this phabricator tab into a different window? I just encountered a frozen foreground tab after I had dragged a tab into a different window and I wonder if that's just a code path where we're not sending the "force qos change" signal.
Hmm, good question. I don't remember doing it but I might have mistakenly dragged it a bit while trying to select it. But it should still be in the same window after the attempt as I mostly use a single window.
Comment 13•3 months ago
|
||
It was not confirmed, but we believe this was fixed by bug 1876306. (We kept this bug open while we monitored crash reports because we weren't sure the problem addressed in bug 1876306 caused the issues on this bug.) With bug 1876306 fixed, we no longer see instances of these shutdownkill crashes in crash-stats.
Description
•