MacOS specific part of HangMonitorChild::RecvSetMainThreadQoSPriority seems to hang frequently
Categories
(Core :: XPCOM, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox119 | --- | disabled |
firefox120 | --- | disabled |
firefox121 | --- | disabled |
firefox122 | --- | disabled |
People
(Reporter: jstutte, Unassigned)
References
(Blocks 2 open bugs, Regression)
Details
(Keywords: regression)
Attachments
(1 obsolete file)
In the recent ShutdownKill data all MacOS instances I clicked on were stuck inside HangMonitorChild::RecvSetMainThreadQoSPriority.
Reporter | ||
Updated•7 months ago
|
Comment 1•7 months ago
|
||
Set release status flags based on info from the regressing bug 1834629
:KrisWright, since you are the author of the regressor, bug 1834629, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Updated•7 months ago
|
Comment 2•7 months ago
|
||
Looks like something's causing a hang related to the new codepath. I'm curious if it's related to some recent content process crashes that make the main thread unable to change contexts, resulting in a hang. I'll look into this. As it stands, this code is still in the experiment stage and hasn't been introduced outside of nightly populations outside of the experiment.
Comment 3•7 months ago
|
||
Set release status flags based on info from the regressing bug 1834629
Comment 4•7 months ago
|
||
For posterity, this is gated on the threads.use_low_power.enabled
pref.
Comment 5•6 months ago
|
||
pthread_override_qos_class_start_np
can return NULL. In such a case, in the current code, if the dispatch fails, we'll call pthread_override_qos_class_end_np(NULL)
which might be what we're hanging on. I'll build a patch.
Comment 6•6 months ago
|
||
Comment 7•6 months ago
|
||
I'll take the Bug since I can reproduce it fairly consistently, using these Steps to Reproduce:
- Start building Firefox, causing CPU usage to reach >95% usage or higher via clang. It's likely that other methods of increasing CPU usage would also work, but I haven't been able to demonstrate that. In theory, multiple invocations of
yes > /dev/null &
will do this. - While CPU usage is still high, navigate to "https://www.polygon.com/archives".
- Scroll up and down a few times, then click the "Next" button at the bottom. Switch to another window, then back to the browser window.
- Repeat Step 3 until the browser hangs.
This method works for me to cause a hang quite consistently, though it usually takes 5 minutes to make it happen. It's not easy to replicate, but it is consistent.
Comment 8•6 months ago
|
||
Here's one of my crash reports, generated once the hang has occurred, and then I force-quit the application. https://crash-stats.mozilla.org/report/index/6bb3fa75-f252-4a4d-bff1-dce380231129
Updated•6 months ago
|
Comment 9•5 months ago
|
||
Just reproduced this again, this time while watching a Twitch video during background compilation.
https://crash-stats.mozilla.org/report/index/dcfae4fa-5cbd-4b12-ba85-677420240112
I don't think I'm equipped to solve this. Taking myself off the Bug.
Updated•5 months ago
|
Comment 10•2 months ago
•
|
||
I had a similar hang today. I switched to a phabricator tab that I had, and the content process was unresponsive for a long time. I captured a profile and it was spending the whole time in __bsdthread_ctl
that's inside HangMonitorChild::RecvSetMainThreadQoSPriority
: https://share.firefox.dev/4azCaTY
Comment 11•1 month ago
|
||
Nazim, do you remember if you dragged this phabricator tab into a different window? I just encountered a frozen foreground tab after I had dragged a tab into a different window and I wonder if that's just a code path where we're not sending the "force qos change" signal.
Hmm, good question. I don't remember doing it but I might have mistakenly dragged it a bit while trying to select it. But it should still be in the same window after the attempt as I mostly use a single window.
Description
•