Open Bug 1864641 Opened 7 months ago Updated 4 days ago

MacOS specific part of HangMonitorChild::RecvSetMainThreadQoSPriority seems to hang frequently

Categories

(Core :: XPCOM, defect)

Unspecified
macOS
defect

Tracking

()

Tracking Status
firefox-esr115 --- unaffected
firefox119 --- disabled
firefox120 --- disabled
firefox121 --- disabled
firefox122 --- disabled

People

(Reporter: jstutte, Unassigned)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: regression)

Attachments

(1 obsolete file)

In the recent ShutdownKill data all MacOS instances I clicked on were stuck inside HangMonitorChild::RecvSetMainThreadQoSPriority.

Component: DOM: Content Processes → XPCOM
Keywords: regression
Regressed by: 1834629

Set release status flags based on info from the regressing bug 1834629

:KrisWright, since you are the author of the regressor, bug 1834629, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Looks like something's causing a hang related to the new codepath. I'm curious if it's related to some recent content process crashes that make the main thread unable to change contexts, resulting in a hang. I'll look into this. As it stands, this code is still in the experiment stage and hasn't been introduced outside of nightly populations outside of the experiment.

Assignee: nobody → kwright
Severity: -- → S3
Flags: needinfo?(kwright)

Set release status flags based on info from the regressing bug 1834629

For posterity, this is gated on the threads.use_low_power.enabled pref.

pthread_override_qos_class_start_np can return NULL. In such a case, in the current code, if the dispatch fails, we'll call pthread_override_qos_class_end_np(NULL) which might be what we're hanging on. I'll build a patch.

I'll take the Bug since I can reproduce it fairly consistently, using these Steps to Reproduce:

  1. Start building Firefox, causing CPU usage to reach >95% usage or higher via clang. It's likely that other methods of increasing CPU usage would also work, but I haven't been able to demonstrate that. In theory, multiple invocations of yes > /dev/null & will do this.
  2. While CPU usage is still high, navigate to "https://www.polygon.com/archives".
  3. Scroll up and down a few times, then click the "Next" button at the bottom. Switch to another window, then back to the browser window.
  4. Repeat Step 3 until the browser hangs.

This method works for me to cause a hang quite consistently, though it usually takes 5 minutes to make it happen. It's not easy to replicate, but it is consistent.

Assignee: kwright → bwerth

Here's one of my crash reports, generated once the hang has occurred, and then I force-quit the application. https://crash-stats.mozilla.org/report/index/6bb3fa75-f252-4a4d-bff1-dce380231129

Just reproduced this again, this time while watching a Twitch video during background compilation.

https://crash-stats.mozilla.org/report/index/dcfae4fa-5cbd-4b12-ba85-677420240112

I don't think I'm equipped to solve this. Taking myself off the Bug.

Assignee: bwerth → nobody
Attachment #9366929 - Attachment is obsolete: true
See Also: → 1872850
See Also: → 1876306

I had a similar hang today. I switched to a phabricator tab that I had, and the content process was unresponsive for a long time. I captured a profile and it was spending the whole time in __bsdthread_ctl that's inside HangMonitorChild::RecvSetMainThreadQoSPriority: https://share.firefox.dev/4azCaTY

Nazim, do you remember if you dragged this phabricator tab into a different window? I just encountered a frozen foreground tab after I had dragged a tab into a different window and I wonder if that's just a code path where we're not sending the "force qos change" signal.

Hmm, good question. I don't remember doing it but I might have mistakenly dragged it a bit while trying to select it. But it should still be in the same window after the attempt as I mostly use a single window.

Blocks: 1895985
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: