Closed Bug 1798099 Opened 2 years ago Closed 2 years ago

Hung Firefox starting in 106 (main thread APZ waiting on GPU process)

Categories

(Core :: Graphics, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox106 + fixed
firefox107 + fixed
firefox108 + fixed

People

(Reporter: jrmuizel, Assigned: jrmuizel)

References

Details

There are a number of reports of Firefox hanging. Often associated with media playback:
https://www.reddit.com/r/firefox/comments/yfkplj/pages_with_media_freeze_firefox_after_update_to/

https://www.reddit.com/r/firefox/comments/yefgrr/still_experiencing_hanging_with_ff_10602/

and associated: https://crash-stats.mozilla.org/report/index/3b3df888-5d5f-479c-96c2-fce6d0221028

We don't have too much information about what's going on yet but all the reports I've seen have been on Intel Xe GPUs

Severity: -- → S1
Priority: -- → P1
Summary: Hung Firefox starting in 106 → Hung Firefox starting in 106 (main thread APZ waiting on GPU process)

Similar reports in bug 1791938.

See Also: → 1791938
See Also: → 1798050

(In reply to Jeff Muizelaar [:jrmuizel] from comment #0)

We don't have too much information about what's going on yet but all the reports I've seen have been on Intel Xe GPUs

As per my bug report (1798050), the bug also happens on integrated AMD graphics.

Another crash report showing us getting stuck waiting for the GPU process:
https://crash-stats.mozilla.org/report/index/500f0fee-442d-49eb-b98d-fb5400221029#allthreads

Can the tool https://github.com/b0bh00d/crash-firefox be made to crash the GPU process? If so, would it make sense to ask folks experiencing this issue on Reddit to crash the GPU process instead so we can see what it's doing at the time of the hang?

(In reply to Botond Ballo [:botond] from comment #5)

Can the tool https://github.com/b0bh00d/crash-firefox be made to crash the GPU process? If so, would it make sense to ask folks experiencing this issue on Reddit to crash the GPU process instead so we can see what it's doing at the time of the hang?

I think so (I haven't tried it) but (1) this would require some way to tell which PID is the GPU process while Firefox is hung and (2) I haven't run into this in just under 3 weeks.

Also (as the reporter of bug 1791938), this only occurs very intermittently for me, indicating maybe not the same cause as bug 1798050. (It is still possible they're related.)

The bug is marked as tracked for firefox106 (release), tracked for firefox107 (beta) and tracked for firefox108 (nightly). We have limited time to fix this, the soft freeze is in 10 days. However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Assignee: nobody → jmuizelaar
Flags: needinfo?(bhood)
Depends on: 1798357

This looks like a dupe of bug 1791938.

Here are two GPU processes crashes:
https://crash-stats.mozilla.org/report/index/3553982f-3197-460d-bd80-bdb9e0221101
https://crash-stats.mozilla.org/report/index/eaeb8abf-7307-461a-8c9b-431650221101

Unfortunately, everything looks normal in those.

However in https://crash-stats.mozilla.org/report/index/f398b407-4a11-4c90-83e3-54f290221102#allthreads we see that we're waiting in mozilla::gfx::DeviceManagerDx::CreateCompositorDevices()

See Also: → 1792115

Reviewing the reports/crashes, I am certain as I can be without reproducing myself that this would be fixed by bug 1792115.

Duplicate of this bug: 1791938

Here's another crash report that says the same thing: https://crash-stats.mozilla.org/report/index/4c036fd5-ce09-4fe4-b4fd-1c2440221102

I've been able to reproduce a similar hang using dxcap -forcetdr. I also confirmed that the hang does not happen in beta 107.

My history of hangs/no hangs also lines up perfectly with the timeline in bug 1792115.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #13)

https://crash-stats.mozilla.org/report/index/4c036fd5-ce09-4fe4-b4fd-1c2440221102
https://crash-stats.mozilla.org/report/index/f398b407-4a11-4c90-83e3-54f290221102

The driver that crashed in #1 is kind of old (30.0.101.1069 @ Nov 11, 2021) and #2 is from July 2022 (31.0.101.3251) so not very old. Updating to 31.0.101.3790 may not help here.

https://crash-stats.mozilla.org/report/index/3553982f-3197-460d-bd80-bdb9e0221101
https://crash-stats.mozilla.org/report/index/eaeb8abf-7307-461a-8c9b-431650221101

Both of these crashed on the latest AMD Adrenalin 22.10.3 driver from 10/28/2022 so unless the driver has a bug, it's likely not the cause.

Looking at the telemetry closer. In 105 we'd get about 1.5-2M device resets per day. In 106 that number has dropped to 0 presumably because we hang instead of recording the resets.

We have a point release out (106.0.4) with the fix in bug 1792115. Would appreciate it if people who run into this grab that update and confirm the issue is addressed. Thanks!

Duplicate of this bug: 1798050
Depends on: 1792115
Duplicate of this bug: 1795274
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.