Closed Bug 1757324 Opened 3 years ago Closed 3 years ago

Regular app-wide hangs on macOS

Categories

(Core :: Graphics: WebRender, defect)

Firefox 97
defect

Tracking

()

RESOLVED INACTIVE

People

(Reporter: evanw, Unassigned, NeedInfo)

References

Details

(Whiteboard: [mac:hang])

Attachments

(1 file)

Attached file firefox-hang.txt

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:97.0) Gecko/20100101 Firefox/97.0

Steps to reproduce:

It's unclear exactly what triggers this. I'm just using the browser normally with a few tabs open. This particular case had two tabs, one of which uses some basic WebGL. Firefox always starts hanging when it's in the background. I switch to it later and find it hung.

Actual results:

Ever since Firefox 97, the whole app (including chrome) locks up and hangs around 30-90 minutes in. CPU usage is minimal when this happens. I get a beach ball and have to force quit the app. Apple's ignore/report dialog says the cause is "hang" so maybe there's some kind of deadlock? I'm attaching the report in the hope that it's useful.

I'd love to be able to revert to an earlier version of Firefox but I tried and it doesn't appear to be possible (I'd have to make a new profile and lose everything).

Expected results:

The browser shouldn't hang. I'm going to have to stop using Firefox at some point if this isn't fixed.

Actually never mind about hanging when in the background. I just had it start hanging when in the foreground too.

The Bugbug bot thinks this bug should belong to the 'Core::Canvas: WebGL' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Canvas: WebGL
Product: Firefox → Core

The severity field is not set for this bug.
:jgilbert, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jgilbert)
Depends on: gfx-triage
Flags: needinfo?(jgilbert)

This sounds more like webrender than webgl. @gw?

Flags: needinfo?(gwatson)

All the WR scene / backend / renderer / worker threads look like they are idle, waiting for work to do, I think. It not clear to me that any of them are stuck inside a loop or anything like that from the trace information.

The heaviest stack part of the trace includes CA::Transaction::commit() but it looks like that might be coming from an event handler, rather than something explicit called by the WR compositor code.

I'm not a mac expert at all, but it seems like this might be a more general mac event handling issue? Or maybe it could be something to do with how we alloc / free CALayers in the native compositor integration?

Markus, any ideas what this might be related to?

Flags: needinfo?(gwatson) → needinfo?(mstange.moz)

The main thread is inside a CoreAnimation transaction, and is blocked waiting on a lock. It is probably waiting for the WebRender "Renderer" thread to render something.
These main thread CA commits happen when the window is focused / unfocused or resized.

When we trigger these synchronous renders, we always send a new display list to WebRender, so WebRender should definitely have something it can render. It shouldn't just hang around and wait.
The main thread blocks in SendFlushRendering(), waiting for the compositor thread which blocks in WebRenderAPI::WaitFlushed, waiting for the renderer thread. At least that's the common case. It's not entirely clear if this is exactly what's happening on Evan's machine, because the Activity Monitor profile doesn't contain full stacks, because Firefox Release before 99 wasn't built with frame pointers. (I fixed this recently in bug 1451902.)

I haven't heard of this problem before. I'd love to debug it, but it sounds quite hard to reproduce :(

Flags: needinfo?(mstange.moz)

Evan, when you encounter this again, could you kill the Firefox process using kill -SIGABRT 12345? This should cause the Firefox crash reporter to come up. After submitting the report, you can find it on about:crashes and then post the link here. This will hopefully give us the full stacks of all threads.

Flags: needinfo?(evan.exe)
Status: UNCONFIRMED → NEW
Component: Canvas: WebGL → Graphics: WebRender
Ever confirmed: true
Whiteboard: [mac:hang]
Blocks: gfx-triage
No longer depends on: gfx-triage
Severity: -- → S4
Priority: -- → P3
Severity: S4 → --
Priority: P3 → --
Severity: -- → S2
See Also: → 1760382

Evan, have you seen this again recently?

Without more info, we can't make progress here.

We suspect this might be related to bug 1670885, but generally we expect to only see a hang while the system is under load, and Firefox should unhang after the system load goes away and CPU scheduling frees up.
I think also we might have primarily been seeing that on M1 Macs, while it looks like this is for an Intel Mac.
But I'll link to that bug just in case.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INACTIVE
See Also: → 1670885
No longer blocks: gfx-triage
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: