Mac GPU process: Crash in [@ IPCError-browser | GPUProcessKill]
Categories
(Core :: Graphics, defect, P2)
Tracking
()
People
(Reporter: aleiserson, Assigned: bradwerth)
References
Details
(Keywords: topcrash, topcrash-startup)
Crash Data
It is tricky to capture the scope of this issue because GPUProcessKill is a common crash signature with multiple causes. Crash stats shows 113 GPUProcessKill crashes in the past week on MacOS. (https://crash-stats.mozilla.org/search/?product=Firefox&platform=Mac%20OS%20X&process_type=gpu&date=>%3D2025-10-20T16%3A35%3A00.000Z&date=<2025-10-27T16%3A35%3A00.000Z&_facets=signature). Note that the big uptick in the crash data graph is on Android, and is associated with bug 1900134 and bug 1908798.
Looking at a sampling of the reports (and clicking "Show other threads" to see all the threads), in most of them the Renderer thread appears to be actively working, with no consistency in exactly what it is doing. Although I did find this one where that is not the case: https://crash-stats.mozilla.org/report/index/a868da07-ebd8-4ce9-9674-f693c0251022
| Assignee | ||
Comment 1•1 month ago
|
||
If I understand correctly, "GPUProcessKill" is only emitted when a minidump is generated. As best I can tell, there's only two callsites that request a minidump while killing the GPU process:
CompositorManagerChild::ShouldContinueFromReplyTimeout()UiCompositorControllerChild::SetReplyTimeout()
I'll check the crash reports and see if there's some correlation to these callsites.
| Reporter | ||
Comment 2•1 month ago
|
||
Here is another view that may be useful: https://crash-stats.mozilla.org/search/?signature=%3DIPCError-browser%20%7C%20GPUProcessKill&product=Firefox&platform=Mac%20OS%20X&process_type=gpu&date=%3E%3D2025-10-20T16%3A35%3A00.000Z&date=%3C2025-10-27T19%3A07%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=version&_columns=build_id&_columns=graphics_critical_error#crash-reports
The common thread seems to be "Killing GPU process due to IPC reply timeout". (I noted this earlier but then decided that maybe this was inherently associated with GPUProcessKill and didn't mention it in the description.) This is why I was trying to figure out where things got stuck and noted that the renderer thread seemed to still be working.
I don't know what the timeout is or what the considerations are in setting it. Possibly it could be adjusted? I figured that someone more knowledgeable about graphics crashes than I am might have better intuition about what could be going on here and what strategies make sense to narrow down the problem.
| Assignee | ||
Comment 3•1 month ago
|
||
I see. I wonder if actor destruction can lead to IPC timeout somehow. If one side of the bridge is alive and CanSend() is true, and then a sync message(?) is sent and the receiving side dies without processing it. Actually, that seems like the sort of thing we would be encountering broadly if it was possible. I'll see if I can come up with a better theory.
| Assignee | ||
Comment 4•1 month ago
|
||
See Also Bug 1900134 as the Android version of this crash.
Comment 5•1 month ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 10 desktop browser crashes on nightly (startup)
- Top 5 GPU process crashes on release (startup)
- Top 10 AArch64 and ARM crashes on nightly
- Top 10 AArch64 and ARM crashes on beta
- Top 10 AArch64 and ARM crashes on release
:bhood, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
| Reporter | ||
Comment 6•1 month ago
•
|
||
Re: bugbot topcrash notice, this is a generic crash signature that is almost certainly being produced for multiple reasons. Most of the recent crash volume is on Android, so bug 1900134 or bug 1908798 should be the topcrash bug, if any.
Comment 7•1 month ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 10 desktop browser crashes on nightly (startup)
- Top 5 GPU process crashes on release (startup)
- Top 10 AArch64 and ARM crashes on nightly
- Top 10 AArch64 and ARM crashes on beta
- Top 10 AArch64 and ARM crashes on release
For more information, please visit BugBot documentation.
Updated•1 month ago
|
| Assignee | ||
Comment 8•19 days ago
|
||
I will try to make this less noisy, maybe by testing my theory in (In reply to Brad Werth [:bradwerth] from comment #3)
I see. I wonder if actor destruction can lead to IPC timeout somehow. If one side of the bridge is alive and
CanSend()is true, and then a sync message(?) is sent and the receiving side dies without processing it. Actually, that seems like the sort of thing we would be encountering broadly if it was possible. I'll see if I can come up with a better theory.
I will take this Bug and try to make this crash signature less noisy, maybe by pursuing my theory above.
Description
•