Crash in [@ gfxPlatform::FallbackFromAcceleration]
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr91 | --- | unaffected |
| firefox100 | --- | unaffected |
| firefox101 | + | wontfix |
| firefox102 | + | fixed |
People
(Reporter: amejia, Assigned: aosmond)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/564de69b-5ad6-40e7-8a5b-9179f0220506
MOZ_CRASH Reason: MOZ_CRASH(Fallback configurations exhausted)
Top 10 frames of crashing thread:
0 libxul.so gfxPlatform::FallbackFromAcceleration gfx/thebes/gfxPlatform.cpp:3420
1 libxul.so mozilla::gfx::GPUProcessManager::DisableWebRender gfx/ipc/GPUProcessManager.cpp:578
2 libxul.so mozilla::gfx::GPUProcessManager::NotifyWebRenderError gfx/ipc/GPUProcessManager.cpp:597
3 libxul.so mozilla::layers::CompositorManagerChild::RecvNotifyWebRenderError gfx/layers/ipc/CompositorManagerChild.cpp:257
4 libxul.so mozilla::layers::PCompositorManagerChild::OnMessageReceived ipc/ipdl/PCompositorManagerChild.cpp:567
5 libxul.so mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1535
6 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:465
7 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:85
8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:355
9 libxul.so nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
Comment 1•3 years ago
|
||
This is showing up in the Fenix Beta topcrash list. Can you please take a look, Andrew?
Comment 2•3 years ago
|
||
Sorry, this is probably more of a jnicol question.
Comment 3•3 years ago
|
||
Just a note: Jamie is on PTO until 25 May.
Comment 4•3 years ago
|
||
That'll be the middle of RC week. Is there someone else who can look into this in the mean time? Though now that I think of it, I wonder if this will go away with the GPU process being disabled by bug 1768674. We'll be shipping beta.4 tomorrow with that change, so maybe we can revisit on Monday once we've seen some incoming crash data for that release.
Comment 5•3 years ago
|
||
I'll check my team, but Jamie may have taken the bulk of the mobile expertise with him.
| Assignee | ||
Comment 6•3 years ago
|
||
I will look.
| Assignee | ||
Comment 7•3 years ago
|
||
So we only trigger this crash if we issue a WebRenderError::NEW_SURFACE failure, after trying full HW WebRender with EGL, and then partial HW WebRender with SWGL drawing and GL compositing. This can happen when we fail to create EGL surfaces on Android with both EGL and EGL+SWGL backends.
I haven't found any evidence this crash happens much without the GPU process (GPUProcessStatus is always Running in CrashAnnotations), so now that we've disabled it, I am expecting the volume to go down. This could be wrong.
Assuming it is tied to the GPU process:
- Why does it happen in the first place?
- If intrinsic to the platform/device, should we try to fallback again by disabling the GPU process?
| Assignee | ||
Comment 9•3 years ago
|
||
Thinking on it further, the uptime, e.g. original report was after 44 minutes, suggests that a user can encounter the NEW_SURFACE error during regular use. This is problematic to crash the parent process in that case -- the user might be generally in a stable state, but cannot continue with the current GL context situation. We should consider either tearing down the GPU process and restarting it, or tearing down the compositor sessions to treat it like a device reset.
Comment 10•3 years ago
|
||
Still seeing crashes with 101.0.0-beta.4 :(
| Assignee | ||
Comment 11•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Comment 13•3 years ago
|
||
| bugherder | ||
Updated•3 years ago
|
Comment 14•3 years ago
|
||
The patch landed in nightly and beta is affected.
:aosmond, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.
For more information, please visit auto_nag documentation.
Comment 15•3 years ago
|
||
It's not clear to me that the patch has had any significant effect on the crash volume.
Comment 16•3 years ago
|
||
Jamie will be returning from PTO soon (in a little more than 12 hours), and I will have him focus on this immediately.
Comment 17•3 years ago
|
||
Thanks for picking this up in my absence, Andrew. I think the landed patch makes a lot of sense, as OOM could in theory certainly cause this to occur.
I agree it doesn't appear to have had a significant effect on the crash volume. Part of this is because I landed bug 1768925 just before I left, to prevent users running in to bug 1767128. The effect of this is that whenever the GPU process crashes on Android 12 (and it is still enabled on nightly) we run in the this issue. If we break down the crash stats by android version we can see that the crash rate spikes on Android 12 (SDK level 31) following this, as expected.
Also if we look at the breakdown by android version, we can see that SDK level 28 (Android 9) is significantly higher than the others. I'm guessing that we therefore have a bona fide issue on Android 9 causing us to run in to this bug. If we ignore the Android 12 and Android 9 crashes then I think the crash rate is acceptably low (and Andrew's patch hopefully makes it even lower).
In bug 1767128 we will solve the Android 12 issue. We can re-open this or file a new one to investigate the Android 9 issue. In the interim I think we still want to force this crash, as the browser will be in an unusable state otherwise and it helps us detect potential issues like the suspected android 9 one.
Comment 18•3 years ago
|
||
Do you think this patch is worth taking on 101 by itself?
Comment 19•3 years ago
|
||
I'm inclined to say no. Android 9 accounts for 78% of the crashes on Beta, and I suspect this patch will not help those. The GPU process / WR fallback logic is complex and I don't think the risks of changing that are worth slightly reducing that remaining 22%.
Comment 20•3 years ago
|
||
OK, let's move the follow-up work to a new bug for better tracking.
Comment 21•3 years ago
|
||
This is currently the #3 Fenix 101 topcrash since release. I think we need to spend more time investigating here :(
| Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Description
•