Crash in [@ gfxPlatform::FallbackFromAcceleration]
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: jnicol)
References
Details
(Keywords: crash)
Crash Data
Attachments
(4 files)
Crash report: https://crash-stats.mozilla.org/report/index/f3ec70ee-db85-483f-b60e-dd7550230323
MOZ_CRASH Reason: MOZ_CRASH(Fallback configurations exhausted)
Top 10 frames of crashing thread:
0 libxul.so gfxPlatform::FallbackFromAcceleration gfx/thebes/gfxPlatform.cpp:3710
1 libxul.so mozilla::gfx::GPUProcessManager::FallbackFromAcceleration gfx/ipc/GPUProcessManager.cpp
2 libxul.so mozilla::gfx::GPUProcessManager::DisableWebRenderConfig gfx/ipc/GPUProcessManager.cpp:616
2 libxul.so mozilla::gfx::GPUProcessManager::DisableWebRender gfx/ipc/GPUProcessManager.cpp:634
3 libxul.so mozilla::gfx::GPUProcessManager::NotifyWebRenderError gfx/ipc/GPUProcessManager.cpp:654
4 libxul.so mozilla::layers::CompositorManagerChild::RecvNotifyWebRenderError gfx/layers/ipc/CompositorManagerChild.cpp:257
5 libxul.so mozilla::layers::PCompositorManagerChild::OnMessageReceived ipc/ipdl/PCompositorManagerChild.cpp:580
6 libxul.so mozilla::ipc::MessageChannel::DispatchAsyncMessage ipc/glue/MessageChannel.cpp:1800
6 libxul.so mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:1725
6 libxul.so mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1525
I see there's already been two bugs about this but it still seems to be a problem, at least on Android. The vast majority of crashes in recent versions are from users with Mali-G71 GPUs on Samsung devices. The adapter driver versions is always OpenGL ES 3.2 v1.r16p0-01rel0.###other-sha0123456789ABCDEF0###
for the affected devices.
Updated•1 year ago
|
Assignee | ||
Comment 1•1 year ago
|
||
See bug 1796947 too.
I can see in the graphics critical error annotation in the crash reports that we have 0 active renderers at the time of the crash. So it is not because we are already attached to the Surface. The Surface we are being given by the OS must be in a broken state for some reason. In bug 1772839 we found this was because the "BufferQueue has been abandoned". We added code to detect this and attempt workaround it in some cases (and explicitly crash when that fails - see bug 1801524).
The remaining crashes must be due to the Surface being in a different broken state, which we are unable to detect using the same trick. I will look to see if there are any other ways of detecting such a state. I'm unsure whether this is due to OS bugs, or something Fenix / geckoview is doing wrong.
Assignee | ||
Comment 2•1 year ago
|
||
Previous attempts to detect the invalid Surface state prior to resuming the compositor were clearly inadequate. Instead we should just try to resume the compositor, and if that fails then we can try to recover. (eg Toggle the SurfaceView's visiblity to force the system to give us a new surface.) And if that still fails then we can crash as we currently do.
This could potentially have been chromium experiencing the same bug: https://chromium-review.googlesource.com/c/chromium/src/+/1244106
Seems like they do roughly what I'm envisioning
Assignee | ||
Comment 3•1 year ago
|
||
Not related to this bug, but it's a good opportunity for a small tidy up.
Updated•1 year ago
|
Assignee | ||
Comment 4•1 year ago
|
||
The detection is inadequate and the workaround does not work on all
versions of Android. Later patches in this series will replace this
with something better.
Depends on D176718
Assignee | ||
Comment 5•1 year ago
|
||
Adds a new interface, GeckoDisplay.NewSurfaceProvider, of which an
implementation can be provided in the GeckoDisplay.SurfaceInfo passed
to GeckoDisplay.surfaceChanged(). We include an implementation in
the GeckoView class, which works by toggling the SurfaceView's
visibility, causing a new surfaceChanged callback to be fired.
Depends on D176719
Assignee | ||
Comment 6•1 year ago
|
||
We see a fair number of crashes caused by failing to create an EGL
surface when resuming the compositor on Android. We believe that in
the vast majority of these cases the Surface we have been provided by
the OS is in an invalid state, and therefore we will never succeed in
creating an EGL surface from it.
Currently when creating the EGL surface fails we raise a NEW_SURFACE
webrender error. This causes us to fall back through webrender
configurations, reinitialize the compositors, and eventually crash
when we are still unable to resume. None of this will help when the
Android Surface we have been provided is in this invalid state.
This patch therefore avoids raising the webrender error initially, and
instead gives the widget an opportunity to handle the failure. The
widget uses the new GeckoView API added in the previous patch in this
series to request a new Surface from the application. This will cause
another resume event immediately afterwards with a new - and hopefully
valid - surface, allowing the EGL surface to be created and the
compositor to be successfully resumed. If we are still unable to
create an EGL surface after this, then we will raise the webrender
error as before, likely eventually resulting in a crash.
Depends on D176720
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/57ae9c87c467 Remove unused member and unnecessary ifdef. r=gfx-reviewers,ErichDonGubler https://hg.mozilla.org/integration/autoland/rev/6fad491b84be Remove previous attempts to detect and work around invalid Surface bug. r=gfx-reviewers,geckoview-reviewers,nical,m_kato https://hg.mozilla.org/integration/autoland/rev/bf5c5929ef7b Add API to allow Gecko to request a new Surface from the application. r=geckoview-reviewers,m_kato https://hg.mozilla.org/integration/autoland/rev/7c1be037e345 Request new Surface from application when resuming compositor fails on Android. r=gfx-reviewers,geckoview-reviewers,nical,m_kato
Comment 8•1 year ago
|
||
Backed out for causing build bustages in include/mozilla/webrender/RenderCompositorOGLSWGL.h:
Backout link: https://hg.mozilla.org/integration/autoland/rev/5a4114777f84a2429c7f6afcb25f2d5fc8b38525
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5e44a52f2989 Remove unused member and unnecessary ifdef. r=gfx-reviewers,ErichDonGubler https://hg.mozilla.org/integration/autoland/rev/50b28cf6d6ca Remove previous attempts to detect and work around invalid Surface bug. r=gfx-reviewers,geckoview-reviewers,nical,m_kato https://hg.mozilla.org/integration/autoland/rev/e26fa9885b0e Add API to allow Gecko to request a new Surface from the application. r=geckoview-reviewers,m_kato https://hg.mozilla.org/integration/autoland/rev/feb23257a892 Request new Surface from application when resuming compositor fails on Android. r=gfx-reviewers,geckoview-reviewers,nical,m_kato
Assignee | ||
Updated•1 year ago
|
Comment 10•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5e44a52f2989
https://hg.mozilla.org/mozilla-central/rev/50b28cf6d6ca
https://hg.mozilla.org/mozilla-central/rev/e26fa9885b0e
https://hg.mozilla.org/mozilla-central/rev/feb23257a892
Updated•1 year ago
|
Comment 11•11 months ago
|
||
The patch landed in nightly and beta is affected.
:jnicol, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox114
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 12•11 months ago
|
||
These crashes have been around a while, and the patch is not without risk, so I think better to let it bake on nightly for a cycle
Description
•