Closed Bug 1824083 Opened 1 year ago Closed 1 year ago

Crash in [@ gfxPlatform::FallbackFromAcceleration]

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox-esr102 --- wontfix
firefox113 --- wontfix
firefox114 --- wontfix
firefox115 --- fixed

People

(Reporter: gsvelto, Assigned: jnicol)

References

Details

(Keywords: crash)

Crash Data

Attachments

(4 files)

Crash report: https://crash-stats.mozilla.org/report/index/f3ec70ee-db85-483f-b60e-dd7550230323

MOZ_CRASH Reason: MOZ_CRASH(Fallback configurations exhausted)

Top 10 frames of crashing thread:

0  libxul.so  gfxPlatform::FallbackFromAcceleration  gfx/thebes/gfxPlatform.cpp:3710
1  libxul.so  mozilla::gfx::GPUProcessManager::FallbackFromAcceleration  gfx/ipc/GPUProcessManager.cpp
2  libxul.so  mozilla::gfx::GPUProcessManager::DisableWebRenderConfig  gfx/ipc/GPUProcessManager.cpp:616
2  libxul.so  mozilla::gfx::GPUProcessManager::DisableWebRender  gfx/ipc/GPUProcessManager.cpp:634
3  libxul.so  mozilla::gfx::GPUProcessManager::NotifyWebRenderError  gfx/ipc/GPUProcessManager.cpp:654
4  libxul.so  mozilla::layers::CompositorManagerChild::RecvNotifyWebRenderError  gfx/layers/ipc/CompositorManagerChild.cpp:257
5  libxul.so  mozilla::layers::PCompositorManagerChild::OnMessageReceived  ipc/ipdl/PCompositorManagerChild.cpp:580
6  libxul.so  mozilla::ipc::MessageChannel::DispatchAsyncMessage  ipc/glue/MessageChannel.cpp:1800
6  libxul.so  mozilla::ipc::MessageChannel::DispatchMessage  ipc/glue/MessageChannel.cpp:1725
6  libxul.so  mozilla::ipc::MessageChannel::RunMessage  ipc/glue/MessageChannel.cpp:1525

I see there's already been two bugs about this but it still seems to be a problem, at least on Android. The vast majority of crashes in recent versions are from users with Mali-G71 GPUs on Samsung devices. The adapter driver versions is always OpenGL ES 3.2 v1.r16p0-01rel0.###other-sha0123456789ABCDEF0### for the affected devices.

Severity: -- → S3

See bug 1796947 too.

I can see in the graphics critical error annotation in the crash reports that we have 0 active renderers at the time of the crash. So it is not because we are already attached to the Surface. The Surface we are being given by the OS must be in a broken state for some reason. In bug 1772839 we found this was because the "BufferQueue has been abandoned". We added code to detect this and attempt workaround it in some cases (and explicitly crash when that fails - see bug 1801524).

The remaining crashes must be due to the Surface being in a different broken state, which we are unable to detect using the same trick. I will look to see if there are any other ways of detecting such a state. I'm unsure whether this is due to OS bugs, or something Fenix / geckoview is doing wrong.

Depends on: 1830026

Previous attempts to detect the invalid Surface state prior to resuming the compositor were clearly inadequate. Instead we should just try to resume the compositor, and if that fails then we can try to recover. (eg Toggle the SurfaceView's visiblity to force the system to give us a new surface.) And if that still fails then we can crash as we currently do.

This could potentially have been chromium experiencing the same bug: https://chromium-review.googlesource.com/c/chromium/src/+/1244106

Seems like they do roughly what I'm envisioning

Not related to this bug, but it's a good opportunity for a small tidy up.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED

The detection is inadequate and the workaround does not work on all
versions of Android. Later patches in this series will replace this
with something better.

Depends on D176718

Adds a new interface, GeckoDisplay.NewSurfaceProvider, of which an
implementation can be provided in the GeckoDisplay.SurfaceInfo passed
to GeckoDisplay.surfaceChanged(). We include an implementation in
the GeckoView class, which works by toggling the SurfaceView's
visibility, causing a new surfaceChanged callback to be fired.

Depends on D176719

We see a fair number of crashes caused by failing to create an EGL
surface when resuming the compositor on Android. We believe that in
the vast majority of these cases the Surface we have been provided by
the OS is in an invalid state, and therefore we will never succeed in
creating an EGL surface from it.

Currently when creating the EGL surface fails we raise a NEW_SURFACE
webrender error. This causes us to fall back through webrender
configurations, reinitialize the compositors, and eventually crash
when we are still unable to resume. None of this will help when the
Android Surface we have been provided is in this invalid state.

This patch therefore avoids raising the webrender error initially, and
instead gives the widget an opportunity to handle the failure. The
widget uses the new GeckoView API added in the previous patch in this
series to request a new Surface from the application. This will cause
another resume event immediately afterwards with a new - and hopefully
valid - surface, allowing the EGL surface to be created and the
compositor to be successfully resumed. If we are still unable to
create an EGL surface after this, then we will raise the webrender
error as before, likely eventually resulting in a crash.

Depends on D176720

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/57ae9c87c467
Remove unused member and unnecessary ifdef. r=gfx-reviewers,ErichDonGubler
https://hg.mozilla.org/integration/autoland/rev/6fad491b84be
Remove previous attempts to detect and work around invalid Surface bug. r=gfx-reviewers,geckoview-reviewers,nical,m_kato
https://hg.mozilla.org/integration/autoland/rev/bf5c5929ef7b
Add API to allow Gecko to request a new Surface from the application. r=geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/7c1be037e345
Request new Surface from application when resuming compositor fails on Android. r=gfx-reviewers,geckoview-reviewers,nical,m_kato
Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e44a52f2989
Remove unused member and unnecessary ifdef. r=gfx-reviewers,ErichDonGubler
https://hg.mozilla.org/integration/autoland/rev/50b28cf6d6ca
Remove previous attempts to detect and work around invalid Surface bug. r=gfx-reviewers,geckoview-reviewers,nical,m_kato
https://hg.mozilla.org/integration/autoland/rev/e26fa9885b0e
Add API to allow Gecko to request a new Surface from the application. r=geckoview-reviewers,m_kato
https://hg.mozilla.org/integration/autoland/rev/feb23257a892
Request new Surface from application when resuming compositor fails on Android. r=gfx-reviewers,geckoview-reviewers,nical,m_kato
Flags: needinfo?(jnicol)

The patch landed in nightly and beta is affected.
:jnicol, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox114 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(jnicol)

These crashes have been around a while, and the patch is not without risk, so I think better to let it bake on nightly for a cycle

Flags: needinfo?(jnicol)
See Also: → 1839239
See Also: 1839239
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: