Fenix crashes when playing video on Samsung with WebRender enabled
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox78 | --- | wontfix |
firefox87 | --- | wontfix |
firefox88 | --- | fixed |
firefox89 | --- | fixed |
People
(Reporter: ekager, Assigned: jnicol)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release-
|
Details | Review |
Originally reported as https://github.com/mozilla-mobile/fenix/issues/10655
Steps to reproduce
- Set gfx.webrender.all to true and restart browser
- Open https://dtf.ru/new
- Find any embedded youtube video
- Start playing video
- Crash
If browser not crashed, scroll for another youtube video or refresh page and try again
about:crashes leads me to this page:
https://crash-stats.mozilla.org/report/index/0ae9329c-13c1-4ae7-8f39-199730200514
Expected behavior
Video played without crashes
Actual behavior
Browser crashed
Device information
- Android device: Samsung S6 Edge / Android 7.0 (Exynos 7420 / Mali-T760)
- Fenix version: All Fenix versions affected
Updated•5 years ago
|
Comment 1•5 years ago
|
||
This sounds quite serious! We should fix this.
Jamie, could you repro?
Comment 2•5 years ago
|
||
Lowering severity to S3 as we aren't yet shipping WR on that platform by default (but we want to!).
Assignee | ||
Comment 3•5 years ago
|
||
As you say, we're not shipping to Mali yet. Mali-Gxx should be soon, but Txxx will take a bit longer.
FWIW I cannot reproduce on a Galaxy S9 (Mali-G72).
Updated•4 years ago
|
Assignee | ||
Comment 4•4 years ago
|
||
I cannot reproduce on a Galaxy A5 2017 (Mali-T830). All crashes matching that signature are on a T-760, so maybe it's very GPU-specific. I'd say proceed with enabling on nightly for all Mali-Ts, and keep an eye on this. If the numbers spike we can disable for affected devices, and try to get our hands on one so we can fix it.
Comment 5•4 years ago
|
||
From comment 0, I wonder if GL_OES_EGL_image_external_essl3 does not work well on Mali-T760.
Comment 6•4 years ago
|
||
I have got Samsung S6 Edge with Android 7.0 (Mali-T760). I could not reproduce the crash of comment 0. But I could reproduce the crash of the following STR
Comment 7•4 years ago
|
||
On latest m-c, crash happened with the STR of comment 6.
glDrawElementsInstanced 0x00000000bf3594e0
gleam::ffi_gles::Gles2::DrawElementsInstanced::h125c6ce4e56035a1 gles_bindings.rs:2261
_$LT$gleam..gl..GlesFns$u20$as$u20$gleam..gl..Gl$GT$::draw_elements_instanced::hab5c351f21ff4bb7 gles_fns.rs:1215
webrender::device::gl::Device::draw_indexed_triangles_instanced_u16::h495291381f2d1352 gl.rs:3500
webrender::renderer::Renderer::draw_instanced_batch::h706a197f30bde517 mod.rs:2705
webrender::renderer::Renderer::draw_blurs::hb4bb8972829eb3c8 mod.rs:3825
webrender::renderer::Renderer::draw_alpha_target::hf0e51b15df50cd87 mod.rs:3965
webrender::renderer::Renderer::draw_frame::h869ab55439a4a366 mod.rs:4674
webrender::renderer::Renderer::render_impl::he0ce2eec139c9826 mod.rs:2131
webrender::renderer::Renderer::update::hc0f12041d169ecc4 mod.rs:1447
wr_renderer_update bindings.rs:627
mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*) RendererOGL.cpp:173
mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*)
Comment 8•4 years ago
|
||
During the STR of comment 7, RenderAndroidSurfaceTextureHost was not used. Then video seemed not related to the crash. Instead, draw_instanced_batch() call within Renderer::draw_blurs() seemed to related to the crash.
Comment 9•4 years ago
|
||
Bug 1640960 comment 37 added non-instancing rendering. But enabling it also caused the crash like the following :(
glDrawElements 0x00000000b65e9f04
gleam::ffi_gles::Gles2::DrawElements::hdfbcc41d8584cb83 gles_bindings.rs:2257
_$LT$gleam..gl..GlesFns$u20$as$u20$gleam..gl..Gl$GT$::draw_elements::hcc601f4a57c4077e gles_fns.rs:1197
webrender::device::gl::Device::draw_indexed_triangles::h389c952e9bbe322f gl.rs:3487
webrender::renderer::Renderer::draw_instanced_batch::h706a197f30bde517 mod.rs:2710
webrender::renderer::Renderer::draw_blurs::hb4bb8972829eb3c8 mod.rs:3825
webrender::renderer::Renderer::draw_alpha_target::hf0e51b15df50cd87 mod.rs:3965
webrender::renderer::Renderer::draw_frame::h869ab55439a4a366 mod.rs:4674
webrender::renderer::Renderer::render_impl::he0ce2eec139c9826 mod.rs:2131
webrender::renderer::Renderer::update::hc0f12041d169ecc4 mod.rs:1447
wr_renderer_update bindings.rs:627
mozilla::wr::RendererOGL::UpdateAndRender(mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*, mozilla::wr::RendererStats*) RendererOGL.cpp:173
mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, bool, mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > const&, mozilla::Maybe<mozilla::wr::ImageFormat> const&, mozilla::Maybe<mozilla::Range<unsigned char> > const&, bool*) RenderThread.cpp:480
Assignee | ||
Comment 10•4 years ago
|
||
Sotaro, did you manage to find out anything else about this? Are you able to make a testcase which reproduces the crash?
Assignee | ||
Comment 12•4 years ago
|
||
Closed bug 1696989 as a duplicate of this, and added its signature to this bug.
As I wrote in that bug, since bug 1701633 landed some crash reports are trickling in with the GraphicsDrawShader annotation. We appear to be crashing either during cs_clip_box_shadow_TEXTURE_2D or cs_blur_ALPHA_TARGET draw calls. The blur one matches Sotaro's findings in comment 8.
Also, if we search crash stats for libGLES_mali.so, these 2 signatures are the highest. But there are quite a few other signatures on various Mali-T8xx devices too which may be related. Some of these have not had recent enough crashes to contain the GraphicsDrawShader annotation, but some have and they contain the same 2 shaders. So it's likely that this bug applies to more Mali-T devices than just the T760.
I am able to reproduce the cs_blur case on the site f95zone.to/latest (NSFW), and I can reproduce the cs_clip_box_shadow crash on https://www.w3schools.com/cssref/tryit.asp?filename=trycss3_image_gallery
Assignee | ||
Comment 13•4 years ago
|
||
When investigating this I noticed that after adding enough logging to draw_alpha_target()
the crashes no longer reproduced. Specifically after adding a lot of logging between the glClear()
s for ones and zeroes, and the draw call which triggers the crash on the respective pages (either blurs or box shadows). Likewise, adding a glFlush()
after the clears avoids the crash, as does removing the clears altogether (though of course that breaks rendering).
This explains why cs_blur and cs_clip_box_shadow are affected - what those two shaders have in common is that we always clear the regions of the alpha target (with either ones or zeroes) prior to rendering with them.
So my theory is that there's some sort of timing issue between calling glClear on an alpha target and then doing a draw call. However, on an alternative line of investigation I was playing around with the shaders themselves. I tried removing the textureSize()
calls and hard coding the values instead, just in case this bug was in any way similar to bug 1692848. It seemed a long shot given that was a Mali driver regression in Android 11 on Mali-G devices, and was specific to YUV/texture external. However, to my surprise it did actually reliably prevent the crash. Furthermore, applying the same workaround as in bug 1692848 (ie continue to call textureSize, but also include a sample call in the vertex shader) did reliably prevent the cs_blur crash, but not the cs_clip_box_shadow crash.
The textureSize
line of investigation doesn't seem nearly as satisfying as the glClear
one to me. My hunch would be that the glClear is the real underlying issue, and that some shader changes can affect the timings in such a way that the crash may or may not occur.
So my proposed fix is to call glFlush after the clears in draw_alpha_target()
on Mali-T
Assignee | ||
Comment 14•4 years ago
|
||
On several Mali-T devices we have observed crashes during draw calls
when rendering to an alpha target immediately after clearing regions
of the target with glClear. This appears to be timing-related as it
seems more likely to occur the greater the number of regions that have
been cleared, and it no longer occurs after adding enough
logging. Explicitly calling glFlush after the clears prevents the
crash.
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
ARM have confirmed with me that this is a suitable fix.
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Assignee | ||
Comment 17•4 years ago
|
||
Dzmitry pointed out in the review that we could use the ps_clear shader instead to perform the clears, like we currently do on some platforms for picture cache tiles due to bug. That way we don't need to flush. I tested that this still avoids the crash, and changed the patch to do that.
Assignee | ||
Comment 18•4 years ago
|
||
Comment on attachment 9214421 [details]
Bug 1638593 - Use shader to clear alpha targets on Mali-Txxx devices. r?gw,kvark
Beta/Release Uplift Approval Request
- User impact if declined: Crashes on some websites for users with Mali-T devices
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Very localized change which only affects devices which are affected by the crash. Replaces usage the glClear function that causes the crash with an alternative clearing method. This alternative clear is already used elsewhere in the code on other platforms, so is unlikely to cause issues
- String changes made/needed:
Assignee | ||
Comment 19•4 years ago
|
||
We should certainly uplift this to beta, but possibly release too given the crash volume and that it's a relatively low-risk fix?
Updated•4 years ago
|
Comment 20•4 years ago
|
||
Comment on attachment 9214421 [details]
Bug 1638593 - Use shader to clear alpha targets on Mali-Txxx devices. r?gw,kvark
Extremely unlikely that we're going to dot release 87 at this point, but approved for Fenix 88.0.0-beta.6.
Updated•4 years ago
|
Comment 21•4 years ago
|
||
bugherder uplift |
Comment 22•4 years ago
|
||
bugherder |
Comment 23•4 years ago
|
||
Comment on attachment 9214421 [details]
Bug 1638593 - Use shader to clear alpha targets on Mali-Txxx devices. r?gw,kvark
88 is in RC now and there are no plans for an 87 dot release.
Updated•4 years ago
|
Updated•4 years ago
|
Description
•