Webrender hard crashes Samsung Galaxy Tab 4 [Adreno 305] / Galaxy S4 [Adreno 330]
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox90 | --- | verified |
firefox91 | --- | verified |
firefox92 | --- | verified |
People
(Reporter: ke5trel, Assigned: jnicol)
References
(Blocks 2 open bugs)
Details
(Keywords: crash, regression)
Attachments
(3 files)
40.36 KB,
image/png
|
Details | |
183.80 KB,
application/zip
|
Details | |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
|
Details | Review |
Samsung Galaxy Tab 4 (SM-T530)
Android 5.0.2
Qualcomm Adreno (TM) 305
OpenGL ES 3.0 V@84.0
Opening almost any site (eg https://mozilla.org) causes the content to turn black, followed by the screen turning black and the device rebooting. about:config
similarly crashes. about:support
works, though buttons are drawn with a diagonal artifact, rounded corners are missing and scrolling is choppy.
Disabling WebRender by installing Nightly 89 which uses OpenGL, changing gfx.webrender.force-disabled = true
then updating to the latest Nightly 92 works.
logcat
does not provide any information about the crash.
Comment 1•3 years ago
|
||
Same thing happens with:
Samsung Galaxy S4 (GT-I9506)
Android 5.0.1
Qualcomm Adreno (TM) 330
OpenGL ES 3.0 V@84.0
Assignee | ||
Comment 3•3 years ago
|
||
Looks like a pretty severe bug in Android 5 or old Adreno driver versions.
On my Nexus 5 running Android 5.0, V@95.0 I can also render about:support but it fails if I try to render a more complex page. However, it doesn't cause the device to reboot, it just causes Firefox to stop responding.
logcat does not provide any information about the crash.
Could you please attach a copy of the logcat anyway? There may be something in there that is helpful. Thanks
Sure, I also included OpenGL traces. The last libEGL entry is always glViewport(0, 0, 1024, 512);
.
Assignee | ||
Comment 5•3 years ago
|
||
Thanks. You're right, I can't see anything relating to the crash in there. Whereas on my Nexus 5 I see a lot of errors from the Adreno driver.
I'm hoping that we are indeed seeing the same issue, despite the different logcats. I have a fix for my issue, could you please test it and see if it works for yours too? https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QjY0FENQQa-buEB6Wu29bg/runs/0/artifacts/public/build/geckoview_example.apk
The test build works for me without crashing. 👍
Assignee | ||
Comment 7•3 years ago
|
||
That's brilliant, thanks for testing.
Here's what I've discovered about this bug:
- I did a
mozregression --good 2020-07-20 --app gve --pref gfx.webrender.all:true
to see if this was regressed by something prior to webrender being enabled on this device. Older builds rendered completely incorrectly due to bug 1630356, but importantly did not crash. This gave me this range. - By hand, I established that
Bug 1682365 - Pt 2 - Remove texture array usage from render task graph
was responsible, specifically just the changes to cs_blur.glsl - It doesn't make much sense why those changes to cs_blur (switching from sampler2DArray to sampler2D) would cause this bug, but there is clearly something in the shader the old adreno drivers does not like, and we were just getting lucky before.
- I reduced the current cs_blur shader by hand to see what would stop the crash, and it appears to be due to the unbounded for loop: https://searchfox.org/mozilla-central/rev/a14ecd829fdb1e9780b7c100016c6a3d951cf7da/gfx/wr/webrender/res/cs_blur.glsl#158
- If I limit the number of iterations, eg by looping to
min(vSupport, SOME_CONST)
then we avoid the crash.
Glenn, I don't really understand the blur code, but am I correct in thinking that we can safely use 300
as a limit here.
- We have a max blur radius of 100 here
- Then we calculate
vSupport
here as 3 times the blur radius. And 3*100 = 300.
Or am I misunderstanding?
Comment 8•3 years ago
|
||
Yes, it looks like we should be able to clamp that value there. Odd that it would avoid a hard driver crash, but it looks like a valid workaround to me!
Assignee | ||
Comment 9•3 years ago
|
||
Cool. Yeah, bit strange. I think old drivers can have issues with shaders that they cannot verify will complete statically.
Assignee | ||
Comment 10•3 years ago
|
||
In the cs_blur fragment shader there is a for loop with a number of
iterations determined by a flat varying. On some old Adreno drivers
this causes severe issues including hangs and crashes. These can be
avoided by clamping the number of iterations to a statically known
value.
Updated•3 years ago
|
Comment 11•3 years ago
|
||
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/16308825081d Clamp loop condition variable to static value in cs_blur. r=gw
Comment 12•3 years ago
|
||
bugherder |
Assignee | ||
Comment 14•3 years ago
|
||
Yes definitely. Probably worth doing for 90 too: it only affects a small number of users, but makes the browser practically unusable.
I'll make the request.
Assignee | ||
Comment 15•3 years ago
|
||
Comment on attachment 9232031 [details]
Bug 1720841 - Clamp loop condition variable to static value in cs_blur. r?gw
Beta/Release Uplift Approval Request
- User impact if declined: Crashes, app not responding, phone rebooting for users with Adreno 3xx GPUs running Android 5.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Adds a limit to the number of loop iterations in a shader, which shouldn't be exceeded anyway so won't affect rendering. It should make life easier for drivers to have a statically known max number of loop iterations, however, there can always be unintended consequences of modifying shader code.
- String changes made/needed:
Assignee | ||
Comment 16•3 years ago
|
||
I think this should be low risk, however we've been bitten in the past by changes to shader code which "should" be fine. So maybe best to let this bake on nightly/beta for a few days before it hits release?
Comment 17•3 years ago
|
||
Comment on attachment 9232031 [details]
Bug 1720841 - Clamp loop condition variable to static value in cs_blur. r?gw
Crash fix, approved for 91 beta 5, thanks.
Comment 18•3 years ago
|
||
bugherder uplift |
Comment 19•3 years ago
|
||
I was not able to reproduce the crash on the 92.0a1 Nightly from 7/21, nor on the 92.0.0-beta.2 build.
Tested with a Samsung Galaxy Tab A6 (Android 5.1.1), SM-T285.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 20•3 years ago
|
||
Jftr, the latest nightly works fine for me as well with
Samsung Galaxy S4 (GT-I9506)
Android 5.0.1
Qualcomm Adreno (TM) 330
Do you want a bug for the button rendering in about:support mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1720841#c0?
Assignee | ||
Comment 21•3 years ago
|
||
(In reply to Stefan [:stefanh] from comment #20)
Jftr, the latest nightly works fine for me as well with
Samsung Galaxy S4 (GT-I9506)
Android 5.0.1
Qualcomm Adreno (TM) 330Do you want a bug for the button rendering in about:support mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1720841#c0?
That's great!
Yes please on the new bug. Thanks!
Comment 22•3 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #21)
Yes please on the new bug. Thanks!
Filed bug 1721795.
Comment 24•3 years ago
|
||
Comment on attachment 9232031 [details]
Bug 1720841 - Clamp loop condition variable to static value in cs_blur. r?gw
Approved for Fenix 90.1.3.
Comment 25•3 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Comment 26•3 years ago
|
||
I was not able to reproduce the crash/ANR on the 90.1.3 RC build.
Tested with a Samsung Galaxy Tab A6 (Android 5.1.1), SM-T285.
Updated•3 years ago
|
Description
•