Closed Bug 1884791 Opened 6 months ago Closed 6 months ago

Text is randomly garbled on Samsung Galaxy S2 tablet, with hardware WebRender, starting with Firefox 125

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

VERIFIED FIXED
125 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox123 --- unaffected
firefox124 --- unaffected
firefox125 + verified

People

(Reporter: dholbert, Assigned: jnicol)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Attached image screenshot

[Tracking Requested - why for this release]:

STR:

  1. Load any site in Firefox Nightly on Android (e.g. https://example.org or https://en.m.wikipedia.org/wiki/Mozilla or a google search results page)

ACTUAL RESULTS:
Text is randomly smeared / garbled / missing. This manifests as a bunch of black "smudges" on the page. See screenshots.

EXPECTED RESULTS:
No such misrendering.

DEVICE:
Samsung Galaxy Tab S2, running Android 9 (technically LineageOS 16, which was a fork of android 9).

I can reproduce in mozregression-launched GeckoView Example App (gve). Regression range is quite recent:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=b26d4c8d5e44cc544e259f32a62382902af6f7ff&tochange=2bd01d77bc4d2a1ad705bb1ecb8e793864a737ee

gw, is it conceivable that bug 1881978 could have caused this? (That's the only graphics change I'm seeing in the push range.)

Flags: needinfo?(gwatson)

(FWIW I'm backfilling some gve tasks in the regression range, to hopefully narrow it a bit and build some confidence around potential regressors.)

It's plausible that the change exposes some kind of driver / shader compiler bug.

Flags: needinfo?(gwatson)

Narrowed regression range after some gve tasks completed:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=aea58e5386eae7e197cfae9706804d22d8e3342c&tochange=c5ec87ff79c5e8faf3e69f4dd3057c1df36c49e7

I think that confirms bug 1881978 as the regressor.

(Note that the first and last pushes in that push-range were for landing and then backing out a particular bug ( 1759175 ) -- that bug happened to trigger android build-bustage while it was in, so I suspect we can't bisect further. But also, we probably don't need to worry about bisecting further.)

(In reply to Glenn Watson [:gw] from comment #3)

It's plausible that the change exposes some kind of driver / shader compiler bug.

Thanks. I think it's not one that I can get an update for, unfortunately; this device is no longer getting OS updates (including from LineageOS which brought it another year or two into the future beyond where it would have otherwise been EOL'd).

I'm not sure how to assess how many folks might be affected by a similar driver bug & don't want to over-pivot based on my own device happening to be lucky enough to have a bug that was exposed by a Firefox update. But I wonder if we should block-list my driver version (from comment 4 about:support info in comment 4)?

I did confirm that enabling software-webrender (gfx.webrender.software = true) fixes the bug for me, so I'll be using that on this device going forward.

Keywords: regression
Regressed by: 1881978
Summary: Text is randomly garbled on Samsung Galaxy S2 tablet → Text is randomly garbled on Samsung Galaxy S2 tablet, with hardware WebRender, starting with Firefox 125

Jamie, do you think it's reasonable to block that driver version since it appears unmaintained?

Severity: -- → S2
Flags: needinfo?(jnicol)

Set release status flags based on info from the regressing bug 1881978

The bug is marked as tracked for firefox125 (nightly). We have limited time to fix this, the soft freeze is in 2 days. However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)
Assignee: nobody → jnicol
Flags: needinfo?(bhood)

We can absolutely block that driver version on that GPU, but my concern is that this could affect a whole range of driver versions and potentially multiple GPUs. I was working from home today so didn't have access to my range of devices to test, but tomorrow I'll test on as many devices as I can. Unfortunately I don't think I have an Adreno 510, but it will at least be useful to know whether other Adreno 5xx devices are affected.

I've been able to reproduce on an HTC 10, which has an Adreno 530 (compared to daniel's Adreno 510), but the same driver version: V@251. So we know this affects multiple GPUs, but whether that's only Adreno 5xx or more widespread is unknown. I cannot rule out it affecting other driver versions, but I haven't seen any others affected, so my feeling is it's limited to V@251.

This is indeed a shader miscompilation, which is only encountered with the optimized shader output. Daniel, you can try re-enabling hardware acceleration (reset gfx.webrender.software to false) and then disable the optimized shaders (gfx.webrender.use-optimized-shaders = false) and see if that fixes it.

My understanding of the miscompilation so far is that the problem lies with fetching the clip area here. The value passed as index comes from the instance data here. Prior to bug 1881978 both the clip task address and render task address were packed into the same int, and unpacked using masking and bit shifts. Moving the picture task address from the instance data to the prim header is working fine, but that has accidentally broken the clip address. Adding back some code that reads the high bits of aData.y but doesn't affect the output, whilst also not getting optimized away, works around the issue.

Flags: needinfo?(jnicol)

(In reply to Jamie Nicol [:jnicol] from comment #10)

Daniel, you can try re-enabling hardware acceleration (reset gfx.webrender.software to false) and then disable the optimized shaders (gfx.webrender.use-optimized-shaders = false) and see if that fixes it.

That works fine on my tablet, yes (can't repro the bug, with prefs in that configuration).

Thanks for confirming, Daniel!

Adding back some code that reads the high bits of aData.y but doesn't affect the output, whilst also not getting optimized away, works around the issue.

Hrm, no this part isn't accurate. I'm really not sure what precisely is causing the miscompilation. I have a workaround that looks reasonable, but my concern is that since we do not understand the bug it could easily be regressed. Luckily I don't think there's too much churn in this code, so a big fat comment warning people to test when changing this code is probably good enough.

Webrender's glslopt-optimized shaders encounter a miscompilation on
some Adreno driver versions regarding fetching empty clip tasks. This
patch reshuffles the code in such a way as to avoid the
bug. Unfortunately the specific cause of the miscompilation remains
unknown, meaning we must take extra care not to regress it in the
future.

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/488c57bb8cee
Avoid shader miscompilation on some Adreno drivers. r=gw
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch

Verified fixed.

(I turned on my S2 Tablet which still had gfx.webrender.use-optimized-shaders and a few-days-old Nightly. I reverted my gfx.webrender.use-optimized-shaders tweak and restarted Nightly, and I confirmed that I could reproduce the bug on wikipedia.org. Then I installed a Nightly update from the play store, and then launched Firefox Nightly again and retested on wikipedia.org, and I cannot reproduce the bug.)

"Good" version that I'm currently using is:

125.0a1 (Build #2016009767), 8694c91ac7+
GV: 125.0a1-20240317231404
AS: 125.20240317050356
Status: RESOLVED → VERIFIED

For reference, https://github.com/webcompat/web-bugs/issues/135588 looks it may be a similar (but more-recently-introduced) issue to what was going on here.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: