Closed Bug 1897444 Opened 4 months ago Closed 2 months ago

Boxes Rendered with white background with Adreno GPU on Surface Pro X

Categories

(Core :: Graphics: WebRender, defect)

Firefox 127
defect

Tracking

()

RESOLVED FIXED
129 Branch
Tracking Status
relnote-firefox --- 128+
firefox-esr115 --- unaffected
firefox-esr128 --- fixed
firefox126 --- unaffected
firefox127 + wontfix
firefox128 + fixed
firefox129 --- fixed

People

(Reporter: sstreich, Assigned: nical)

References

(Regression)

Details

(Keywords: regression)

Attachments

(7 files)

Hello!
I currently run nightly on an Surface-Pro X, Firefox nightly renders some boxes with white background.
I've attached a screenshot, that probably explains it better.
I cannot re-produce this on my other windows machines.

I did run mozregression which did lead me to: https://phabricator.services.mozilla.com/D206098.

2024-05-17T14:42:37.464000: INFO : platform_changeset: 3b76cb83627af4607b79caf70a03243e6ee6369b
2024-05-17T14:42:37.464000: INFO : platform_repository: https://hg.mozilla.org/integration/autoland
2024-05-17T14:42:37.465000: INFO : platform_version: 127.0a1
2024-05-17T14:42:57.575000: INFO : Narrowed integration regression window from [6c625895, bfaf7b3f] (3 builds) to [6c625895, 3b76cb83] (2 builds) (~1 steps left)
2024-05-17T14:42:57.624000: DEBUG : Starting merge handling...
2024-05-17T14:42:57.630000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=3b76cb83627af4607b79caf70a03243e6ee6369b&full=1
2024-05-17T14:42:57.631000: DEBUG : redo: attempt 1/3
2024-05-17T14:42:57.633000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=3b76cb83627af4607b79caf70a03243e6ee6369b&full=1',), kwargs: {}, attempt #1
2024-05-17T14:42:57.653000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2024-05-17T14:42:59.142000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=3b76cb83627af4607b79caf70a03243e6ee6369b&full=1 HTTP/1.1" 200 None
2024-05-17T14:42:59.201000: DEBUG : Found commit message:
Bug 1888628 - Extract the texture sampling logic out of ps_quad.glsl. r=gw

This simplifies the common infrastructure, removes two varyings from the common set and will allow other patterns to handle sampling differently if they need it (for example an upcoming repeating pattern).

In addition:
 - the color parameter is always passed to the fragment shader (it used to be only when no uv_rect was passed).
 - v_flags was reorganized a bit so that w is used by the common infrastructure and xyz are available for patterns to use.

Differential Revision: https://phabricator.services.mozilla.com/D206098

2024-05-17T14:42:59.202000: DEBUG : Did not find a branch, checking all integration branches
2024-05-17T14:42:59.227000: INFO : The bisection is done.
2024-05-17T14:42:59.245000: INFO : Stopped

Keywords: regression
Regressed by: 1888628

Set release status flags based on info from the regressing bug 1888628

:nical, since you are the author of the regressor, bug 1888628, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(nical.bugzilla)
Severity: -- → S3

The bug is marked as tracked for firefox127 (beta). However, the bug still isn't assigned and has low severity.

:bhood, could you please find an assignee and increase the severity for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)

Per Nical, this one is believed to be a driver issue. Access to a Surface-Pro X is needed to debug.

@gw, this looks like something we might issue a dot release for. Should we backout the regressor or consider an alternative like blocklisting the driver?

Flags: needinfo?(gwatson)

I'd prefer not to back it out, but perhaps blocking that driver version may make sense.

Flags: needinfo?(gwatson)

A bit of a shot in the dark but I kicked a build on try here, I'll provide a direct link when the build is ready.

Here is the direct link to a windows arm64 build with a potential fix https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/d9LRe-2vS2W4fQ5gMYz76Q/runs/0/artifacts/public/build/target.zip could you try it out and report back whether it addresses the issue?

I kicked another possible fix, I'll post the direct link it is ready.

Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla) → needinfo?(sstreich)

Hey! Thanks so much for the attempts, however both builds do not seem to solve the issue on the device, sorry :(

Flags: needinfo?(sstreich)
Flags: needinfo?(bhood)

Also seen in Fx 127 (release) and Fx 129 (nightly) on my Surface Pro X, mozregression pointing to same commit:

2024-06-20T11:47:50.299000: DEBUG : Found commit message:
Bug 1888628 - Extract the texture sampling logic out of ps_quad.glsl. r=gw

This simplifies the common infrastructure, removes two varyings from the common set and will allow other patterns to handle sampling differently if they need it (for example an upcoming repeating pattern).

In addition:
 - the color parameter is always passed to the fragment shader (it used to be only when no uv_rect was passed).
 - v_flags was reorganized a bit so that w is used by the common infrastructure and xyz are available for patterns to use.

Differential Revision: https://phabricator.services.mozilla.com/D206098

2024-06-20T11:47:50.316000: DEBUG : Did not find a branch, checking all integration branches
2024-06-20T11:47:50.353000: INFO : The bisection is done.
2024-06-20T11:47:50.373000: INFO : Stopped

Slightly different manifestation (screen shots will be attached):

  • Visual display has "empty button", as shown by Screen capture from windows snipping tool
  • Firefox's "Take Screenshot" shows correctly rendered button.

The process to get to the screens shown is:

  • start logged out from GitHub
  • view a Mozilla GitHub org to which you have SAML access (I used mozilla)
  • log in to GitHub
  • GitHub will redirect for SAML authentication <== this page will have the impacted button

NI: on :nical -- with your assistance, I can be remote hands for any debug work. (i.e. I'm willing, but not a strong windows tech)

Flags: needinfo?(nical.bugzilla)
Attached image Windows Snip.png

This correctly shows how the screen appears to me.

Attached image Fx Screenshot.png

This image is from Firefox' screenshot tool -- it shows how the image should be rendered on screen (but is not). See comment 9 for details.

Here are two new builds to try for anyone who can reproduce the bug:

Flags: needinfo?(sstreich)

Results following STR from comment 9:

  • Build #3: -- broken
  • Build #4: -- broken

This bug really has a horrible UX impact, perhaps not conveyed by the static shots. Here's a video of just trying to get to https://github.com/mozilla

:gw can we reconsider backing out this regression? 4 mitigation builds haven't helped sofar. (re comment 4)

Flags: needinfo?(gwatson)

Probably best to check with Nical whether it's feasible to back out temporarily.

The problem is that this appears to be exposing a driver bug specific to the GPU driver on Surface models, triggering some kind of bug in the driver shader compiler most likely (so it's not widespread or possible to repro on other systems).

Because we are planning to do a lot of work and changes around these shaders in the next quarter, it's very likely that we will hit this bug again and again, so we really do need to find out the root cause and how we can work around the driver bug, so that we can continue the future planned work here with confidence. I know that's not really helpful for you now, but just to give some background on why it's difficult / important to find the root cause.

Flags: needinfo?(gwatson)

[adding Surface Pro X and Adreno to the bug title for discoverability. (note that "Adreno" hasn't yet been mentioned here, but it seems to be implied by the fact that this is Surface Pro X, which only has Adreno graphics hardware I think.) We've got a bunch of other recently-filed bugs that are likely the same issue, some of which are loosely grouped as see-also on bug 1903048. At least three of them -- Bug 1852440, bug 1898164, bug 1904190 -- currently mention Adreno or Surface Pro X in their titles; and bug 1903048 looks likely to be Surface Pro X with Adreno GPU as well. I suspect these are all the same issue as this one and could be duped if/when we're confident about that.]

Summary: Boxes Rendered with white background → Boxes Rendered with white background with Adreno GPU on Surface Pro X

in support of comment 16, I am running an Adreno 680 -- I'm happy to provide remote access for more intense debug if that would help.

GPU #1
Active	Yes
Description	Qualcomm(R) Adreno(TM) 680 GPU
Vendor ID	0x5143
Device ID	0x043a
Driver Version	27.20.2060.0
Driver Date	1-24-2024
Drivers	qcdx11arm64xum8180 qcdx11arm64xum8180 qcdx11arm64xum8180 qcdx12arm64xum8180
Subsys ID	00008180
RAM	0

Hi there. Thanks all for making Firefox, which I really love.

I had a close look at the commit that might have caused the regression, at https://phabricator.services.mozilla.com/D206098

I noticed one line of code that was in the old version, but not in the new version:

gfx/wr/webrender/res/ps_quad.glsl:332: v_color = vec4(1.0);

does not have an equivalent on the right hand side. Everything else looks fine?

N.B. I don't actually have any way of compiling or testing; just spent a little bit of time eyeballing the code.

Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a07e41a800a8
Use opaque black as the quad base color for clear patterns. r=gfx-reviewers,jnicol
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 129 Branch
Flags: needinfo?(sstreich)
Flags: needinfo?(nical.bugzilla)

Please request release approval on this so we can get it uplifted for an RC respin or future dot release.

Flags: needinfo?(nical.bugzilla)

The commit in comment 21 DOES NOT FIX the issue following the STR in comment 9. Tested using build id 20240702092648, which should contain the commit per hg.m.o

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

This appears to work around the glitch on Windows + Adreno. It is unclear why. The fact that this so far only reproduced on windows on arm looks like a driver issue, however from the workaround it looks like there might be a confusion between the textured and non-textured code paths with composite quads.

Flags: needinfo?(nical.bugzilla)

Thanks all, looking forward to seeing the fix.

The Qualcomm driver on Windows, as I understand it, is one of those super-modern types that only support D3D12 -- it uses the compatibility layers D3D11on12 and D3D9on12 (both built into Windows?); and OpenCLOn12 (downloadable from the Microsoft Store as the "OpenCL™, OpenGL®, and Vulkan® Compatibility Pack") to translate OpenCL/OpenGL/Vulkan calls.

Presumably ANGLE is calling the D3D11 compatibility layer. So as well as the driver, there's a lot of graphics stack involved...

Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9ac7f1f672d5
Pass base color in add_composite_prim, then ignore it in the shader. r=gw

The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox128 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(nical.bugzilla)
Status: REOPENED → RESOLVED
Closed: 2 months ago2 months ago
Resolution: --- → FIXED

Hi Hal, are things better with current Nightly builds?

Flags: needinfo?(hwine)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #29)

Hi Hal, are things better with current Nightly builds?

Yes, the "missing button" issue is resolved (this bug, STR in comment 9). Also, visual anomalies (comment 14) have no longer lead to "missing feedback", and are merely annoyances (not this issue).

Flags: needinfo?(hwine)

This appears to work around the glitch on Windows + Adreno. It is unclear why. The fact that this so far only reproduced on windows on arm looks like a driver issue, however from the workaround it looks like there might be a confusion between the textured and non-textured code paths with composite quads.

Comment on attachment 9410380 [details]
Bug 1897444 - Use opaque black as the quad base color for clear patterns. r=#gfx-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: A lot of glitches on Windows on arm (adreno) if dark mode is enabled.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch is in nightly and beta, it was verified on nightly.
  • String changes made/needed: None.
  • Is Android affected?: No
Flags: needinfo?(nical.bugzilla)
Attachment #9410380 - Flags: approval-mozilla-release?
Attachment #9410380 - Flags: approval-mozilla-release?

Comment on attachment 9412277 [details]
Bug 1897444 - Pass base color in add_composite_prim, then ignore it in the shader (release). r=gw

Beta/Release Uplift Approval Request

  • User impact if declined: Lots of glitches on windows on arm (adreno) if dark mode is enabled.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch is rather simple. It is currently in nightly and beta, we can wait for a bit before uplifting if we need more confidence.
  • String changes made/needed:
  • Is Android affected?: Yes

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Without this, the browser is barely usable on the affected configuration if dark mode is enabled.
  • User impact if declined:
  • Fix Landed on Version: 129
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch is rather simple. It is currently in nightly and beta, we can wait for a bit before uplifting if we need more confidence.
Attachment #9412277 - Flags: approval-mozilla-release?
Attachment #9412277 - Flags: approval-mozilla-esr128?

I should have been clearer in the uplift requests: I think that the fix is worth uplifting all the way to 128. But I don't think we are in a hurry to do it, so letting it bake in nightly and beta for a few more days before doing uplifting to give us a better chance to catch potential regressions would be good thing.

Comment on attachment 9412277 [details]
Bug 1897444 - Pass base color in add_composite_prim, then ignore it in the shader (release). r=gw

Approved for 128.1esr.

Attachment #9412277 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+
https://hg.mozilla.org/releases/mozilla-esr128/rev/f1d4e5121889
Duplicate of this bug: 1898164

Comment on attachment 9412277 [details]
Bug 1897444 - Pass base color in add_composite_prim, then ignore it in the shader (release). r=gw

Approved for 128.0.2.

Attachment #9412277 - Flags: approval-mozilla-release? → approval-mozilla-release+
https://hg.mozilla.org/releases/mozilla-release/rev/2a5ff1aba1de

Added to the 128.0.2 relnotes:

Fixed visual glitches when dark mode is enabled in Windows ARM devices.

See Also: → 1911128
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: