Boxes Rendered with white background with Adreno GPU on Surface Pro X
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
relnote-firefox | --- | 128+ |
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | fixed |
firefox126 | --- | unaffected |
firefox127 | + | wontfix |
firefox128 | + | fixed |
firefox129 | --- | fixed |
People
(Reporter: sstreich, Assigned: nical)
References
(Regression)
Details
(Keywords: regression)
Attachments
(7 files)
247.51 KB,
image/png
|
Details | |
20.75 KB,
image/png
|
Details | |
19.40 KB,
image/png
|
Details | |
3.22 MB,
video/mp4
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-release+
dmeehan
:
approval-mozilla-esr128+
|
Details | Review |
Hello!
I currently run nightly on an Surface-Pro X, Firefox nightly renders some boxes with white background.
I've attached a screenshot, that probably explains it better.
I cannot re-produce this on my other windows machines.
I did run mozregression which did lead me to: https://phabricator.services.mozilla.com/D206098.
2024-05-17T14:42:37.464000: INFO : platform_changeset: 3b76cb83627af4607b79caf70a03243e6ee6369b
2024-05-17T14:42:37.464000: INFO : platform_repository: https://hg.mozilla.org/integration/autoland
2024-05-17T14:42:37.465000: INFO : platform_version: 127.0a1
2024-05-17T14:42:57.575000: INFO : Narrowed integration regression window from [6c625895, bfaf7b3f] (3 builds) to [6c625895, 3b76cb83] (2 builds) (~1 steps left)
2024-05-17T14:42:57.624000: DEBUG : Starting merge handling...
2024-05-17T14:42:57.630000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=3b76cb83627af4607b79caf70a03243e6ee6369b&full=1
2024-05-17T14:42:57.631000: DEBUG : redo: attempt 1/3
2024-05-17T14:42:57.633000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=3b76cb83627af4607b79caf70a03243e6ee6369b&full=1',), kwargs: {}, attempt #1
2024-05-17T14:42:57.653000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2024-05-17T14:42:59.142000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=3b76cb83627af4607b79caf70a03243e6ee6369b&full=1 HTTP/1.1" 200 None
2024-05-17T14:42:59.201000: DEBUG : Found commit message:
Bug 1888628 - Extract the texture sampling logic out of ps_quad.glsl. r=gw
This simplifies the common infrastructure, removes two varyings from the common set and will allow other patterns to handle sampling differently if they need it (for example an upcoming repeating pattern).
In addition:
- the color parameter is always passed to the fragment shader (it used to be only when no uv_rect was passed).
- v_flags was reorganized a bit so that w is used by the common infrastructure and xyz are available for patterns to use.
Differential Revision: https://phabricator.services.mozilla.com/D206098
2024-05-17T14:42:59.202000: DEBUG : Did not find a branch, checking all integration branches
2024-05-17T14:42:59.227000: INFO : The bisection is done.
2024-05-17T14:42:59.245000: INFO : Stopped
Updated•7 months ago
|
Comment 1•7 months ago
|
||
Set release status flags based on info from the regressing bug 1888628
:nical, since you are the author of the regressor, bug 1888628, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Updated•7 months ago
|
Updated•7 months ago
|
Comment 2•7 months ago
|
||
The bug is marked as tracked for firefox127 (beta). However, the bug still isn't assigned and has low severity.
:bhood, could you please find an assignee and increase the severity for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 3•7 months ago
|
||
Per Nical, this one is believed to be a driver issue. Access to a Surface-Pro X is needed to debug.
@gw, this looks like something we might issue a dot release for. Should we backout the regressor or consider an alternative like blocklisting the driver?
Comment 4•7 months ago
|
||
I'd prefer not to back it out, but perhaps blocking that driver version may make sense.
Assignee | ||
Comment 5•7 months ago
|
||
A bit of a shot in the dark but I kicked a build on try here, I'll provide a direct link when the build is ready.
Assignee | ||
Comment 6•6 months ago
•
|
||
Here is the direct link to a windows arm64 build with a potential fix https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/d9LRe-2vS2W4fQ5gMYz76Q/runs/0/artifacts/public/build/target.zip could you try it out and report back whether it addresses the issue?
I kicked another possible fix, I'll post the direct link it is ready.
Assignee | ||
Comment 7•6 months ago
|
||
And here goes the second potential fix: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ehYZhQ2aQOq1xKwJeQopBA/runs/0/artifacts/public/build/target.zip
Reporter | ||
Comment 8•6 months ago
|
||
Hey! Thanks so much for the attempts, however both builds do not seem to solve the issue on the device, sorry :(
Updated•6 months ago
|
Updated•6 months ago
|
Comment 9•6 months ago
|
||
Also seen in Fx 127 (release) and Fx 129 (nightly) on my Surface Pro X, mozregression pointing to same commit:
2024-06-20T11:47:50.299000: DEBUG : Found commit message:
Bug 1888628 - Extract the texture sampling logic out of ps_quad.glsl. r=gw
This simplifies the common infrastructure, removes two varyings from the common set and will allow other patterns to handle sampling differently if they need it (for example an upcoming repeating pattern).
In addition:
- the color parameter is always passed to the fragment shader (it used to be only when no uv_rect was passed).
- v_flags was reorganized a bit so that w is used by the common infrastructure and xyz are available for patterns to use.
Differential Revision: https://phabricator.services.mozilla.com/D206098
2024-06-20T11:47:50.316000: DEBUG : Did not find a branch, checking all integration branches
2024-06-20T11:47:50.353000: INFO : The bisection is done.
2024-06-20T11:47:50.373000: INFO : Stopped
Slightly different manifestation (screen shots will be attached):
- Visual display has "empty button", as shown by Screen capture from windows snipping tool
- Firefox's "Take Screenshot" shows correctly rendered button.
The process to get to the screens shown is:
- start logged out from GitHub
- view a Mozilla GitHub org to which you have SAML access (I used mozilla)
- log in to GitHub
- GitHub will redirect for SAML authentication <== this page will have the impacted button
NI: on :nical -- with your assistance, I can be remote hands for any debug work. (i.e. I'm willing, but not a strong windows tech)
Comment 10•6 months ago
|
||
This correctly shows how the screen appears to me.
Comment 11•6 months ago
|
||
This image is from Firefox' screenshot tool -- it shows how the image should be rendered on screen (but is not). See comment 9 for details.
Assignee | ||
Comment 12•6 months ago
|
||
Here are two new builds to try for anyone who can reproduce the bug:
- Build #3: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/asdKKhBvQUSyn6Tc5fZ_vw/runs/0/artifacts/public/build/target.zip (include the sampling code in ps_quad instead of ps_quad_textured)
- Build #4: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZaDn34WbTHumG9jHTEiJyA/runs/0/artifacts/public/build/target.zip (setup the sampling parameter in ps_quad instead of ps_quad_textured)
Comment 13•6 months ago
|
||
Results following STR from comment 9:
- Build #3: -- broken
- Build #4: -- broken
Comment 14•5 months ago
|
||
This bug really has a horrible UX impact, perhaps not conveyed by the static shots. Here's a video of just trying to get to https://github.com/mozilla
:gw can we reconsider backing out this regression? 4 mitigation builds haven't helped sofar. (re comment 4)
Comment 15•5 months ago
|
||
Probably best to check with Nical whether it's feasible to back out temporarily.
The problem is that this appears to be exposing a driver bug specific to the GPU driver on Surface models, triggering some kind of bug in the driver shader compiler most likely (so it's not widespread or possible to repro on other systems).
Because we are planning to do a lot of work and changes around these shaders in the next quarter, it's very likely that we will hit this bug again and again, so we really do need to find out the root cause and how we can work around the driver bug, so that we can continue the future planned work here with confidence. I know that's not really helpful for you now, but just to give some background on why it's difficult / important to find the root cause.
Comment 16•5 months ago
•
|
||
[adding Surface Pro X and Adreno to the bug title for discoverability. (note that "Adreno" hasn't yet been mentioned here, but it seems to be implied by the fact that this is Surface Pro X, which only has Adreno graphics hardware I think.) We've got a bunch of other recently-filed bugs that are likely the same issue, some of which are loosely grouped as see-also on bug 1903048. At least three of them -- Bug 1852440, bug 1898164, bug 1904190 -- currently mention Adreno or Surface Pro X in their titles; and bug 1903048 looks likely to be Surface Pro X with Adreno GPU as well. I suspect these are all the same issue as this one and could be duped if/when we're confident about that.]
Comment 17•5 months ago
|
||
in support of comment 16, I am running an Adreno 680 -- I'm happy to provide remote access for more intense debug if that would help.
GPU #1
Active Yes
Description Qualcomm(R) Adreno(TM) 680 GPU
Vendor ID 0x5143
Device ID 0x043a
Driver Version 27.20.2060.0
Driver Date 1-24-2024
Drivers qcdx11arm64xum8180 qcdx11arm64xum8180 qcdx11arm64xum8180 qcdx12arm64xum8180
Subsys ID 00008180
RAM 0
Comment 18•5 months ago
|
||
Hi there. Thanks all for making Firefox, which I really love.
I had a close look at the commit that might have caused the regression, at https://phabricator.services.mozilla.com/D206098
I noticed one line of code that was in the old version, but not in the new version:
gfx/wr/webrender/res/ps_quad.glsl:332: v_color = vec4(1.0);
does not have an equivalent on the right hand side. Everything else looks fine?
N.B. I don't actually have any way of compiling or testing; just spent a little bit of time eyeballing the code.
Assignee | ||
Comment 19•5 months ago
|
||
Comment 20•5 months ago
|
||
Comment 21•5 months ago
|
||
bugherder |
Updated•5 months ago
|
Updated•5 months ago
|
Comment 22•5 months ago
|
||
Please request release approval on this so we can get it uplifted for an RC respin or future dot release.
Comment 23•5 months ago
|
||
The commit in comment 21 DOES NOT FIX the issue following the STR in comment 9. Tested using build id 20240702092648
, which should contain the commit per hg.m.o
Assignee | ||
Comment 24•5 months ago
|
||
This appears to work around the glitch on Windows + Adreno. It is unclear why. The fact that this so far only reproduced on windows on arm looks like a driver issue, however from the workaround it looks like there might be a confusion between the textured and non-textured code paths with composite quads.
Updated•5 months ago
|
Assignee | ||
Updated•5 months ago
|
Comment 25•5 months ago
|
||
Thanks all, looking forward to seeing the fix.
The Qualcomm driver on Windows, as I understand it, is one of those super-modern types that only support D3D12 -- it uses the compatibility layers D3D11on12 and D3D9on12 (both built into Windows?); and OpenCLOn12 (downloadable from the Microsoft Store as the "OpenCL™, OpenGL®, and Vulkan® Compatibility Pack") to translate OpenCL/OpenGL/Vulkan calls.
Presumably ANGLE is calling the D3D11 compatibility layer. So as well as the driver, there's a lot of graphics stack involved...
Comment 26•5 months ago
|
||
Comment 27•5 months ago
|
||
The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox128
towontfix
.
For more information, please visit BugBot documentation.
Comment 28•5 months ago
|
||
bugherder |
Comment 29•5 months ago
|
||
Hi Hal, are things better with current Nightly builds?
Comment 30•5 months ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #29)
Hi Hal, are things better with current Nightly builds?
Yes, the "missing button" issue is resolved (this bug, STR in comment 9). Also, visual anomalies (comment 14) have no longer lead to "missing feedback", and are merely annoyances (not this issue).
Assignee | ||
Comment 31•5 months ago
|
||
This appears to work around the glitch on Windows + Adreno. It is unclear why. The fact that this so far only reproduced on windows on arm looks like a driver issue, however from the workaround it looks like there might be a confusion between the textured and non-textured code paths with composite quads.
Assignee | ||
Comment 32•5 months ago
|
||
Comment on attachment 9410380 [details]
Bug 1897444 - Use opaque black as the quad base color for clear patterns. r=#gfx-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: A lot of glitches on Windows on arm (adreno) if dark mode is enabled.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch is in nightly and beta, it was verified on nightly.
- String changes made/needed: None.
- Is Android affected?: No
Assignee | ||
Updated•5 months ago
|
Assignee | ||
Comment 33•5 months ago
|
||
Comment on attachment 9412277 [details]
Bug 1897444 - Pass base color in add_composite_prim, then ignore it in the shader (release). r=gw
Beta/Release Uplift Approval Request
- User impact if declined: Lots of glitches on windows on arm (adreno) if dark mode is enabled.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch is rather simple. It is currently in nightly and beta, we can wait for a bit before uplifting if we need more confidence.
- String changes made/needed:
- Is Android affected?: Yes
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Without this, the browser is barely usable on the affected configuration if dark mode is enabled.
- User impact if declined:
- Fix Landed on Version: 129
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch is rather simple. It is currently in nightly and beta, we can wait for a bit before uplifting if we need more confidence.
Assignee | ||
Comment 34•5 months ago
|
||
I should have been clearer in the uplift requests: I think that the fix is worth uplifting all the way to 128. But I don't think we are in a hurry to do it, so letting it bake in nightly and beta for a few more days before doing uplifting to give us a better chance to catch potential regressions would be good thing.
Comment 35•5 months ago
|
||
Comment on attachment 9412277 [details]
Bug 1897444 - Pass base color in add_composite_prim, then ignore it in the shader (release). r=gw
Approved for 128.1esr.
Comment 36•5 months ago
|
||
uplift |
Updated•5 months ago
|
Comment 38•5 months ago
|
||
Comment on attachment 9412277 [details]
Bug 1897444 - Pass base color in add_composite_prim, then ignore it in the shader (release). r=gw
Approved for 128.0.2.
Updated•5 months ago
|
Comment 39•5 months ago
|
||
uplift |
Comment 40•5 months ago
|
||
Added to the 128.0.2 relnotes:
Fixed visual glitches when dark mode is enabled in Windows ARM devices.
Description
•