Screen glitches - black squares (Win10/Intel HD Graphics 4600) (picture-caching)
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr60 | --- | unaffected |
| firefox-esr68 | --- | unaffected |
| firefox68 | --- | unaffected |
| firefox69 | + | verified |
| firefox70 | + | verified |
People
(Reporter: apavel, Assigned: gw)
References
(Blocks 1 open bug)
Details
(Keywords: correctness, regression)
Attachments
(10 files)
|
32.58 KB,
text/plain
|
Details | |
|
8.07 MB,
video/mp4
|
Details | |
|
26.01 KB,
image/png
|
Details | |
|
2.04 MB,
image/x-kde-raw
|
Details | |
|
2.02 MB,
image/x-kde-raw
|
Details | |
|
35.19 KB,
patch
|
Details | Diff | Splinter Review | |
|
924 bytes,
text/html
|
Details | |
|
1.08 KB,
patch
|
Details | Diff | Splinter Review | |
|
47 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
|
47 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
I use Firefox Nighty 69.0a1 (2019-06-29) (64-bit) and sometimes on my screen these glitchy black squares appear.
I made a recording the first time I noticed it: https://send.firefox.com/download/289c7cbb722314e5/#dz5C3kXEnmXvSP2SdZxUBA (recording expires after 100 downloads).
**i appologize for the audio.
Comment 1•6 years ago
|
||
Which platform are you on? Could you attach your about:support?
Black squares sounds like graphics issues.
Also, any chance you could attach the recording to the bug? I haven't figured out how to download it, maybe it expired already somehow?
| Reporter | ||
Comment 2•6 years ago
|
||
| Reporter | ||
Comment 3•6 years ago
|
||
| Reporter | ||
Comment 4•6 years ago
|
||
Posted the info you required above. the raw data is from about:support
The glitches occur when there are 47 seconds left from the video and when there are 07 seconds left.
Comment 5•6 years ago
|
||
So you have WR enabled, and this looks like some kind of graphics corruption, so I'm going to move it there for now since it seems more likely to be the culprit. Glenn, do you know if you've seen something like this or some recent change that could've introduced this?
Updated•6 years ago
|
| Assignee | ||
Comment 6•6 years ago
|
||
I have seen some black screen glitches when running with ANGLE force disabled. I'm investigating those at the moment as part of some research into running WR on top of SwiftShader. However, they look very different to the glitches in the video above, so I think it's unlikely they are the same issue.
I've never seen any glitches similar to the ones in this video.
If I'm reading the support log correctly, this is on an Intel HD4600 GPU - is that right? It looks like the associated driver version is 10-16-2017 - I wonder if there is an updated driver available, which we could update just to get an idea if this is a driver related bug? It's a (relatively) old GPU, so it's possible there aren't any newer drivers available for this too.
Comment 7•6 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #3)
At 1m9s, after you've clicked on Save, an X icon disappears in the top-right corner (which looks similar to bug 1558107) at the same time the large corruption on the left appears. The icon didn't disappear the other times you've clicked on Save.
Comment 8•6 years ago
|
||
Andreea first observed this on June 12th or 13th. The processor is an i5-4590S and the latest version 15.40.42.5063 is from 2019-03-19: https://downloadcenter.intel.com/product/97500/Graphics-for-4th-Generation-Intel-Processors But they are managed by admins, so an update for testing might not be quick.
Talked about this issue with Matt at the start of the work week, he suggested to let Ryan know if it happens again.
Comment 9•6 years ago
|
||
WebRender isn't in my area of expertise so I'm not sure what's going on here. I don't think I'll have time to look at it further any time soon, either.
| Assignee | ||
Comment 10•6 years ago
|
||
I haven't had any luck reproducing this locally. Jeff, would you or someone else in Toronto be able to test on one of the Toronto machines with this hardware configuration?
Updated•6 years ago
|
Comment 11•6 years ago
|
||
I tried reproducing this on a HD4400 with an older driver. I didn't see the issue. Andreea can you see if you can reproduce the issue with gfx.webrender.picture-caching set to false?
| Reporter | ||
Comment 12•6 years ago
|
||
Hi Jeff. I've set gfx.webrender.picture-caching to false and will see what happens.
I'll post the result at the end of the shift (~11h)
Comment 13•6 years ago
|
||
Andreea no longer sees the issue after switching the pref.
Updated•6 years ago
|
Comment 14•6 years ago
•
|
||
Debian Testing, KDE, X11, Macbook Pro A1502, Intel Iris 6100 (Broadwell GT3)
A few moments ago I saw something similar I haven't seen before in this form. The left part of a website's light-grey background suddenly became transparent and revealed my desktop background. Elements of a fixed navigation (that do not scroll with the page) and the fixed header were still flawlessly painted on top of it. Circumstances: Two open windows, heat and blowing fans. Unfortunately no screenshot.
According to comment 2 Andreea also had two windows open, maybe it was too stressful?
Updated•6 years ago
|
Updated•6 years ago
|
Comment 15•6 years ago
|
||
Bug 1559688 fixed a horrible graphics corruption (bug 1565297).
Comment 16•6 years ago
|
||
[Tracking Requested - why for this release]: Display artifacts
Updated•6 years ago
|
| Assignee | ||
Comment 18•6 years ago
|
||
On a local Win10 + Intel HD530 machine, I can see black squares if I set gfx.webrender.force-angle to be false (which runs Gecko through the native GL driver instead of ANGLE/D3D), although I can't reproduce without that setting.
The black squares disappear if I disable picture caching in this configuration.
It might be a red herring (there are also other artifacts visible in this mode), but I will investigate this configuration and see if I can identify the cause, as it may be the same underlying issue.
| Assignee | ||
Comment 19•6 years ago
|
||
I made some progress on this today.
I managed to reduce the test case I have down to a single rectangle, followed by a single border.
Drawing the border on the 2nd to last tile results in NaN in the VS outputs gl_Position and some of the interpolators (at least, according to RenderDoc).
From what I can tell, this seems related to the textureSize call in the vertex shader in brush_image.
Specifically, if I replace that code with:
texture_size = textureSize(sColor0, 0);
texture_size = vec2(512.0);
Then the following occurs:
- If both lines are present, the bug occurs.
- Commenting out just the first line, bug does not occur.
- Commenting out just the second line, bug does occur.
Needs more investigation tomorrow to try and narrow this down further, and see if it is indeed related to the same symptoms under ANGLE.
| Assignee | ||
Comment 20•6 years ago
|
||
I wrote a patch that removes all textureSize usage, replacing them with uniforms. Unfortunately the black squares on native GL are still appearing with those calls removed.
Trying to reproduce another capture with that patch, to see if renderdoc reports anything else strange.
| Assignee | ||
Comment 21•6 years ago
|
||
Small progress on this today - it does seem to be somehow related to brush_image and/or sampling from an array texture. Tomorrow I'll continue investigating, and also try out some different hardware / driver variations to see if I can get a better repro case.
| Assignee | ||
Comment 22•6 years ago
|
||
A little bit more progress. I can now reproduce the bug as originally described, when running under ANGLE.
It occurs fairly commonly, but randomly enough to make it difficult to capture. I would estimate I see the glitch for one frame every few minutes of browsing on my local configuration.
I managed to capture a trace file in apitrace when the glitch occurred. The glitch shows up in the thumbnails view for the frame, but doesn't appear when I replay each draw call individually.
I was unable to capture the glitch in PIX, GPA or RenderDoc.
Next step - continue investigating the apitrace capture, and try to work out what looks different in the command stream on the frame the glitch occurs.
Updated•6 years ago
|
| Assignee | ||
Comment 23•6 years ago
|
||
| Assignee | ||
Comment 24•6 years ago
|
||
I managed to get a capture of the problem in RenderDoc, but I'm struggling to see what's going on. Perhaps someone else can make sense of these attached captures? Context below:
-
In the attached image, you can see the RenderDoc thumbnails of two consecutive frames. In each of the thumbnails, the glitch is apparent (one picture cache tile is corrupted).
-
If I open the first capture (frame 669) in RenderDoc, the output at the end of the frame looks correct, it's only the thumbnail that has the glitch. This frame is the one that draws that tile into the picture cache texture array.
-
If I open the second capture (frame 670) in RenderDoc, the glitch is shown in the final output image.
-
Looking at the content of the picture cache texture array slice in question, on frame 669 it looks correct. In frame 670, the content of the picture cache tile looks corrupted in the texture array. So it's not the drawing of the tile, it's the actual content of the tile that appears to be wrong.
-
As far as I can tell, nothing alters that texture slice after it's written to. Pixel history in RenderDoc doesn't seem to reveal anything - it looks like it gets written to in frame 669, and then just read from in frame 670. The corruption in the tile is odd - since it looks like something writes to it with some kind of blend, since the AA on the rounded corners of the image appears to be different?
Questions:
- Could the texture array be getting modified between frames in a way that RenderDoc can't see (or is there some kind of blit / resize of the array that I missed in the RenderDoc trace)?
- Since we can see the glitch in a RenderDoc capture of the D3D command stream, that seems unlikely to be an ANGLE bug? Although I guess it still could be...
- Is the way we use render targets / textures causing some kind of race condition / undefined behavior?
- Does this make sense at all? Are we most likely looking at a driver bug?
| Assignee | ||
Comment 25•6 years ago
|
||
| Assignee | ||
Comment 26•6 years ago
|
||
| Assignee | ||
Comment 27•6 years ago
|
||
Dzmitry, Nical, Jeff, any ideas on https://bugzilla.mozilla.org/show_bug.cgi?id=1562462#c24?
| Assignee | ||
Updated•6 years ago
|
Comment 28•6 years ago
|
||
I had a look at the GPU captures...
TL;DR: I found no evidence that either us or Angle are doing anything wrong. D3D11 command stream is reasonable. Looks like a driver bug so far.
Could the texture array be getting modified between frames in a way that RenderDoc can't see ?
AFAIK, RenderDoc captures all commands from one Present() to another. There should be no gaps.
(or is there some kind of blit / resize of the array that I missed in the RenderDoc trace)
Both frames have 20 slices, so no resize is taking place.
Since we can see the glitch in a RenderDoc capture of the D3D command stream, that seems unlikely to be an ANGLE bug? Although I guess it still could be...
Right. We'd see the problem in D3D commands if it was Angle.
Does this make sense at all? Are we most likely looking at a driver bug?
I confirm your observations to be correct.
Suggestions:
- investigate the non-Angle issue further, confirm if it's related/unrelated.
- play with parameters: tile size, picture cache texture format (e.g. https://phabricator.services.mozilla.com/D21965 has it as RGBA8), blit tiles instead of drawing, etc
- Force Dx11 debug runtime (i.e. using DX control panel) and run with NSDebugView attached to see the DX11 runtime debug messages; try to associate any with the glitch, if caught
Comment 29•6 years ago
•
|
||
https://mozilla.logbot.info/gfx/20190726#c16497308-c16497309
pseudo-free texture memory
For me, problems of bug 1565809 so far only appeared after longer active usage with many tabs.
And in the background, Thunderbird is always running with WebRender enabled, but behaves well and never shows any bugs.
| Assignee | ||
Comment 30•6 years ago
|
||
I tried an experiment today to remove the use of texture arrays for picture caching, wondering if the texture arrays were the cause of this (apparent) driver bug. With the attached patch, picture cache tiles are allocated and stored as blocks inside a normal 2D texture / render target.
Unfortunately, the bug persists even with this patch applied.
This seems to suggest it's a rendering issue with the content. Next steps I am going to try:
- Investigate z-buffer values and any potential z-accuracy problem.
- Investigate opaque / alpha pass differences and see if the problem still occurs with z/opaque optimizations disabled.
| Assignee | ||
Comment 31•6 years ago
|
||
Well, this is strange. Having pulled the latest code today, I can now no longer reproduce the bug locally.
Is anyone else able to confirm in the next nightly if it's still occurring for them? Or does anyone know of any patches that have landed recently which might be related?
| Assignee | ||
Comment 32•6 years ago
|
||
Notes from today:
-
Managed to get a reliable repro again. Seems to sometimes stop happening for short periods of time.
-
Tried to see of the bug occurs under various API scenarios:
-
Force enable WARP - bug does not occur.
-
Force native GL - bug does not occur.
-
Only seems to occur when running ANGLE + D3D11.
-
Seems to depend on a high(ish) frame rate to occur. Possible race condition etc. This might explain why it doesn't occur with WARP enabled.
-
Tried disabling all z-buffer / opaque optimizations. Force everything through the alpha pass. Bug still occurs, although seems less frequent.
-
Managed to create a very simple HTML test case that can reliably reproduce the bug each run (although only on random frames every 20 seconds or so). Manifests as the solid rectangles (the background and div in the test case) failing to draw and/or drawing with invalid geometry. So far unable to capture this test case in RenderDoc - possibly slows the frame rate down enough to prevent the bug occurring.
-
The above things are making me wonder if there is a WR / ANGLE / driver issue with a buffer that gets mapped and discarded / overwritten incorrectly, causing stale data to be read from a vertex texture and/or vertex/index buffer. I've tried a few hacks in WR and ANGLE to experiment with this, haven't found anything yet. It does seem like a plausible explanation for the above results though. More investigation into this tomorrow.
Comment 33•6 years ago
|
||
I believe that NI? was for me.
Observed some black squares while testing WR on Beta last week. No reliable repro steps tho, mostly when videos or pages are fastly loaded. Sometimes I can reproduce, sometimes it doesn't happen at all.
Could you provide that testcase for me too Glenn?
Meanwhile, I asked Andreea Pavel if she can still see this issue on the latest Nightly with Webrender enabled. Waiting for her results and will update here.
Updated•6 years ago
|
| Assignee | ||
Comment 34•6 years ago
|
||
Added a test case that reproduces the bug on the specific hardware. On my setup, it reliably reproduces, but may only happen one frame every minute or so.
| Assignee | ||
Comment 35•6 years ago
|
||
I made some progress on this today.
The bug appears to be related to the way we resize the vertex data textures (primitive headers, render task data etc). As far as I can tell, WR is behaving correctly here. I suspect there is a bug either in ANGLE or the underlying D3D driver that is occurring when a texture is deleted.
I verified that if I remove the calls to delete the vertex data textures when creating a new one, I can no longer reproduce the bug (of course, this leaks textures so isn't a proper solution).
I tried a solution where we use a texture pool, but I think I still saw the bug occur very occasionally, even with a pool size of 32. However, it definitely reduces the frequency of the bug very significantly, so seems to be on the right track.
Tomorrow, I will try to narrow this down further and find a reasonable workaround.
Comment 36•6 years ago
|
||
Checked the test case on latest Nightly and Beta and I can't reproduce the glitches.
Andreea mentioned she can't reproduce the glitches anymore after enabling WebRender on the latest Nightly. This was after 4h of work with WR enabled. She will ping me in case it happens again.
We both have Intel 4600.
| Assignee | ||
Comment 37•6 years ago
|
||
I'm still able to reproduce the bug on this machine, both with the test case and on real pages.
I'm reasonably convinced it is a driver bug now. D3D debug runtime doesn't detect any issues, even with GPU validation enabled.
ANGLE has a setDataFasterThanImageUpload option in the D3D11 workarounds struct, which changes the texture upload to not use UpdateSubResource. This is defaulted to off for D3D11-class hardware, and on for D3D9-class hardware.
If I switch that workaround on, the bug seems to disappear! I can't say with 100% certainty it fixes it, due to the nature of the bug. However, without this change, I would typically see the bug at least a few times per minute. Whereas, with this fix I browsed for ~30mins without seeing any glitches.
The attached patch sets this workaround for any Intel + Haswell combinations, since it occurs even with the most recent driver update.
Jeff and Jeff, what are your thoughts on such a fix? Would a workaround like this be accepted upstream? Could we apply it to our local ANGLE, even if just in the interim while we do more research on the problem at a lower priority?
| Reporter | ||
Comment 38•6 years ago
|
||
(In reply to Timea Babos [on PTO until 19th Aug - ni? Brindusa Tot] from comment #36)
Checked the test case on latest Nightly and Beta and I can't reproduce the glitches.
Andreea mentioned she can't reproduce the glitches anymore after enabling WebRender on the latest Nightly. This was after 4h of work with WR enabled. She will ping me in case it happens again.
We both have Intel 4600.
Since Timea is in PTO now, i'll write here. It's been ~1h since i got to work and this started occurring again. WebRender is enabled.
Comment 39•6 years ago
|
||
Comment 40•6 years ago
|
||
Updated•6 years ago
|
Comment 41•6 years ago
|
||
Comment 42•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/fe1d262c0542
https://hg.mozilla.org/mozilla-central/rev/d7f116f6262f
| Assignee | ||
Comment 43•6 years ago
|
||
Jeff, can we get this uplifted to beta?
| Assignee | ||
Comment 46•6 years ago
|
||
Comment on attachment 9082502 [details]
Bug 1562462 - ANGLE Cherry-pick: Fix occasional corruption of vertex textures in HD4600 GPUs for WebRender.
Beta/Release Uplift Approval Request
- User impact if declined: Users with WebRender on Haswell chipsets will see black flickering.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It's a very small patch that enables a tested workaround path inside the ANGLE library we use.
- String changes made/needed:
| Assignee | ||
Updated•6 years ago
|
Comment 47•6 years ago
|
||
Comment on attachment 9082502 [details]
Bug 1562462 - ANGLE Cherry-pick: Fix occasional corruption of vertex textures in HD4600 GPUs for WebRender.
Low risk patch for a 69 graphics regression, uplift approved for 69 beta 12, thanks.
Updated•6 years ago
|
Comment 48•6 years ago
|
||
| bugherder uplift | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 49•6 years ago
|
||
Hi, I tried to reproduce this issue using the test case from Comment 34 on different versions of old Beta and Nightly builds but without any success, I also tested the latest Nightly and Beta 69.0b12 and the issue does not occur there either.
I tested this issue on a Windows 10 with Intel vga 4600 HD graphics.
Andreea can you please take a look at this, you mentioned that you reproduced the issue 8 days ago, can you please recheck (when you come back from PTO)
| Reporter | ||
Comment 50•6 years ago
|
||
Hi everybody, i just got back from PTO. I had no issues this shift, could not reproduce this anymore.
| Reporter | ||
Comment 51•6 years ago
|
||
I have tested today on Firefox Quantum 69.0b16 (64-bit) and the issue no longer reproduces.
Comment 52•6 years ago
|
||
Hi, Based on Comment 50 as well as 51 it seems this issue no longer occurs, I will update the flags for this issue. Thank you Andreea.
Comment 53•6 years ago
|
||
Recently (for some weeks at least) while browsing websites, I started to recognize drawing problems, which could be referenced as black rectangular areas [1] in Firefox (v68 for sure) which now seem to be fixed with Firefox v69 (only tested for 30 minutes by now). Just thought I'd better inform about, that v68 and Intel HD 530 possibly have been affected as well.
By disabling Hardware acceleration (Options -> Performance -> [ ] Use recommended performance settings -> [ ] Use hardware acceleration when available) or 'about:config layers.acceleration.disabled=true', after restart of browser, no black rectangles can be observed any longer.
System specs: i5-6600 (Gfx Intel HD 530), W10-1903-x64 (w/ latest security patches as of 2019-08), Firefox 64-bit, dual screen (1x 1920x1200, 1x 1600x1200) - tried newest Intel Graphics driver 26.20.100.7000 without success first (upgrading from some older Intel Graphics driver).
[1]
- a permanent small black rectangle on top left of title bar
- occasionally drawing problems (usually Browser needs to run for couple of minutes to hours until I recognized this problem), black areas on displayed websites (can get as big as whole window, except title bar if I remember correctly), minimize and restore the browser window immediately restored the displayed website (no black areas any more)
Comment 54•6 years ago
|
||
It's quite unfortunate that we switched all the texture uploads to this path. As far as I can see, Angle doesn't do any ring buffering and GPU tracking for the staging texture area, so it just tries to map it every time we update the contents (on that slow path), which means there is a forced stall for GPU. The bug is resolved, but we can't reasonably ship anything that uses this slow path (see bug 1576637).
The "proper" solution here would be to have an entirely new texture uploading path, either by manually ring-buffering the staging textures, or invoking the GPU scatter (like we can do for GPU cache today). But before we go there (and it's arguably a significantly complex affair), it would be good to constraint the problematic domain:
- is it relevant that the VS stage uses the textures?
- is the texture format is relevant? i.e. does it only happen for RGBA32F, or also for other formats?
Glenn, do you think we could run a series of experiments to narrow down the issue?
| Assignee | ||
Comment 55•6 years ago
|
||
I'm not sure I understand the question - do you mean experiments with telemetry? Or running tests on various hardware in Toronto? Or something else?
Comment 56•6 years ago
|
||
Glenn, I mean experiments on your machine, given that you were able to consistently reproduce the issue and investigate it.
| Assignee | ||
Comment 57•6 years ago
|
||
OK, sure - I don't use that as my main development machine now, so we can freely use it to run whatever experiments and tests we want to.
Comment 58•6 years ago
|
||
Jeff, do we have someone who can reproduce this?
This try has the workaround disabled, so that we can confirm the repro case: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1cb16ffdce16c234677f8b95918d6aa41c90660
This try has the workaround restricted to 128bit formats: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d12a8005374cdcc90c371450886b20a872fefca8
If the last one works good, I'll try to upstream it to Angle.
Updated•6 years ago
|
Comment 59•6 years ago
|
||
This is still a problem in firefox 72
Description
•