Heavy flickering of WebGL background when webrender is on and canvas repeatedly added and removed
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
People
(Reporter: karlcow, Assigned: sotaro)
References
(Blocks 1 open bug, )
Details
(Keywords: regression)
Attachments
(2 files, 2 obsolete files)
This was initially reported on https://webcompat.com/issues/31728
May trigger epilepsy? Flickers quite rapidly.
This only happens when WebRender is enabled, and only with the background with the rotating cubes.
Bottom frame is normal, top (with black background) is abnormal.
Updated•6 years ago
|
Comment 1•6 years ago
|
||
I don't see this with WebRender on Mac on 69
Comment 2•6 years ago
|
||
Does this show up in 67?
Reporter | ||
Comment 3•6 years ago
|
||
I don't know for 67. I was not able to reproduce on mac either. But the report in https://webcompat.com/issues/31728 was someone reporting on Windows 10 with Firefox 69 and it was reproduced by one of our tester.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)
Does this show up in 67?
I was able to reproduce the issue on Dell XPS 15 9550 running windows 10 on 67 with WR enabled.
Updated•6 years ago
|
Comment 5•6 years ago
|
||
It looks like this is a probably a problem with WebGL/WebRender interaction.
Updated•6 years ago
|
Assignee | ||
Comment 6•6 years ago
|
||
I was able to reproduce the problem on my Win 10 laptop P50.
Assignee | ||
Comment 7•6 years ago
|
||
When the problem happened, graphic section of about:support had the following Failure Log. The error was returned without waiting at AcquireSync().
GP+[GFX1]: RenderDXGITextureHostOGL AcquireSync timeout, hr=0x80070057
With local build RenderDXGITextureHostOGL::EnsureLockable() also failed because of device->OpenSharedResource() failure.
https://searchfox.org/mozilla-central/source/gfx/layers/d3d11/TextureD3D11.cpp#758
Assignee | ||
Comment 8•6 years ago
|
||
SurfaceFactory creation and destruction happened very often. It might be related to the problem.
Assignee | ||
Comment 9•6 years ago
•
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
SurfaceFactory creation and destruction happened very often. It might be related to the problem.
It happened since WebRenderCanvasData was destroyed/created very often in WebRenderCommandBuilder::BuildWebRenderCommands() by mLastCanvasDatas.Clear();
https://searchfox.org/mozilla-central/source/gfx/layers/wr/WebRenderCommandBuilder.cpp#1542
It did not happen during normal WebGL usage.
Assignee | ||
Comment 10•6 years ago
|
||
SharedSurface_ANGLEShareHandle is wrapped by SharedSurfaceTextureClient. In normal situation, SharedSurface_ANGLEShareHandle is not destroyed before end of usage by RenderDXGITextureHostOGL. But when SurfaceFactory is destroyed, TextureClient::CancelWaitForRecycle() is called. It stop to wait end of the resource usage by RenderDXGITextureHostOGL.
https://searchfox.org/mozilla-central/source/gfx/layers/client/CanvasClient.cpp#502
https://searchfox.org/mozilla-central/source/gfx/gl/SharedSurface.cpp#279
Comment 11•6 years ago
|
||
Hi, I managed to get a regression range for this issue, here are the results:
Last known good build:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f8e6e95065e943a2208675cabb0eb0cdb06eab93&tochange=c7a754017e70c0cddce73fcfbc4c3ba0f1aed97b
First Known Bad build:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3e792c898f69fabe1bc2387f8c98631d2ec7b7c7&tochange=c7a754017e70c0cddce73fcfbc4c3ba0f1aed97b
I'm not sure but I think this is the changeset that caused the issue:
f8e6e95065e943a2208675cabb0eb0cdb06eab93 Andrei Oprea — Bug 1472599 - Reset browser.startup.page preference for older profiles r=k88hudson
Updated•6 years ago
|
Assignee | ||
Comment 12•6 years ago
|
||
Assignee | ||
Comment 13•6 years ago
|
||
Confirmed that attachment 9068649 [details] addressed the problem for me.
Assignee | ||
Comment 14•6 years ago
|
||
It is not good to use TextureFlags::DEALLOCATE_CLIENT, since TextureClient de-allocation becomes synchronous.
https://searchfox.org/mozilla-central/source/gfx/layers/client/TextureClient.cpp#424
https://searchfox.org/mozilla-central/source/gfx/layers/client/TextureClient.cpp#338
Assignee | ||
Comment 15•6 years ago
|
||
Actually we do not need to call TextureClient::CancelWaitForRecycle(). Without calling it, TextureClient is destroyed automatically, when host side ends to use it. Removing the CancelWaitForRecycle() call seems like a better solution.
Updated•6 years ago
|
Assignee | ||
Comment 16•6 years ago
|
||
Assignee | ||
Comment 17•6 years ago
|
||
Confirmed that Attachment 9068661 [details] addressed the problem for me.
Assignee | ||
Comment 18•6 years ago
|
||
Comment 19•6 years ago
|
||
Sotaro, do you know why the problem is happening on this page and we haven't otherwise seen this issue? How common do you expect this problem to be?
Comment 20•6 years ago
|
||
[Tracking Requested - why for this release]: We're not sure how widespread the problem is yet, but based on what we figure out we may want to consider disabling WebRender or doing a dot release.
Updated•6 years ago
|
Assignee | ||
Comment 21•6 years ago
•
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
Sotaro, do you know why the problem is happening on this page and we haven't otherwise seen this issue? How common do you expect this problem to be?
I do not know why it happened. But nsHTMLCanvasFrame seemed to be re-created by NS_NewHTMLCanvasFrame() for every WebRenderLayerManager transaction. I am not sure how the problem is common, though I did not saw such symptom other than this page.
Comment 22•6 years ago
|
||
Indeed the page has the following code:
function f() {
document.getElementById(e).appendChild(t.domElement);
for (var d = 0; d < s.length; d++)
s[d].position.y += .1,
s[d].rotation.x += .02,
s[d].rotation.y += .02,
s[d].rotation.z += .02;
if (i % o == 0) {
var l = i % (2 * o) == 0 ? 1 : -1
, p = new THREE.BoxGeometry(a,a,a)
, u = new THREE.Mesh(p,m());
u.castShadow = !0,
u.position.x = -1 * c(10),
u.position.y = -40,
u.position.z = c(40) * l,
s.push(u),
r.add(u)
}
requestAnimationFrame(f),
t.render(r, n),
i++
}
This is removing and adding the canvas node every frame which is a bizarre thing to do.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 23•6 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
Actually we do not need to call TextureClient::CancelWaitForRecycle(). Without calling it, TextureClient is destroyed automatically, when host side ends to use it. Removing the CancelWaitForRecycle() call seems like a better solution.
If we remove CancelWaitForRecycle(), shouldn't we expect it to wait for recycle instead, instead of eagerly recycling the TextureClient? It doesn't seem to me like this should fix the issue here, so I either don't understand why the patch is changing what it does, or perhaps why we're getting flickering in the first place.
Assignee | ||
Comment 24•6 years ago
|
||
perhaps why we're getting flickering in the first place.
Flickering happened when SharedSurface_ANGLEShareHandle is destroyed before RenderDXGITextureHostOGL::EnsureLockable() is called on Render thread. RenderDXGITextureHostOGL failed at device->OpenSharedResource() . In this case, SharedSurface_ANGLEShareHandle failed to render. Then black was rendered.
SharedSurface was recreated mostly for every transaction. RenderDXGITextureHostOGLs were also created and the EnsureLockable() was called. Some times rendering of RenderDXGITextureHostOGL succeeded, sometimes rendering of RenderDXGITextureHostOGL was failed. When it succeeded, background was rendered as white. When it failed, background was rendered as black. Then flickering happened.
If we remove CancelWaitForRecycle(), shouldn't we expect it to wait for recycle instead, instead of eagerly recycling the TextureClient?
Actual stop recycling is already done by SurfaceFactory::StopRecycling(). TextureClient::CancelWaitForRecycle() does different thing. The function name is not good. With it TextureClient lose refcount that is held when host side is in use. Then if TextureClient::CancelWaitForRecycle() is called, TextureClient is destroyed even when TextureHost is still in use. If it happens, the above problem could happen.
If TextureClient::CancelWaitForRecycle() is not called, the refcount is kept until the host side ends its usage. The refcount is removed by CompositorBridgeChild::NotifyNotUsed()
https://searchfox.org/mozilla-central/source/gfx/layers/ipc/CompositorBridgeChild.cpp#840
With it, SharedSurface_ANGLEShareHandle is kept alive until host side ends its usage. Then the above problem does not happen.
Assignee | ||
Comment 25•6 years ago
|
||
Assignee | ||
Comment 26•6 years ago
|
||
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 27•6 years ago
|
||
Comment 28•6 years ago
|
||
Comment 29•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/996a636778d0
https://hg.mozilla.org/mozilla-central/rev/59a62f52fe32
Comment 30•6 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #20)
[Tracking Requested - why for this release]: We're not sure how widespread the problem is yet, but based on what we figure out we may want to consider disabling WebRender or doing a dot release.
Tracking for 67 but do we know if this is actually a widespread problem or a problem affecting top sites?
Setting qe-verify+ keyword as I would like a manual verification that this is actually fixed on Nightly.
Thanks
Comment 31•6 years ago
|
||
(In reply to Pascal Chevrel:pascalc from comment #30)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #20)
[Tracking Requested - why for this release]: We're not sure how widespread the problem is yet, but based on what we figure out we may want to consider disabling WebRender or doing a dot release.
Tracking for 67 but do we know if this is actually a widespread problem or a problem affecting top sites?
The problem seems to happen if the site is recreating the canvas on every frame. I suspect this is quite uncommon. It's probably not common enough to ship in a dot release. It is probably worth adding something to the release notes though.
Comment 32•6 years ago
|
||
Hi again, This issue is verified as fixed in Firefox 69.0a1 (2019-06-03). I will mark it accordingly.
Comment 33•6 years ago
|
||
I agree with jrmuizel: This is a bug in how we handle this edgecase, but literally no one should be doing this, so we should probably let this ride the trains and engage with dev-evang to try to talk people out of doing this. Unless this is a huge site that can't migrate away from this edge-case quickly, we probably don't need a dot-release.
Comment 34•5 years ago
|
||
Marking as wontfix for 67 as this not seems to be a major problem for this release.
Comment 35•5 years ago
|
||
wontfix for 68 based on comment 33.
Updated•5 years ago
|
Comment 36•5 years ago
|
||
Hi, since this is a wont fix in 68 I will remove the Qe-verify flag.
Updated•5 years ago
|
Description
•