Last Comment Bug 769949 - With WebGL antialiasing enabled, leaving a WebGL FBO bound causes broken compositing
: With WebGL antialiasing enabled, leaving a WebGL FBO bound causes broken comp...
Status: VERIFIED FIXED
[games:p1]
: regression
Product: Core
Classification: Components
Component: Graphics: Layers (show other bugs)
: 16 Branch
: All All
: -- normal (vote)
: mozilla16
Assigned To: Jeff Gilbert [:jgilbert]
:
Mentors:
http://labs.ayzenberg.com/2012/03/htm...
Depends on:
Blocks: gecko-games GPU-clipping-rounded 726396
  Show dependency treegraph
 
Reported: 2012-06-30 13:28 PDT by Loic
Modified: 2014-05-19 04:56 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
fixed
verified


Attachments
testcase (1.04 KB, text/html)
2012-07-04 21:57 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
possible patch (1.58 KB, patch)
2012-07-04 22:20 PDT, Nick Cameron [:nrc]
ncameron: review-
Details | Diff | Splinter Review
patch (3.70 KB, patch)
2012-07-05 17:05 PDT, Jeff Gilbert [:jgilbert]
jacob.benoit.1: review+
Details | Diff | Splinter Review
backport (3.71 KB, patch)
2012-07-24 16:03 PDT, Jeff Gilbert [:jgilbert]
jgilbert: review+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Loic 2012-06-30 13:28:20 PDT
STR

1) Be sure HWA is enabled
2) Open Tony Lee WebGL demo (textures and meshes on 3D model)
http://sandbox.ayzenberg.com/creative_prototypes/html5_/webgl/prototypeEightMe.html
3) You need to wait 5-10 sec to see the 3D model rotating
Once you are on the demo page, you can toggle through different textures by clicking anywhere on the tab.
More details: http://labs.ayzenberg.com/2012/03/html5-webgl-demo-tony-lee/

Result: 
With HWA on, the 3D model stays frozen and doesn't rotate with the mouse and clicking anywhere on the page doesn't change the texture.

With HWA off, the demo plays fine.

In FF13, the demo plays fine with HWA on/off.

Mozregression range:

m-c
good=2012-05-04
bad=2012-05-05
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2db9df42823d&tochange=0a48e6561534

m-i
good=2012-05-02
bad=2012-05-03
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c3813fbb1c9a&tochange=de5745bce8bc

Suspected bug:
Nicholas Cameron - Bug 716439 - (GPU-clipping-rounded) Implement clipping to rectangles with rounded corners on the GPU
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2012-07-01 04:38:30 PDT
I seem to get different results than reported on comment 0:
 - with default settings (layers accel enabled), the demo plays "fine" except that it's extremely slow (have to wait far more than 10 seconds to see the results of mouse interaction).
 - with layers.acceleration.disabled=true, the canvas seems to remain blank black.

This with NVIDIA 296.88 driver on Windows 7 64bit, 32bit Nightly from 06-29.
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2012-07-01 04:43:34 PDT
Ah, wait!

With layers.acceleration.disabled, switching to another tab and coming back to the Tony Lee tab seems to immediately "fix" the rendering.
Comment 3 Loic 2012-07-01 04:52:03 PDT
Yes but the 3D model move stays "frozen" and is'nt smooth like with FF13.
Comment 4 Nick Cameron [:nrc] 2012-07-01 13:33:03 PDT
Does anyone else get an ASSERT thrown by Texture.cpp:241 in the Angle library when running this e.g. in a debug build? I get it fairly often, but not every run. I see similar behaviour to bjacob with HWA disabled, and as reported for HWA enabled. Looking into this now...
Comment 5 Nick Cameron [:nrc] 2012-07-01 14:49:07 PDT
Actually, I get the same behaviour with and without accelerated layers, it is frozen and generally borked, but switching tabs causes an update. I can switch textures with accelerated layers on or off, but only with a tab change. Turning Direct2D off fixes everything.
Comment 6 Nick Cameron [:nrc] 2012-07-01 20:17:03 PDT
This webgl demo, http://schumann.elis.ugent.be/#, seems to work fine with HWA, but not without, failing in the the same way as the reported one, again it is fine with or without HWA if D2D is turned off, so I assume it is the same bug. Not sure what that is though.
Comment 7 Nick Cameron [:nrc] 2012-07-01 21:43:23 PDT
On the HWA side, not using a shareHandle to make mTexture in CanvasLayerD3D10 and leaving mUsingSharedTexture false (around line 85 in CanvasLayerD3D10.cpp) makes the problem go away. I couldn't find an equivalent fix for basic layers. I don't understand why, and I couldn't 'fix' the problem by doing anything different when updating or rendering the canvas (e.g., getting a fresh shareHandle and mTexture or reattaching the shader resource view).

I tried a few more prefs (in particular Azure content), but the only thing that seems to make a difference is Direct2D, so I'm assuming this problem is some interaction between Direct2D and WebGL/ANGLE. I can't see anything in the mask layers patches which would cause this, and preventing most of the mask layers code from executing does not fix this. But, I agree that it still looks like the most likely thing to affect this in the regression range.

I think the precise problem is that the webgl canvas is not repainting unless it is re-initialised.

The failed ANGLE assertion (Texture.cpp:241) is extremely intermittent and only presents in debug builds (obviously it won't assert in an optimised build but it could bork, I can't recover after the assertion in a debug build). Possibly due to time? WebGL in a debug build really crawls. I can't work out if it is related to the bug here or not, but the bug occurs always, and the assertion not very often, so possibly not. I don't know what causes the assertion.

I'm pretty stuck with this. I'm going to leave it for today and have another look tomorrow, I'd really appreciate it if anyone has any ideas for things to look at/try.
Comment 8 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-02 15:35:46 PDT
[Triage comment]
Tracking, since it's an unshipped regression so getting this fixed before release is ideal.
Comment 9 Loic 2012-07-02 16:31:47 PDT
Nick, you can try to run another WebGL demo of Tony Lee if it's more "simple" to find the bug like this one about LOTR Ring:
http://sandbox.ayzenberg.com/creative_prototypes/html5_/webgl/prototypeEightLOD.html
On FF16, the ring doesn't rotate slowly and stays frozen.
Comment 10 Nick Cameron [:nrc] 2012-07-02 16:43:07 PDT
I did some inbound builds this morning, and I can confirm that it is something in the mask layers (bug 716439) patches that is causing this problem, trying to work out exactly what now...
Comment 11 Nick Cameron [:nrc] 2012-07-02 20:06:09 PDT
narrowed it down to changeset f7b8deeb0cc4. 

I didn't mean to change the tracking flags, but can't change them back, sorry :-(
Comment 12 Nick Cameron [:nrc] 2012-07-02 21:27:49 PDT
The problem appears to be that the webgl canvas is being rendered as an active layer (Canvas layer), whereas before the mask layers patch it was treated as inactive and rendered on a Thebes layer. No idea why that causes the breakage yet though.
Comment 13 Nick Cameron [:nrc] 2012-07-02 22:19:01 PDT
Turning bug 618892 off fixes this, and doesn't break other WebGL examples, but I guess we lose speed. No idea why D2D + pbuffers + Tony Lee breaks.
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2012-07-03 02:56:34 PDT
Maybe this path breaks the DidTransactionCallback mechanism? The WebGL implementation needs to know when compositing occurred, and knows that when the DidTransactionCallback function is called. If this callback isn't properly called by this path, that could explain it.
Comment 15 Nick Cameron [:nrc] 2012-07-03 14:45:58 PDT
(In reply to Benoit Jacob [:bjacob] from comment #14)
> Maybe this path breaks the DidTransactionCallback mechanism? The WebGL
> implementation needs to know when compositing occurred, and knows that when
> the DidTransactionCallback function is called. If this callback isn't
> properly called by this path, that could explain it.

Thanks for the tip. I checked it though, and it seems to be getting called just fine.
Comment 16 Nick Cameron [:nrc] 2012-07-04 17:55:23 PDT
I think bug 725747 may have broken this, commenting out http://mxr.mozilla.org/mozilla-central/source/gfx/gl/GLContext.h#1115 seems to fix this problem. Perhaps the internal FBO needs to be reset before calling this method or something?
Comment 17 Vladimir Vukicevic [:vlad] [:vladv] 2012-07-04 18:45:29 PDT
That's very bad if that fixes it -- it means that some of the FBO tracking is busted.  I don't -fully- understand the internal/external FBO stuff, though I think I understand the intent. But gets called before any read operation, and I guess the check is to see if there's a user bound fbo or not.  If there isn't, then it does a blit from the offscreen draw fbo to the offscreen read fbo if they're different, which, I admit, I don't understand when that would be the case.

Or wait, yes I do.  It happens if we want to have a multisample front buffer, where we can't ReadPixels from directly -- we'd need to do a framebuffer resolve first, but we want to hide that fact from content (especially WebGL).  Ok; so I'd do some dumps to see what framebuffer stuff that WebGL thing is doing, and then see where our bindings are going wrong.

My guess -- it's drawing to framebuffer 0 (the front/display buffer), but is leaving some other FBO bound at the end of the JS draw operation.  If some code isn't being careful, that could cause this... I'd see what code is being read when fReadPixels (or raw_ReadPixels) is being called, and who's calling it, if we're going down the readpixels path.  Otherwise there might be something similar going on in another path.
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2012-07-04 19:41:00 PDT
Aha --- setting webgl.msaa=0  (which disables WebGL multisampling) fixes it.
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2012-07-04 21:57:52 PDT
Created attachment 639217 [details]
testcase

minimal test case
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2012-07-04 22:01:17 PDT
Jeff: See Nick's Comment 16 and the testcase.
Comment 21 Nick Cameron [:nrc] 2012-07-04 22:15:12 PDT
I think I have a fix for this, it seems to me that BeforeGLReadCall should blit the offscreen read to the offscreen draw buffer if we are about to do a glFinish, even if the offscreen buffer is not currently bound. This certainly fixes the original Tony Lee example and bjacob's test case, not 100% sure it is right though.

I've pushed to Try: https://tbpl.mozilla.org/?tree=Try&rev=fdf1104da540
Comment 22 Nick Cameron [:nrc] 2012-07-04 22:20:00 PDT
Created attachment 639221 [details] [diff] [review]
possible patch

could do with a comment, but just to get the idea of what I think the fix might be
Comment 23 Loic 2012-07-05 03:03:42 PDT
(In reply to Benoit Jacob [:bjacob] from comment #19)
> Created attachment 639217 [details]
> testcase
> 
> minimal test case

There is something I don't understand. Why does your testcase not work with FF13 while Tony Lee's demo seems to work fine with FF13?

Indeed, mozregression found a different range for your testcase:
m-i
good=2012-03-12
bad=2012-03-13
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=eceb16d4057b&tochange=d4496bf4bb64

Is this bug in 2 regression steps?
Comment 24 Nick Cameron [:nrc] 2012-07-05 04:03:30 PDT
Pre-mask layers, any object with rounded corners was forced to be rendered to an inactive layer, this uses a different code path for canvases, so the Tony Lee demo rendered fine. With the mask layers patches (in the regression range), the Tony Lee demo got it's own layer (with a mask, but the mask has nothing to do with the bug), and so you get the same effects (bug) as in the minimal test case. So, the rounded corners hid the underlying bug which was exposed when mask layers where introduced, the mask layers didn't introduce the bug. You can test this by removing the rounded corners from the demo, and it breaks in the same way as for more recent versions of FF.

I haven't checked the regression range for the test case, but I suspect it will be for one of the patches which improved the FBO handling.
Comment 25 Loic 2012-07-05 04:17:48 PDT
(In reply to Nick Cameron [:nrc] from comment #24)
> I haven't checked the regression range for the test case, but I suspect it
> will be for one of the patches which improved the FBO handling.
Thanks for the explanation.

Indeed, the suspected bug seems to be:
Jeff Gilbert — Bug 726396 - Repair ANGLE d3d share handle fetching an PBuffer creation behavior - r=bjacob 
The patch touched code about FBO bindings.
Comment 26 Vladimir Vukicevic [:vlad] [:vladv] 2012-07-05 06:53:59 PDT
(In reply to Nick Cameron [:nrc] from comment #21)
> I think I have a fix for this, it seems to me that BeforeGLReadCall should
> blit the offscreen read to the offscreen draw buffer if we are about to do a
> glFinish, even if the offscreen buffer is not currently bound. This
> certainly fixes the original Tony Lee example and bjacob's test case, not
> 100% sure it is right though.
> 
> I've pushed to Try: https://tbpl.mozilla.org/?tree=Try&rev=fdf1104da540

Nope, not a good fix -- you're accidentally fixing the symptom, but without fixing the underlying bug.  glFinish should have nothing to do with blitting nor should it care about what FBO is currently bound.  Putting the BeforeGLReadCall wrapper around there probably causes the right front buffer to get updated at that time when the FBO bindings just happen to be "right".
Comment 27 Jeff Gilbert [:jgilbert] 2012-07-05 14:58:51 PDT
Comment on attachment 639221 [details] [diff] [review]
possible patch

Review of attachment 639221 [details] [diff] [review]:
-----------------------------------------------------------------

Vlad's right, that's not it. I'll take a look. My guess is something touched the somewhat-delicate FBO binding code.
Comment 28 Jeff Gilbert [:jgilbert] 2012-07-05 15:26:01 PDT
Yeah, looks GuaranteeResolve() will not trigger a AA-resolve blit if we're bound to something other than 0 (from the user's point of view). Patch forthcoming.
Comment 29 Jeff Gilbert [:jgilbert] 2012-07-05 17:05:26 PDT
Created attachment 639525 [details] [diff] [review]
patch

GuaranteeResolved() must call BlitDirtyFBOs(). Previously, we were only calling fFinish(), which only blitted FBs if we were bound to FB '0'.

On my linux machine, bug confirmed, and fix verified.
Comment 30 Jeff Gilbert [:jgilbert] 2012-07-05 17:33:09 PDT
This affects all desktop platforms, but not mobile.
Comment 31 Benoit Jacob [:bjacob] (mostly away) 2012-07-06 14:54:29 PDT
Comment on attachment 639525 [details] [diff] [review]
patch

Review of attachment 639525 [details] [diff] [review]:
-----------------------------------------------------------------

I'm happy to see the "flush guarantees resolve" thing go away since it didn't seem to be used anyway; and happy too that fFinish() is no longer wrapped as a read-call.
Comment 33 Ryan VanderMeulen [:RyanVM] 2012-07-07 12:01:22 PDT
https://hg.mozilla.org/mozilla-central/rev/921d27b8a76e
Comment 34 Jeff Gilbert [:jgilbert] 2012-07-12 17:54:48 PDT
I'll put together a backport of this, since it's really simple.
Comment 35 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-18 09:41:15 PDT
(In reply to Jeff Gilbert [:jgilbert] from comment #34)
> I'll put together a backport of this, since it's really simple.

Please nominate the backport for mozilla-beta approval now that 15 is on Beta. Since we have STR it would be great to get a fix landed in an earlier beta so we can get time to verify.
Comment 36 Scoobidiver (away) 2012-07-24 00:20:40 PDT
(In reply to Jeff Gilbert [:jgilbert] from comment #34)
> I'll put together a backport of this, since it's really simple.
Please do ASAP.
Comment 37 Jeff Gilbert [:jgilbert] 2012-07-24 16:03:41 PDT
Created attachment 645564 [details] [diff] [review]
backport

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 726396
User impact if declined: Failure to render webgl sometimes under normal operation.
Testing completed (on m-c, etc.): m-c, on aurora.
Risk to taking this patch (and alternatives if risky): Low: This is well understood.
String or UUID changes made by this patch: None

Carry r+ forward from original patch.
Comment 38 Lukas Blakk [:lsblakk] use ?needinfo 2012-07-24 16:52:01 PDT
Comment on attachment 645564 [details] [diff] [review]
backport

Still early enough in beta cycle that we can take this fix for a regression.
Comment 39 Jeff Gilbert [:jgilbert] 2012-07-24 17:07:49 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/498309f32015
Comment 40 Simona B [:simonab] 2012-08-03 08:35:26 PDT
Verified on Firefox 15 beta 3 that WebGL loads fine having HWA enabled - the demo from the Description plays fine and that the canvas from the test case attached in Comment 19 repaints constantly. 
Verified on Windows 7, Ubuntu 12.04 and Mac OS X 10.7:

Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0
Comment 41 Paul Silaghi, QA [:pauly] 2014-05-19 04:56:31 PDT
Verified fixed 32.0a1 (2014-05-18), Win 7 x64

Note You need to log in before you can comment on or make changes to this bug.