WebGL / Canvas displaying uninitialized video memory.

RESOLVED FIXED in Firefox 11

Status

()

Core
Canvas: WebGL
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: mbx, Assigned: jgilbert)

Tracking

unspecified
mozilla11
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox8 wontfix, firefox9 wontfix, firefox10+ wontfix, firefox11+ fixed, firefox-esr1011+ fixed, status1.9.2 unaffected)

Details

(Whiteboard: [sg:vector-high][qa-])

Attachments

(5 attachments, 7 obsolete attachments)

106.79 KB, application/zip
Details
564 bytes, text/html
Details
7.87 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
7.46 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
7.46 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
(Reporter)

Description

6 years ago
Created attachment 573407 [details]
HTML page that reproduces bug and screenshot.

Setting the canvas width and height properties AFTER you get an webgl context can lead to reading of arbitrary video data, desktop images, other tabs, etc. This seems to occur on MacBook Airs, and not on MacBook Pros. This occurs in Firefox 6.0.2, as well as Nightly.

Machine
=======

  Model Name:	MacBook Air
  Model Identifier:	MacBookAir4,2
  Processor Name:	Intel Core i7
  Processor Speed:	1.8 GHz
  Number of Processors:	1
  Total Number of Cores:	2
  L2 Cache (per Core):	256 KB
  L3 Cache:	4 MB
  Memory:	4 GB
  Boot ROM Version:	MBA41.0077.B0E
  SMC Version (system):	1.73f63

about:support
=============
 
Adapter Description: 0x24300,0x20400
WebGL Renderer: Intel Inc. -- Intel HD Graphics 3000 OpenGL Engine -- 2.1 APPLE-7.12.9GPU 
Accelerated Windows: 1/1 OpenGL

Updated

6 years ago
Whiteboard: sg:high

Comment 1

6 years ago
This seems to occur only on some hardware. Setting sg:high for now (information leak).
(Reporter)

Comment 2

6 years ago
Created attachment 573432 [details]
HTML page that reproduces bug.
This is not the same bug as bug 684882, but that one also was specific to Intel GPUs on Mac OSX, with similar consequences.
The reason why it only happens on MacBook Airs is that we use the discrete (non-Intel) GPU for WebGL when there is one. MacBook Airs are special in that they only have Intel integrated graphics.

What is your Mac OS X version?
(Reporter)

Comment 5

6 years ago
I'm running, Mac OS X Lion 10.7.2 (11C74)
Does the problem go away if you set webgl.msaa-level=0 ?   (in about:config)

Does it go away if you set layers.acceleration.disabled=true and restart Firefox?

Is the size 512 important in your testcase? Is there a minimal size under which the problem doesn't happen?

Note that this should be reproducible on other Macs by using the gfxCardStatus utility to force Firefox to use Intel integrated graphics.
(Reporter)

Comment 7

6 years ago
I added webgl.msaa-level = 0 to about:config since it wasn't already there and the problem persists.

I set layers.acceleration.disabled=true and restarted Firefox and the problem still persists.

512 is not that important, the lowest I was able to set it, and yet reproduce the bug was 300. As you increase the size you can see more. I'm guessing that when using larger sizes it's more likely to encounter stretches of non zero memory.
Chrome doesn't seem to suffer from this bug. It's possible that it might be a bug on our side. investigating.
The bug is Mac-specific (but could well be a bug in our code).

The testcase results in GLContext::ResizeOffscreenFBO being called 3 times (yes, we need to make that lazy). We check on Jeff M's Mac and it didn't call ClearSafely() everytime.

But here on Linux, I do get ClearSafely called everytime by ResizeOffscreenFBO.

We need to step though ResizeOffscreenFBO carefully on a Mac to figure where we're returning early, before the ClearSafely call.

At a more meta-level, we need to make this code less prone to forgetting to ClearSafely.

Comment 10

6 years ago
Can RAII help here?
(In reply to Andreas Gal :gal from comment #10)
> Can RAII help here?

Probably, yes, since the problem is that we need to ensure that ClearSafely gets called even if we return early.

I would still like to understand where exactly things go wrong on Mac.
I can't reproduce this on a Nightly anymore. We should get a range on when it was fixed.

Comment 13

6 years ago
Michael, does this still happen for you with a nightly?
This was "fixed" because antialaising is now used on OS X. Forcing msaa-level to 0 causes the bug to show up again.
We don't call ClearSafely in ResizeOffscreenFBO because we take the following condition:

    if (!useDrawMSFBO && !aUseReadFBO)
        return true;
Created attachment 574094 [details] [diff] [review]
remove early return in ResizeOffscreenFBO that we're hitting, and that doesn't seem to make sense

I don't understand that early return. Even if we don't have a MS draw FBO and don't have a read FBO, we still have a non-MS draw FBO to resize. So why return early here?

This is definitely the early return that Jeff was hitting. He had to set msaa-level=0 to be able to reproduce since AA has been turned on on Mac.

Side question: I couldn't reproduce this on Linux as there, aUseReadFBO is true (while it's false on Mac with AA disabled). Why is that?
Attachment #574094 - Flags: review?(jgilbert)
(In reply to Benoit Jacob [:bjacob] from comment #16)
> Created attachment 574094 [details] [diff] [review] [diff] [details] [review]
> remove early return in ResizeOffscreenFBO that we're hitting, and that
> doesn't seem to make sense
> 
> I don't understand that early return. Even if we don't have a MS draw FBO
> and don't have a read FBO, we still have a non-MS draw FBO to resize. So why
> return early here?
> 
> This is definitely the early return that Jeff was hitting. He had to set
> msaa-level=0 to be able to reproduce since AA has been turned on on Mac.
> 
> Side question: I couldn't reproduce this on Linux as there, aUseReadFBO is
> true (while it's false on Mac with AA disabled). Why is that?

This patch fixes it for me.

Comment 18

6 years ago
Thanks for the quick fix guys.
(Assignee)

Comment 19

6 years ago
This isn't the proper fix, since ResizeOffscreenFBO wasn't even called for all paths until AA hit. For paths using backing PBuffers, for example, we never called it, so we had to make sure we were clearing the buffer elsewhere.

It looks like we are just not clearing the buffer on Mac w/PBuffers.
(Assignee)

Comment 20

6 years ago
Comment on attachment 574094 [details] [diff] [review]
remove early return in ResizeOffscreenFBO that we're hitting, and that doesn't seem to make sense

Uploading a new fix shortly.
Attachment #574094 - Flags: review?(jgilbert) → review-
(Assignee)

Comment 21

6 years ago
Created attachment 574213 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Try: https://tbpl.mozilla.org/?tree=Try&rev=de8f0e15f875
Assignee: nobody → jgilbert
Attachment #574094 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #574213 - Flags: review?(bjacob)
Attachment #574213 - Flags: review?(bjacob) → review+
(Assignee)

Comment 22

6 years ago
Looks like it passes try.
Can someone on Mac replicate this bug before applying the fix, but not after?
(Reporter)

Comment 23

6 years ago
I updated to the latest Nightly (11.0a1 (2011-11-14)) and the bug is still there.
(Reporter)

Comment 24

6 years ago
I also tried it on this build ftp://ftp.mozilla.org/pub/firefox/try-builds/jgilbert@mozilla.com-de8f0e15f875/try-macosx64/firefox-11.0a1.en-US.mac.dmg and it still fails.
(Assignee)

Comment 25

6 years ago
Created attachment 574403 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Added MakeCurrent call before clearing after resize.
Attachment #574213 - Attachment is obsolete: true
(Assignee)

Comment 26

6 years ago
This is fixed if we use the hidden pref cgl.prefer-fbo set to true. (default is false) This must be a problem with the PBuffers path in CGL.
(Assignee)

Comment 27

6 years ago
What's the first version which this fails on?
We can simply fix 10 by using cgl.prefer-fbo until we can track down the elusive problem.
(In reply to Jeff Gilbert [:jgilbert] from comment #27)
> What's the first version which this fails on?
> We can simply fix 10 by using cgl.prefer-fbo until we can track down the
> elusive problem.

What's your feeling about this: is the CGL/FBO path in good enough shape for this despite not having been default until now? if we switch, we should do it ASAP to maximize testing.
(Assignee)

Comment 29

6 years ago
I do not know. Can someone with a mac compare mochitest outputs with cgl.prefer-fbo? This is currently a pretty big problem. It may be worth switching paths until this is solved, even if we take a small conformance hit.
Just make a tryserver push?
Whiteboard: sg:high → [sg:high]
I also got this behavior with todays nightly on my MacBook Pro.
This website shows me uninitialized data at the beginning:
http://alteredqualia.com/three/examples/webgl_terrain_dynamic.html
According to Jeff M in comment 17, my patch (now r-'d and obsoleted) fixed (the effects of) this bug even if it wasn't the correct solution. Shouldn't we just take it? It just removes 2 lines that we agreed were wrong. So actually I'm not sure i understand the r- on it.
(Assignee)

Comment 33

6 years ago
The code after the early exit relies upon the assumption that the conditions of the early exit are false. That is, the code is not designed to work without the early-out. Resize code in GLContext should not be responsible for clearing the buffers, because that's only needed by WebGL, and actually easier to assure if we deliberately clear the buffer in WebGL.

I think the reason the deguarantee patch was still failing was because the read FBO isn't blitted to before we use it as a texture for drawing. Either ClearSafely() should force-blit the FBOs, or layers code must guarantee that it will blit the buffers before using the texture. The latter seems the more proper solution, to me.
OK. Please continue driving this i.e. talk with Layers people as needed; we need to have this bug fixed ASAP.
Matt Woodrow's out most of this week. Who else knows GLContext and layers?

Is there a GLContext method the layer system needs to call before every render of a WebGL CanvasLayer? What is it? E.g., what is BasicCanvasLayer::UpdateSurface not doing that it should be doing?
(Assignee)

Comment 36

6 years ago
Created attachment 575817 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Updated to assure that buffers are cleared before using the texture by adding GLContext::ReadyOffscreenTexture.

Testing via try at: https://tbpl.mozilla.org/?tree=Try&rev=f699ee432c35

I will be able to run tests/debug locally on mac and linux tomorrow.
Attachment #574403 - Attachment is obsolete: true
Attachment #575817 - Flags: review?(bjacob)
(Assignee)

Comment 37

6 years ago
Comment on attachment 575817 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Assumes that glFlush is sufficient to sync buffers across contexts.
Attachment #575817 - Flags: review?(bjacob) → review-
(Assignee)

Comment 38

6 years ago
Created attachment 575827 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Updated to use OpenGL spec-guaranteed glFinish for syncing textures between GLContext and Layers in D3D10 and OGL.
Attachment #575817 - Attachment is obsolete: true
Attachment #575827 - Flags: review?(bjacob)
(Assignee)

Comment 39

6 years ago
New try run: https://tbpl.mozilla.org/?tree=Try&rev=f481ecdb6027
(Assignee)

Comment 40

6 years ago
I'll also note here that I cannot reproduce this on my Mac:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0a1) Gecko/20111121 Firefox/11.0a1
ATI Technologies Inc. -- AMD Radeon HD 6630M OpenGL Engine -- 2.1 ATI-7.12.9
(Reporter)

Comment 41

6 years ago
I tried https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jgilbert@mozilla.com-f481ecdb6027/try-macosx64/firefox-11.0a1.en-US.mac.dmg and it still fails on my MacBook Air.
(Assignee)

Comment 42

6 years ago
It must be using a path I'm not expecting, investigating...
(Assignee)

Comment 43

6 years ago
Does it also replicate without errors on the debug builds from that try batch?
(Reporter)

Comment 44

6 years ago
Same behavior in the debug builds.
(Assignee)

Comment 45

6 years ago
Ok, until I can replicate this and fix it properly, we should probably hack around this and just manually blit the buffers in WebGLContext::SetDimensions.

Note that there still exists a bug where we are not correctly resolving the GLContext before drawing it out with OGL Layers.
(Assignee)

Comment 46

6 years ago
Created attachment 576058 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

https://tbpl.mozilla.org/?tree=Try&rev=5fca6dc7a41c
Attachment #575827 - Attachment is obsolete: true
Attachment #575827 - Flags: review?(bjacob)
Attachment #576058 - Flags: review?(bjacob)
Attachment #576058 - Flags: feedback?(mbebenita)
(Assignee)

Comment 47

6 years ago
Created attachment 576063 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Fixed forgetting to force the resolve after the blit.

https://tbpl.mozilla.org/?tree=Try&rev=6089a0a4ff7b
Attachment #576058 - Attachment is obsolete: true
Attachment #576058 - Flags: review?(bjacob)
Attachment #576058 - Flags: feedback?(mbebenita)
Attachment #576063 - Flags: review?(bjacob)
Attachment #576063 - Flags: feedback?(mbebenita)
(Assignee)

Comment 48

6 years ago
While this should be fixed anyways, bug 702058 switches to using FBOs instead of PBuffers on Mac, which should hide this bug.
Comment on attachment 576063 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly

Review of attachment 576063 [details] [diff] [review]:
-----------------------------------------------------------------

Just some nits, need some explanations.

::: content/canvas/src/WebGLContext.cpp
@@ +545,5 @@
>          // everything's good, we're done here
>          mWidth = gl->OffscreenActualSize().width;
>          mHeight = gl->OffscreenActualSize().height;
> +
> +        gl->ClearSafely();

Since we must not forget to call ClearSafely in SetDimensions, and I don't see a very good way to ensure it's always called (since it should not be called in error cases), I would really, really like a unit test (can take the testcase of this bug as a starting point).

::: gfx/gl/GLContext.cpp
@@ +1387,1 @@
>          fViewport(viewport[0], viewport[1], viewport[2], viewport[3]);

I don't get that. Why are we still playing at all with the viewport here? I thought that was only needed a long time ago when ClearSafely didn't take care of it. It now does, so this viewport business should have been useless for a while. Moreover I don't understand why firstTime matters here.

::: gfx/gl/GLContext.h
@@ +1010,5 @@
>  
> +    // Before reads from offscreen texture
> +    // Resolve contents in this context, before we resume work with the contents on 'other' context
> +    // If other is null, block until new commands will only be executed after the resolve. (Usually via glFinish)
> +    void ResolveBeforeResumingOther(GLContext* other = nsnull) {

As discussed, I would mimic the ARB_sync API here rather than coming up with a custom function name. Ideally it would be neat if we could implement the ARB_sync API with a fallback to using glFinish (and maybe, later, some mutex locking) when ARB_sync is not available.
Attachment #576063 - Flags: review?(bjacob) → review-
(Assignee)

Updated

6 years ago
Depends on: 705024
(Assignee)

Updated

6 years ago
Attachment #576063 - Flags: feedback?(mbebenita)
(Reporter)

Comment 50

6 years ago
Although I cannot reproduce it, it seems that a similar problem persists when waking up the laptop from sleep mode. I'm guessing it may have something to do with a loss of the GL context.
(Assignee)

Updated

6 years ago
Blocks: 711642
(Assignee)

Comment 51

6 years ago
Created attachment 582419 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly
Attachment #576063 - Attachment is obsolete: true
Attachment #582419 - Flags: review?(bjacob)
Attachment #582419 - Flags: review?(bjacob) → review+
(Assignee)

Comment 52

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/91141088eb7d
Target Milestone: --- → mozilla11
(Assignee)

Comment 53

6 years ago
(In reply to Michael Bebenita from comment #50)
> Although I cannot reproduce it, it seems that a similar problem persists
> when waking up the laptop from sleep mode. I'm guessing it may have
> something to do with a loss of the GL context.

Are you seeing this only in WebGL contexts, or for the whole browser?
(Reporter)

Comment 54

6 years ago
(In reply to Jeff Gilbert [:jgilbert] from comment #53)
> (In reply to Michael Bebenita from comment #50)
> > Although I cannot reproduce it, it seems that a similar problem persists
> > when waking up the laptop from sleep mode. I'm guessing it may have
> > something to do with a loss of the GL context.
> 
> Are you seeing this only in WebGL contexts, or for the whole browser?

It was only in the WebGL context.
Target Milestone: mozilla11 → ---
inbound/m-c merge by Ed Moreley
https://hg.mozilla.org/mozilla-central/rev/91141088eb7d
status-firefox11: --- → fixed
Target Milestone: --- → mozilla11
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
From a quick read this sounds like a Mac OS/driver bug we're working around rather than a bug in our code per se. If that's wrong please change the whiteboard back to [sg:high].

Please create a patch to land on Firefox 10 and request approval-mozilla-beta, or attach the request to the existing patch if it works land as-is.
status1.9.2: --- → unaffected
status-firefox10: --- → affected
status-firefox8: --- → wontfix
status-firefox9: --- → wontfix
tracking-firefox10: --- → +
tracking-firefox11: --- → +
Whiteboard: [sg:high] → [sg:vector-high]
(Assignee)

Comment 57

6 years ago
Created attachment 589258 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly (fx10)

Backport to Fx10.
Attachment #589258 - Flags: review?(bjacob)
Attachment #589258 - Flags: review?(bjacob) → review+
(Assignee)

Comment 58

6 years ago
Comment on attachment 589258 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly (fx10)

[Approval Request Comment]
Regression caused by (bug #): This bug.
User impact if declined: Security concerns noted above.
Testing completed (on m-c, etc.): Was landed on MC in Fx11, now in Aurora.
Risk to taking this patch (and alternatives if risky): If there is anyone relying on GLContext resizes clearing its buffers (besides WebGL, as fixed in this patch), they may receive uninitialized data instead of cleared buffers.
Attachment #589258 - Flags: approval-mozilla-beta?
(Assignee)

Comment 59

6 years ago
Try attempt here:
https://tbpl.mozilla.org/?tree=Try&rev=b60e9f97915f

Passing reftests, failing(?) talos, burning on android.
Comment on attachment 589258 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly (fx10)

[Triage Comment]
Please re-set approval-mozilla-beta flag once try builds are in order and this is ready to land.
Attachment #589258 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
(Assignee)

Comment 61

6 years ago
Null try run from beta:
https://tbpl.mozilla.org/?tree=Try&rev=a43849b2a5f0

Preliminarily indicates that fails encountered on previous try build were to be expected when trying from beta.

Android appears to be burning because the makefile/mozconfig requirements changed. Talos appears to be failing because it's hitting a 404 when requesting 'http://hg.mozilla.org/try/raw-file/b60e9f97915f/testing/talos/talos.json'. 

I can spend time trying to fix these to get a clean try run, but effectively the same code has passed try and landed in Fx11. Is it inviable to land it on Beta while being prepared to back it out immediately?
Whiteboard: [sg:vector-high] → [sg:vector-high][qa+]
Having problems reproducing original using identical Macbook Air with 2011-11-09 Nightly build with the included testcase.

Info:

Build identifier: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0a1) Gecko/20111109 Firefox/11.0a1

Machine:

  Model Name:	MacBook Air
  Model Identifier:	MacBookAir4,2
  Processor Name:	Intel Core i7
  Processor Speed:	1.8 GHz
  Number of Processors:	1
  Total Number of Cores:	2
  L2 Cache (per Core):	256 KB
  L3 Cache:	4 MB
  Memory:	4 GB
  Boot ROM Version:	MBA41.0077.B0E
  SMC Version (system):	1.73f63

Display:

  Chipset Model:	Intel HD Graphics 3000
  Type:	GPU
  Bus:	Built-In
  VRAM (Total):	384 MB
  Vendor:	Intel (0x8086)
  Device ID:	0x0116
  Revision ID:	0x0009

about:support:

Vendor ID: 8086
Device ID: 0116
WebGL Renderer: Intel Inc. -- Intel HD Graphics 3000 OpenGL Engine -- 2.1 APPLE-7.14.5GPU 
Accelerated Windows: 1/1 OpenGL

Looks identical, aside from the renderer. This is Lion 10.7.2, same as mentioned in comment 5.

Since I can't repro original, can't verify fix yet.
[Triage comment]

This bug is being tracked for landing on ESR branch.  Please land patches on http://hg.mozilla.org/releases/mozilla-esr10/by Thursday March 1, 2012 in order to be ready for go-to-build on Friday March 2, 2012.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more information.
tracking-firefox-esr10: --- → 11+

Updated

5 years ago
status-firefox10: affected → wontfix
status-firefox-esr10: --- → affected
[Triage comment]
Confirmed with dveditz that esr10 is affected by this bug and requires this patch to go to build - can someone please land today or tomorrow?  Let me know if there are any concerns we should be aware of.
(Assignee)

Comment 65

5 years ago
Created attachment 601483 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly (esr10)

Original patch applied (almost) cleanly to esr10, minus one comment line. (which conflicted)
Attachment #601483 - Flags: review?(bjacob)
Attachment #601483 - Flags: approval-mozilla-esr10?
Attachment #601483 - Flags: review?(bjacob) → review+
Comment on attachment 601483 [details] [diff] [review]
Deguarantee ResizeOffscreenFBO clearing its buffers, and adding clears in WebGLContext accordingly (esr10)

[Triage Comment]
Nomination for esr10 approved, please land this today if possible so we're ready to go to build on Friday. Thank you.
Attachment #601483 - Flags: approval-mozilla-esr10? → approval-mozilla-esr10+
(Assignee)

Comment 67

5 years ago
ESR10:
https://hg.mozilla.org/releases/mozilla-esr10/rev/349408792301
status-firefox-esr10: affected → fixed
I haven't been able to reproduce the original problem in order to verify this bug fix. Geo, could you give it another try when you get a chance?
I'll bring my Air tomorrow and will try again.
QA is unable to verify this fix because we are unable to reproduce the original issue, even with an identical test environment. If someone is able to reproduce this issue, please verify the fix in Firefox 11.0b6 and 10.0.3esr.

Thanks
Whiteboard: [sg:vector-high][qa+] → [sg:vector-high][qa-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.