Closed Bug 665578 Opened 13 years ago Closed 13 years ago

[ATI on Mac] WebGL demo "To the Road of Ribbon" crashes in useProgram [@ ATIRadeonX3000GLDriver@0x5a4fc]

Categories

(Core :: Graphics: CanvasWebGL, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla10
Tracking Status
firefox5 - wontfix
firefox6 - wontfix
firefox7 - wontfix
firefox8 - wontfix
firefox9 - wontfix
firefox10 + fixed
firefox11 + fixed
status1.9.2 --- unaffected

People

(Reporter: jruderman, Assigned: bjacob)

References

()

Details

(Keywords: crash, reproducible, testcase, Whiteboard: [sg:critical?][qa+])

Crash Data

Attachments

(9 files, 3 obsolete files)

483 bytes, text/plain
Details
55.39 KB, text/plain
Details
2.64 KB, text/html
Details
158.27 KB, text/plain
Details
2.07 KB, text/html
Details
2.15 KB, patch
Details | Diff | Splinter Review
1.97 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
66.79 KB, patch
Details | Diff | Splinter Review
2.01 KB, patch
bjacob
: review+
Details | Diff | Splinter Review
Steps to reproduce, at least on my computer:
1. Load http://www.iquilezles.org/apps/shadertoy/.
2. In the dropdown in the upper right, select the 3D demo called "To the Road of Ribbon".
3. Click the "Load" button.

Result: bp-788d54b6-7ae3-47d1-a7f8-2197c2110620

Security-sensitive because it's accessing what looks like a random memory address.

This is a new MacBook Pro with Mac OS X 10.6.7.

Graphics info from about:support:
> Adapter Description: 0x21b00,0x20400
> WebGL Renderer: ATI Technologies Inc. -- ATI Radeon HD 6750M OpenGL Engine -- 2.1 ATI-1.6.34
> GPU Accelerated Windows: 1/1 OpenGL
Attached file crash report (mac debug) (obsolete) —
Attachment #540494 - Attachment is obsolete: true
Strange, following these steps also crashed FF4 on me, but only on my first visit to that shader -- appears to be fine every time since.

My crash report is here:

https://crash-stats.mozilla.com/report/index/bp-1bd7001a-2c61-4deb-9828-0cfd12110620

Also running 10.6.7 on a macbook pro except with an NVIDIA 330M GT
Using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0a1) Gecko/20110620 Firefox/7.0a1, I cannot reproduce the crash, but I can reproduce it easily on my 10.6.7 machine.
My 10.7 machine has NVIDIA GeForce 9400M while the 10.6 machine has the AMD Radeon HD 6750M and is the newest MPB hardware.
Attached file testcase
bjacob, can you look into this please?
Assignee: nobody → bjacob
(In reply to comment #4)
> Strange, following these steps also crashed FF4 on me, but only on my first
> visit to that shader -- appears to be fine every time since.
> 
> My crash report is here:
> 
> https://crash-stats.mozilla.com/report/index/bp-1bd7001a-2c61-4deb-9828-
> 0cfd12110620
> 
> Also running 10.6.7 on a macbook pro except with an NVIDIA 330M GT

Charles, your crash looks completely different and should be filed separately against the JS engine.
(In reply to comment #3)
> Created attachment 540495 [details]
> crash report (mac debug)

Can you please try, with a _debug_ build, running with MOZ_GL_DEBUG_VERBOSE defined, and capture the output. Something like:

MOZ_GL_DEBUG_VERBOSE=1 ./firefox 2>&1 | tee logfile

I'd like to see the last ~1000 lines if it's too big. Also, when it's crashed, can you please go in GDB and do:

call DumpJSStack()
Note: here with NVIDIA driver on linux, it works fine.
(In reply to comment #10)
> (In reply to comment #3)
> > Created attachment 540495 [details]
> > crash report (mac debug)
> 
> Can you please try, with a _debug_ build, running with MOZ_GL_DEBUG_VERBOSE
> defined, and capture the output.

Please also attach a backtrace obtained with MOZ_GL_DEBUG_VERBOSE. Without GL debug mode, backtraces are often fooled by the asynchronousness of the GL. The debug mode makes it a synchronous API.
This is with the testcase Christoph attached. It includes a stack trace and a DumpJSStack.
I can't reproduce on my Mac Mini with a NVIDIA Geforce 320M and Mac OS 10.6.4. Confirming that this bug is ATI-only.
Blocks: 605780
Summary: WebGL demo "To the Road of Ribbon" crashes [@ ATIRadeonX3000GLDriver@0x5a4fc] → [ATI on Mac] WebGL demo "To the Road of Ribbon" crashes [@ ATIRadeonX3000GLDriver@0x5a4fc]
Summary: [ATI on Mac] WebGL demo "To the Road of Ribbon" crashes [@ ATIRadeonX3000GLDriver@0x5a4fc] → [ATI on Mac] WebGL demo "To the Road of Ribbon" crashes in useProgram [@ ATIRadeonX3000GLDriver@0x5a4fc]
(In reply to comment #13)
> Created attachment 541618 [details]
> output with MOZ_GL_DEBUG_VERBOSE
> 
> This is with the testcase Christoph attached. It includes a stack trace and
> a DumpJSStack.

Thanks a lot. This shows that the crash is actually in useProgram. I wonder if the GL is getting confused by the undefined vertex attribs in the shader.

Does Chrome also crash on the same machine?
> I wonder if the GL is getting confused by the undefined vertex attribs in the shader.

Ignore that, sorry: the crash is in the first useProgram() call, so by that time attribs can't have been defined yet.

The stack is really strange:

#0  0x00007fffffe00847 in __memcpy ()
#1  0x000000011cd1e4fd in gldCopyTexSubImage ()
#2  0x000000011cd05adf in gldUpdateDispatch ()
#3  0x000000011ccd7748 in gldCreateProgram ()
#4  0x000000011ccd80fb in gldCreateProgram ()
#5  0x000000011ccf29e6 in gldUpdateDispatch ()
#6  0x000000011cbde584 in gleDoSelectiveDispatchNoErrorCore ()
#7  0x000000011cb4d10b in glFinish_Exec ()
#8  0x000000010179fad1 in mozilla::gl::GLContext::AfterGLCall (this=0x100e08a00, glFunction=0x102fc27b8 "void mozilla::gl::GLContext::fUseProgram(GLuint)") at GLContext.h:1042
#9  0x00000001027d79b7 in mozilla::gl::GLContext::fUseProgram (this=0x100e08a00, program=1) at GLContext.h:1773

Frames 9 to 5 look normal. Frames 4 to 2 can still be understood: program creation was delayed until the program is made current. But frame 1 is ununderstandable: it claims that we're grabbing part of the framebuffer into a texture. That's all the more unexpected as this testcase doesn't do anything with textures.

If you would like us to understand this bug better and perhaps find a work-around, please try to reduce the testcase. There's probably a lot of stuff that can be taken out of this big shader that defines multiple functions oa(), ob(), o() etc.

I don't understand Mac signal names: is this an invalid read, or an invalid write?

I don't agree that the address being accessed here, 0x1266db000, looks random. It looks very close to the heap, since e.g. 0x126706850 and 0x124330a00 are on the heap and it falls in between them.

If it's a write access then I can see how this is sg:critical, as one could imagine an exploit writing stuff onto the heap. If it's a read access, I'm less sure: I understand that that means that the GL is potentially reading private data, but I don't know that that could be exploited i.e. that that private data could end up being exposed in some exploitable way as opposed to just being used internally by the GL.
(In reply to comment #16)
> I don't understand Mac signal names: is this an invalid read, or an invalid
> write?

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000011e828000
0x00007fffffe00847 in __memcpy ()
--[ REGISTERS ]
rax:      0x63b3000 rbx:    0x117cd7fe0 rcx: 0xffffffffffffe600
rdx:           0x30 rsi:    0x118476a00 rdi:        0x11e829a00
rbp: 0x7fff5fbfac10 rsp: 0x7fff5fbfac10 rip:     0x7fffffe00847

--[ FAULTING INSTRUCTION ]
0x7fffffe00847 <__memcpy+167>:	movdqa XMMWORD PTR [rdi+rcx],xmm0
(In reply to comment #17)
> --[ FAULTING INSTRUCTION ]
> 0x7fffffe00847 <__memcpy+167>:	movdqa XMMWORD PTR [rdi+rcx],xmm0

OK so it's an invalid write, thanks.
@Benoit: my insights :)

I think the problem is the assignment of q which is made during the for loop(s). If we reduce the size of the loop counter the bug is not triggered. There was similar issue with uniform variables. Which looked like:

uniform sampler2D uni[8];
void main() {
    vec4 c = vec4(0,0,0,0);
    for (int ii = 0; ii < NUM_TEXTURES; ++ii) {
      c += texture2D(uni[ii], vec2(0.5, 0.5));
    }
    gl_FragColor = c;
} 

Benoit, do you remember this bug?
(In reply to comment #19)
> Benoit, do you remember this bug?

Oh, right. The stack sure does look similar. I think the ANGLE guys figured that Mac crashed when a loop index was used to address a sampler array:
http://code.google.com/p/angleproject/issues/detail?id=94
They fixed it by unrolling such for loops:
http://code.google.com/p/angleproject/source/detail?r=606

Here however, we don't have any samplers. Could it still be another bug with for loops on Mac? It would be interesting to manually unroll all the for loops in the testcase to see if the crash persists.
ANGLE devs: can you please have a look at comments 19-21, the present crash looks very similar to ANGLE bug 94 except that we have no samplers involved here. Questions:

 - Does ANGLE now automatically unroll loops where the index is used to address a sampler array? Or is there a compile-time option to enable that?

 - Is there an ANGLE option to unroll all for loops? I'd like to see if this makes the present crash go away.

 - Does the present testcase also crash Chrome on Macs with ATI cards?
(In reply to comment #22)
> ANGLE devs: can you please have a look at comments 19-21, the present crash
> looks very similar to ANGLE bug 94 except that we have no samplers involved
> here. Questions:
> 
>  - Does ANGLE now automatically unroll loops where the index is used to
> address a sampler array? Or is there a compile-time option to enable that?

No option.  It always unrolls on Mac.

> 
>  - Is there an ANGLE option to unroll all for loops? I'd like to see if this
> makes the present crash go away.

No.  But you can easily to so to your local ANGLE.  The code is there for sampler arrays, you only need to comment out the condition.

> 
>  - Does the present testcase also crash Chrome on Macs with ATI cards?

I don't have a Mac with ATI card to test at the moment.
(In reply to comment #22)
>  - Does the present testcase also crash Chrome on Macs with ATI cards?

Yes, Chrome crashes too:

Google Chrome 14.0.797.0 (Offizieller Build 89638) dev
OS	Mac OS
WebKit	535.1 (trunk@89145)
Thanks Mo. Here's an attempt at unrolling all loops, just to see if it fixes the crash. Can you please check the patch.

Jesse: please check if this avoids the crash.
Yeah, this should work.
Have applied the patch but FF is still crashing.
(In reply to comment #26)
> Yeah, this should work.

The loop unrolling patch doesn't seem to unroll the loops in this shader.
Here's the output from angle with attachment 541936 [details] [diff] [review]

uniform vec2 resolution;
uniform float time;
void main(){
vec2 p = gl_FragCoord.xy;
(p[0] *= (resolution[0] / resolution[1]));
vec3 org = vec3((sin(time) * 0.5), (cos((time * 0.5)) * 0.25), time);
vec3 dir = normalize(vec3(p[0], p[1], 1.0));
vec3 q = vec3(0.0, 0.0, 0.0);
for (int i = 0; (i < 64); (i++))
{
(q += min((((cos(q[0]) + cos((q[1] * 1.5))) + cos(q[2])) + cos(q[1])), length(max((abs((q - vec3((cos((q[2] * 1.5)) * 0.30000001), (-0.5 + (cos(q[2]) * 0.2)), 0.0))) - vec3(0.125, 0.02, (time + 3.0))), vec3(0.0, 0.0, 0.0)))));
}
(q += reflect(dir, q));
for (int i = 0; (i < 64); (i++))
{
(q += min((((cos(q[0]) + cos((q[1] * 1.5))) + cos(q[2])) + cos(q[1])), length(max((abs((q - vec3((cos((q[2] * 1.5)) * 0.30000001), (-0.5 + (cos(q[2]) * 0.2)), 0.0))) - vec3(0.125, 0.02, (time + 3.0))), vec3(0.0, 0.0, 0.0)))));
}
vec4 fcolor = (((q[0] + vec4(length((q - org)))) + (min(q[1], 1.0) * vec4(1.0, 0.80000001, 0.69999999, 1.0))) * time);
(gl_FragColor = vec4(fcolor.xyz, 1.0));
}
Not going to make it for 6 :(
Apparently no reply yet on my email to the WebGL list, pinging.
Still nothing from the WebGL list?
Sorry, but no, there has been a lot of traffic on the WebGL lists recently and this issue seems to have fallen through the cracks.

The difficult part is going to be finding someone to implement the loop unrolling workaround.

I suggest filing a security issue on http://code.google.com/p/angleproject/issues/list (CC: kbr@chromium.org) and I can try to get it prioritized.
Indeed I dropped this ball, sorry. Filing ANGLE bug now.
Mo has implemented an option for unrolling of all loops in ANGLE r734. Now we have to use it.

Meanwhile ANGLE developers are investigating a different class of work-arounds in ANGLE bug 196.
Is the fix in angle something we can cherry pick for 7 and 8, or do we need to wait for a new angle merge here? We're running very short on time for 7 (given all hands meetings etc).
scarybeasts: when will Chrome be adopting (and more importantly, announcing) this fix for the ANGLE bug?
Is this an ANGLE bug or a workaround for an ATI / MacOS bug? It's not clear to me.
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #36)
> Is the fix in angle something we can cherry pick for 7 and 8, or do we need
> to wait for a new angle merge here? We're running very short on time for 7
> (given all hands meetings etc).

It should be cherry-pickable, but either way it will be a quite intrusive change as it's about rewriting shaders. So I don't think that we want it in Firefox 7.

The first possible work-around, which consists in an option to unroll all loops, was implemented as ANGLE r734.

The second possible work-around, which consists in replacing built-in GLSL functions such as normalize() by shims doing basically the same thing and multiplying the result by 1, has just been implemented as ANGLE r738.

The second work-around (r738) seems to work around a larger class of Mac crashers, and doesn't suffer from the downsides of unrolling all loops, so it's my preferred option.
http://code.google.com/p/angleproject/source/detail?r=738
(In reply to Chris Evans from comment #38)
> Is this an ANGLE bug or a workaround for an ATI / MacOS bug? It's not clear
> to me.

The bug here is in the Mac OpenGL libraries. The work-arounds are being implemented in ANGLE. The most likely work-around is ANGLE r738, see comment 39 above.
Oh, the Mac OpenGL libraries are in an awful state! I have a ton of cases open, and I've forwarded more over from external researchers.
I don't think it's necessary to synchronize here, nor rush anything.

If you haven't already sent this over to Apple, let me know and I have a good contact.

Also, I don't think you should assign this a Mozilla CVE as it is not your fault. You might always mention "workarounds for Mac bugs" in your release notes, though.
Ok, wontfix for 7 then. Thanks Bonoit.
mozilla-central now has ANGLE r740 so the work-arounds are implemented in ANGLE and we just need to turn them on.

Here's a tryserver build:
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=d56911f09ab6
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-d56911f09ab6

By default it doesn't do anything special but you can switch two different workarounds for this bug by defining these environment variables:

MOZ_ANGLE_UNROLL_FOR_LOOP_WITH_INTEGER_INDEX
   turns on the workaround implemented in ANGLE r734

MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS
   turns on the workaround implemented in ANGLE r738 and fixed in ANGLE r740

It doesn't matter what the value of these environment vars is, only whether they are defined.

If both fix the crash and we get to choose, the preferred solution is EMULATE_BUILT_IN_FUNCTIONS.
Jeff M just tested it on his Mac.

MOZ_ANGLE_UNROLL_FOR_LOOP_WITH_INTEGER_INDEX does not fix the crash.

MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS "fixes" the crash... by injecting a syntax error into the shader preventing the GL driver from successfully compiling it. Here's what Jeff gets on the ShaderToy page:

FS ERROR: ERROR: 0:5: '*' :  wrong operand types no operation '*' exists that takes a left-hand operand of type '3-component vector of float' and a right operand of type 'const int' (or there is no acceptable conversion)

Apparenntly what happens is that the ANGLE EMULATE_BUILT_IN_FUNCTIONS feature replaces

    v.normalize()

by

    v.normalize() * 1

and instead it should replace the 1 by a float constant, like

    v.normalize() * 1.

Filing a ANGLE bug; will also try to patch it myself.
Can someone (Jeff? Doug?) please check this build,

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-d17ed33c8ed9

with MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS=1 ?
I'm unable to repro this crash on my Mac. System specs:

Video card: ATI Radeon HD 6750M
OS: Mac OS X 10.7.1
Video driver: ATI Technologies Inc. -- ATI Radeon HD 6750M OpenGL Engine -- 2.1 ATI-7.4.10

Note that I couldn't repro it on release either, or with MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS=0.
I will try to reproduce this using the tryserver build as well since I have good luck crashing on that site.
/marcia wonder where the tryserver build went since the link in Comment 48 is giving me a 404. Can we regenerate the build?
Forget about that tryserver build, it probably didn't work; but since then I've received a notification from Mo that the real workaround is implemented in ANGLE r774, using the BUILT_IN_FUNCTION_EMULATION option. Will make a patch ASAP, first need to update ANGLE to r774.
Depends on: 690735
Tryserver build with ANGLE updated to r774 and using built-in function emulation:

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-3fa88d7c03b2/

Please test!
Benoit: I tested the Tryserver build on 10.6 and it did not crash, although the content did not show in the window. I just see the fps moving but no actual rendering in the window.

Using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0a1) Gecko/20110929 Firefox/10.0a1 I am able to still replicate the crash easily on the same machine.
Great, thanks.

Mo, is it expected behavior, that while not crashing it's also not rendering correctly?

Anyway, from a security perspective that's all we need :-)
The status of landing this fix is as follows:
 * Firefox 10: it's a no brainer to upgrade ANGLE to r774 and land this fix
 * Firefox 9: i suggest backporting the ANGLE r774 to aurora. Not only will this allow landing this fix easily, but it also brings a large number of other fixes.
 * Firefox 8: I don't know what to do. This fix seems to rely on a number of changesets  and cherry-picking them might be just as dangerous as doing a wholesale ANGLE upgrade, which is kinda scary to do in beta stage.
Also, Aurora is already on ANGLE r740 making this a relatively small ANGLE update (in particular no build-system change is involved, these being the ones that bit us in the past).

But Beta is still on ANGLE r686... so that would be a really big ANGLE update.
@Beniot: you might want to consider the chrome_m14 or chrome_m15 branches of ANGLE.  I believe they had the fix you need merged into them recently.  I'm not sure how closely they are aligned with when you pulled for FF 8/9 but in some ways it might be nice to have both Chrome and FF use the same branches :-)
To fix the crash, all the EMULATION did is put cos into a function wrapper

float cos_emu(float angle)
{ 
  return cos(angle);
}

so that should not affect the rendering in theory.  If it's not rendering correctly, there is another bug.
(In reply to daniel-bzmz from comment #58)
> @Beniot: you might want to consider the chrome_m14 or chrome_m15 branches of
> ANGLE.  I believe they had the fix you need merged into them recently.  I'm
> not sure how closely they are aligned with when you pulled for FF 8/9 but in
> some ways it might be nice to have both Chrome and FF use the same branches
> :-)

@Daniel: I actually would love to align FF ANGLE versions with Chrome's, but the main limiting factor here is SVN's difficult handling of branches (or my poor understanding thereof). It's probably doable but I haven't figured a very good way yet of generating a diff between two branches if I decide to switch a FF release from one chrome_mXX branch to another, or switch from trunk to some chrome_mXX branch, etc.
Replied to Benoit via email.

Basic gist is : svn diff OLD-URL[@OLDREV] NEW-URL[@NEWREV]
eg: svn diff https://angleproject.googlecode.com/svn/trunk@r686 https://angleproject.googlecode.com/svn/branches/chrome_m14
Attachment #564017 - Flags: review?(jmuizelaar)
(In reply to daniel-bzmz from comment #61)
> Replied to Benoit via email.
> 
> Basic gist is : svn diff OLD-URL[@OLDREV] NEW-URL[@NEWREV]
> eg: svn diff https://angleproject.googlecode.com/svn/trunk@r686
> https://angleproject.googlecode.com/svn/branches/chrome_m14

Thanks a lot! Will use chrome_m branches.
Comment on attachment 564017 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

I don't like sprinkling #ifdefs all around and would prefer something like:

if (needsBuiltInFunctions())
  compileOptions |= SH_EMULATE_BUILT_IN_FUNCTIONS
Attachment #564017 - Flags: review?(jmuizelaar) → review-
ok, i hope this is good enough for you, kitten; i want to harmonize the way we do workarounds, see bug 686735, but let's first fix this security bug.
Attachment #564017 - Attachment is obsolete: true
Attachment #564288 - Flags: review?(jmuizelaar)
Comment on attachment 564288 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

Still not thrilled but better enough for me.
Attachment #564288 - Flags: review?(jmuizelaar) → review+
Can we get this landed? We'd love to get this in for 8, which means we need to get this landed on trunk ASAP.
Sorry, dropped this ball again. This is currently only landed in m-c. For 8, the way to go would be to import the chrome_m14 branch of ANGLE. patch coming.
I tried to make a patch for 8 using the chrome_m14 branch... but it's missing the main changeset, ANGLE r738. Only the bugfix ANGLE r773 changeset was merged to that branch.

-> giving up for Firefox 8 at this point. Patch coming for Firefox 9 using chrome_m15 branch.
This patch for Aurora updates it to the chrome_m15 branch, which is a small change. chrome_m15 branched off r744 while we were at r740; the diff between 740 and 744 is some optimizations that are nice to have and have been tested for a month on Nightly.

To get the present bug fixed in Aurora, we need approval for both:
  - this patch, and
  - the other patch to actually use the function-emulation feature (attachment 564288 [details] [diff] [review])
Attachment #568995 - Flags: approval-mozilla-aurora?
Attachment #564288 - Flags: approval-mozilla-aurora?
I guessed now was the right time to land this patch, since that's in principle needed to get aurora approval.
http://hg.mozilla.org/mozilla-central/rev/182a55b58e01
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
We were running into errors on 10.7 + ATI:

https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/glsl/functions/glsl-function-cos.html
FAIL *** Error compiling shader '[object WebGLShader @ 0x115c798c0 (native @ 0x115c79ec0)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw
FAIL *** Error compiling shader '[object WebGLShader @ 0x115aeae90 (native @ 0x115aeb020)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw
FAIL *** Error compiling shader '[object WebGLShader @ 0x11cc04ef0 (native @ 0x11cc04640)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw
FAIL *** Error compiling shader '[object WebGLShader @ 0x118981e90 (native @ 0x118981de0)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw

This patch fixes these on 10.7.
Attachment #569382 - Flags: review?(bjacob)
[triage comment]

Not ready to approve yet until "Patch on top of previous patch to disable built-in function emulation on 10.7 and newer" is ready
Attachment #569382 - Attachment description: Patch on top of previous patch to disable built-in function emulation on 10.7 and newer → (Maybe consider in the future) Patch on top of previous patch to disable built-in function emulation on 10.7 and newer
Attachment #569382 - Flags: review?(bjacob)
(In reply to Christian Legnitto [:LegNeato] from comment #75)
> [triage comment]
> 
> Not ready to approve yet until "Patch on top of previous patch to disable
> built-in function emulation on 10.7 and newer" is ready

This was just a potential follow-up patch in case it would be confirmed that the security issue is fixed on Mac OS 10.7... which so far is only confirmed on 1 machine.

I removed the r? field on it and renamed it to be clear.
Just to clarify: we posted about this bug on the 3dweb mailing list (for WebGL stuff) and someone said that they'd look into it because it seemed to be a bug with ANGLE. We're hoping that if we update ANGLE after it is patched, this bug will go away. If this happens, my patch will be unnecessary. It sounds like this regression isn't anywhere near as bad as the security hole we're leaving open without the main patch, anyways.
I just discovered that this breaks the WebGL water demo (http://madebyevan.com/webgl-water/) and probably others that are similar. I have filed a bug with ANGLE.
http://code.google.com/p/angleproject/issues/detail?id=242&thanks=242&ts=1320357137
[triage comment]

Ok, so for Firefox 9 (on aurora, moving to beta tomorrow):

1. If we take the patch, we get the issues in comment 73
2. There is another patch to fix ths issues in comment 73
3. The patch breaks some demos / might break some sites
4. This is a bug in Apple's graphics drivers, thought to be a security issue

Is that all correct? If so, I'm not sure we can take this in Firefox 9 as-is.
(In reply to Christian Legnitto [:LegNeato] from comment #79)
> [triage comment]
> 
> Ok, so for Firefox 9 (on aurora, moving to beta tomorrow):
> 
> 1. If we take the patch, we get the issues in comment 73
> 2. There is another patch to fix ths issues in comment 73
> 3. The patch breaks some demos / might break some sites
> 4. This is a bug in Apple's graphics drivers, thought to be a security issue
> 
> Is that all correct? If so, I'm not sure we can take this in Firefox 9 as-is.

I think I should have clarified. My patch fixes the issue described in comment 78. Comment 78 was made because we originally thought that the conformance issue wasn't serious but then I discovered that the problem uncovered by the conformance tests applied to a real-world demo. 

The patch is, however, intended to be a temporary fix and I wanted to hold off on doing anything about it until the ANGLE people fix it properly. Given that the broken code will be going into a release, I think it's best to land my patch and we can just pull it later when ANGLE is updated.
(In reply to Christian Legnitto [:LegNeato] from comment #79)
> [triage comment]
> 
> Ok, so for Firefox 9 (on aurora, moving to beta tomorrow):
> 
> 1. If we take the patch, we get the issues in comment 73
> 2. There is another patch to fix ths issues in comment 73
> 3. The patch breaks some demos / might break some sites
> 4. This is a bug in Apple's graphics drivers, thought to be a security issue
> 
> Is that all correct? If so, I'm not sure we can take this in Firefox 9 as-is.

That's correct. The main patch here is turning on a ANGLE feature that rewrites WebGL shaders to work around a bug in Apple's shader compiler. Unfortunately, that ANGLE feature is currently buggy and causes valid, real-world WebGL apps to stop working.

Given that I don't see a big emergency here (exploitability seems improbable) I wouldn't do anything for FF9.

Hopefully the ANGLE bug can be fixed in time for Firefox 10, or else we'll have a bad dilemma (keep using it and break real world apps due to the ANGLE bug, or turn it off and lose the security work-around it provides).
Comment on attachment 569382 [details] [diff] [review]
Patch on top of previous patch to disable built-in function emulation on 10.7 and newer

Review of attachment 569382 [details] [diff] [review]:
-----------------------------------------------------------------

My only reservation is that we only have limited data to know that it's actually fixed on 10.7 (it just works on Doug's machine).

I'm comfortable landing this on central for FF10 and letting it bake during the full aurora and beta cycles. I'd be less comfortable getting this in FF9 in Beta this week.
Attachment #569382 - Flags: review+
Attachment #569382 - Attachment description: (Maybe consider in the future) Patch on top of previous patch to disable built-in function emulation on 10.7 and newer → Patch on top of previous patch to disable built-in function emulation on 10.7 and newer
Patch on top of previous patch to disable built-in function emulation on 10.7 and newer
http://hg.mozilla.org/integration/mozilla-inbound/rev/82a297b0d0d3
Updated to use new OnLionOrLater() function which vastly simplifies the code.
Attachment #569382 - Attachment is obsolete: true
Attachment #572632 - Flags: review?(bjacob)
Attachment #572632 - Flags: review?(bjacob) → review+
(In reply to Benoit Jacob [:bjacob] from comment #86)
> http://hg.mozilla.org/integration/mozilla-inbound/rev/412bc4e114ec

merged to m-c
Whiteboard: [sg:critical?] → [sg:critical]
Target Milestone: --- → mozilla10
I don't see a reason to drop the '?' from [sg:critical?] now?
Whiteboard: [sg:critical] → [sg:critical?]
Comment on attachment 564288 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

[Triage Comment]
Triggers a major feature regression that will need to be fixed before this lands. Denying for Aurora.

I believe this made the merge (and is thus in Aurora now). Please back out if so.
Attachment #564288 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #568995 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
(In reply to Alex Keybl [:akeybl] from comment #89)
> Comment on attachment 564288 [details] [diff] [review] [diff] [details] [review]
> use ANGLE built-in-function-emulation feature
> 
> [Triage Comment]
> Triggers a major feature regression that will need to be fixed before this
> lands. Denying for Aurora.
> 
> I believe this made the merge (and is thus in Aurora now). Please back out
> if so.

What's the regression you're talking about?
(In reply to Doug Sherk (:dRdR) from comment #90)
> (In reply to Alex Keybl [:akeybl] from comment #89)
> > Comment on attachment 564288 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > use ANGLE built-in-function-emulation feature
> > 
> > [Triage Comment]
> > Triggers a major feature regression that will need to be fixed before this
> > lands. Denying for Aurora.
> > 
> > I believe this made the merge (and is thus in Aurora now). Please back out
> > if so.
> 
> What's the regression you're talking about?

The one you reported :-) that ANGLE generates shaders that fail to compile. comment 73.

Your latest patch removes the problem on 10.7 by disabling that workaround there, but if I understand correctly we still break useful content on 10.6, such as the WebGL Water demo, comment 78.

That's why in today's aurora meeting I said we should not take this stuff in Firefox 9.

For Firefox 10, there's still time and hope that the ANGLE bug will get fixed. So I wouldn't back out from Firefox 10 just yet. Again, this is a lose-lose game. If we back out, we get the security issue back; if we don't back out, we break useful content.
(In reply to Benoit Jacob [:bjacob] from comment #91)
> (In reply to Doug Sherk (:dRdR) from comment #90)
> > (In reply to Alex Keybl [:akeybl] from comment #89)
> > > Comment on attachment 564288 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review] [diff] [details] [review]
> > > use ANGLE built-in-function-emulation feature
> > > 
> > > [Triage Comment]
> > > Triggers a major feature regression that will need to be fixed before this
> > > lands. Denying for Aurora.
> > > 
> > > I believe this made the merge (and is thus in Aurora now). Please back out
> > > if so.
> > 
> > What's the regression you're talking about?
> 
> The one you reported :-) that ANGLE generates shaders that fail to compile.
> comment 73.
> 
> Your latest patch removes the problem on 10.7 by disabling that workaround
> there, but if I understand correctly we still break useful content on 10.6,
> such as the WebGL Water demo, comment 78.
> 
> That's why in today's aurora meeting I said we should not take this stuff in
> Firefox 9.
> 
> For Firefox 10, there's still time and hope that the ANGLE bug will get
> fixed. So I wouldn't back out from Firefox 10 just yet. Again, this is a
> lose-lose game. If we back out, we get the security issue back; if we don't
> back out, we break useful content.

To the best of my knowledge, this works fine in 10.6 and earlier. The problem wasn't caught until recently because we don't have any 10.7 slaves (or didn't until recently). I don't see any reason to back this out, at least based on what we know now, as long as my patch is with it.
Whiteboard: [sg:critical?] → [sg:critical?][qa+]
Depends on: 711592
Angle r910 seems like the fix for the ANGLE bug we're hitting here.
Cannot reproduce this bug on Firefox 9.0.1 on Mac OS X with the two test cases attached and the reproduction steps given in the comment 0.

about:support graphics

Adapter Description: 0x21aa00,0x20400
WebGL Renderer: ATI Technologies Inc. -- ATI Radeon HD 2600 PRO OpenGL Engine -- 2.1 ATI-1.6.36
GPU Accelerated Windows: 1/1 OpenGL
Although I was originally able to repro this at one time on my 10.6 machine that has similar specs to Jesse's - ATI Technologies Inc. -- ATI Radeon HD 6750M OpenGL Engine -- 2.1 ATI-1.6.42 - I now cannot reproduce this on 9.0.1 or any other versions I Have on my machine currently. I am using both the site and the testcases to try to test.
Group: core-security
You need to log in before you can comment on or make changes to this bug.