Last Comment Bug 665578 - [ATI on Mac] WebGL demo "To the Road of Ribbon" crashes in useProgram [@ ATIRadeonX3000GLDriver@0x5a4fc]
: [ATI on Mac] WebGL demo "To the Road of Ribbon" crashes in useProgram [@ ATIR...
Status: RESOLVED FIXED
[sg:critical?][qa+]
: crash, reproducible, testcase
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: Trunk
: x86_64 Mac OS X
: -- critical (vote)
: mozilla10
Assigned To: Benoit Jacob [:bjacob] (mostly away)
:
: Milan Sreckovic [:milan]
Mentors:
http://www.iquilezles.org/apps/shader...
: 685892 (view as bug list)
Depends on: 690735 711592
Blocks: 605780
  Show dependency treegraph
 
Reported: 2011-06-20 09:34 PDT by Jesse Ruderman
Modified: 2012-03-23 13:28 PDT (History)
23 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
wontfix
-
wontfix
-
wontfix
-
wontfix
-
wontfix
+
fixed
+
fixed
unaffected


Attachments
graphics info from System Profiler (483 bytes, text/plain)
2011-06-20 09:35 PDT, Jesse Ruderman
no flags Details
crash report (mac debug) (37.20 KB, text/plain)
2011-06-20 09:36 PDT, Jesse Ruderman
no flags Details
crash report (mac debug) (55.39 KB, text/plain)
2011-06-20 09:36 PDT, Jesse Ruderman
no flags Details
testcase (2.64 KB, text/html)
2011-06-20 11:43 PDT, Christoph Diehl [:posidron]
no flags Details
output with MOZ_GL_DEBUG_VERBOSE (158.27 KB, text/plain)
2011-06-23 23:55 PDT, Jesse Ruderman
no flags Details
testcase (slightly reduced) (2.07 KB, text/html)
2011-06-24 11:08 PDT, Jesse Ruderman
no flags Details
experimental patch: always unroll loops on Mac (2.15 KB, patch)
2011-06-25 08:50 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Splinter Review
use ANGLE built-in-function-emulation feature (1.42 KB, patch)
2011-10-01 22:34 PDT, Benoit Jacob [:bjacob] (mostly away)
jmuizelaar: review-
Details | Diff | Splinter Review
use ANGLE built-in-function-emulation feature (1.97 KB, patch)
2011-10-03 12:07 PDT, Benoit Jacob [:bjacob] (mostly away)
jmuizelaar: review+
akeybl: approval‑mozilla‑aurora-
Details | Diff | Splinter Review
Patch for Aurora: upgrade ANGLE from r740 to chrome_m15 branch (66.79 KB, patch)
2011-10-23 19:38 PDT, Benoit Jacob [:bjacob] (mostly away)
akeybl: approval‑mozilla‑aurora-
Details | Diff | Splinter Review
Patch on top of previous patch to disable built-in function emulation on 10.7 and newer (2.86 KB, patch)
2011-10-25 08:26 PDT, Doug Sherk (:drs) (inactive)
jacob.benoit.1: review+
Details | Diff | Splinter Review
Updated patch on top of previous patch to disable built-in function emulation on 10.7 and newer (2.01 KB, patch)
2011-11-07 14:56 PST, Doug Sherk (:drs) (inactive)
jacob.benoit.1: review+
Details | Diff | Splinter Review

Description Jesse Ruderman 2011-06-20 09:34:50 PDT
Steps to reproduce, at least on my computer:
1. Load http://www.iquilezles.org/apps/shadertoy/.
2. In the dropdown in the upper right, select the 3D demo called "To the Road of Ribbon".
3. Click the "Load" button.

Result: bp-788d54b6-7ae3-47d1-a7f8-2197c2110620

Security-sensitive because it's accessing what looks like a random memory address.

This is a new MacBook Pro with Mac OS X 10.6.7.

Graphics info from about:support:
> Adapter Description: 0x21b00,0x20400
> WebGL Renderer: ATI Technologies Inc. -- ATI Radeon HD 6750M OpenGL Engine -- 2.1 ATI-1.6.34
> GPU Accelerated Windows: 1/1 OpenGL
Comment 1 Jesse Ruderman 2011-06-20 09:35:20 PDT
Created attachment 540493 [details]
graphics info from System Profiler
Comment 2 Jesse Ruderman 2011-06-20 09:36:06 PDT
Created attachment 540494 [details]
crash report (mac debug)
Comment 3 Jesse Ruderman 2011-06-20 09:36:52 PDT
Created attachment 540495 [details]
crash report (mac debug)
Comment 4 Charles J. Cliffe 2011-06-20 09:52:07 PDT
Strange, following these steps also crashed FF4 on me, but only on my first visit to that shader -- appears to be fine every time since.

My crash report is here:

https://crash-stats.mozilla.com/report/index/bp-1bd7001a-2c61-4deb-9828-0cfd12110620

Also running 10.6.7 on a macbook pro except with an NVIDIA 330M GT
Comment 5 Marcia Knous [:marcia - use ni] 2011-06-20 11:04:49 PDT
Using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0a1) Gecko/20110620 Firefox/7.0a1, I cannot reproduce the crash, but I can reproduce it easily on my 10.6.7 machine.
Comment 6 Marcia Knous [:marcia - use ni] 2011-06-20 11:24:55 PDT
My 10.7 machine has NVIDIA GeForce 9400M while the 10.6 machine has the AMD Radeon HD 6750M and is the newest MPB hardware.
Comment 7 Christoph Diehl [:posidron] 2011-06-20 11:43:09 PDT
Created attachment 540550 [details]
testcase
Comment 8 Johnny Stenback (:jst, jst@mozilla.com) 2011-06-23 13:43:51 PDT
bjacob, can you look into this please?
Comment 9 Benoit Jacob [:bjacob] (mostly away) 2011-06-23 14:30:24 PDT
(In reply to comment #4)
> Strange, following these steps also crashed FF4 on me, but only on my first
> visit to that shader -- appears to be fine every time since.
> 
> My crash report is here:
> 
> https://crash-stats.mozilla.com/report/index/bp-1bd7001a-2c61-4deb-9828-
> 0cfd12110620
> 
> Also running 10.6.7 on a macbook pro except with an NVIDIA 330M GT

Charles, your crash looks completely different and should be filed separately against the JS engine.
Comment 10 Benoit Jacob [:bjacob] (mostly away) 2011-06-23 14:34:48 PDT
(In reply to comment #3)
> Created attachment 540495 [details]
> crash report (mac debug)

Can you please try, with a _debug_ build, running with MOZ_GL_DEBUG_VERBOSE defined, and capture the output. Something like:

MOZ_GL_DEBUG_VERBOSE=1 ./firefox 2>&1 | tee logfile

I'd like to see the last ~1000 lines if it's too big. Also, when it's crashed, can you please go in GDB and do:

call DumpJSStack()
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2011-06-23 14:35:22 PDT
Note: here with NVIDIA driver on linux, it works fine.
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2011-06-23 14:37:56 PDT
(In reply to comment #10)
> (In reply to comment #3)
> > Created attachment 540495 [details]
> > crash report (mac debug)
> 
> Can you please try, with a _debug_ build, running with MOZ_GL_DEBUG_VERBOSE
> defined, and capture the output.

Please also attach a backtrace obtained with MOZ_GL_DEBUG_VERBOSE. Without GL debug mode, backtraces are often fooled by the asynchronousness of the GL. The debug mode makes it a synchronous API.
Comment 13 Jesse Ruderman 2011-06-23 23:55:05 PDT
Created attachment 541618 [details]
output with MOZ_GL_DEBUG_VERBOSE

This is with the testcase Christoph attached. It includes a stack trace and a DumpJSStack.
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2011-06-24 08:06:24 PDT
I can't reproduce on my Mac Mini with a NVIDIA Geforce 320M and Mac OS 10.6.4. Confirming that this bug is ATI-only.
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2011-06-24 08:10:40 PDT
(In reply to comment #13)
> Created attachment 541618 [details]
> output with MOZ_GL_DEBUG_VERBOSE
> 
> This is with the testcase Christoph attached. It includes a stack trace and
> a DumpJSStack.

Thanks a lot. This shows that the crash is actually in useProgram. I wonder if the GL is getting confused by the undefined vertex attribs in the shader.

Does Chrome also crash on the same machine?
Comment 16 Benoit Jacob [:bjacob] (mostly away) 2011-06-24 09:02:53 PDT
> I wonder if the GL is getting confused by the undefined vertex attribs in the shader.

Ignore that, sorry: the crash is in the first useProgram() call, so by that time attribs can't have been defined yet.

The stack is really strange:

#0  0x00007fffffe00847 in __memcpy ()
#1  0x000000011cd1e4fd in gldCopyTexSubImage ()
#2  0x000000011cd05adf in gldUpdateDispatch ()
#3  0x000000011ccd7748 in gldCreateProgram ()
#4  0x000000011ccd80fb in gldCreateProgram ()
#5  0x000000011ccf29e6 in gldUpdateDispatch ()
#6  0x000000011cbde584 in gleDoSelectiveDispatchNoErrorCore ()
#7  0x000000011cb4d10b in glFinish_Exec ()
#8  0x000000010179fad1 in mozilla::gl::GLContext::AfterGLCall (this=0x100e08a00, glFunction=0x102fc27b8 "void mozilla::gl::GLContext::fUseProgram(GLuint)") at GLContext.h:1042
#9  0x00000001027d79b7 in mozilla::gl::GLContext::fUseProgram (this=0x100e08a00, program=1) at GLContext.h:1773

Frames 9 to 5 look normal. Frames 4 to 2 can still be understood: program creation was delayed until the program is made current. But frame 1 is ununderstandable: it claims that we're grabbing part of the framebuffer into a texture. That's all the more unexpected as this testcase doesn't do anything with textures.

If you would like us to understand this bug better and perhaps find a work-around, please try to reduce the testcase. There's probably a lot of stuff that can be taken out of this big shader that defines multiple functions oa(), ob(), o() etc.

I don't understand Mac signal names: is this an invalid read, or an invalid write?

I don't agree that the address being accessed here, 0x1266db000, looks random. It looks very close to the heap, since e.g. 0x126706850 and 0x124330a00 are on the heap and it falls in between them.

If it's a write access then I can see how this is sg:critical, as one could imagine an exploit writing stuff onto the heap. If it's a read access, I'm less sure: I understand that that means that the GL is potentially reading private data, but I don't know that that could be exploited i.e. that that private data could end up being exposed in some exploitable way as opposed to just being used internally by the GL.
Comment 17 Christoph Diehl [:posidron] 2011-06-24 09:59:46 PDT
(In reply to comment #16)
> I don't understand Mac signal names: is this an invalid read, or an invalid
> write?

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000011e828000
0x00007fffffe00847 in __memcpy ()
--[ REGISTERS ]
rax:      0x63b3000 rbx:    0x117cd7fe0 rcx: 0xffffffffffffe600
rdx:           0x30 rsi:    0x118476a00 rdi:        0x11e829a00
rbp: 0x7fff5fbfac10 rsp: 0x7fff5fbfac10 rip:     0x7fffffe00847

--[ FAULTING INSTRUCTION ]
0x7fffffe00847 <__memcpy+167>:	movdqa XMMWORD PTR [rdi+rcx],xmm0
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2011-06-24 10:55:02 PDT
(In reply to comment #17)
> --[ FAULTING INSTRUCTION ]
> 0x7fffffe00847 <__memcpy+167>:	movdqa XMMWORD PTR [rdi+rcx],xmm0

OK so it's an invalid write, thanks.
Comment 19 Christoph Diehl [:posidron] 2011-06-24 11:01:38 PDT
@Benoit: my insights :)

I think the problem is the assignment of q which is made during the for loop(s). If we reduce the size of the loop counter the bug is not triggered. There was similar issue with uniform variables. Which looked like:

uniform sampler2D uni[8];
void main() {
    vec4 c = vec4(0,0,0,0);
    for (int ii = 0; ii < NUM_TEXTURES; ++ii) {
      c += texture2D(uni[ii], vec2(0.5, 0.5));
    }
    gl_FragColor = c;
} 

Benoit, do you remember this bug?
Comment 20 Jesse Ruderman 2011-06-24 11:08:27 PDT
Created attachment 541722 [details]
testcase (slightly reduced)
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2011-06-24 11:38:54 PDT
(In reply to comment #19)
> Benoit, do you remember this bug?

Oh, right. The stack sure does look similar. I think the ANGLE guys figured that Mac crashed when a loop index was used to address a sampler array:
http://code.google.com/p/angleproject/issues/detail?id=94
They fixed it by unrolling such for loops:
http://code.google.com/p/angleproject/source/detail?r=606

Here however, we don't have any samplers. Could it still be another bug with for loops on Mac? It would be interesting to manually unroll all the for loops in the testcase to see if the crash persists.
Comment 22 Benoit Jacob [:bjacob] (mostly away) 2011-06-24 11:42:15 PDT
ANGLE devs: can you please have a look at comments 19-21, the present crash looks very similar to ANGLE bug 94 except that we have no samplers involved here. Questions:

 - Does ANGLE now automatically unroll loops where the index is used to address a sampler array? Or is there a compile-time option to enable that?

 - Is there an ANGLE option to unroll all for loops? I'd like to see if this makes the present crash go away.

 - Does the present testcase also crash Chrome on Macs with ATI cards?
Comment 23 Zhenyao Mo 2011-06-24 13:32:00 PDT
(In reply to comment #22)
> ANGLE devs: can you please have a look at comments 19-21, the present crash
> looks very similar to ANGLE bug 94 except that we have no samplers involved
> here. Questions:
> 
>  - Does ANGLE now automatically unroll loops where the index is used to
> address a sampler array? Or is there a compile-time option to enable that?

No option.  It always unrolls on Mac.

> 
>  - Is there an ANGLE option to unroll all for loops? I'd like to see if this
> makes the present crash go away.

No.  But you can easily to so to your local ANGLE.  The code is there for sampler arrays, you only need to comment out the condition.

> 
>  - Does the present testcase also crash Chrome on Macs with ATI cards?

I don't have a Mac with ATI card to test at the moment.
Comment 24 Christoph Diehl [:posidron] 2011-06-25 06:34:19 PDT
(In reply to comment #22)
>  - Does the present testcase also crash Chrome on Macs with ATI cards?

Yes, Chrome crashes too:

Google Chrome 14.0.797.0 (Offizieller Build 89638) dev
OS	Mac OS
WebKit	535.1 (trunk@89145)
Comment 25 Benoit Jacob [:bjacob] (mostly away) 2011-06-25 08:50:52 PDT
Created attachment 541936 [details] [diff] [review]
experimental patch: always unroll loops on Mac

Thanks Mo. Here's an attempt at unrolling all loops, just to see if it fixes the crash. Can you please check the patch.

Jesse: please check if this avoids the crash.
Comment 26 Zhenyao Mo 2011-06-27 11:50:29 PDT
Yeah, this should work.
Comment 27 Christoph Diehl [:posidron] 2011-06-30 04:32:34 PDT
Have applied the patch but FF is still crashing.
Comment 28 Jeff Muizelaar [:jrmuizel] 2011-07-05 08:15:47 PDT
(In reply to comment #26)
> Yeah, this should work.

The loop unrolling patch doesn't seem to unroll the loops in this shader.
Here's the output from angle with attachment 541936 [details] [diff] [review]

uniform vec2 resolution;
uniform float time;
void main(){
vec2 p = gl_FragCoord.xy;
(p[0] *= (resolution[0] / resolution[1]));
vec3 org = vec3((sin(time) * 0.5), (cos((time * 0.5)) * 0.25), time);
vec3 dir = normalize(vec3(p[0], p[1], 1.0));
vec3 q = vec3(0.0, 0.0, 0.0);
for (int i = 0; (i < 64); (i++))
{
(q += min((((cos(q[0]) + cos((q[1] * 1.5))) + cos(q[2])) + cos(q[1])), length(max((abs((q - vec3((cos((q[2] * 1.5)) * 0.30000001), (-0.5 + (cos(q[2]) * 0.2)), 0.0))) - vec3(0.125, 0.02, (time + 3.0))), vec3(0.0, 0.0, 0.0)))));
}
(q += reflect(dir, q));
for (int i = 0; (i < 64); (i++))
{
(q += min((((cos(q[0]) + cos((q[1] * 1.5))) + cos(q[2])) + cos(q[1])), length(max((abs((q - vec3((cos((q[2] * 1.5)) * 0.30000001), (-0.5 + (cos(q[2]) * 0.2)), 0.0))) - vec3(0.125, 0.02, (time + 3.0))), vec3(0.0, 0.0, 0.0)))));
}
vec4 fcolor = (((q[0] + vec4(length((q - org)))) + (min(q[1], 1.0) * vec4(1.0, 0.80000001, 0.69999999, 1.0))) * time);
(gl_FragColor = vec4(fcolor.xyz, 1.0));
}
Comment 29 Johnny Stenback (:jst, jst@mozilla.com) 2011-07-14 13:55:16 PDT
Not going to make it for 6 :(
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2011-07-24 21:49:42 PDT
Apparently no reply yet on my email to the WebGL list, pinging.
Comment 31 Johnny Stenback (:jst, jst@mozilla.com) 2011-08-04 13:45:16 PDT
Still nothing from the WebGL list?
Comment 32 Kenneth Russell 2011-08-04 14:02:29 PDT
Sorry, but no, there has been a lot of traffic on the WebGL lists recently and this issue seems to have fallen through the cracks.

The difficult part is going to be finding someone to implement the loop unrolling workaround.

I suggest filing a security issue on http://code.google.com/p/angleproject/issues/list (CC: kbr@chromium.org) and I can try to get it prioritized.
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2011-08-04 14:37:55 PDT
Indeed I dropped this ball, sorry. Filing ANGLE bug now.
Comment 34 Benoit Jacob [:bjacob] (mostly away) 2011-08-04 15:05:55 PDT
Filed ANGLE bug:
http://code.google.com/p/angleproject/issues/detail?id=193
Comment 35 Benoit Jacob [:bjacob] (mostly away) 2011-08-17 15:37:23 PDT
Mo has implemented an option for unrolling of all loops in ANGLE r734. Now we have to use it.

Meanwhile ANGLE developers are investigating a different class of work-arounds in ANGLE bug 196.
Comment 36 Johnny Stenback (:jst, jst@mozilla.com) 2011-08-25 13:33:09 PDT
Is the fix in angle something we can cherry pick for 7 and 8, or do we need to wait for a new angle merge here? We're running very short on time for 7 (given all hands meetings etc).
Comment 37 Daniel Veditz [:dveditz] 2011-08-25 13:33:37 PDT
scarybeasts: when will Chrome be adopting (and more importantly, announcing) this fix for the ANGLE bug?
Comment 38 Chris Evans 2011-08-25 13:52:13 PDT
Is this an ANGLE bug or a workaround for an ATI / MacOS bug? It's not clear to me.
Comment 39 Benoit Jacob [:bjacob] (mostly away) 2011-08-25 14:09:10 PDT
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #36)
> Is the fix in angle something we can cherry pick for 7 and 8, or do we need
> to wait for a new angle merge here? We're running very short on time for 7
> (given all hands meetings etc).

It should be cherry-pickable, but either way it will be a quite intrusive change as it's about rewriting shaders. So I don't think that we want it in Firefox 7.

The first possible work-around, which consists in an option to unroll all loops, was implemented as ANGLE r734.

The second possible work-around, which consists in replacing built-in GLSL functions such as normalize() by shims doing basically the same thing and multiplying the result by 1, has just been implemented as ANGLE r738.

The second work-around (r738) seems to work around a larger class of Mac crashers, and doesn't suffer from the downsides of unrolling all loops, so it's my preferred option.
http://code.google.com/p/angleproject/source/detail?r=738
Comment 40 Benoit Jacob [:bjacob] (mostly away) 2011-08-25 14:10:14 PDT
(In reply to Chris Evans from comment #38)
> Is this an ANGLE bug or a workaround for an ATI / MacOS bug? It's not clear
> to me.

The bug here is in the Mac OpenGL libraries. The work-arounds are being implemented in ANGLE. The most likely work-around is ANGLE r738, see comment 39 above.
Comment 41 Chris Evans 2011-08-25 14:39:26 PDT
Oh, the Mac OpenGL libraries are in an awful state! I have a ton of cases open, and I've forwarded more over from external researchers.
I don't think it's necessary to synchronize here, nor rush anything.

If you haven't already sent this over to Apple, let me know and I have a good contact.

Also, I don't think you should assign this a Mozilla CVE as it is not your fault. You might always mention "workarounds for Mac bugs" in your release notes, though.
Comment 42 Johnny Stenback (:jst, jst@mozilla.com) 2011-09-01 13:21:07 PDT
Ok, wontfix for 7 then. Thanks Bonoit.
Comment 43 Benoit Jacob [:bjacob] (mostly away) 2011-09-05 15:08:41 PDT
mozilla-central now has ANGLE r740 so the work-arounds are implemented in ANGLE and we just need to turn them on.

Here's a tryserver build:
https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=d56911f09ab6
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-d56911f09ab6

By default it doesn't do anything special but you can switch two different workarounds for this bug by defining these environment variables:

MOZ_ANGLE_UNROLL_FOR_LOOP_WITH_INTEGER_INDEX
   turns on the workaround implemented in ANGLE r734

MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS
   turns on the workaround implemented in ANGLE r738 and fixed in ANGLE r740

It doesn't matter what the value of these environment vars is, only whether they are defined.

If both fix the crash and we get to choose, the preferred solution is EMULATE_BUILT_IN_FUNCTIONS.
Comment 44 Benoit Jacob [:bjacob] (mostly away) 2011-09-06 07:32:54 PDT
Jeff M just tested it on his Mac.

MOZ_ANGLE_UNROLL_FOR_LOOP_WITH_INTEGER_INDEX does not fix the crash.

MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS "fixes" the crash... by injecting a syntax error into the shader preventing the GL driver from successfully compiling it. Here's what Jeff gets on the ShaderToy page:

FS ERROR: ERROR: 0:5: '*' :  wrong operand types no operation '*' exists that takes a left-hand operand of type '3-component vector of float' and a right operand of type 'const int' (or there is no acceptable conversion)

Apparenntly what happens is that the ANGLE EMULATE_BUILT_IN_FUNCTIONS feature replaces

    v.normalize()

by

    v.normalize() * 1

and instead it should replace the 1 by a float constant, like

    v.normalize() * 1.

Filing a ANGLE bug; will also try to patch it myself.
Comment 45 Benoit Jacob [:bjacob] (mostly away) 2011-09-06 08:11:13 PDT
Filed http://code.google.com/p/angleproject/issues/detail?id=202

Wrote a patch. Tryserver:
http://code.google.com/p/angleproject/issues/detail?id=202
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-d17ed33c8ed9

Please try again with MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS once this build is ready.
Comment 46 Benoit Jacob [:bjacob] (mostly away) 2011-09-06 08:13:50 PDT
Oops, the TBPL link is https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=d17ed33c8ed9
Comment 47 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 10:58:57 PDT
*** Bug 685892 has been marked as a duplicate of this bug. ***
Comment 48 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 11:00:21 PDT
Can someone (Jeff? Doug?) please check this build,

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-d17ed33c8ed9

with MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS=1 ?
Comment 49 Doug Sherk (:drs) (inactive) 2011-09-09 11:19:24 PDT
I'm unable to repro this crash on my Mac. System specs:

Video card: ATI Radeon HD 6750M
OS: Mac OS X 10.7.1
Video driver: ATI Technologies Inc. -- ATI Radeon HD 6750M OpenGL Engine -- 2.1 ATI-7.4.10

Note that I couldn't repro it on release either, or with MOZ_ANGLE_EMULATE_BUILT_IN_FUNCTIONS=0.
Comment 50 Marcia Knous [:marcia - use ni] 2011-09-29 13:18:35 PDT
I will try to reproduce this using the tryserver build as well since I have good luck crashing on that site.
Comment 51 Marcia Knous [:marcia - use ni] 2011-09-29 13:20:29 PDT
/marcia wonder where the tryserver build went since the link in Comment 48 is giving me a 404. Can we regenerate the build?
Comment 52 Benoit Jacob [:bjacob] (mostly away) 2011-09-29 13:34:57 PDT
Forget about that tryserver build, it probably didn't work; but since then I've received a notification from Mo that the real workaround is implemented in ANGLE r774, using the BUILT_IN_FUNCTION_EMULATION option. Will make a patch ASAP, first need to update ANGLE to r774.
Comment 53 Benoit Jacob [:bjacob] (mostly away) 2011-09-30 12:29:14 PDT
Tryserver build with ANGLE updated to r774 and using built-in function emulation:

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-3fa88d7c03b2/

Please test!
Comment 54 Marcia Knous [:marcia - use ni] 2011-09-30 12:49:46 PDT
Benoit: I tested the Tryserver build on 10.6 and it did not crash, although the content did not show in the window. I just see the fps moving but no actual rendering in the window.

Using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0a1) Gecko/20110929 Firefox/10.0a1 I am able to still replicate the crash easily on the same machine.
Comment 55 Benoit Jacob [:bjacob] (mostly away) 2011-09-30 12:51:44 PDT
Great, thanks.

Mo, is it expected behavior, that while not crashing it's also not rendering correctly?

Anyway, from a security perspective that's all we need :-)
Comment 56 Benoit Jacob [:bjacob] (mostly away) 2011-09-30 13:08:39 PDT
The status of landing this fix is as follows:
 * Firefox 10: it's a no brainer to upgrade ANGLE to r774 and land this fix
 * Firefox 9: i suggest backporting the ANGLE r774 to aurora. Not only will this allow landing this fix easily, but it also brings a large number of other fixes.
 * Firefox 8: I don't know what to do. This fix seems to rely on a number of changesets  and cherry-picking them might be just as dangerous as doing a wholesale ANGLE upgrade, which is kinda scary to do in beta stage.
Comment 57 Benoit Jacob [:bjacob] (mostly away) 2011-09-30 13:10:26 PDT
Also, Aurora is already on ANGLE r740 making this a relatively small ANGLE update (in particular no build-system change is involved, these being the ones that bit us in the past).

But Beta is still on ANGLE r686... so that would be a really big ANGLE update.
Comment 58 daniel-bzmz 2011-09-30 13:17:28 PDT
@Beniot: you might want to consider the chrome_m14 or chrome_m15 branches of ANGLE.  I believe they had the fix you need merged into them recently.  I'm not sure how closely they are aligned with when you pulled for FF 8/9 but in some ways it might be nice to have both Chrome and FF use the same branches :-)
Comment 59 Zhenyao Mo 2011-09-30 16:44:13 PDT
To fix the crash, all the EMULATION did is put cos into a function wrapper

float cos_emu(float angle)
{ 
  return cos(angle);
}

so that should not affect the rendering in theory.  If it's not rendering correctly, there is another bug.
Comment 60 Benoit Jacob [:bjacob] (mostly away) 2011-09-30 21:14:41 PDT
(In reply to daniel-bzmz from comment #58)
> @Beniot: you might want to consider the chrome_m14 or chrome_m15 branches of
> ANGLE.  I believe they had the fix you need merged into them recently.  I'm
> not sure how closely they are aligned with when you pulled for FF 8/9 but in
> some ways it might be nice to have both Chrome and FF use the same branches
> :-)

@Daniel: I actually would love to align FF ANGLE versions with Chrome's, but the main limiting factor here is SVN's difficult handling of branches (or my poor understanding thereof). It's probably doable but I haven't figured a very good way yet of generating a diff between two branches if I decide to switch a FF release from one chrome_mXX branch to another, or switch from trunk to some chrome_mXX branch, etc.
Comment 61 daniel-bzmz 2011-10-01 08:06:22 PDT
Replied to Benoit via email.

Basic gist is : svn diff OLD-URL[@OLDREV] NEW-URL[@NEWREV]
eg: svn diff https://angleproject.googlecode.com/svn/trunk@r686 https://angleproject.googlecode.com/svn/branches/chrome_m14
Comment 62 Benoit Jacob [:bjacob] (mostly away) 2011-10-01 22:34:39 PDT
Created attachment 564017 [details] [diff] [review]
use ANGLE built-in-function-emulation feature
Comment 63 Benoit Jacob [:bjacob] (mostly away) 2011-10-01 22:35:15 PDT
(In reply to daniel-bzmz from comment #61)
> Replied to Benoit via email.
> 
> Basic gist is : svn diff OLD-URL[@OLDREV] NEW-URL[@NEWREV]
> eg: svn diff https://angleproject.googlecode.com/svn/trunk@r686
> https://angleproject.googlecode.com/svn/branches/chrome_m14

Thanks a lot! Will use chrome_m branches.
Comment 64 Jeff Muizelaar [:jrmuizel] 2011-10-03 07:38:34 PDT
Comment on attachment 564017 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

I don't like sprinkling #ifdefs all around and would prefer something like:

if (needsBuiltInFunctions())
  compileOptions |= SH_EMULATE_BUILT_IN_FUNCTIONS
Comment 65 Benoit Jacob [:bjacob] (mostly away) 2011-10-03 12:07:50 PDT
Created attachment 564288 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

ok, i hope this is good enough for you, kitten; i want to harmonize the way we do workarounds, see bug 686735, but let's first fix this security bug.
Comment 66 Jeff Muizelaar [:jrmuizel] 2011-10-03 12:14:12 PDT
Comment on attachment 564288 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

Still not thrilled but better enough for me.
Comment 67 Johnny Stenback (:jst, jst@mozilla.com) 2011-10-20 13:17:47 PDT
Can we get this landed? We'd love to get this in for 8, which means we need to get this landed on trunk ASAP.
Comment 68 Benoit Jacob [:bjacob] (mostly away) 2011-10-20 13:29:16 PDT
Sorry, dropped this ball again. This is currently only landed in m-c. For 8, the way to go would be to import the chrome_m14 branch of ANGLE. patch coming.
Comment 69 Benoit Jacob [:bjacob] (mostly away) 2011-10-23 18:56:03 PDT
I tried to make a patch for 8 using the chrome_m14 branch... but it's missing the main changeset, ANGLE r738. Only the bugfix ANGLE r773 changeset was merged to that branch.

-> giving up for Firefox 8 at this point. Patch coming for Firefox 9 using chrome_m15 branch.
Comment 70 Benoit Jacob [:bjacob] (mostly away) 2011-10-23 19:38:10 PDT
Created attachment 568995 [details] [diff] [review]
Patch for Aurora: upgrade ANGLE from r740 to chrome_m15 branch

This patch for Aurora updates it to the chrome_m15 branch, which is a small change. chrome_m15 branched off r744 while we were at r740; the diff between 740 and 744 is some optimizations that are nice to have and have been tested for a month on Nightly.

To get the present bug fixed in Aurora, we need approval for both:
  - this patch, and
  - the other patch to actually use the function-emulation feature (attachment 564288 [details] [diff] [review])
Comment 71 Benoit Jacob [:bjacob] (mostly away) 2011-10-23 19:39:46 PDT
Aurora patches on tryserver: https://tbpl.mozilla.org/?tree=Try&rev=227ae3d1470e
Comment 72 Benoit Jacob [:bjacob] (mostly away) 2011-10-23 20:07:04 PDT
I guessed now was the right time to land this patch, since that's in principle needed to get aurora approval.
http://hg.mozilla.org/mozilla-central/rev/182a55b58e01
Comment 73 Doug Sherk (:drs) (inactive) 2011-10-25 08:26:03 PDT
Created attachment 569382 [details] [diff] [review]
Patch on top of previous patch to disable built-in function emulation on 10.7 and newer

We were running into errors on 10.7 + ATI:

https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/glsl/functions/glsl-function-cos.html
FAIL *** Error compiling shader '[object WebGLShader @ 0x115c798c0 (native @ 0x115c79ec0)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw
FAIL *** Error compiling shader '[object WebGLShader @ 0x115aeae90 (native @ 0x115aeb020)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw
FAIL *** Error compiling shader '[object WebGLShader @ 0x11cc04ef0 (native @ 0x11cc04640)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw
FAIL *** Error compiling shader '[object WebGLShader @ 0x118981e90 (native @ 0x118981de0)]':ERROR: 0:5: 'webgl_emu_precision' : syntax error syntax error
FAIL getError expected: NO_ERROR. Was INVALID_VALUE : no errors from draw

This patch fixes these on 10.7.
Comment 74 Doug Sherk (:drs) (inactive) 2011-10-27 21:41:34 PDT
Running my patch on Try: https://tbpl.mozilla.org/?tree=Try&rev=20a50709f71c
Comment 75 christian 2011-11-01 14:31:49 PDT
[triage comment]

Not ready to approve yet until "Patch on top of previous patch to disable built-in function emulation on 10.7 and newer" is ready
Comment 76 Benoit Jacob [:bjacob] (mostly away) 2011-11-01 15:03:05 PDT
(In reply to Christian Legnitto [:LegNeato] from comment #75)
> [triage comment]
> 
> Not ready to approve yet until "Patch on top of previous patch to disable
> built-in function emulation on 10.7 and newer" is ready

This was just a potential follow-up patch in case it would be confirmed that the security issue is fixed on Mac OS 10.7... which so far is only confirmed on 1 machine.

I removed the r? field on it and renamed it to be clear.
Comment 77 Doug Sherk (:drs) (inactive) 2011-11-01 15:05:14 PDT
Just to clarify: we posted about this bug on the 3dweb mailing list (for WebGL stuff) and someone said that they'd look into it because it seemed to be a bug with ANGLE. We're hoping that if we update ANGLE after it is patched, this bug will go away. If this happens, my patch will be unnecessary. It sounds like this regression isn't anywhere near as bad as the security hole we're leaving open without the main patch, anyways.
Comment 78 Doug Sherk (:drs) (inactive) 2011-11-03 14:53:51 PDT
I just discovered that this breaks the WebGL water demo (http://madebyevan.com/webgl-water/) and probably others that are similar. I have filed a bug with ANGLE.
http://code.google.com/p/angleproject/issues/detail?id=242&thanks=242&ts=1320357137
Comment 79 christian 2011-11-07 13:30:42 PST
[triage comment]

Ok, so for Firefox 9 (on aurora, moving to beta tomorrow):

1. If we take the patch, we get the issues in comment 73
2. There is another patch to fix ths issues in comment 73
3. The patch breaks some demos / might break some sites
4. This is a bug in Apple's graphics drivers, thought to be a security issue

Is that all correct? If so, I'm not sure we can take this in Firefox 9 as-is.
Comment 80 Doug Sherk (:drs) (inactive) 2011-11-07 13:37:44 PST
(In reply to Christian Legnitto [:LegNeato] from comment #79)
> [triage comment]
> 
> Ok, so for Firefox 9 (on aurora, moving to beta tomorrow):
> 
> 1. If we take the patch, we get the issues in comment 73
> 2. There is another patch to fix ths issues in comment 73
> 3. The patch breaks some demos / might break some sites
> 4. This is a bug in Apple's graphics drivers, thought to be a security issue
> 
> Is that all correct? If so, I'm not sure we can take this in Firefox 9 as-is.

I think I should have clarified. My patch fixes the issue described in comment 78. Comment 78 was made because we originally thought that the conformance issue wasn't serious but then I discovered that the problem uncovered by the conformance tests applied to a real-world demo. 

The patch is, however, intended to be a temporary fix and I wanted to hold off on doing anything about it until the ANGLE people fix it properly. Given that the broken code will be going into a release, I think it's best to land my patch and we can just pull it later when ANGLE is updated.
Comment 81 Benoit Jacob [:bjacob] (mostly away) 2011-11-07 13:55:13 PST
(In reply to Christian Legnitto [:LegNeato] from comment #79)
> [triage comment]
> 
> Ok, so for Firefox 9 (on aurora, moving to beta tomorrow):
> 
> 1. If we take the patch, we get the issues in comment 73
> 2. There is another patch to fix ths issues in comment 73
> 3. The patch breaks some demos / might break some sites
> 4. This is a bug in Apple's graphics drivers, thought to be a security issue
> 
> Is that all correct? If so, I'm not sure we can take this in Firefox 9 as-is.

That's correct. The main patch here is turning on a ANGLE feature that rewrites WebGL shaders to work around a bug in Apple's shader compiler. Unfortunately, that ANGLE feature is currently buggy and causes valid, real-world WebGL apps to stop working.

Given that I don't see a big emergency here (exploitability seems improbable) I wouldn't do anything for FF9.

Hopefully the ANGLE bug can be fixed in time for Firefox 10, or else we'll have a bad dilemma (keep using it and break real world apps due to the ANGLE bug, or turn it off and lose the security work-around it provides).
Comment 82 Benoit Jacob [:bjacob] (mostly away) 2011-11-07 14:08:24 PST
Comment on attachment 569382 [details] [diff] [review]
Patch on top of previous patch to disable built-in function emulation on 10.7 and newer

Review of attachment 569382 [details] [diff] [review]:
-----------------------------------------------------------------

My only reservation is that we only have limited data to know that it's actually fixed on 10.7 (it just works on Doug's machine).

I'm comfortable landing this on central for FF10 and letting it bake during the full aurora and beta cycles. I'd be less comfortable getting this in FF9 in Beta this week.
Comment 83 Benoit Jacob [:bjacob] (mostly away) 2011-11-07 14:15:17 PST
Patch on top of previous patch to disable built-in function emulation on 10.7 and newer
http://hg.mozilla.org/integration/mozilla-inbound/rev/82a297b0d0d3
Comment 84 Benoit Jacob [:bjacob] (mostly away) 2011-11-07 14:34:27 PST
backed out from inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f70682372656
Comment 85 Doug Sherk (:drs) (inactive) 2011-11-07 14:56:31 PST
Created attachment 572632 [details] [diff] [review]
Updated patch on top of previous patch to disable built-in function emulation on 10.7 and newer

Updated to use new OnLionOrLater() function which vastly simplifies the code.
Comment 86 Benoit Jacob [:bjacob] (mostly away) 2011-11-07 16:34:43 PST
http://hg.mozilla.org/integration/mozilla-inbound/rev/412bc4e114ec
Comment 87 Justin Wood (:Callek) 2011-11-08 01:22:11 PST
(In reply to Benoit Jacob [:bjacob] from comment #86)
> http://hg.mozilla.org/integration/mozilla-inbound/rev/412bc4e114ec

merged to m-c
Comment 88 Benoit Jacob [:bjacob] (mostly away) 2011-11-08 07:59:15 PST
I don't see a reason to drop the '?' from [sg:critical?] now?
Comment 89 Alex Keybl [:akeybl] 2011-11-08 14:45:06 PST
Comment on attachment 564288 [details] [diff] [review]
use ANGLE built-in-function-emulation feature

[Triage Comment]
Triggers a major feature regression that will need to be fixed before this lands. Denying for Aurora.

I believe this made the merge (and is thus in Aurora now). Please back out if so.
Comment 90 Doug Sherk (:drs) (inactive) 2011-11-08 14:49:29 PST
(In reply to Alex Keybl [:akeybl] from comment #89)
> Comment on attachment 564288 [details] [diff] [review] [diff] [details] [review]
> use ANGLE built-in-function-emulation feature
> 
> [Triage Comment]
> Triggers a major feature regression that will need to be fixed before this
> lands. Denying for Aurora.
> 
> I believe this made the merge (and is thus in Aurora now). Please back out
> if so.

What's the regression you're talking about?
Comment 91 Benoit Jacob [:bjacob] (mostly away) 2011-11-08 14:53:47 PST
(In reply to Doug Sherk (:dRdR) from comment #90)
> (In reply to Alex Keybl [:akeybl] from comment #89)
> > Comment on attachment 564288 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review]
> > use ANGLE built-in-function-emulation feature
> > 
> > [Triage Comment]
> > Triggers a major feature regression that will need to be fixed before this
> > lands. Denying for Aurora.
> > 
> > I believe this made the merge (and is thus in Aurora now). Please back out
> > if so.
> 
> What's the regression you're talking about?

The one you reported :-) that ANGLE generates shaders that fail to compile. comment 73.

Your latest patch removes the problem on 10.7 by disabling that workaround there, but if I understand correctly we still break useful content on 10.6, such as the WebGL Water demo, comment 78.

That's why in today's aurora meeting I said we should not take this stuff in Firefox 9.

For Firefox 10, there's still time and hope that the ANGLE bug will get fixed. So I wouldn't back out from Firefox 10 just yet. Again, this is a lose-lose game. If we back out, we get the security issue back; if we don't back out, we break useful content.
Comment 92 Doug Sherk (:drs) (inactive) 2011-11-08 14:55:41 PST
(In reply to Benoit Jacob [:bjacob] from comment #91)
> (In reply to Doug Sherk (:dRdR) from comment #90)
> > (In reply to Alex Keybl [:akeybl] from comment #89)
> > > Comment on attachment 564288 [details] [diff] [review] [diff] [details] [review] [diff] [details] [review] [diff] [details] [review]
> > > use ANGLE built-in-function-emulation feature
> > > 
> > > [Triage Comment]
> > > Triggers a major feature regression that will need to be fixed before this
> > > lands. Denying for Aurora.
> > > 
> > > I believe this made the merge (and is thus in Aurora now). Please back out
> > > if so.
> > 
> > What's the regression you're talking about?
> 
> The one you reported :-) that ANGLE generates shaders that fail to compile.
> comment 73.
> 
> Your latest patch removes the problem on 10.7 by disabling that workaround
> there, but if I understand correctly we still break useful content on 10.6,
> such as the WebGL Water demo, comment 78.
> 
> That's why in today's aurora meeting I said we should not take this stuff in
> Firefox 9.
> 
> For Firefox 10, there's still time and hope that the ANGLE bug will get
> fixed. So I wouldn't back out from Firefox 10 just yet. Again, this is a
> lose-lose game. If we back out, we get the security issue back; if we don't
> back out, we break useful content.

To the best of my knowledge, this works fine in 10.6 and earlier. The problem wasn't caught until recently because we don't have any 10.7 slaves (or didn't until recently). I don't see any reason to back this out, at least based on what we know now, as long as my patch is with it.
Comment 93 Benoit Jacob [:bjacob] (mostly away) 2011-12-16 13:21:52 PST
Angle r910 seems like the fix for the ANGLE bug we're hitting here.
Comment 94 Jason Smith [:jsmith] 2012-03-13 14:34:11 PDT
Cannot reproduce this bug on Firefox 9.0.1 on Mac OS X with the two test cases attached and the reproduction steps given in the comment 0.

about:support graphics

Adapter Description: 0x21aa00,0x20400
WebGL Renderer: ATI Technologies Inc. -- ATI Radeon HD 2600 PRO OpenGL Engine -- 2.1 ATI-1.6.36
GPU Accelerated Windows: 1/1 OpenGL
Comment 95 Marcia Knous [:marcia - use ni] 2012-03-13 17:42:07 PDT
Although I was originally able to repro this at one time on my 10.6 machine that has similar specs to Jesse's - ATI Technologies Inc. -- ATI Radeon HD 6750M OpenGL Engine -- 2.1 ATI-1.6.42 - I now cannot reproduce this on 9.0.1 or any other versions I Have on my machine currently. I am using both the site and the testcases to try to test.

Note You need to log in before you can comment on or make changes to this bug.