Last Comment Bug 791905 - (CVE-2013-0763) Heap-use-after-free in Mesa, triggerable by resizing a WebGL canvas
(CVE-2013-0763)
: Heap-use-after-free in Mesa, triggerable by resizing a WebGL canvas
Status: RESOLVED FIXED
[asan][adv-main18+][adv-esr17+]
: csectype-uaf, sec-critical, sec-vector
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: Trunk
: x86_64 All
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 787827 (view as bug list)
Depends on: 816695
Blocks: 899802
  Show dependency treegraph
 
Reported: 2012-09-17 19:29 PDT by Abhishek Arya
Modified: 2014-07-24 13:43 PDT (History)
23 users (show)
rforbes: sec‑bounty+
ryanvm: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
+
fixed
+
fixed
unaffected
fixed


Attachments
Testcase (1.18 KB, text/html)
2012-09-17 19:29 PDT, Abhishek Arya
no flags Details
Graphics (about:support) snapshot (108.10 KB, image/png)
2012-09-17 23:21 PDT, Abhishek Arya
no flags Details
reject negative canvas size (796 bytes, patch)
2012-09-20 11:38 PDT, Benoit Jacob [:bjacob] (mostly away)
jgilbert: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
testcase updated (1.18 KB, text/html)
2012-10-02 21:37 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
apitrace (61.10 KB, text/plain)
2012-11-29 19:08 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details
apitrace without MSAA (60.48 KB, text/plain)
2012-11-29 19:20 PST, Benoit Jacob [:bjacob] (mostly away)
no flags Details

Description Abhishek Arya 2012-09-17 19:29:40 PDT
Created attachment 661996 [details]
Testcase

Run the testcase from content/canvas/test/webgl/conformance/more/functions/ directory. Here is my OpenGL info.
OpenGL version string: 2.1 Mesa 8.0.2

Unsymbolized - 
==28427== ERROR: AddressSanitizer heap-use-after-free on address 0x7f299c105380 at pc 0x4c1336 bp 0x7fff93b8a980 sp 0x7fff93b8a748
WRITE of size 1 at 0x7f299c105380 thread T0
    #0 0x4c1335 (/tmp/firefox/firefox+0x4c1335)
    #1 0x7f29a15438c1 (/usr/lib/x86_64-linux-gnu/dri/swrast_dri.so+0x738c1)
0x7f299c105380 is located 0 bytes inside of 64-byte region [0x7f299c105380,0x7f299c1053c0)
freed by thread T0 here:
    #0 0x4c4a10 (/tmp/firefox/firefox+0x4c4a10)
    #1 0x7f29a11c4c19 (/usr/lib/x86_64-linux-gnu/dri/libdricore.so+0x155c19)
previously allocated by thread T0 here:
    #0 0x4c4d4a (/tmp/firefox/firefox+0x4c4d4a)
    #1 0x7f29a10f9d60 (/usr/lib/x86_64-linux-gnu/dri/libdricore.so+0x8ad60)
Shadow byte and word:
  0x1fe533820a70: fd
  0x1fe533820a70: fd fd fd fd fd fd fd fd
More shadow bytes:
  0x1fe533820a50: fd fd fd fd fd fd fd fd
  0x1fe533820a58: fd fd fd fd fd fd fd fd
  0x1fe533820a60: fa fa fa fa fa fa fa fa
  0x1fe533820a68: fa fa fa fa fa fa fa fa
=>0x1fe533820a70: fd fd fd fd fd fd fd fd
  0x1fe533820a78: fd fd fd fd fd fd fd fd
  0x1fe533820a80: fa fa fa fa fa fa fa fa
  0x1fe533820a88: fa fa fa fa fa fa fa fa
  0x1fe533820a90: fd fd fd fd fd fd fd fd
Stats: 214M malloced (217M for red zones) by 289321 calls
Stats: 30M realloced by 12288 calls
Stats: 174M freed by 171411 calls
Stats: 42M really freed by 47020 calls
Stats: 420M (107600 full pages) mmaped in 105 calls
  mmaps   by size class: 8:212979; 9:40955; 10:8190; 11:10235; 12:2048; 13:2048; 14:1536; 15:256; 16:384; 17:1312; 18:144; 19:40; 20:20; 
  mallocs by size class: 8:221586; 9:38394; 10:7643; 11:13561; 12:2266; 13:2004; 14:1530; 15:310; 16:504; 17:1333; 18:136; 19:37; 20:17; 
  frees   by size class: 8:122227; 9:28108; 10:4313; 11:10535; 12:1422; 13:1409; 14:1332; 15:256; 16:443; 17:1283; 18:32; 19:36; 20:15; 
  rfrees  by size class: 8:34255; 9:5161; 10:1300; 11:4626; 12:388; 13:372; 14:332; 15:83; 16:229; 17:263; 18:6; 19:4; 20:1; 
Stats: malloc large: 1523 small slow: 1623
==28427== ABORTING

Symbolized -
==27925== ERROR: AddressSanitizer heap-use-after-free on address 0x7ff8472f3c80 at pc 0x4c1336 bp 0x7fffa43d7d20 sp 0x7fffa43d7ae8
WRITE of size 1 at 0x7ff8472f3c80 thread T0
    #0 0x4c1335 in memcpy ??:?
    #1 0x7ff84cf438c1 in ?? ??:0
0x7ff8472f3c80 is located 0 bytes inside of 64-byte region [0x7ff8472f3c80,0x7ff8472f3cc0)
freed by thread T0 here:
    #0 0x4c4a10 in __interceptor_free ??:?
    #1 0x7ff84cbc4c19 in ?? ??:0
previously allocated by thread T0 here:
    #0 0x4c4d4a in __interceptor_posix_memalign ??:?
    #1 0x7ff84caf9d60 in ?? ??:0
Shadow byte and word:
  0x1fff08e5e790: fd
  0x1fff08e5e790: fd fd fd fd fd fd fd fd
More shadow bytes:
  0x1fff08e5e770: fd fd fd fd fd fd fd fd
  0x1fff08e5e778: fd fd fd fd fd fd fd fd
  0x1fff08e5e780: fa fa fa fa fa fa fa fa
  0x1fff08e5e788: fa fa fa fa fa fa fa fa
=>0x1fff08e5e790: fd fd fd fd fd fd fd fd
  0x1fff08e5e798: fd fd fd fd fd fd fd fd
  0x1fff08e5e7a0: fa fa fa fa fa fa fa fa
  0x1fff08e5e7a8: fa fa fa fa fa fa fa fa
  0x1fff08e5e7b0: fd fd fd fd fd fd fd fd
Stats: 214M malloced (218M for red zones) by 289327 calls
Stats: 30M realloced by 12281 calls
Stats: 174M freed by 171411 calls
Stats: 42M really freed by 46444 calls
Stats: 420M (107600 full pages) mmaped in 105 calls
  mmaps   by size class: 8:212979; 9:40955; 10:8190; 11:10235; 12:2048; 13:2048; 14:1536; 15:256; 16:384; 17:1312; 18:144; 19:40; 20:20;
  mallocs by size class: 8:221603; 9:38387; 10:7646; 11:13555; 12:2265; 13:2004; 14:1530; 15:310; 16:503; 17:1333; 18:137; 19:37; 20:17;
  frees   by size class: 8:122240; 9:28103; 10:4313; 11:10529; 12:1420; 13:1409; 14:1332; 15:256; 16:443; 17:1283; 18:32; 19:36; 20:15;
  rfrees  by size class: 8:33919; 9:5096; 10:1273; 11:4499; 12:374; 13:370; 14:328; 15:83; 16:231; 17:260; 18:6; 19:4; 20:1;
Stats: malloc large: 1524 small slow: 1614
==27925== ABORTING
Comment 1 Benoit Jacob [:bjacob] (mostly away) 2012-09-17 22:41:23 PDT
I can't reproduce, with Mesa Git + LLVMpipe and with the NVIDIA driver.

What driver is this with (please paste about:support Graphics)
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2012-09-17 22:48:46 PDT
specifically: valgrind doesn't see any error when running this on Mesa Git + LLVMpipe.
Comment 3 Benoit Jacob [:bjacob] (mostly away) 2012-09-17 22:51:31 PDT
Oh, comment 0 has swrast_dri.so. "swrast" is the antiquated Mesa software renderer, now replaced by softpipe and llvmpipe. I guess we might as well blacklist it at this point. We just dropped the built-in OSMesa support anyway (OSMesa was one of last things to use swrast).
Comment 4 Abhishek Arya 2012-09-17 23:21:29 PDT
Created attachment 662037 [details]
Graphics (about:support) snapshot
Comment 5 Benoit Jacob [:bjacob] (mostly away) 2012-09-17 23:27:24 PDT
oh... you _are_ on llvmpipe like me. Can you get a stack trace to the use-after-free?
Comment 6 Abhishek Arya 2012-09-18 00:04:14 PDT
(In reply to Benoit Jacob [:bjacob] (away at Khronos meeting until Thursday) from comment #5)
> oh... you _are_ on llvmpipe like me. Can you get a stack trace to the
> use-after-free?

Sorry i couldn't get a good stack here due to lack of symbolized mesa library and ASAN couldn't unwind up the stack here. Can you try with asanified firefox builds from https://people.mozilla.com/~choller/firefox/asan/ and using a Xvfb screen 1280x1024x24 ? Also, check out the last line in the repro testcase which might be indicating the bug.
Comment 7 Abhishek Arya 2012-09-18 00:22:57 PDT
That was an issue in ASAN stack unwinding as i suspected. Here is the valgrind stack.

==22580== Invalid read of size 8
==22580==    at 0x4C2D108: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22580==    by 0x257C48C1: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257C4CB1: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257C52F7: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257C586A: lp_setup_flush (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257BEE09: llvmpipe_flush (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257BEE5D: llvmpipe_finish (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257BEF3A: llvmpipe_flush_resource (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x257A8F3A: ??? (in /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so)
==22580==    by 0x2632A3BD: ??? (in /usr/lib/x86_64-linux-gnu/dri/libgallium.so)
==22580==    by 0x25E67AF5: _mesa_readpixels (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==    by 0x25E687AA: _mesa_ReadnPixelsARB (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==  Address 0x21001e68 is 8 bytes inside a block of size 64 free'd
==22580==    at 0x4C2A82E: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22580==    by 0x25F07C19: _mesa_free_parameter_list (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==    by 0x25EF7D6F: _mesa_delete_program (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==    by 0x25EF7DFA: _mesa_reference_program_ (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==    by 0x2632153E: ??? (in /usr/lib/x86_64-linux-gnu/dri/libgallium.so)
==22580==    by 0x2631EE6E: st_validate_state (in /usr/lib/x86_64-linux-gnu/dri/libgallium.so)
==22580==    by 0x2632C1FD: ??? (in /usr/lib/x86_64-linux-gnu/dri/libgallium.so)
==22580==    by 0x25E687AA: _mesa_ReadnPixelsARB (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==    by 0x25E687C9: _mesa_ReadPixels (in /usr/lib/x86_64-linux-gnu/dri/libdricore.so)
==22580==    by 0x87E2901: mozilla::gl::GLContext::raw_fReadPixels(int, int, int, int, unsigned int, unsigned int, void*) (GLContext.h:2632)
==22580==    by 0x87E1B9A: mozilla::gl::GLContext::fReadPixels(int, int, int, int, unsigned int, unsigned int, void*) (GLContext.h:1345)
==22580==    by 0x9AE659C: mozilla::gl::GLContext::ReadPixelsIntoImageSurface(gfxImageSurface*) (GLContext.cpp:2050)
Comment 8 Daniel Veditz [:dveditz] 2012-09-19 10:48:20 PDT
Both the valgrind and asan logs seem to indicate a use-after-free, but down in the driver? Is this our bug or someone else's?
Comment 9 David Bolter [:davidb] 2012-09-19 10:50:31 PDT
Assigning to Benoit while we're here. Reassign as necessary.
Comment 10 Jeff Gilbert [:jgilbert] 2012-09-19 17:23:43 PDT
This sounds like it's a really embarrassing bug wrt setting height to a negative number.
Comment 11 Abhishek Arya 2012-09-19 19:11:17 PDT
(In reply to Jeff Gilbert [:jgilbert] from comment #10)
> This sounds like it's a really embarrassing bug wrt setting height to a
> negative number.

I do agree that fixing the negative height (last line in repro) should fix the root cause and we don't have to worry about how driver behaves. Even if you guys are not able to reproduce, i can always try a potential patch on my box to make sure that it is fixed (where it is reliably reproducing).
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2012-09-20 11:16:57 PDT
> try { gl.setAttribute("height", "-39"); } catch(e) {}

What is this supposed to do? The WebGL context does not have a "height" attribute. Is this a specific of the wrapGLcontext stuff used here? Does this somehow resize the canvas?

Above I had not placed the file in the right directory -- now that I have, I can reproduce the use-after-free in valgrind:

==3613== Invalid read of size 8
==3613==    at 0x402F198: memcpy@@GLIBC_2.14 (mc_replace_strmem.c:839)
==3613==    by 0x30FCA2ED: try_update_scene_state (lp_setup.c:827)
==3613==    by 0x30FC8CFB: begin_binning (lp_setup.c:197)
==3613==    by 0x30FC8F93: execute_clears (lp_setup.c:262)
==3613==    by 0x30FC90CB: set_scene_state (lp_setup.c:310)
==3613==    by 0x30FC91C5: lp_setup_flush (lp_setup.c:342)
==3613==    by 0x30FBB083: llvmpipe_flush (lp_flush.c:55)
==3613==    by 0x30FBB0FB: llvmpipe_finish (lp_flush.c:89)
==3613==    by 0x30FBB1DB: llvmpipe_flush_resource (lp_flush.c:128)
==3613==    by 0x30F998E3: llvmpipe_get_transfer (lp_texture.c:601)
==3613==    by 0x31130099: pipe_get_transfer (u_inlines.h:389)
==3613==    by 0x3113126B: st_MapRenderbuffer (st_cb_fbo.c:679)
==3613==  Address 0x5cfced8 is 8 bytes inside a block of size 64 free'd
==3613==    at 0x402C4B3: free (vg_replace_malloc.c:444)
==3613==    by 0x31035869: _mesa_align_free (imports.c:181)
==3613==    by 0x3111E3F6: _mesa_free_parameter_list (prog_parameter.c:88)
==3613==    by 0x31117D7E: _mesa_delete_program (program.c:395)
==3613==    by 0x3120651C: st_delete_program (st_cb_program.c:175)
==3613==    by 0x31117F92: _mesa_reference_program_ (program.c:460)
==3613==    by 0x311F84DB: _mesa_reference_program (program.h:102)
==3613==    by 0x311F8598: st_reference_fragprog (st_program.h:259)
==3613==    by 0x311F86C0: update_fp (st_atom_shader.c:88)
==3613==    by 0x311F4765: st_validate_state (st_atom.c:177)
==3613==    by 0x31207389: st_readpixels (st_cb_readpixels.c:52)
==3613==    by 0x310BCA13: _mesa_ReadnPixelsARB (readpix.c:875)
==3613==
Comment 13 Benoit Jacob [:bjacob] (mostly away) 2012-09-20 11:28:06 PDT
Our SetDimensions methods take signed width and height, but the canvas spec says these are unsigned attributes. Using unsigned here would solve the problem, as we would then go on to note that the resulting huge positive value exceeds the max texture size.
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2012-09-20 11:32:01 PDT
...but then, we have a ton of code around here that assumes signed sizes (like gfxIntSize) so it's safer to just add a check here.
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2012-09-20 11:38:58 PDT
Created attachment 663089 [details] [diff] [review]
reject negative canvas size
Comment 16 Daniel Veditz [:dveditz] 2012-09-20 13:19:11 PDT
Created attachment 663135 [details]
Bug Bounty Awarded $3000 [paid] 10/3/2012
Comment 17 Jeff Gilbert [:jgilbert] 2012-09-21 01:15:18 PDT
Comment on attachment 663089 [details] [diff] [review]
reject negative canvas size

Review of attachment 663089 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +338,5 @@
>  NS_IMETHODIMP
>  WebGLContext::SetDimensions(int32_t width, int32_t height)
>  {
> +    if (width < 0 || height < 0) {
> +        GenerateWarning("Canvas size is too large (seems like a negative value wrapped)");

This seems like a weird way to say that we don't allow negative values, as IIRC, the javascript width/height values are signed. I would rather this just say that we don't allow negative width or height.
Comment 18 Jeff Gilbert [:jgilbert] 2012-09-21 01:16:34 PDT
Comment on attachment 663089 [details] [diff] [review]
reject negative canvas size

Review of attachment 663089 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContext.cpp
@@ +339,5 @@
>  WebGLContext::SetDimensions(int32_t width, int32_t height)
>  {
> +    if (width < 0 || height < 0) {
> +        GenerateWarning("Canvas size is too large (seems like a negative value wrapped)");
> +        return NS_ERROR_OUT_OF_MEMORY;

As above, I don't actually think this is the right error to throw, since this should be invalid rather than too-large, unless the WebGL spec (or similar) requires these values to be interpreted as unsigned.
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2012-09-21 08:21:24 PDT
(In reply to Jeff Gilbert [:jgilbert] from comment #17)
> Comment on attachment 663089 [details] [diff] [review]
> reject negative canvas size
> 
> Review of attachment 663089 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +338,5 @@
> >  NS_IMETHODIMP
> >  WebGLContext::SetDimensions(int32_t width, int32_t height)
> >  {
> > +    if (width < 0 || height < 0) {
> > +        GenerateWarning("Canvas size is too large (seems like a negative value wrapped)");
> 
> This seems like a weird way to say that we don't allow negative values, as
> IIRC, the javascript width/height values are signed. I would rather this
> just say that we don't allow negative width or height.

The canvas width/height attributes are actually unsigned:

http://www.whatwg.org/specs/web-apps/current-work/multipage/the-canvas-element.html#the-canvas-element

That's why I felt the need to phrase this in this convoluted way.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2012-09-21 08:23:27 PDT
(In reply to Jeff Gilbert [:jgilbert] from comment #18)
> Comment on attachment 663089 [details] [diff] [review]
> reject negative canvas size
> 
> Review of attachment 663089 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/canvas/src/WebGLContext.cpp
> @@ +339,5 @@
> >  WebGLContext::SetDimensions(int32_t width, int32_t height)
> >  {
> > +    if (width < 0 || height < 0) {
> > +        GenerateWarning("Canvas size is too large (seems like a negative value wrapped)");
> > +        return NS_ERROR_OUT_OF_MEMORY;
> 
> As above, I don't actually think this is the right error to throw, since
> this should be invalid rather than too-large, unless the WebGL spec (or
> similar) requires these values to be interpreted as unsigned.

Assuming we agree that width and height should be handled as unsigned values, I suppose that adresses your comment.

It sucks that we have this discrepancy between the specs mandating unsigned and our c++ funcs wanting signed, but that is quite nontrivial to fix and I wanted a quick fix here. This is nontrivial because we pass along these values to other functions that expect signed values, like ResizeOffscreenFBO. gfxIntSize is signed.
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2012-09-26 08:19:58 PDT
FWIW -- the reason why this hasn't landed yet is that this causes crashes on tryserver. It seems that we are actually doing a resize to a negative size somewhere in aboutSupport.js, and this patch causes the |gl| pointer to be null in this case.
Comment 22 Abhishek Arya 2012-09-26 13:45:30 PDT
Hi Benoit, since you were reproduce the crash, did you try with your patch on and made sure that test does not crash. I tried it on my local and it still crashes [can't believe since the patch and bug seemed straightforward]. I also blew away my build dir and rechecked with your patch applied, it still crashes.
Comment 23 Abhishek Arya 2012-09-26 14:01:39 PDT
There is something else going on here. The bug does not reproduce without the setAttribute line, however positive values work as well. i tested with 39, 184 and it crashes the same way.
Comment 24 Benoit Jacob [:bjacob] (mostly away) 2012-09-27 05:46:52 PDT
I hadn't tried --- trying now.

One thing that I found strange in your testcase is that it seemed to set width/height attributes on the WebGL context, but only the canvas (as opposed to the WebGL context) has width/height attributes. But I assumed that there was something hidden in the wrapper you're using around the WebGL context.
Comment 25 Benoit Jacob [:bjacob] (mostly away) 2012-09-27 07:08:03 PDT
OK, the problem still reproduces with this patch.

Also, the crash I had seen on tryserver was unrelated to this patch (was caused by patch on bug 791521).

I'll land the patch anyways as it seems like a useful thing, but this bug stays open.

Has this been reported to Mesa?
Comment 26 Abhishek Arya 2012-09-27 07:57:12 PDT
No it is not reported to Mesa. Please report it as needed.
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2012-09-27 09:12:08 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/3d81919584ff
Comment 28 David Bolter [:davidb] 2012-09-27 13:26:52 PDT
Benoit, please request approval if this applies cleanly on branches.
Comment 29 Daniel Veditz [:dveditz] 2012-09-27 13:28:01 PDT
Would be nice to get this fix (which looks completely minimal and safe) into Firerfox 16 and ESR-10. Please request approvals unless you need a different patch on older branches.
Comment 30 Ryan VanderMeulen [:RyanVM] 2012-09-27 20:22:51 PDT
https://hg.mozilla.org/mozilla-central/rev/3d81919584ff
Comment 31 Lukas Blakk [:lsblakk] use ?needinfo 2012-09-28 11:08:29 PDT
Tracking for the upcoming ESR that will ship with FF 16, please submit the mozilla-beta/mozilla-aurora approval nominations with risk assessment for landing this late in beta cycle.
Comment 32 Benoit Jacob [:bjacob] (mostly away) 2012-09-28 11:51:39 PDT
@ comment 31: unfortunately, the bug is still open -- the patch that landed, while nice to have, does not close the issue.
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 19:48:24 PDT
Comment on attachment 663089 [details] [diff] [review]
reject negative canvas size

Sorry for the delayed request.

Important: while nice to have, this patch does not avoid the driver bug here. The bug is still unfixed.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): driver bug
User impact if declined: while this is nice to have, we don't actually know that this protects the user from any bad thing.
Testing completed (on m-c, etc.): m-c since last week
Risk to taking this patch (and alternatives if risky): no risk
String or UUID changes made by this patch: none
Comment 34 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 19:50:05 PDT
Ian, this is a sec-critical Mesa bug and we don't know any work-around for it.
Comment 35 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 19:52:35 PDT
The Mesa bug tracker does not seem to allow filing hidden bugs, so let's stay here. CC'ing our friends at Google.
Comment 36 Abhishek Arya 2012-10-02 20:01:00 PDT
I remember testing this with Chromium (I work in chrome security team) and it didn't crash there. zmo@ would know if we have workarounds for this.
Comment 37 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 20:03:11 PDT
Ah OK! Could you please CC me on the corresponding Chromium security bug if there is one? My Google account is jacob.benoit.1
Comment 38 Abhishek Arya 2012-10-02 20:06:22 PDT
(In reply to Benoit Jacob [:bjacob] from comment #37)
> Ah OK! Could you please CC me on the corresponding Chromium security bug if
> there is one? My Google account is jacob.benoit.1

I haven't filed a chromium bug yet. Firstly because we were first thinking this is related to the negative height issue which seemed firefox code issue. Now, i tried the testcase again and it still does not look to crash in Chromium, so i dont know if it affects us. Will let zmo@ triage this and then we can file a tracking chromium security bug if needed.
Comment 39 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 20:09:15 PDT
OK, I see.

I suppose that the real test is not whether the browser (or in Chrome's case, the content process or GPU process) crashes, but rather whether Valgrind or ASan can detect an error.
Comment 40 Abhishek Arya 2012-10-02 20:14:02 PDT
(In reply to Benoit Jacob [:bjacob] from comment #39)
> OK, I see.
> 
> I suppose that the real test is not whether the browser (or in Chrome's
> case, the content process or GPU process) crashes, but rather whether
> Valgrind or ASan can detect an error.

Exactly. By crash i meant crash in an asanified build [i tried both with and without --use-gl=osmesa cmd line flags]. we have a large archive of our builds here - https://commondatastorage.googleapis.com/chromium-browser-asan/index.html :)
Comment 41 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 20:16:42 PDT
Hm, notice that I have only reproduced the error in LLVMpipe, which is a specific Mesa driver. I don't know if it reproduces in OSMesa. What were you using in comment 0?
Comment 42 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 21:26:06 PDT
OK, so here are the line numbers in Mesa Git:

==14095== Invalid read of size 8
==14095==    at 0x402F288: memcpy@@GLIBC_2.14 (mc_replace_strmem.c:877)
==14095==    by 0x2EE75C81: try_update_scene_state (lp_setup.c:827)
==14095==    by 0x2EE74689: begin_binning (lp_setup.c:197)
==14095==    by 0x2EE74921: execute_clears (lp_setup.c:262)
==14095==    by 0x2EE74A59: set_scene_state (lp_setup.c:310)
==14095==    by 0x2EE74B53: lp_setup_flush (lp_setup.c:342)
==14095==    by 0x2EE66707: llvmpipe_flush (lp_flush.c:55)
==14095==    by 0x2EE6674C: llvmpipe_finish (lp_flush.c:82)
==14095==    by 0x2EE66828: llvmpipe_flush_resource (lp_flush.c:121)
==14095==    by 0x2EE45128: llvmpipe_get_transfer (lp_texture.c:611)
==14095==    by 0x2EFE55FB: pipe_get_transfer (u_inlines.h:417)
==14095==    by 0x2EFE6959: st_MapRenderbuffer (st_cb_fbo.c:745)
==14095==  Address 0x19438e78 is 8 bytes inside a block of size 64 free'd
==14095==    at 0x402C563: free (vg_replace_malloc.c:447)
==14095==    by 0x2EEE8FE1: _mesa_align_free (imports.c:179)
==14095==    by 0x2EFD39FA: _mesa_free_parameter_list (prog_parameter.c:88)
==14095==    by 0x2EFCD252: _mesa_delete_program (program.c:392)
==14095==    by 0x2F0BE587: st_delete_program (st_cb_program.c:170)
==14095==    by 0x2EFCD466: _mesa_reference_program_ (program.c:457)
==14095==    by 0x2F0B08B3: _mesa_reference_program (program.h:102)
==14095==    by 0x2F0B0970: st_reference_fragprog (st_program.h:265)
==14095==    by 0x2F0B0AF7: update_fp (st_atom_shader.c:93)
==14095==    by 0x2F0AB72E: st_validate_state (st_atom.c:203)
==14095==    by 0x2F0BF49D: st_readpixels (st_cb_readpixels.c:52)
==14095==    by 0x2EF728C6: _mesa_ReadnPixelsARB (readpix.c:767)

In the call stack to the free() (the second one) we see that a ReadPixels resulted in some program refcounting work (_mesa_reference_program_) which decides that the refcount hit 0 hence deletes the program: program.c:457 is this:

      deleteFlag = ((*ptr)->RefCount == 0);
      /*_glthread_UNLOCK_MUTEX((*ptr)->Mutex);*/

      if (deleteFlag) {
         ASSERT(ctx);
457->    ctx->Driver.DeleteProgram(ctx, *ptr);
      }

It's really strange that readPixels would really want to delete a program object, so I suppose that that is the real bug.
Comment 43 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 21:31:22 PDT
This readPixels call in Mesa is caused by Firefox's software compositor (which is still default on linux) reading back the WebGL canvas (checked in GDB). Confirmed that the testcase setting the height attribute does result in a canvas resizing, which causes Mesa readPixels to be called again and that's when the error occurs.
Comment 44 Benoit Jacob [:bjacob] (mostly away) 2012-10-02 21:37:24 PDT
Created attachment 667320 [details]
testcase updated
Comment 45 Benoit Jacob [:bjacob] (mostly away) 2012-10-03 05:59:01 PDT
In the testcase, calling gl.finish() before setting the height attribute removes the error; so we may be on track to find a work-around.
Comment 46 Benoit Jacob [:bjacob] (mostly away) 2012-10-03 06:50:07 PDT
...nope, tried putting glFinish() before glClear() and before glReadPixels(), the error is still present. no workaround in sight.
Comment 47 Kenneth Russell 2012-10-03 10:30:11 PDT
Do you know which program is being deleted? Could you explicitly defer calling glDeleteProgram against it as a workaround?
Comment 48 Benoit Jacob [:bjacob] (mostly away) 2012-10-03 10:33:10 PDT
This program deletion is not apparently the result of a glDeleteProgram call that we issue. Even calling glFinish after every gl call, I still get this error happening in places such as glClear or glReadPixels calls.
Comment 49 Benoit Jacob [:bjacob] (mostly away) 2012-10-04 08:15:40 PDT
http://hg.mozilla.org/releases/mozilla-aurora/rev/97d5b2b09fb8
Comment 50 Justin Wood (:Callek) 2012-10-07 20:54:34 PDT
(In reply to Benoit Jacob [:bjacob] from comment #27)
> http://hg.mozilla.org/integration/mozilla-inbound/rev/3d81919584ff

Should this be marked reso/fixed ?
Comment 51 Benoit Jacob [:bjacob] (mostly away) 2012-10-23 18:20:11 PDT
As said above (comment 32, comment 33) this patch was only a 'nice to have' patch and does not resolve the issue here. We don't have a fix for this bug as of yet.
Comment 52 Benoit Jacob [:bjacob] (mostly away) 2012-10-23 18:22:12 PDT
Un-assigning from myself to reflect the reality that I've not been working on this. If you think that this is so important that I should work on this instead of other things I'm working on, let me know.
Comment 53 Benoit Jacob [:bjacob] (mostly away) 2012-10-24 09:58:42 PDT
If we are going to do more investigation here, here is what could be done:

APItrace Firefox reproducing this, and then replay the trace and check if that reproduces the heap use-after-free. If it does, we then have a testcase independent of Firefox which we can then try to minimize.
Comment 54 Benoit Jacob [:bjacob] (mostly away) 2012-10-24 12:14:39 PDT
So far, it seems that we have only reproduced this on LLVMpipe, and the stack trace found by Valgrind seems very LLVMpipe-specific. So if it comes to blacklisting, we could at least consider blacklisting LLVMpipe only.
Comment 55 Al Billings [:abillings] 2012-11-01 10:46:08 PDT
This is being tracked for Firefox 17 and seems to be very much at risk since no one is assigned to it. What should we be doing here in the short term?
Comment 56 Benoit Jacob [:bjacob] (mostly away) 2012-11-01 11:04:15 PDT
We have 2 options in the short term:
 - live with the vulnerability (we have no idea at this point of whether it is exploitable AFAIK)
 - or blacklist LLVMpipe. This won't affect too many users.
Comment 57 Lukas Blakk [:lsblakk] use ?needinfo 2012-11-07 17:53:00 PST
(In reply to Benoit Jacob [:bjacob] from comment #56)
>  - live with the vulnerability (we have no idea at this point of whether it
We'll be taking this option, wontfixing for 17.
Comment 58 Daniel Veditz [:dveditz] 2012-11-15 13:42:01 PST
blacklisting llvmpipe is probably fine -- performance would be pretty sucky anyway, wouldn't it be better to simply not offer false hope?
Comment 59 Daniel Veditz [:dveditz] 2012-11-29 13:22:55 PST
Filed bug 816695 to blocklist the driver.
Comment 60 Benoit Jacob [:bjacob] (mostly away) 2012-11-29 13:39:55 PST
(In reply to Daniel Veditz [:dveditz] from comment #58)
> blacklisting llvmpipe is probably fine -- performance would be pretty sucky
> anyway, wouldn't it be better to simply not offer false hope?

LLVMpipe is a fast enough software renderer to be useful in some cases (either very fast machine, or not-too-demanding WebGL code).

Before blacklisting let's ask again the Mesa developers CC'd here if they know a work-around for this bug? Somehow, it seems to only be affecting Firefox and not Chrome, which lets me hope that a work-around may exist.
Comment 61 Benoit Jacob [:bjacob] (mostly away) 2012-11-29 19:00:04 PST
Let me expand a little more on why I am a bit reluctant to blacklist llvmpipe and find it worth my time to work around its bugs:

 - It is the fastest open-source software GL renderer in existence, so it is our only reasonable option if we want to offer a software fallback for WebGL. Note that the Chrome team found it important enough to offer a software fallback for WebGL, that they licensed a commercial GL renderer for that.

 - Being open source AND running entirely on the CPU side, it is particularly easy to discover its bugs, for example it can be run entirely in Valgrind, recompiled with Address Sanitizer, etc, so that the fact that we get more bug reports about it does not necessarily imply that it is buggier than other drivers. It just is the driver that we can understand the best. Blacklisting it risks just papering over the most easy to see of all driver bugs. Part of this argument can be turned around though: as it runs entirely as a library in our process (as opposed to, say, running parts on the GPU), llvmpipe bugs can affect us more severely (e.g. more opportunities to corrupt our heap).

 - It is only used by a small minority of our users at the moment: it is virtually unused outside of desktop Linux (1%) and even there, only used by a small minority of Linux users. so the overall proportion of our users using it is certainly no more than 0.1%. And indeed, if I churn over the crash-data CSV files,

bjacob:~/crash-stats$ zcat 20121128-pub-crashdata.csv.gz | wc -l
405231
bjacob:~/crash-stats$ zcat 20121128-pub-crashdata.csv.gz | grep llvmpipe | wc -l
55
bjacob:~/crash-stats$ zcat 20121128-pub-crashdata.csv.gz | grep llvmpipe | grep Linux | wc -l
55

So we find that only 0.01% of our crash reports are on llvmpipe, and that all of them are on Linux. Then again, that argument can be turned around as evidence that we may as well blacklist it by default and tell the few tech-savvy people using it who care about getting WebGL that they can force-enable it.
Comment 62 Benoit Jacob [:bjacob] (mostly away) 2012-11-29 19:08:34 PST
Created attachment 686906 [details]
apitrace

This allows to reproduce without using Firefox, like this:

LD_PRELOAD=/hack/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1
LD_LIBRARY_PATH=/hack/mesa/build/linux-x86_64-debug/gallium/targets/libgl-xlib
valgrind --smc-check=all-non-file ../apitrace/build/glretrace -v
firefox.2.trace

Shows that the issue occurs in glBlitFramebuffer... now I have a sparkle of hope: that suggests we should try without MSAA...
Comment 63 Benoit Jacob [:bjacob] (mostly away) 2012-11-29 19:20:10 PST
Created attachment 686910 [details]
apitrace without MSAA

Nope.... /o\
this apitrace has MSAA disabled, and the same error still occurs, this time in a MakeCurrent call apparently.
Comment 64 Benoit Jacob [:bjacob] (mostly away) 2012-11-29 21:09:10 PST
Also tried disabling the share group, because of a warning pointing in that direction... no effect.

I'm leaning in favor of blacklisting (reluctantly!) because for all I can understand, this is looking like a reference-counting bug, and if that is true, this could allow arbitrarily bad memory access bugs. Also, I'm long past the point where I can justify spending this time on this driver professionally :-P Let's give the Mesa devs a couple days and maybe blacklist this coming Monday.
Comment 65 Benoit Jacob [:bjacob] (mostly away) 2012-11-30 06:05:21 PST
Filed Mesa security bug (which are a thing since today): https://bugs.freedesktop.org/show_bug.cgi?id=57733
Comment 66 Benoit Jacob [:bjacob] (mostly away) 2012-11-30 06:07:41 PST
*** Bug 787827 has been marked as a duplicate of this bug. ***
Comment 67 Jeff Gilbert [:jgilbert] 2012-12-10 15:42:31 PST
(In reply to Benoit Jacob [:bjacob] from comment #64)
> Also tried disabling the share group, because of a warning pointing in that
> direction... no effect.
> 
> I'm leaning in favor of blacklisting (reluctantly!) because for all I can
> understand, this is looking like a reference-counting bug, and if that is
> true, this could allow arbitrarily bad memory access bugs. Also, I'm long
> past the point where I can justify spending this time on this driver
> professionally :-P Let's give the Mesa devs a couple days and maybe
> blacklist this coming Monday.

It seems clear we should blacklist then, unfortunately.
Comment 68 Benoit Jacob [:bjacob] (mostly away) 2012-12-10 15:46:26 PST
(In reply to Jeff Gilbert [:jgilbert] from comment #67)
> (In reply to Benoit Jacob [:bjacob] from comment #64)
> > Also tried disabling the share group, because of a warning pointing in that
> > direction... no effect.
> > 
> > I'm leaning in favor of blacklisting (reluctantly!) because for all I can
> > understand, this is looking like a reference-counting bug, and if that is
> > true, this could allow arbitrarily bad memory access bugs. Also, I'm long
> > past the point where I can justify spending this time on this driver
> > professionally :-P Let's give the Mesa devs a couple days and maybe
> > blacklist this coming Monday.
> 
> It seems clear we should blacklist then, unfortunately.

For the record, we did last week (as you know --- you reviewed the patch ;-) )
Comment 69 Jeff Gilbert [:jgilbert] 2012-12-10 15:50:56 PST
So I did! Shall we mark this fixed then?

Also, it looks like though my response tried to reset all the flags, it didn't do anything. While strange, this is good.
Comment 70 Benoit Jacob [:bjacob] (mostly away) 2012-12-10 15:53:30 PST
Yes, I believe that we can close this bug: it seems llvmpipe specific, and llvmpipe is now blacklisted.
Comment 71 Daniel Veditz [:dveditz] 2012-12-13 16:28:13 PST
I believe we need to ship a blocklist with ESR10 for this to take effect.
Comment 72 Benoit Jacob [:bjacob] (mostly away) 2012-12-13 18:40:15 PST
Mesa is already blocked altogether in ESR10. See widget/src/xpwidgets/GfxInfoX11.cpp.
Comment 73 Raymond Forbes[:rforbes] 2013-07-19 18:19:56 PDT
rforbes-bugspam-for-setting-that-bounty-flag-20130719

Note You need to log in before you can comment on or make changes to this bug.