Last Comment Bug 682335 - (CVE-2011-3003) Investigate crash [@ WebGLContext::BufferSubData_array]
(CVE-2011-3003)
: Investigate crash [@ WebGLContext::BufferSubData_array]
Status: RESOLVED FIXED
[sg:critical?][qa+]
:
Product: Core
Classification: Components
Component: Canvas: WebGL (show other bugs)
: 6 Branch
: All Linux
: -- normal (vote)
: mozilla9
Assigned To: Brandon Sterne (:bsterne)
:
: Milan Sreckovic [:milan]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-26 10:49 PDT by Brandon Sterne (:bsterne)
Modified: 2015-05-07 15:27 PDT (History)
12 users (show)
rforbes: sec‑bounty+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
+
fixed
unaffected
unaffected


Attachments
PoC - Crashes Firefox 6 (1.46 MB, application/x-gzip)
2011-08-26 10:49 PDT, Brandon Sterne (:bsterne)
no flags Details
set byteLength to 0 on mData allocation failure (2.03 KB, patch)
2011-09-09 07:36 PDT, Benoit Jacob [:bjacob] (mostly away)
jmuizelaar: review+
christian: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Minimal testcase (711 bytes, text/html)
2011-09-09 08:02 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details
make 100x bigger allocations to trigger OOM condition easily (1.44 KB, patch)
2011-09-26 12:36 PDT, Benoit Jacob [:bjacob] (mostly away)
no flags Details | Diff | Splinter Review
Change to harness01.html to make it run on Linux and OSX. (624 bytes, patch)
2011-09-29 00:54 PDT, Doug Sherk (:drs) (inactive)
no flags Details | Diff | Splinter Review

Description Brandon Sterne (:bsterne) 2011-08-26 10:49:00 PDT
Created attachment 556072 [details]
PoC - Crashes Firefox 6

Ben Hawkes reported the following to security@m.o yesterday.  There are 31MB of uncompressed testcases (4MB compressed) so, for the purposes of this bug, I am going to remove the testcases beyond the one where he observes the crash. Otherwise I won't be able to attach the PoC to Bugzilla.

I'll separately follow up with Ben to see if there is a way we can get the testcase generator and perhaps integrate it into our existing fuzzers.

------

Hi,

I'm getting a reliable crash in WebGLContext::BufferSubData_array in Firefox 6.0 while doing some basic fuzzing of WebGL/GLESSL. I'm having some trouble finding the root-cause (presumably its a use-after-free), but since the crash looks exploitable I thought I'd get you guys involved early to help investigate further.

The attached test cases are run sequentially from "harness01.html" (starting from test 3750.html) and I reliably see the crash at 4169.html on a Linux system using the proprietary NVidia driver. Stack trace as follows:

Program received signal SIGSEGV, Segmentation fault.
0xf675e700 in mozilla::WebGLBuffer::CopySubDataIfElementArray
(this=0x54ff8680, byteOffset=558000083, byteLength=16948,
data=0xb7e01000)
    at /usr/include/bits/string3.h:52
52        return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
(gdb) x/i 0xf675e700
=> 0xf675e700 <mozilla::WebGLBuffer::CopySubDataIfElementArray(GLuint,
GLuint, void const*)+32>:        rep movsb %ds:(%esi),%es:(%edi)
(gdb) i r esi edi ecx
esi            0xb7e01000       -1210052608
edi            0x214267d3       558000083
ecx            0x4234   16948
(gdb) x/bx 0x214267d3
0x214267d3:     Cannot access memory at address 0x214267d3
(gdb) bt
#0  0xf675e700 in mozilla::WebGLBuffer::CopySubDataIfElementArray
(this=0x54ff8680, byteOffset=558000083, byteLength=16948,
data=0xb7e01000)
    at /usr/include/bits/string3.h:52
#1  0xf674e540 in mozilla::WebGLContext::BufferSubData_array
(this=0x1c3f32e0, target=34963, byteOffset=558000083, wa=0x53b70600)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/content/canvas/src/WebGLContextGL.cpp:560
#2  0xf6b80ca8 in nsIDOMWebGLRenderingContext_BufferSubData
(cx=0xe90b3290, argc=3, vp=0xf19ff0e8)
    at ../../../../dist/include/CustomQS_WebGL.h:208
#3  0xf72fe8f9 in CallCompiler::generateNativeStub() () from
/source/firefox/firefox32/mozilla-release/objdir-ff-dbg32/dist/bin/libxul.so
#4  0xf72fa26f in js::mjit::ic::NativeCall (f=..., ic=0x54ff8680)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/methodjit/MonoIC.cpp:1026
#5  0xe49acf28 in ?? ()
#6  0xf72aba8c in EnterMethodJIT (cx=0x0, safePoint=0xe49661dc)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/methodjit/MethodJIT.cpp:685
#7  CheckStackAndEnterMethodJIT (cx=0x0, safePoint=0xe49661dc)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/methodjit/MethodJIT.cpp:715
#8  js::mjit::JaegerShotAtSafePoint (cx=0x0, safePoint=0xe49661dc)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/methodjit/MethodJIT.cpp:742
#9  0xf7362522 in js::Interpret (cx=0xe90b3290, entryFrame=0xf19ff068,
inlineCallCount=1, interpMode=js::JSINTERP_NORMAL)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/jsinterp.cpp:6347
#10 0xf71788c1 in js::RunScript (cx=0xe90b3290, script=0x537ff900,
fp=0xf19ff068)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/jsinterp.cpp:613
#11 0xf71794a5 in js::Invoke (cx=0xe90b3290, argsRef=...,
option=js::INVOKE_NORMAL)
    at /usr/local/google/source/firefox/firefox32/mozilla-release/js/src/jsinterp.cpp:694
etc

Note: I was only able to reproduce this on 32-bit builds, my 64-bit build would not trigger this crash.

Thanks,
Ben
Comment 1 Daniel Veditz [:dveditz] 2011-09-07 18:47:06 PDT
Have not investigated but based on the stack it's plausibly sg:critical until proven otherwise. Benoit, please check this out.
Comment 3 Benoit Jacob [:bjacob] (mostly away) 2011-09-07 19:32:42 PDT
I am on linux x86-64 (debian squeeze) with the NVIDIA 280.13 driver, and I can't reproduce this crash at all, not even in Firefox 6.0.2, 32-bit version.

Without me being able to reproduce here are some things that could move this forward:
 1. Can you reproduce this crash in Nightly?
 2. Can you run this with APItrace,
      https://github.com/apitrace/apitrace
    and attach here the resulting trace?
Comment 4 Benoit Jacob [:bjacob] (mostly away) 2011-09-07 19:35:20 PDT
The very large byteLength and the fact that this only happens on 32bit suggests this might be a duplicate of bug 665070. If this crash persists in Nightly then it's not a duplicate of bug 665070.
Comment 5 Benoit Jacob [:bjacob] (mostly away) 2011-09-07 19:37:35 PDT
Also, if this is bug 665070 or any other out-of-memory-related bug, then running this in a debug build with MOZ_GL_DEBUG_VERBOSE=1 will show it (this prints GL error codes and GL_OUT_OF_MEMORY value will appear).
Comment 6 Daniel Veditz [:dveditz] 2011-09-08 13:17:12 PDT
bsterne: reassigning to you to follow up with the reporter on comment 4
Comment 7 Brandon Sterne (:bsterne) 2011-09-08 13:49:06 PDT
(In reply to Daniel Veditz from comment #6)
> bsterne: reassigning to you to follow up with the reporter on comment 4

He's copied on the bug.

Ben, can you respond to comment 4?  Are you able to reproduce this using Nightly?  We are trying to determine if this is a dupe of a bug we already fixed.
Comment 8 Ben Hawkes 2011-09-09 03:43:47 PDT
Hi! I've reproduced the crash on Nightly 9.0a1 (2011-09-08). It is still crashing on test case 4169.html during a memcpy to an invalid destination. I've hosted the apitrace file here:

http://sota.gen.nz/webgl_crash_trace.tar.gz

I don't have access to bug 665070, so I can't comment on the likelihood of it being related to this crash.

Cheers!
Comment 9 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 06:17:03 PDT
Thanks, but:

$ ./build/apitrace/tracedump ~/firefox.trace
error: unsupported trace format version 97
error: failed to open /home/bjacob/firefox.trace

It seems that we have incompatible versions of APItrace. Could you please run tracedump on your side, and compress and attach to this bug the resulting text file?
Comment 10 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 06:18:25 PDT
The fact that it was reproduced in Nightly means it's not a dupe of 665070.
Comment 11 Ben Hawkes 2011-09-09 06:21:31 PDT
Trace dump output uploaded here:

http://sota.gen.nz/webgl_crash_trace_dump.tar.gz
Comment 12 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 06:42:24 PDT
TL;DR: we have a driver bug here. What driver is this?


Here are all the Buffer-related calls in this trace:

376196 glGenBuffers(n = 1, buffer = &3)
376276 glGetBufferParameteriv(target = GL_ARRAY_BUFFER, pname = GL_BUFFER_SIZE, params = &0)
376291 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 3)
376447 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
376479 glGetBufferParameteriv(target = GL_ARRAY_BUFFER, pname = GL_BUFFER_SIZE, params = &0)
376482 glIsBuffer(buffer = 3) = true
376496 glIsBuffer(buffer = 3) = true
376518 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
376543 glBufferData(target = GL_ELEMENT_ARRAY_BUFFER, size = 956231260, data = NULL, usage = GL_STATIC_DRAW)
376544 glGetError() = GL_NO_ERROR
376603 glGetBufferParameteriv(target = GL_ARRAY_BUFFER, pname = GL_BUFFER_USAGE, params = &0)
376824 glGetBufferParameteriv(target = GL_ELEMENT_ARRAY_BUFFER, pname = GL_BUFFER_SIZE, params = &0)


Here is the interesting part:

376518 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
376543 glBufferData(target = GL_ELEMENT_ARRAY_BUFFER, size = 956231260, data = NULL, usage = GL_STATIC_DRAW)
376544 glGetError() = GL_NO_ERROR

BindBuffer with buffer=0 unbind any bound buffer. So the glBufferData call should fail and produce a GL error:

http://www.khronos.org/opengles/sdk/docs/man/xhtml/glBufferData.xml

"GL_INVALID_OPERATION is generated if the reserved buffer object name 0 is bound to target."

So this is a bug in this OpenGL implementation/driver.

What driver is this?

Next to that we must also have a bug on our side. Indeed, our WebGL BufferData functions do check that a buffer is bound, and generate an error if no buffer is bound. So here, apparently they believe that a buffer is bound but none is. This could be another bug in the category "Mistake in keeping track of GL state".
Comment 13 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 06:46:00 PDT
(In reply to Benoit Jacob [:bjacob] from comment #12)
> TL;DR: we have a driver bug here. What driver is this?
> 
> 
> Here are all the Buffer-related calls in this trace:
> 
> 376196 glGenBuffers(n = 1, buffer = &3)
> 376276 glGetBufferParameteriv(target = GL_ARRAY_BUFFER, pname =
> GL_BUFFER_SIZE, params = &0)
> 376291 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 3)
> 376447 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
> 376479 glGetBufferParameteriv(target = GL_ARRAY_BUFFER, pname =
> GL_BUFFER_SIZE, params = &0)
> 376482 glIsBuffer(buffer = 3) = true
> 376496 glIsBuffer(buffer = 3) = true
> 376518 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
> 376543 glBufferData(target = GL_ELEMENT_ARRAY_BUFFER, size = 956231260, data
> = NULL, usage = GL_STATIC_DRAW)
> 376544 glGetError() = GL_NO_ERROR
> 376603 glGetBufferParameteriv(target = GL_ARRAY_BUFFER, pname =
> GL_BUFFER_USAGE, params = &0)
> 376824 glGetBufferParameteriv(target = GL_ELEMENT_ARRAY_BUFFER, pname =
> GL_BUFFER_SIZE, params = &0)
> 
> 
> Here is the interesting part:
> 
> 376518 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 0)
> 376543 glBufferData(target = GL_ELEMENT_ARRAY_BUFFER, size = 956231260, data
> = NULL, usage = GL_STATIC_DRAW)
> 376544 glGetError() = GL_NO_ERROR

Oops. The target is different: GL_ARRAY_BUFFER versus GL_ELEMENT_ARRAY_BUFFER. Need to think more.
Comment 14 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 07:15:08 PDT
(In reply to Brandon Sterne (:bsterne) from comment #0)
> Program received signal SIGSEGV, Segmentation fault.
> 0xf675e700 in mozilla::WebGLBuffer::CopySubDataIfElementArray
> (this=0x54ff8680, byteOffset=558000083, byteLength=16948,
> data=0xb7e01000)
>     at /usr/include/bits/string3.h:52
> 52        return __builtin___memcpy_chk (__dest, __src, __len, __bos0
> (__dest));
> (gdb) x/i 0xf675e700
> => 0xf675e700 <mozilla::WebGLBuffer::CopySubDataIfElementArray(GLuint,
> GLuint, void const*)+32>:        rep movsb %ds:(%esi),%es:(%edi)
> (gdb) i r esi edi ecx
> esi            0xb7e01000       -1210052608
> edi            0x214267d3       558000083
> ecx            0x4234   16948
> (gdb) x/bx 0x214267d3
> 0x214267d3:     Cannot access memory at address 0x214267d3

Pardon my ignorance but how does this show that we're writing at an invalid destination? Here edi is just the offset, not the absolute address. We're writing to es:edi, not to edi. Right? So at this point we don't know that we're doing an invalid write, this could also be an invalid read. right?
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 07:32:08 PDT
...thinking about this, maybe the answer is that es itself is 0. That would mean that mData is null. Looking at the code, this can happen if it failed to allocate and we don't seem to properly check for that.
Comment 16 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 07:36:07 PDT
Created attachment 559455 [details] [diff] [review]
set byteLength to 0 on mData allocation failure

When mData is null, we want to avoid dereferencing (pointers based on) it. Rather than adding mData!=0 checks everywhere, we already have mByteLength!=0 checks in most places so I used that. When mData allocation fails, mByteLength gets set to 0 so that our bookkeeping is consistent (a buffer that failed to allocate has size 0). Should remove the crash.
Comment 17 Jeff Muizelaar [:jrmuizel] 2011-09-09 07:47:44 PDT
Comment on attachment 559455 [details] [diff] [review]
set byteLength to 0 on mData allocation failure

Should we be using a safer data type for this?
Comment 18 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 08:02:04 PDT
Created attachment 559464 [details]
Minimal testcase

This is a really minimal testcase. But since it relies on OOM and WebGL doesn't allow buffer allocations bigger than 2G, it's still hard to reproduce on 64bit systems with lots of virtual memory (note that VRAM is virtualized). So in order to reproduce easily, I patched Firefox to make 100x bigger allocations in CopyDataIfElementArray and in ZeroDataIfElementArray in WebGLBuffer, in WebGLContext.h.
Comment 19 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 08:06:47 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)
> Should we be using a safer data type for this?

Yes, we should use something like nsTArray<PRUint8>.

The reason for void* was that a WebGL buffer, already in the ELEMENT_ARRAY_BUFFER case that's relevant here, can be either an array of uint8's or an array of uint16's with the possibility of uint32's in the future.

But this doesn't really matter to the code here, and using nsTArray<PRUint8> and letting it handle resizing and querying its length instead of storing mByteLength separately would have prevented this bug.

Please still r+ this patch as it is minimal and so will be easier to approve for beta. Will write the long-term-solution patch.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 08:14:58 PDT
(In reply to Benoit Jacob [:bjacob] from comment #19)
> But this doesn't really matter to the code here, and using nsTArray<PRUint8>
> and letting it handle resizing and querying its length instead of storing
> mByteLength separately would have prevented this bug.

Ah, wait, no, there is a difficulty.

We only want to store a copy of the WebGLBuffer contents (mData) in the ELEMENT_ARRAY_BUFFER case.

In the ARRAY_BUFFER case, storing it would be useless and typically a lot more expensive. That's why we don't do it. But we _still_ need to store the byteLength!

I think the full solution is we want to have both:
 * always store mByteLength as an integer (as we currently do)
 * store mData as a nsTArray
 * there would not quite be a redundancy between mByteLength and mData.Length() as  mByteLength would be the length of the actual GPU-side buffer while mData.Length() would be the size of the copy we have on our side. So they're different concepts. Of course their values are the same unless we are in an error case.
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 08:15:44 PDT
I confirm that the current patch fixes the crash on my testcase.
Comment 22 Jeff Muizelaar [:jrmuizel] 2011-09-09 12:01:42 PDT
Comment on attachment 559455 [details] [diff] [review]
set byteLength to 0 on mData allocation failure

Benoit, and I talked about a mechanism for making this easier to test. We'll follow up with a link to that bug.
Comment 23 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 12:18:13 PDT
Indeed, the problem is that any crashtest for this bug would have to reproduce OOM conditions, which on 64bit machines with lots of slow virtual memory is indistinguishable from a plain timeout.
Comment 24 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 15:05:11 PDT
http://hg.mozilla.org/mozilla-central/rev/9f664f2ac12c
Comment 25 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 15:06:15 PDT
Comment on attachment 559455 [details] [diff] [review]
set byteLength to 0 on mData allocation failure

Please approve for beta. It's a really bad bug. It allows to implement POKE() if you can trigger an OOM with bufferData, which is not hard to do on 32bit machines.
Comment 26 Benoit Jacob [:bjacob] (mostly away) 2011-09-09 15:09:32 PDT
http://hg.mozilla.org/mozilla-central/rev/9f664f2ac12c
Comment 28 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-22 16:58:01 PDT
qa- as no QA fix verification needed
Comment 29 Daniel Veditz [:dveditz] 2011-09-23 12:51:38 PDT
Again not understanding the qa- on a crash bug with a reproducible minimal testcase.
Comment 30 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-26 10:52:31 PDT
Sorry, I made a mistake. I thought I was commenting on a different bug (triaging 131 bugs in a couple hours has a high likelihood of human error).
Comment 31 juan becerra [:juanb] 2011-09-26 11:50:24 PDT
Brandon, could you help us verify this bug on your environment? We've been trying to get a hold of a machine with similar specs, but I haven't been able to reproduce the crash as described nor see a difference on a build with the fix.
Comment 32 Benoit Jacob [:bjacob] (mostly away) 2011-09-26 12:34:17 PDT
Juan, as I explained in comment 18, the key to reproducing this is to use a 32bit system with not too much RAM (1G would work well). Alternatively, to reproduce this on my 64bit system, I patched Firefox to make 100x bigger allocations there. If you want, I can make you such a patch.
Comment 33 Benoit Jacob [:bjacob] (mostly away) 2011-09-26 12:36:14 PDT
Created attachment 562513 [details] [diff] [review]
make 100x bigger allocations to trigger OOM condition easily

actually I still had this in my queue.
Comment 34 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-26 16:23:23 PDT
For time sake, it would be great if someone who is already set up to reproduce this bug could verify the fix.
Comment 35 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-09-26 20:22:09 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #34)
> For time sake, it would be great if someone who is already set up to
> reproduce this bug could verify the fix.

Benoit, would it be possible for you to share the build you already generated? or throw something up on Try which QA can test?
Comment 36 Benoit Jacob [:bjacob] (mostly away) 2011-09-26 20:42:36 PDT
pushed the alloc-100x-more to try:

https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=b6363b4d9bcd

builds will be available in a few hours at

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bjacob@mozilla.com-b6363b4d9bcd

The QA steps from there are run the PoC in this build and note that it doesn't crash, but does run out of memory (I think if you check in the Web console you'll see a WebGL warning like "bufferData: out of memory". You will need to set webgl.verbose=true in about:config).
Comment 37 Doug Sherk (:drs) (inactive) 2011-09-29 00:54:00 PDT
Created attachment 563331 [details] [diff] [review]
Change to harness01.html to make it run on Linux and OSX.

There was a mistake in the PoC which caused it fail on OSX and Linux due to a path issue. I have attached a diff to fix this.
Comment 38 Raymond Forbes[:rforbes] 2013-07-19 18:46:27 PDT
rforbes-bugspam-for-setting-that-bounty-flag-20130719

Note You need to log in before you can comment on or make changes to this bug.