Closed Bug 588918 Opened 14 years ago Closed 14 years ago

Make WebGL test suite pass without Valgrind errors

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: bjacob, Assigned: bjacob)

References

Details

Attachments

(7 files, 1 obsolete file)

Let's make the WebGL test suite not generate Valgrind errors.
Attachment #467502 - Flags: review?(vladimir)
blocking2.0: --- → ?
Comment on attachment 467502 [details] [diff] [review]
fix many valgrind uninitialized value errors

Um, I think bugzilla ate my comments :(

Basically, looks great, but just happens to be ugly style-wise.  I think it would help if you just added a newline after every *retval = ..; at the start.  Sorry to be nit picky, just stuff like:

> WebGLContext::GetActiveAttrib(nsIWebGLProgram *pobj, PRUint32 index, nsIWebGLActiveInfo **retval)
> {
>+    *retval = nsnull;
>     WebGLuint progname;
>     if (!GetGLName<WebGLProgram>("getActiveAttrib: program", pobj, &progname))
>         return NS_OK;

looks pretty weird and ugly, because the retval assignment has nothing to do with the GetGLName chunk.
blocking2.0: ? → final+
ok, this version has the blank lines.
Assignee: nobody → bjacob
Attachment #467502 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #468423 - Flags: review?(vladimir)
Attachment #467502 - Flags: review?(vladimir)
Comment on attachment 468423 [details] [diff] [review]
fix many valgrind uninitialized value errors, updated

That works, thanks!
Attachment #468423 - Flags: review?(vladimir) → review+
http://hg.mozilla.org/mozilla-central/rev/4999ff306455
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
argh, i'm stupid to have marked this as fixed, this patch only fixes part of this.

let's keep this bug open until all valgrind errors are fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
http://hg.mozilla.org/mozilla-central/rev/a961e84c9fee
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
arf, closed by mistake.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 607774
OK, the next big chunk is renderbufferStorage failing to initialize buffers, I'm working on it but it's taking much longer than expected as I just realized that a significant part of the spec was just plain unimplemented and am taking care of this now: see bug 607774.
Depends on: 613020
With the attached valgrind suppression files, with NVIDIA driver 195.36 on linux x86-64, and with the patch (attachment 491513 [details] [diff] [review]) in bug 613020 working around a NVIDIA bug, the whole WebGL conformance suite, webgl-conformance-tests.html, runs without any other valgrind error than this JS GC-related error:

==23185== Invalid read of size 8
==23185==    at 0x6C76F36: js::MarkRangeConservatively(JSTracer*, unsigned long*, unsigned long*) (jsgc.cpp:711)
==23185==    by 0x6C76FFA: js::MarkThreadDataConservatively(JSTracer*, JSThreadData*) (jsgc.cpp:728)
==23185==    by 0x6C7711E: js::MarkConservativeStackRoots(JSTracer*) (jsgc.cpp:761)
==23185==    by 0x6C78A5B: js::MarkRuntime(JSTracer*) (jsgc.cpp:1620)
==23185==    by 0x6C79E38: MarkAndSweep(JSContext*, JSGCInvocationKind) (jsgc.cpp:2171)
==23185==    by 0x6C7A832: GCUntilDone(JSContext*, JSGCInvocationKind) (jsgc.cpp:2515)
==23185==    by 0x6C7A9D0: js_GC(JSContext*, JSGCInvocationKind) (jsgc.cpp:2580)
==23185==    by 0x6BF2BF0: JS_GC (jsapi.cpp:2513)
==23185==    by 0x61EA884: nsXPConnect::Collect() (nsXPConnect.cpp:405)
==23185==    by 0x61EA8E2: nsXPConnect::GarbageCollect() (nsXPConnect.cpp:413)
==23185==    by 0x5C98CFE: nsJSContext::CC(nsICycleCollectorListener*) (nsJSEnvironment.cpp:3628)
==23185==    by 0x5C98F0B: nsJSContext::IntervalCC() (nsJSEnvironment.cpp:3733)
==23185==  Address 0x7feff26f0 is not stack'd, malloc'd or (recently) free'd
(Note: I ran webgl-conformance-tests.html instead of the mochitest because I was too lazy to find back how to valgrind a mochitest which involves running runtests.py with extra arguments)
i also had to apply this patch to prevent tests from timing out when run in valgrind (replaces 3s by 300s)
As mentioned in bug 613020 comment 12, adding the --smc-check=all option to valgrind makes the test suite pass without any error or crash on NVIDIA (still with the suppressions list from above).
This was recorded running the conformance suite with OSMesa, with this command:

valgrind --smc-check=all --suppressions=nvidia.supp --suppressions=firefox-startup.supp --track-origins=yes --error-limit=no
So let's examine the valgrind errors with OSMesa (just attached log).

First we have the JS GC error already reported in comment 11.

We then have these errors in conformance/tex-image-with-format-and-type.html:

==14218== Conditional jump or move depends on uninitialised value(s)
==14218==    at 0x2111134F: _mesa_convert_colors (image.c:5421)
==14218==    by 0x21176CBA: convert_color_type (s_span.c:941)
==14218==    by 0x211784B4: _swrast_write_rgba_span (s_span.c:1226)
==14218==    by 0x21187818: general_triangle (s_tritemp.h:819)
==14218==    by 0x21153E6C: _tnl_render_triangles_verts (t_vb_rendertmp.h:184)
==14218==    by 0x211554B1: run_render (t_vb_render.c:320)
==14218==    by 0x2114D5F6: _tnl_run_pipeline (t_pipeline.c:153)
==14218==    by 0x2114E262: _tnl_draw_prims (t_draw.c:478)
==14218==    by 0x2114E340: _tnl_vbo_draw_prims (t_draw.c:384)
==14218==    by 0x21146FB0: vbo_exec_DrawArrays (vbo_exec_array.c:524)
==14218==    by 0x5A81AC8: mozilla::gl::GLContext::fDrawArrays(unsigned int, int, int) (GLContext.h:1185)
==14218==    by 0x5A670B0: mozilla::WebGLContext::DrawArrays(unsigned int, int, int) (WebGLContextGL.cpp:1075)
==14218==  Uninitialised value was created by a stack allocation
==14218==    at 0x211699D8: fetch_texel (prog_execute.c:373)

This error is a priori worrying because it might mean that we are drawing with uninitialized texture data.

But the next error is related to it:

==14218== Conditional jump or move depends on uninitialised value(s)
==14218==    at 0x6C43B16: js::CompartmentChecker::check(js::Value const&) (jscntxtinlines.h:561)
==14218==    by 0x6CD9C10: void js::assertSameCompartment<js::Value>(JSContext*, js::Value) (jscn
txtinlines.h:624)
==14218==    by 0x6F030D2: js::Interpret(JSContext*, JSStackFrame*, unsigned int, JSInterpMode) (
jsinterp.cpp:4555)
==14218==    by 0x6CD5229: js::RunScript(JSContext*, JSScript*, JSStackFrame*) (jsinterp.cpp:657)
==14218==    by 0x6CD56F4: js::Invoke(JSContext*, js::CallArgs const&, unsigned int) (jsinterp.cp
p:737)
==14218==    by 0x6CD5DA8: js::ExternalInvoke(JSContext*, js::Value const&, js::Value const&, uns
igned int, js::Value*, js::Value*) (jsinterp.cpp:862)
==14218==    by 0x6C1E6F9: js::ExternalInvoke(JSContext*, JSObject*, js::Value const&, unsigned i
nt, js::Value*, js::Value*) (jsinterp.h:954)
==14218==    by 0x6C3BA40: JS_CallFunctionValue (jsapi.cpp:4973)
==14218==    by 0x5CAD984: nsJSContext::CallEventHandler(nsISupports*, void*, void*, nsIArray*, n
sIVariant**) (nsJSEnvironment.cpp:2177)
==14218==    by 0x5D41847: nsJSEventListener::HandleEvent(nsIDOMEvent*) (nsJSEventListener.cpp:22
8)
==14218==    by 0x5A92B28: nsEventListenerManager::HandleEventSubType(nsListenerStruct*, nsIDOMEv
entListener*, nsIDOMEvent*, nsPIDOMEventTarget*, unsigned int, nsCxPusher*) (nsEventListenerManag
er.cpp:1114)
==14218==    by 0x5A92FE7: nsEventListenerManager::HandleEventInternal(nsPresContext*, nsEvent*, 
nsIDOMEvent**, nsPIDOMEventTarget*, unsigned int, nsEventStatus*, nsCxPusher*) (nsEventListenerMa
nager.cpp:1210)
==14218==  Uninitialised value was created by a stack allocation
==14218==    at 0x211699D8: fetch_texel (prog_execute.c:373)


What's funny here is that the uninitialized data was created by the same fetch_texel code, so it's probably the same; and this time, it's used by the Javascript engine.

I looked at fetch_texel in Mesa sources and it's really just doing that, fetching a texel from a texture unit while executing a fragment shader.

Since this test is drawing with a texture and then doing a readPixels() on the result to compare, that makes sense, the first error would be when drawing and the second when comparing the result. So it really seems we're drawing with uninitialized texture.

This needs further investigation: try reducing tex-image-with-format-and-type.html to a minimal test case...

There are no other valgrind errors, other than repetitions of the above.
Attached file reduced test case
I reduced tex-image-with-format-and-type.html to a smaller test case. It needs to be put in the same directory in order to work.

What I currently know:
  * the valgrind errors only occur with the RGB8 format, not in any other format (RGBA8, 16-bit formats, luminance/alpha...) so it seems like a bug with uninitialized alpha channel.
  * the valgrind errors are the same regardless of texImage vs. texSubImage
  * my above theory of the second kind of valgrind error occuring when comparing the result of readpixels() is confirmed.
So this is not our bug, it's OSMesa's.

Here's what's happening here. Referring to Mesa 7.8 sources. The uninitialized value at hand is created by fetch_texel() which is where a fragment shader samples a texture. Stepping through that function we arrive at this location:
   opt_sample_rgb_2d at swrast/s_texfilter.c:1366

Here's this function unabridged:

static void
opt_sample_rgb_2d(GLcontext *ctx,
                  const struct gl_texture_object *tObj,
                  GLuint n, const GLfloat texcoords[][4],
                  const GLfloat lambda[], GLfloat rgba[][4])
{
   const struct gl_texture_image *img = tObj->Image[0][tObj->BaseLevel];
   const GLfloat width = (GLfloat) img->Width;
   const GLfloat height = (GLfloat) img->Height;
   const GLint colMask = img->Width - 1;
   const GLint rowMask = img->Height - 1;
   const GLint shift = img->WidthLog2;
   GLuint k;
   (void) ctx;
   (void) lambda;
   ASSERT(tObj->WrapS==GL_REPEAT);
   ASSERT(tObj->WrapT==GL_REPEAT);
   ASSERT(img->Border==0);
   ASSERT(img->TexFormat == MESA_FORMAT_RGB888);
   ASSERT(img->_IsPowerOfTwo);

   for (k=0; k<n; k++) {
      GLint i = IFLOOR(texcoords[k][0] * width) & colMask;
      GLint j = IFLOOR(texcoords[k][1] * height) & rowMask;
      GLint pos = (j << shift) | i;
      GLubyte *texel = ((GLubyte *) img->Data) + 3*pos;
      rgba[k][RCOMP] = UBYTE_TO_FLOAT(texel[2]);
      rgba[k][GCOMP] = UBYTE_TO_FLOAT(texel[1]);
      rgba[k][BCOMP] = UBYTE_TO_FLOAT(texel[0]);
   }
}

So this is where our RGB8 texture is unpacked to float RGBA. The bug is that the A component is left uninitialized here. I also can't see where it'd be initialized elsewhere, so I'll jump to the conclusion that it is not initialized at all, as that fits well with what follows:

When we look at the location of the first reported valgrind error using that uninitialized value, _mesa_convert_colors (image.c:5421), the source code is:


   case GL_FLOAT:
      if (dstType == GL_UNSIGNED_BYTE) {
         const GLfloat (*src4)[4] = (const GLfloat (*)[4]) src;
         GLubyte (*dst1)[4] = (GLubyte (*)[4]) (useTemp ? tempBuffer : dst);
         GLuint i;
         for (i = 0; i < count; i++) {
            if (!mask || mask[i]) {
               UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][RCOMP], src4[i][RCOMP]);
               UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][GCOMP], src4[i][GCOMP]);
               UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][BCOMP], src4[i][BCOMP]);
               UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][ACOMP], src4[i][ACOMP]);
            }
         }
         if (useTemp)
            memcpy(dst, tempBuffer, count * 4 * sizeof(GLubyte));
      }


The error line 5421 is the one converting the alpha channel:
               UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][ACOMP], src4[i][ACOMP]);

So this first well with my theory: when sampling the RGB8 texture to float RGBA, OSMesa forgot to initialize the A component, and now it's trying to clamp it and convert it back to a 8-bit value, which involves branching:

#define UNCLAMPED_FLOAT_TO_UBYTE(UB, F)                                 \
        do {                                                            \
           fi_type __tmp;                                               \
           __tmp.f = (F);                                               \
           if (__tmp.i < 0)                                             \
              UB = (GLubyte) 0;                                         \
           else if (__tmp.i >= IEEE_0996)                               \
              UB = (GLubyte) 255;                                       \
           else {                                                       \
              __tmp.f = __tmp.f * (255.0F/256.0F) + 32768.0F;           \
              UB = (GLubyte) __tmp.i;                                   \
           }                                                            \
        } while (0)


Conclusion: OSMesa bug.

This was the last known valgrind error when running the WebGL conformance suite with either OSMesa or NVIDIA. Declaring WebGL valgrind-green!
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
The Mesa developers have accepted my patch, so it was indeed a Mesa bug and is fixed now (7.9+)
This is just a small wrapper around valgring to use as debugger for firefox -g:

valgrind --smc-check=all \
         --suppressions=nvidia.supp \
         --suppressions=firefox-startup.supp \
         --track-origins=yes \
         --error-limit=no \
         $* \
         2>&1 | tee valgrindlog

Attaching here for reference.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: