Closed
Bug 588918
Opened 15 years ago
Closed 15 years ago
Make WebGL test suite pass without Valgrind errors
Categories
(Core :: Graphics: CanvasWebGL, defect)
Core
Graphics: CanvasWebGL
Tracking
()
RESOLVED
FIXED
| Tracking | Status | |
|---|---|---|
| blocking2.0 | --- | final+ |
People
(Reporter: bjacob, Assigned: bjacob)
References
Details
Attachments
(7 files, 1 obsolete file)
Let's make the WebGL test suite not generate Valgrind errors.
Attachment #467502 -
Flags: review?(vladimir)
| Assignee | ||
Updated•15 years ago
|
blocking2.0: --- → ?
Comment on attachment 467502 [details] [diff] [review]
fix many valgrind uninitialized value errors
Um, I think bugzilla ate my comments :(
Basically, looks great, but just happens to be ugly style-wise. I think it would help if you just added a newline after every *retval = ..; at the start. Sorry to be nit picky, just stuff like:
> WebGLContext::GetActiveAttrib(nsIWebGLProgram *pobj, PRUint32 index, nsIWebGLActiveInfo **retval)
> {
>+ *retval = nsnull;
> WebGLuint progname;
> if (!GetGLName<WebGLProgram>("getActiveAttrib: program", pobj, &progname))
> return NS_OK;
looks pretty weird and ugly, because the retval assignment has nothing to do with the GetGLName chunk.
Updated•15 years ago
|
blocking2.0: ? → final+
| Assignee | ||
Comment 2•15 years ago
|
||
ok, this version has the blank lines.
Assignee: nobody → bjacob
Attachment #467502 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #468423 -
Flags: review?(vladimir)
Attachment #467502 -
Flags: review?(vladimir)
Comment on attachment 468423 [details] [diff] [review]
fix many valgrind uninitialized value errors, updated
That works, thanks!
Attachment #468423 -
Flags: review?(vladimir) → review+
| Assignee | ||
Comment 4•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 5•15 years ago
|
||
argh, i'm stupid to have marked this as fixed, this patch only fixes part of this.
let's keep this bug open until all valgrind errors are fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 6•15 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 7•15 years ago
|
||
arf, closed by mistake.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 8•15 years ago
|
||
OK, the next big chunk is renderbufferStorage failing to initialize buffers, I'm working on it but it's taking much longer than expected as I just realized that a significant part of the spec was just plain unimplemented and am taking care of this now: see bug 607774.
| Assignee | ||
Comment 9•15 years ago
|
||
| Assignee | ||
Comment 10•15 years ago
|
||
| Assignee | ||
Comment 11•15 years ago
|
||
With the attached valgrind suppression files, with NVIDIA driver 195.36 on linux x86-64, and with the patch (attachment 491513 [details] [diff] [review]) in bug 613020 working around a NVIDIA bug, the whole WebGL conformance suite, webgl-conformance-tests.html, runs without any other valgrind error than this JS GC-related error:
==23185== Invalid read of size 8
==23185== at 0x6C76F36: js::MarkRangeConservatively(JSTracer*, unsigned long*, unsigned long*) (jsgc.cpp:711)
==23185== by 0x6C76FFA: js::MarkThreadDataConservatively(JSTracer*, JSThreadData*) (jsgc.cpp:728)
==23185== by 0x6C7711E: js::MarkConservativeStackRoots(JSTracer*) (jsgc.cpp:761)
==23185== by 0x6C78A5B: js::MarkRuntime(JSTracer*) (jsgc.cpp:1620)
==23185== by 0x6C79E38: MarkAndSweep(JSContext*, JSGCInvocationKind) (jsgc.cpp:2171)
==23185== by 0x6C7A832: GCUntilDone(JSContext*, JSGCInvocationKind) (jsgc.cpp:2515)
==23185== by 0x6C7A9D0: js_GC(JSContext*, JSGCInvocationKind) (jsgc.cpp:2580)
==23185== by 0x6BF2BF0: JS_GC (jsapi.cpp:2513)
==23185== by 0x61EA884: nsXPConnect::Collect() (nsXPConnect.cpp:405)
==23185== by 0x61EA8E2: nsXPConnect::GarbageCollect() (nsXPConnect.cpp:413)
==23185== by 0x5C98CFE: nsJSContext::CC(nsICycleCollectorListener*) (nsJSEnvironment.cpp:3628)
==23185== by 0x5C98F0B: nsJSContext::IntervalCC() (nsJSEnvironment.cpp:3733)
==23185== Address 0x7feff26f0 is not stack'd, malloc'd or (recently) free'd
| Assignee | ||
Comment 12•15 years ago
|
||
(Note: I ran webgl-conformance-tests.html instead of the mochitest because I was too lazy to find back how to valgrind a mochitest which involves running runtests.py with extra arguments)
| Assignee | ||
Comment 13•15 years ago
|
||
i also had to apply this patch to prevent tests from timing out when run in valgrind (replaces 3s by 300s)
| Assignee | ||
Comment 14•15 years ago
|
||
As mentioned in bug 613020 comment 12, adding the --smc-check=all option to valgrind makes the test suite pass without any error or crash on NVIDIA (still with the suppressions list from above).
| Assignee | ||
Comment 15•15 years ago
|
||
This was recorded running the conformance suite with OSMesa, with this command:
valgrind --smc-check=all --suppressions=nvidia.supp --suppressions=firefox-startup.supp --track-origins=yes --error-limit=no
| Assignee | ||
Comment 16•15 years ago
|
||
So let's examine the valgrind errors with OSMesa (just attached log).
First we have the JS GC error already reported in comment 11.
We then have these errors in conformance/tex-image-with-format-and-type.html:
==14218== Conditional jump or move depends on uninitialised value(s)
==14218== at 0x2111134F: _mesa_convert_colors (image.c:5421)
==14218== by 0x21176CBA: convert_color_type (s_span.c:941)
==14218== by 0x211784B4: _swrast_write_rgba_span (s_span.c:1226)
==14218== by 0x21187818: general_triangle (s_tritemp.h:819)
==14218== by 0x21153E6C: _tnl_render_triangles_verts (t_vb_rendertmp.h:184)
==14218== by 0x211554B1: run_render (t_vb_render.c:320)
==14218== by 0x2114D5F6: _tnl_run_pipeline (t_pipeline.c:153)
==14218== by 0x2114E262: _tnl_draw_prims (t_draw.c:478)
==14218== by 0x2114E340: _tnl_vbo_draw_prims (t_draw.c:384)
==14218== by 0x21146FB0: vbo_exec_DrawArrays (vbo_exec_array.c:524)
==14218== by 0x5A81AC8: mozilla::gl::GLContext::fDrawArrays(unsigned int, int, int) (GLContext.h:1185)
==14218== by 0x5A670B0: mozilla::WebGLContext::DrawArrays(unsigned int, int, int) (WebGLContextGL.cpp:1075)
==14218== Uninitialised value was created by a stack allocation
==14218== at 0x211699D8: fetch_texel (prog_execute.c:373)
This error is a priori worrying because it might mean that we are drawing with uninitialized texture data.
But the next error is related to it:
==14218== Conditional jump or move depends on uninitialised value(s)
==14218== at 0x6C43B16: js::CompartmentChecker::check(js::Value const&) (jscntxtinlines.h:561)
==14218== by 0x6CD9C10: void js::assertSameCompartment<js::Value>(JSContext*, js::Value) (jscn
txtinlines.h:624)
==14218== by 0x6F030D2: js::Interpret(JSContext*, JSStackFrame*, unsigned int, JSInterpMode) (
jsinterp.cpp:4555)
==14218== by 0x6CD5229: js::RunScript(JSContext*, JSScript*, JSStackFrame*) (jsinterp.cpp:657)
==14218== by 0x6CD56F4: js::Invoke(JSContext*, js::CallArgs const&, unsigned int) (jsinterp.cp
p:737)
==14218== by 0x6CD5DA8: js::ExternalInvoke(JSContext*, js::Value const&, js::Value const&, uns
igned int, js::Value*, js::Value*) (jsinterp.cpp:862)
==14218== by 0x6C1E6F9: js::ExternalInvoke(JSContext*, JSObject*, js::Value const&, unsigned i
nt, js::Value*, js::Value*) (jsinterp.h:954)
==14218== by 0x6C3BA40: JS_CallFunctionValue (jsapi.cpp:4973)
==14218== by 0x5CAD984: nsJSContext::CallEventHandler(nsISupports*, void*, void*, nsIArray*, n
sIVariant**) (nsJSEnvironment.cpp:2177)
==14218== by 0x5D41847: nsJSEventListener::HandleEvent(nsIDOMEvent*) (nsJSEventListener.cpp:22
8)
==14218== by 0x5A92B28: nsEventListenerManager::HandleEventSubType(nsListenerStruct*, nsIDOMEv
entListener*, nsIDOMEvent*, nsPIDOMEventTarget*, unsigned int, nsCxPusher*) (nsEventListenerManag
er.cpp:1114)
==14218== by 0x5A92FE7: nsEventListenerManager::HandleEventInternal(nsPresContext*, nsEvent*,
nsIDOMEvent**, nsPIDOMEventTarget*, unsigned int, nsEventStatus*, nsCxPusher*) (nsEventListenerMa
nager.cpp:1210)
==14218== Uninitialised value was created by a stack allocation
==14218== at 0x211699D8: fetch_texel (prog_execute.c:373)
What's funny here is that the uninitialized data was created by the same fetch_texel code, so it's probably the same; and this time, it's used by the Javascript engine.
I looked at fetch_texel in Mesa sources and it's really just doing that, fetching a texel from a texture unit while executing a fragment shader.
Since this test is drawing with a texture and then doing a readPixels() on the result to compare, that makes sense, the first error would be when drawing and the second when comparing the result. So it really seems we're drawing with uninitialized texture.
This needs further investigation: try reducing tex-image-with-format-and-type.html to a minimal test case...
There are no other valgrind errors, other than repetitions of the above.
| Assignee | ||
Comment 17•15 years ago
|
||
I reduced tex-image-with-format-and-type.html to a smaller test case. It needs to be put in the same directory in order to work.
What I currently know:
* the valgrind errors only occur with the RGB8 format, not in any other format (RGBA8, 16-bit formats, luminance/alpha...) so it seems like a bug with uninitialized alpha channel.
* the valgrind errors are the same regardless of texImage vs. texSubImage
* my above theory of the second kind of valgrind error occuring when comparing the result of readpixels() is confirmed.
| Assignee | ||
Comment 18•15 years ago
|
||
So this is not our bug, it's OSMesa's.
Here's what's happening here. Referring to Mesa 7.8 sources. The uninitialized value at hand is created by fetch_texel() which is where a fragment shader samples a texture. Stepping through that function we arrive at this location:
opt_sample_rgb_2d at swrast/s_texfilter.c:1366
Here's this function unabridged:
static void
opt_sample_rgb_2d(GLcontext *ctx,
const struct gl_texture_object *tObj,
GLuint n, const GLfloat texcoords[][4],
const GLfloat lambda[], GLfloat rgba[][4])
{
const struct gl_texture_image *img = tObj->Image[0][tObj->BaseLevel];
const GLfloat width = (GLfloat) img->Width;
const GLfloat height = (GLfloat) img->Height;
const GLint colMask = img->Width - 1;
const GLint rowMask = img->Height - 1;
const GLint shift = img->WidthLog2;
GLuint k;
(void) ctx;
(void) lambda;
ASSERT(tObj->WrapS==GL_REPEAT);
ASSERT(tObj->WrapT==GL_REPEAT);
ASSERT(img->Border==0);
ASSERT(img->TexFormat == MESA_FORMAT_RGB888);
ASSERT(img->_IsPowerOfTwo);
for (k=0; k<n; k++) {
GLint i = IFLOOR(texcoords[k][0] * width) & colMask;
GLint j = IFLOOR(texcoords[k][1] * height) & rowMask;
GLint pos = (j << shift) | i;
GLubyte *texel = ((GLubyte *) img->Data) + 3*pos;
rgba[k][RCOMP] = UBYTE_TO_FLOAT(texel[2]);
rgba[k][GCOMP] = UBYTE_TO_FLOAT(texel[1]);
rgba[k][BCOMP] = UBYTE_TO_FLOAT(texel[0]);
}
}
So this is where our RGB8 texture is unpacked to float RGBA. The bug is that the A component is left uninitialized here. I also can't see where it'd be initialized elsewhere, so I'll jump to the conclusion that it is not initialized at all, as that fits well with what follows:
When we look at the location of the first reported valgrind error using that uninitialized value, _mesa_convert_colors (image.c:5421), the source code is:
case GL_FLOAT:
if (dstType == GL_UNSIGNED_BYTE) {
const GLfloat (*src4)[4] = (const GLfloat (*)[4]) src;
GLubyte (*dst1)[4] = (GLubyte (*)[4]) (useTemp ? tempBuffer : dst);
GLuint i;
for (i = 0; i < count; i++) {
if (!mask || mask[i]) {
UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][RCOMP], src4[i][RCOMP]);
UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][GCOMP], src4[i][GCOMP]);
UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][BCOMP], src4[i][BCOMP]);
UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][ACOMP], src4[i][ACOMP]);
}
}
if (useTemp)
memcpy(dst, tempBuffer, count * 4 * sizeof(GLubyte));
}
The error line 5421 is the one converting the alpha channel:
UNCLAMPED_FLOAT_TO_UBYTE(dst1[i][ACOMP], src4[i][ACOMP]);
So this first well with my theory: when sampling the RGB8 texture to float RGBA, OSMesa forgot to initialize the A component, and now it's trying to clamp it and convert it back to a 8-bit value, which involves branching:
#define UNCLAMPED_FLOAT_TO_UBYTE(UB, F) \
do { \
fi_type __tmp; \
__tmp.f = (F); \
if (__tmp.i < 0) \
UB = (GLubyte) 0; \
else if (__tmp.i >= IEEE_0996) \
UB = (GLubyte) 255; \
else { \
__tmp.f = __tmp.f * (255.0F/256.0F) + 32768.0F; \
UB = (GLubyte) __tmp.i; \
} \
} while (0)
Conclusion: OSMesa bug.
This was the last known valgrind error when running the WebGL conformance suite with either OSMesa or NVIDIA. Declaring WebGL valgrind-green!
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 19•15 years ago
|
||
Reported OSMesa bug:
https://bugs.freedesktop.org/show_bug.cgi?id=31837
| Assignee | ||
Comment 20•15 years ago
|
||
The Mesa developers have accepted my patch, so it was indeed a Mesa bug and is fixed now (7.9+)
| Assignee | ||
Comment 21•14 years ago
|
||
This is just a small wrapper around valgring to use as debugger for firefox -g:
valgrind --smc-check=all \
--suppressions=nvidia.supp \
--suppressions=firefox-startup.supp \
--track-origins=yes \
--error-limit=no \
$* \
2>&1 | tee valgrindlog
Attaching here for reference.
You need to log in
before you can comment on or make changes to this bug.
Description
•