Closed
Bug 582668
Opened 15 years ago
Closed 14 years ago
gfxAlphaBoxBlur::Paint appears to pass garbage down through Cairo
Categories
(Core :: Graphics, defect)
Core
Graphics
Tracking
()
RESOLVED
FIXED
mozilla2.0b4
Tracking | Status | |
---|---|---|
blocking2.0 | --- | final+ |
People
(Reporter: jseward, Assigned: jseward)
Details
(Keywords: valgrind)
Attachments
(1 file, 1 obsolete file)
1.42 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
This falls into the category of "stuff I'm suspicious about, but
not complete sure about".
All valgrinded startups of m-c (yesterday's sources) on MacOSX 10.6
produce the first complaint [1] shown below. It seems that
uninitialised stuff is showing up deep in the innards of
/System/Library/Frameworks/.../A/CoreGraphics, and that stuff was
created by a heap allocation in gfxAlphaBoxBlur::Paint.
At first I was inclined to dismiss this as general mutancy in
CoreGraphics or Valgrind being unusually clueless in handling
vectorised code. But then I noticed what looks like the same thing
happening on a Linux build, running Mochitests, see [2] below. Of
course the path down through Cairo soon diverges from the Mac path, as
expected, but the place where the uninitialised value(s) come from
is the same.
The fact that the same complaint occurs for two different libraries
increases my confidence that this isn't a false error from Valgrind.
(not up to 100%, I should point out!)
Not sure what a good line of investigation is. Any suggestions?
[1]: Error as reported on MacOSX 10.6, 64-bit build, formatted
for legibility
Conditional jump or move depends on uninitialised value(s)
at 0x106162C08: argb32_mark_constmask (in .../Versions/A/CoreGraphics)
by 0x106158AFD: argb32_mark (in .../Versions/A/CoreGraphics)
by 0x10EDA5D67: ripl_BltShape (in .../Versions/A/Resources/libRIP.A.dylib)
by 0x10EDA2D85: ripc_Render (in .../Versions/A/Resources/libRIP.A.dylib)
by 0x10EDA0316: ripc_DrawRects (in .../Versions/A/Resources/libRIP.A.dylib)
by 0x106157780: CGContextFillRects (in .../Versions/A/CoreGraphics)
by 0x10618DF59: CGContextFillRect (in .../Versions/A/CoreGraphics)
by 0x100C556DB: _cairo_quartz_surface_paint (cairo-quartz-surface.c:2285)
by 0x100C55B1B: _cairo_quartz_surface_mask_with_surface
(cairo-quartz-surface.c:2809)
by 0x100C56F5D: _cairo_quartz_surface_mask (cairo-quartz-surface.c:2900)
by 0x100C4942C: _cairo_surface_mask (cairo-surface.c:2056)
by 0x100C31D2D: _cairo_gstate_mask (cairo-gstate.c:1048)
by 0x100C262A1: _moz_cairo_mask (cairo.c:2193)
by 0x100C26329: _moz_cairo_mask_surface (cairo.c:2228)
by 0x100B8FAED: gfxContext::Mask (gfxContext.cpp:742)
by 0x100B8DCB4: gfxAlphaBoxBlur::Paint (gfxBlur.cpp:349)
Uninitialised value was created by a heap allocation
at 0x100010D65: malloc (vg_replace_malloc.c:236)
by 0x102BD7AE0: moz_malloc (mozalloc.cpp:108)
by 0x100B56537: NS_Alloc_P (nsMemoryImpl.cpp:279)
by 0x100B126A4: nsTArray_base::EnsureCapacity (nsTArray.cpp:76)
by 0x100B12741: nsTArray_base::InsertSlotsAt (nsTArray.cpp:185)
by 0x10075E14B: nsTArray<unsigned char>::InsertElementsAt (nsTArray.h:848)
by 0x100A04D63: nsTArray<unsigned char>::SetLength (nsTArray.h:808)
by 0x100B8DD07: gfxAlphaBoxBlur::Paint (gfxBlur.cpp:318)
by 0x10017BF59: nsContextBoxBlur::DoPaint (nsCSSRendering.cpp:3785)
by 0x1001822C5: nsCSSRendering::PaintBoxShadowInner (nsCSSRendering.cpp:1435)
by 0x10018D8A8: nsDisplayBoxShadowInner::Paint (nsDisplayList.cpp:1041)
by 0x100164E1E: mozilla::FrameLayerBuilder::DrawThebesLayer
(FrameLayerBuilder.cpp:1397)
by 0x100BB9FBB: mozilla::layers::BasicThebesLayer::PaintBuffer
(BasicLayers.cpp:329)
by 0x100BB93A0: mozilla::layers::BasicThebesLayer::Paint
(BasicLayers.cpp:410)
by 0x100BB79B2: mozilla::layers::BasicLayerManager::PaintLayer
(BasicLayers.cpp:1064)
by 0x100BB7A17: mozilla::layers::BasicLayerManager::PaintLayer
(BasicLayers.cpp:1072)
[2]: Error as reported on Ubuntu 10.04, 64-bit build
Conditional jump or move depends on uninitialised value(s)
at 0x6422CE3: mmx_composite_over_n_8_8888 (pixman-mmx.c:1822)
by 0x6414413: pixman_image_composite32 (pixman.c:397)
by 0x63BA9C5: _cairo_image_surface_composite (cairo-image-surface.c:1153)
by 0x63D0A56: _cairo_surface_composite (cairo-surface.c:1821)
by 0x63D3979: _clip_and_composite (cairo-surface-fallback.c:197)
by 0x63D3FED: _cairo_surface_fallback_mask (cairo-surface-fallback.c:1130)
by 0x63D1B3B: _cairo_surface_mask (cairo-surface.c:2061)
by 0x63B8135: _cairo_gstate_mask (cairo-gstate.c:1048)
by 0x63B09B1: _moz_cairo_mask (cairo.c:2193)
by 0x63B0A52: _moz_cairo_mask_surface (cairo.c:2228)
by 0x633D0D4: gfxAlphaBoxBlur::Paint (gfxBlur.cpp:349)
by 0x57032F6: nsContextBoxBlur::DoPaint (nsCSSRendering.cpp:3785)
Uninitialised value was created by a heap allocation
at 0x4C28528: malloc (vg_replace_malloc.c:236)
by 0x62B46B4: NS_Alloc_P (nsMemoryImpl.cpp:279)
by 0x626CD0F: nsTArray_base::EnsureCapacity (nsTArray.cpp:76)
by 0x626CD8A: nsTArray_base::InsertSlotsAt (nsTArray.cpp:185)
by 0x633D294: gfxAlphaBoxBlur::Paint (nsTArray.h:848)
(note: bogus filename due to inlining)
by 0x57032F6: nsContextBoxBlur::DoPaint (nsCSSRendering.cpp:3785)
by 0x570A9FA: nsCSSRendering::PaintBoxShadowInner (nsCSSRendering.cpp:1435)
by 0x571669D: nsDisplayBoxShadowInner::Paint (nsDisplayList.cpp:1041)
by 0x56E76C2: mozilla::FrameLayerBuilder::DrawThebesLayer
(FrameLayerBuilder.cpp:1397)
by 0x637626B: mozilla::layers::BasicThebesLayer::PaintBuffer
(BasicLayers.cpp:329)
by 0x63734B6: mozilla::layers::BasicThebesLayer::Paint
(BasicLayers.cpp:410)
by 0x6370C13: mozilla::layers::BasicLayerManager::PaintLayer
(BasicLayers.cpp:1064)
Comment 1•15 years ago
|
||
So it looks like some uninitialized data is leaking into the image data from tempAlphaDataBuf.
This code seems to ping-pong data between boxData and tmpData. We should be able to verify that tmpData is completely defined after the first call to BoxBlurHorizontal.
However, as I read the code it looks it could use completely uninitialized data if mBlurRadius.width = 0. It would be interesting to add assert(mBlurRadius.width > 0)
Assignee | ||
Comment 2•15 years ago
|
||
> ... ping-pong data between boxData and tmpData ...
It does look like there's a problem in the case (mBlurRadius.width == 0
&& mBlurRadius.height > 0), since we would then start the first
BoxBlurVertical call using a completely undefined tmpData.
However, that's not the problem here: both .width and .height are 5.
> We should be able to verify that tmpData is completely defined
> after the first call to BoxBlurHorizontal.
I could trivially do that, using VALGRIND_CHECK_MEMORY_IS_DEFINED,
if only I could figure out what section(s) of tmpData BlurBoxHorizontal
is supposed to write to. I am now wondering if the problem
occurs because BlurBoxHorizontal doesn't write to as much of
tmpData as it is supposed to, or at least, to not as much of it as
the innards of CoreGraphics expect it to have written.
Assignee | ||
Comment 3•15 years ago
|
||
(In reply to comment #1)
An initial memset-zero of the contents of tempAlphaDataBuf removes the
complaints and does not appear to have any bad effects.
> So it looks like some uninitialized data is leaking into the image data from
> tempAlphaDataBuf.
I agree. Current theory is: in gfxAlphaBoxBlur::Paint, in
the case where we have to deal with blurring:
1. a temp array is set up. Comments on nsTArray::SetLength claim
the new area is initialised by the default constructor of the
element type, but because that is POD this doesn't happen. Hence
the contents remain uninitialised.
nsTArray<unsigned char> tempAlphaDataBuf;
if (!tempAlphaDataBuf.SetLength(sz))
return; // OOM
2. BlurBoxHorizontal/Vertical purportedly fill in tempAlphaDataBuf
from the contents of boxData (in a roundabout way). Except,
because there's a mSkipRect involved, they don't fill in all
of it, only some.
3. Resulting partial garbage ends up back in mImageSurface->Data()
aka boxData, and is passed down to Cairo:
aDestinationCtx->Mask(mImageSurface, offset);
Assignee | ||
Comment 4•15 years ago
|
||
Fixes both problems discussed above:
* as per comment 3, zeroes out tmpData before use
* as per comments 1 and 2,
makes the cases mBlurRadius.width > 0 and mBlurRadius.height > 0
independent, by increasing the ping-pong length to an even number.
This avoids the failure modes
- mBlurRadius.width > 0 && mBlurRadius.height == 0, in which case
the results of the 3rd call to BoxBlurHorizontal would have been
ignored
- mBlurRadius.width == 0 && mBlurRadius.height > 0, in which case
the chain of calls to BoxBlurVertical would have been started off
with completely uninitialised data
Assignee | ||
Comment 5•15 years ago
|
||
Verified that the patch gets rid of Valgrind's complaints on
both OSX and Linux.
Assignee | ||
Updated•15 years ago
|
Attachment #461495 -
Flags: review?(jmuizelaar)
Comment 6•14 years ago
|
||
Comment on attachment 461495 [details] [diff] [review]
proposed fix
I don't really like the potential performance impact of this change. We should know why BoxBlurHorizontal is not setting the data before we add the memset and it would also be nice if we could avoid doing the memcpys.
Attachment #461495 -
Flags: review?(jmuizelaar) → review-
Comment 7•14 years ago
|
||
(In reply to comment #1)
> However, as I read the code it looks it could use completely uninitialized data
> if mBlurRadius.width = 0. It would be interesting to add
> assert(mBlurRadius.width > 0)
That can't be right, the code isn't even hit if the blur radius is 0.
I suspect we're not setting the parts of the shadow that will go behind the box that is being shadowed, because those pixels won't get displayed anyway.
IIRC that is what happens in BoxBlurHorizontal/Vertical.
I fixed this bug in https://bugzilla.mozilla.org/attachment.cgi?id=460825.
Well, I fixed the bug where we doing the wrong memcopies if the blur radius is zero (which sometimes happens here after you apply all the patches in bug 581222).
What Michael said in comment #7 is true too. We do leave part of the buffer untouched, because the caller told us those pixels aren't relevant. But those pixels shouldn't be uninitialized, they should be simply untouched (and should have been initialized by the caller at some point).
Assignee | ||
Comment 10•14 years ago
|
||
(In reply to comment #9)
> But those pixels shouldn't be uninitialized, they should be simply
> untouched (and should have been initialized by the caller at some point).
Right. But my theory is, they aren't, because this appears to initialise them
nsTArray<unsigned char> tempAlphaDataBuf;
if (!tempAlphaDataBuf.SetLength(mImageSurface->GetDataSize()))
return; // OOM
but actually doesn't, because .SetLength initialises the new area of the
array using the default constructor for the element type. Since that's
unsigned char, there is no default constructor so that zeroing never
happens.
Ah, right. However, the uninitialized parts of the mask should not be visible, it should be entirely in the skip-rect.
I think it's worth silencing these warnings by memsetting the temporary buffer to zero at the beginning. We don't need the other changes in the patch, though.
Comment 12•14 years ago
|
||
Unrelated to this bug, but you'll probably want to find out why you are hitting mmx_composite_over_n_8_8888(). Normally, you should be hitting sse2_composite_over_n_8_8888(), and certainly on an x86-64 build.
Updated•14 years ago
|
blocking2.0: --- → ?
Updated•14 years ago
|
blocking2.0: ? → final+
Assignee | ||
Comment 13•14 years ago
|
||
Attachment #461495 -
Attachment is obsolete: true
Attachment #465183 -
Flags: review?(roc)
Attachment #465183 -
Flags: review?(roc) → review+
Assignee | ||
Updated•14 years ago
|
Keywords: checkin-needed,
valgrind
Updated•14 years ago
|
Assignee: nobody → jseward
Comment 14•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
OS: Mac OS X → All
Hardware: x86_64 → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b4
You need to log in
before you can comment on or make changes to this bug.
Description
•