Closed Bug 582668 Opened 15 years ago Closed 14 years ago

gfxAlphaBoxBlur::Paint appears to pass garbage down through Cairo

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b4
Tracking Status
blocking2.0 --- final+

People

(Reporter: jseward, Assigned: jseward)

Details

(Keywords: valgrind)

Attachments

(1 file, 1 obsolete file)

This falls into the category of "stuff I'm suspicious about, but not complete sure about". All valgrinded startups of m-c (yesterday's sources) on MacOSX 10.6 produce the first complaint [1] shown below. It seems that uninitialised stuff is showing up deep in the innards of /System/Library/Frameworks/.../A/CoreGraphics, and that stuff was created by a heap allocation in gfxAlphaBoxBlur::Paint. At first I was inclined to dismiss this as general mutancy in CoreGraphics or Valgrind being unusually clueless in handling vectorised code. But then I noticed what looks like the same thing happening on a Linux build, running Mochitests, see [2] below. Of course the path down through Cairo soon diverges from the Mac path, as expected, but the place where the uninitialised value(s) come from is the same. The fact that the same complaint occurs for two different libraries increases my confidence that this isn't a false error from Valgrind. (not up to 100%, I should point out!) Not sure what a good line of investigation is. Any suggestions? [1]: Error as reported on MacOSX 10.6, 64-bit build, formatted for legibility Conditional jump or move depends on uninitialised value(s) at 0x106162C08: argb32_mark_constmask (in .../Versions/A/CoreGraphics) by 0x106158AFD: argb32_mark (in .../Versions/A/CoreGraphics) by 0x10EDA5D67: ripl_BltShape (in .../Versions/A/Resources/libRIP.A.dylib) by 0x10EDA2D85: ripc_Render (in .../Versions/A/Resources/libRIP.A.dylib) by 0x10EDA0316: ripc_DrawRects (in .../Versions/A/Resources/libRIP.A.dylib) by 0x106157780: CGContextFillRects (in .../Versions/A/CoreGraphics) by 0x10618DF59: CGContextFillRect (in .../Versions/A/CoreGraphics) by 0x100C556DB: _cairo_quartz_surface_paint (cairo-quartz-surface.c:2285) by 0x100C55B1B: _cairo_quartz_surface_mask_with_surface (cairo-quartz-surface.c:2809) by 0x100C56F5D: _cairo_quartz_surface_mask (cairo-quartz-surface.c:2900) by 0x100C4942C: _cairo_surface_mask (cairo-surface.c:2056) by 0x100C31D2D: _cairo_gstate_mask (cairo-gstate.c:1048) by 0x100C262A1: _moz_cairo_mask (cairo.c:2193) by 0x100C26329: _moz_cairo_mask_surface (cairo.c:2228) by 0x100B8FAED: gfxContext::Mask (gfxContext.cpp:742) by 0x100B8DCB4: gfxAlphaBoxBlur::Paint (gfxBlur.cpp:349) Uninitialised value was created by a heap allocation at 0x100010D65: malloc (vg_replace_malloc.c:236) by 0x102BD7AE0: moz_malloc (mozalloc.cpp:108) by 0x100B56537: NS_Alloc_P (nsMemoryImpl.cpp:279) by 0x100B126A4: nsTArray_base::EnsureCapacity (nsTArray.cpp:76) by 0x100B12741: nsTArray_base::InsertSlotsAt (nsTArray.cpp:185) by 0x10075E14B: nsTArray<unsigned char>::InsertElementsAt (nsTArray.h:848) by 0x100A04D63: nsTArray<unsigned char>::SetLength (nsTArray.h:808) by 0x100B8DD07: gfxAlphaBoxBlur::Paint (gfxBlur.cpp:318) by 0x10017BF59: nsContextBoxBlur::DoPaint (nsCSSRendering.cpp:3785) by 0x1001822C5: nsCSSRendering::PaintBoxShadowInner (nsCSSRendering.cpp:1435) by 0x10018D8A8: nsDisplayBoxShadowInner::Paint (nsDisplayList.cpp:1041) by 0x100164E1E: mozilla::FrameLayerBuilder::DrawThebesLayer (FrameLayerBuilder.cpp:1397) by 0x100BB9FBB: mozilla::layers::BasicThebesLayer::PaintBuffer (BasicLayers.cpp:329) by 0x100BB93A0: mozilla::layers::BasicThebesLayer::Paint (BasicLayers.cpp:410) by 0x100BB79B2: mozilla::layers::BasicLayerManager::PaintLayer (BasicLayers.cpp:1064) by 0x100BB7A17: mozilla::layers::BasicLayerManager::PaintLayer (BasicLayers.cpp:1072) [2]: Error as reported on Ubuntu 10.04, 64-bit build Conditional jump or move depends on uninitialised value(s) at 0x6422CE3: mmx_composite_over_n_8_8888 (pixman-mmx.c:1822) by 0x6414413: pixman_image_composite32 (pixman.c:397) by 0x63BA9C5: _cairo_image_surface_composite (cairo-image-surface.c:1153) by 0x63D0A56: _cairo_surface_composite (cairo-surface.c:1821) by 0x63D3979: _clip_and_composite (cairo-surface-fallback.c:197) by 0x63D3FED: _cairo_surface_fallback_mask (cairo-surface-fallback.c:1130) by 0x63D1B3B: _cairo_surface_mask (cairo-surface.c:2061) by 0x63B8135: _cairo_gstate_mask (cairo-gstate.c:1048) by 0x63B09B1: _moz_cairo_mask (cairo.c:2193) by 0x63B0A52: _moz_cairo_mask_surface (cairo.c:2228) by 0x633D0D4: gfxAlphaBoxBlur::Paint (gfxBlur.cpp:349) by 0x57032F6: nsContextBoxBlur::DoPaint (nsCSSRendering.cpp:3785) Uninitialised value was created by a heap allocation at 0x4C28528: malloc (vg_replace_malloc.c:236) by 0x62B46B4: NS_Alloc_P (nsMemoryImpl.cpp:279) by 0x626CD0F: nsTArray_base::EnsureCapacity (nsTArray.cpp:76) by 0x626CD8A: nsTArray_base::InsertSlotsAt (nsTArray.cpp:185) by 0x633D294: gfxAlphaBoxBlur::Paint (nsTArray.h:848) (note: bogus filename due to inlining) by 0x57032F6: nsContextBoxBlur::DoPaint (nsCSSRendering.cpp:3785) by 0x570A9FA: nsCSSRendering::PaintBoxShadowInner (nsCSSRendering.cpp:1435) by 0x571669D: nsDisplayBoxShadowInner::Paint (nsDisplayList.cpp:1041) by 0x56E76C2: mozilla::FrameLayerBuilder::DrawThebesLayer (FrameLayerBuilder.cpp:1397) by 0x637626B: mozilla::layers::BasicThebesLayer::PaintBuffer (BasicLayers.cpp:329) by 0x63734B6: mozilla::layers::BasicThebesLayer::Paint (BasicLayers.cpp:410) by 0x6370C13: mozilla::layers::BasicLayerManager::PaintLayer (BasicLayers.cpp:1064)
So it looks like some uninitialized data is leaking into the image data from tempAlphaDataBuf. This code seems to ping-pong data between boxData and tmpData. We should be able to verify that tmpData is completely defined after the first call to BoxBlurHorizontal. However, as I read the code it looks it could use completely uninitialized data if mBlurRadius.width = 0. It would be interesting to add assert(mBlurRadius.width > 0)
> ... ping-pong data between boxData and tmpData ... It does look like there's a problem in the case (mBlurRadius.width == 0 && mBlurRadius.height > 0), since we would then start the first BoxBlurVertical call using a completely undefined tmpData. However, that's not the problem here: both .width and .height are 5. > We should be able to verify that tmpData is completely defined > after the first call to BoxBlurHorizontal. I could trivially do that, using VALGRIND_CHECK_MEMORY_IS_DEFINED, if only I could figure out what section(s) of tmpData BlurBoxHorizontal is supposed to write to. I am now wondering if the problem occurs because BlurBoxHorizontal doesn't write to as much of tmpData as it is supposed to, or at least, to not as much of it as the innards of CoreGraphics expect it to have written.
(In reply to comment #1) An initial memset-zero of the contents of tempAlphaDataBuf removes the complaints and does not appear to have any bad effects. > So it looks like some uninitialized data is leaking into the image data from > tempAlphaDataBuf. I agree. Current theory is: in gfxAlphaBoxBlur::Paint, in the case where we have to deal with blurring: 1. a temp array is set up. Comments on nsTArray::SetLength claim the new area is initialised by the default constructor of the element type, but because that is POD this doesn't happen. Hence the contents remain uninitialised. nsTArray<unsigned char> tempAlphaDataBuf; if (!tempAlphaDataBuf.SetLength(sz)) return; // OOM 2. BlurBoxHorizontal/Vertical purportedly fill in tempAlphaDataBuf from the contents of boxData (in a roundabout way). Except, because there's a mSkipRect involved, they don't fill in all of it, only some. 3. Resulting partial garbage ends up back in mImageSurface->Data() aka boxData, and is passed down to Cairo: aDestinationCtx->Mask(mImageSurface, offset);
Attached patch proposed fix (obsolete) — Splinter Review
Fixes both problems discussed above: * as per comment 3, zeroes out tmpData before use * as per comments 1 and 2, makes the cases mBlurRadius.width > 0 and mBlurRadius.height > 0 independent, by increasing the ping-pong length to an even number. This avoids the failure modes - mBlurRadius.width > 0 && mBlurRadius.height == 0, in which case the results of the 3rd call to BoxBlurHorizontal would have been ignored - mBlurRadius.width == 0 && mBlurRadius.height > 0, in which case the chain of calls to BoxBlurVertical would have been started off with completely uninitialised data
Verified that the patch gets rid of Valgrind's complaints on both OSX and Linux.
Attachment #461495 - Flags: review?(jmuizelaar)
Comment on attachment 461495 [details] [diff] [review] proposed fix I don't really like the potential performance impact of this change. We should know why BoxBlurHorizontal is not setting the data before we add the memset and it would also be nice if we could avoid doing the memcpys.
Attachment #461495 - Flags: review?(jmuizelaar) → review-
(In reply to comment #1) > However, as I read the code it looks it could use completely uninitialized data > if mBlurRadius.width = 0. It would be interesting to add > assert(mBlurRadius.width > 0) That can't be right, the code isn't even hit if the blur radius is 0. I suspect we're not setting the parts of the shadow that will go behind the box that is being shadowed, because those pixels won't get displayed anyway. IIRC that is what happens in BoxBlurHorizontal/Vertical.
Well, I fixed the bug where we doing the wrong memcopies if the blur radius is zero (which sometimes happens here after you apply all the patches in bug 581222). What Michael said in comment #7 is true too. We do leave part of the buffer untouched, because the caller told us those pixels aren't relevant. But those pixels shouldn't be uninitialized, they should be simply untouched (and should have been initialized by the caller at some point).
(In reply to comment #9) > But those pixels shouldn't be uninitialized, they should be simply > untouched (and should have been initialized by the caller at some point). Right. But my theory is, they aren't, because this appears to initialise them nsTArray<unsigned char> tempAlphaDataBuf; if (!tempAlphaDataBuf.SetLength(mImageSurface->GetDataSize())) return; // OOM but actually doesn't, because .SetLength initialises the new area of the array using the default constructor for the element type. Since that's unsigned char, there is no default constructor so that zeroing never happens.
Ah, right. However, the uninitialized parts of the mask should not be visible, it should be entirely in the skip-rect. I think it's worth silencing these warnings by memsetting the temporary buffer to zero at the beginning. We don't need the other changes in the patch, though.
Unrelated to this bug, but you'll probably want to find out why you are hitting mmx_composite_over_n_8_8888(). Normally, you should be hitting sse2_composite_over_n_8_8888(), and certainly on an x86-64 build.
blocking2.0: --- → ?
blocking2.0: ? → final+
Attachment #461495 - Attachment is obsolete: true
Attachment #465183 - Flags: review?(roc)
Assignee: nobody → jseward
Status: NEW → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
OS: Mac OS X → All
Hardware: x86_64 → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b4
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: