Closed Bug 587610 Opened 14 years ago Closed 14 years ago

Invalid free on TEST_PATH=content/media/test/test_access_control.html

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jseward, Unassigned)

References

Details

(Whiteboard: [sg:critical?][critsmash:investigating])

mozilla-central 50664:68b886f9b3c3 (Mon Aug 16 18:01:01 2010 +0900)

on x86_64-linux

mozconfig:

  . $topsrcdir/browser/config/mozconfig
  mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/ff-opt
  ac_add_options --enable-tests
  ac_add_options --enable-debug-symbols=yes
  ac_add_options --enable-optimize="-g -O -freorder-blocks"
  ac_add_options --disable-jemalloc
  ac_add_options --enable-valgrind
  mk_add_options MOZ_MAKE_FLAGS="-j4"

TEST_PATH=content/media/test/test_access_control.html DISPLAY=:3.0 \
   make -C ff-opt mochitest-plain

where :3.0 is a 1024x768 16-bit vnc server

leads to glibc detecting an invalid free, and aborting Fx.


This has been going on for some days now -- it's not brand-new.


--------------------------------------------------------------------------

*** glibc detected *** /space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/firefox-bin: munmap_chunk(): invalid pointer: 0x00002ae5e42700f0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x775b6)[0x2ae5d696b5b6]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(+0x14a80bd)[0x2ae5d1d6e0bd]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(+0x14534ce)[0x2ae5d1d194ce]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(+0x1467e1e)[0x2ae5d1d2de1e]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(+0x1467e9c)[0x2ae5d1d2de9c]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(_ZN11gfxASurface7ReleaseEv+0x5e)[0x2ae5d1b47d72]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(_ZN20gfxCachedTempSurfaceD1Ev+0x1bc)[0x2ae5d1b49268]
/space2/sewardj/MOZ/MC-16-08-2010/ff-opt/dist/bin/libxul.so(_ZN7mozilla6layers17BasicLayerManagerD0Ev+0x4b)[0x2ae5d1b7fcf1]

--------------------------------------------------------------------------

GDB says:

(gdb) where
#0  0x00002aaab0f4ca75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00002aaab0f505c0 in *__GI_abort () at abort.c:92
#2  0x00002aaab0f864fb in __libc_message (do_abort=<value optimized out>, fmt=<value optimized out>) at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3  0x00002aaab0f905b6 in malloc_printerr (action=3, str=0x2aaab1063b58 "munmap_chunk(): invalid pointer", ptr=<value optimized out>) at malloc.c:6264
#4  0x00002aaaac3930bd in _moz_pixman_image_unref (image=0x1ce04e0) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/cairo/libpixman/src/pixman-image.c:214
#5  0x00002aaaac33e4ce in _cairo_image_surface_finish (abstract_surface=<value optimized out>) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/cairo/cairo/src/cairo-image-surface.c:795
#6  0x00002aaaac352e1e in *INT__moz_cairo_surface_finish (surface=0x1ce0620) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/cairo/cairo/src/cairo-surface.c:649
#7  0x00002aaaac352e9c in *INT__moz_cairo_surface_destroy (surface=0x1ce0620) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/cairo/cairo/src/cairo-surface.c:581
#8  0x00002aaaac16cd72 in gfxASurface::Release (this=0x1ce0780) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/thebes/gfxASurface.cpp:112
#9  0x00002aaaac16e268 in ~nsRefPtr (this=0x2aaac0161d88, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:969
#10 ~gfxCachedTempSurface (this=0x2aaac0161d88, __in_chrg=<value optimized out>) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/thebes/gfxCachedTempSurface.cpp:97
#11 0x00002aaaac1a4cf1 in ~BasicLayerManager (this=0x2aaac0161d40, __in_chrg=<value optimized out>) at /space2/sewardj/MOZ/MC-16-08-2010/gfx/layers/basic/BasicLayers.cpp:862
#12 0x00002aaaab5c8917 in mozilla::layers::LayerManager::Release (aFrame=<value optimized out>, aPropertyValue=0x2aaaacd7ee58, aRemoveFromFramesWithLayers=<value optimized out>)
    at ../../dist/include/Layers.h:136
#13 ~nsRefPtr (aFrame=<value optimized out>, aPropertyValue=0x2aaaacd7ee58, aRemoveFromFramesWithLayers=<value optimized out>) at ../../dist/include/nsAutoPtr.h:969
#14 mozilla::FrameLayerBuilder::InternalDestroyDisplayItemData (aFrame=<value optimized out>, aPropertyValue=0x2aaaacd7ee58, aRemoveFromFramesWithLayers=<value optimized out>)
    at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FrameLayerBuilder.cpp:355
#15 0x00002aaaab5c8b3a in mozilla::FrameLayerBuilder::DestroyDisplayItemData (aFrame=0x79c, aPropertyValue=0x79c)
    at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FrameLayerBuilder.cpp:362
#16 0x00002aaaab5c92a3 in mozilla::FramePropertyTable::PropertyValue::DestroyValueFor (aEntry=0x1902950)
    at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FramePropertyTable.h:178
#17 mozilla::FramePropertyTable::DeleteAllForEntry (aEntry=0x1902950) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FramePropertyTable.cpp:220
#18 0x00002aaaab5c92f7 in mozilla::FramePropertyTable::DeleteEnumerator (aEntry=0x79c, aArg=0x79c) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FramePropertyTable.cpp:248
#19 0x00002aaaab5c938d in nsTHashtable<mozilla::FramePropertyTable::Entry>::s_EnumStub (table=<value optimized out>, entry=0x79c, number=6, arg=0xffffffffffffffff)
    at ../../dist/include/nsTHashtable.h:420
#20 0x00002aaaac0a7e68 in PL_DHashTableEnumerate (table=<value optimized out>, etor=<value optimized out>, arg=<value optimized out>) at pldhash.c:754
#21 0x00002aaaab5c8cf7 in nsTHashtable<mozilla::FramePropertyTable::Entry>::EnumerateEntries (this=0x79c) at ../../dist/include/nsTHashtable.h:241
#22 mozilla::FramePropertyTable::DeleteAll (this=0x79c) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FramePropertyTable.cpp:258

--------------------------------------------------------------------------

Bizarrely, and worryingly, Valgrind says absolutely nothing, and the
test runs successfully to completion.
I'm wondering if there's some kind of concurrency bug here.  It seems
to fail differently on different runs.  It'd also explain why it
doesn't (apparently) fail on Valgrind, since that drastically changes
the thread scheduling compared to native execution.

Here's another stack from GDB.  This run ended in a segfault rather
than glibc asserting.  Doesn't seem to have much in common with the
previous stack, tho.

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaac0a813b in PL_DHashTableOperate (table=0x1dcf190, key=0xfd1318, op=PL_DHASH_ADD) at pldhash.c:615
615	    keyHash = table->ops->hashKey(table, key);
(gdb) where
#0  0x00002aaaac0a813b in PL_DHashTableOperate (table=0x1dcf190, key=0xfd1318, op=PL_DHASH_ADD) at pldhash.c:615
#1  0x00002aaaab5c5446 in nsTHashtable<nsPtrHashKey<nsIFrame> >::PutEntry (aEntry=0x1b293a0, aUserArg=0x1dcf190) at ../../dist/include/nsTHashtable.h:188
#2  mozilla::FrameLayerBuilder::StoreNewDisplayItemData (aEntry=0x1b293a0, aUserArg=0x1dcf190) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FrameLayerBuilder.cpp:505
#3  0x00002aaaab5c8bb3 in nsTHashtable<mozilla::FrameLayerBuilder::DisplayItemDataEntry>::s_EnumStub (table=<value optimized out>, entry=0x1dcf190, number=1, arg=0x7fffffffc250)
    at ../../dist/include/nsTHashtable.h:420
#4  0x00002aaaac0a7e68 in PL_DHashTableEnumerate (table=<value optimized out>, etor=<value optimized out>, arg=<value optimized out>) at pldhash.c:754
#5  0x00002aaaab5c551a in nsTHashtable<mozilla::FrameLayerBuilder::DisplayItemDataEntry>::EnumerateEntries (this=0x7fffffffc550, aManager=<value optimized out>)
    at ../../dist/include/nsTHashtable.h:241
#6  mozilla::FrameLayerBuilder::WillEndTransaction (this=0x7fffffffc550, aManager=<value optimized out>)
    at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/FrameLayerBuilder.cpp:433
#7  0x00002aaaab5f40ae in nsDisplayList::PaintForFrame (this=<value optimized out>, aBuilder=0x7fffffffc550, aCtx=<value optimized out>, aForFrame=<value optimized out>, 
    aFlags=<value optimized out>) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/nsDisplayList.cpp:393
#8  0x00002aaaab5f41b2 in nsDisplayList::PaintRoot (this=0x1dcf190, aBuilder=0xfd1318, aCtx=0x1, aFlags=8) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/nsDisplayList.cpp:335
#9  0x00002aaaab60422c in nsLayoutUtils::PaintFrame (aRenderingContext=<value optimized out>, aFrame=0x1879ca0, aDirtyRegion=<value optimized out>, 
    aBackstop=<value optimized out>, aFlags=<value optimized out>) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/nsLayoutUtils.cpp:1406
#10 0x00002aaaab612699 in PresShell::Paint (this=0x1864610, aDisplayRoot=0x1879bf0, aViewToPaint=0x1879bf0, aWidgetToPaint=0x1879ca0, aDirtyRegion=<value optimized out>, 
    aIntDirtyRegion=<value optimized out>, aPaintDefaultBackground=0, aWillSendDidPaint=0) at /space2/sewardj/MOZ/MC-16-08-2010/layout/base/nsPresShell.cpp:5932
#11 0x00002aaaab9568a0 in nsViewManager::RenderViews (this=0x1879b80, aView=0x1879bf0, aWidget=<value optimized out>, aRegion=..., aIntRegion=<value optimized out>, 
    aPaintDefaultBackground=0, aWillSendDidPaint=0) at /space2/sewardj/MOZ/MC-16-08-2010/view/src/nsViewManager.cpp:459
#12 0x00002aaaab956a06 in nsViewManager::Refresh (this=0x1879b80, aView=0x1879bf0, aWidget=0x1879ca0, aRegion=..., aUpdateFlags=1)
    at /space2/sewardj/MOZ/MC-16-08-2010/view/src/nsViewManager.cpp:425
#13 0x00002aaaab95841a in nsViewManager::DispatchEvent (this=0x1879b80, aEvent=0x7fffffffce50, aView=0x1879bf0, aStatus=<value optimized out>)
    at /space2/sewardj/MOZ/MC-16-08-2010/view/src/nsViewManager.cpp:912
#14 0x00002aaaab953bfc in HandleEvent (aEvent=0x7fffffffce50) at /space2/sewardj/MOZ/MC-16-08-2010/view/src/nsView.cpp:160
#15 0x00002aaaabee4c38 in nsWindow::DispatchEvent (this=<value optimized out>, aEvent=0xfd1318, aStatus=@0x1)
    at /space2/sewardj/MOZ/MC-16-08-2010/widget/src/gtk2/nsWindow.cpp:571
#16 0x00002aaaabef0816 in nsWindow::OnExposeEvent (this=0x1879ca0, aWidget=<value optimized out>, aEvent=<value optimized out>)
    at /space2/sewardj/MOZ/MC-16-08-2010/widget/src/gtk2/nsWindow.cpp:2141
#17 0x00002aaaabef0f19 in expose_event_cb (widget=0x979010, event=0x7fffffffd510) at /space2/sewardj/MOZ/MC-16-08-2010/widget/src/gtk2/nsWindow.cpp:5414
Marking security-sensitive per sewardj.
Group: core-security
I found some very-far-out-of-range writes, which lead to heap
corruption and crashing.  I don't know if these are the root cause of
the problem, or merely another symptom closer to the root cause.

These were found with Valgrind's exp-ptrcheck tool, which can find
arbitrarily far out-of-range heap accesses.  It crashes, and the
errors are reported, only on about 50% of runs, and that is with me
trying to make each run as similar as possible: display is a vnc
server, no window manager, mouse pointer at bottom RH corner of
display.

Since the crashes are not reliably reproducible I am not ruling out
the possibility of some kind of threading bug.

Invalid write of size 8
   at 0x64F8518: pixman_fill_mmx (pixman-mmx.c:1962)
   by 0x64FE6BB: mmx_fill (pixman-mmx.c:3333)
   by 0x64F1430: _pixman_implementation_fill (pixman-implementation.c:225)
   by 0x64EEC2A: _moz_pixman_fill (pixman.c:864)
   by 0x64F05AD: pixman_image_fill_boxes (pixman.c:1022)
   by 0x64F0790: _moz_pixman_image_fill_rectangles (pixman.c:958)
   by 0x649CB58: _cairo_image_surface_fill_rectangles
                 (cairo-image-surface.c:1238)
   by 0x64B03F1: _cairo_surface_fill_rectangles (cairo-surface.c:1977)
   by 0x64B3F2B: _fill_rectangles (cairo-surface-fallback.c:702)
   by 0x64B40D1: _clip_and_composite_trapezoids (cairo-surface-fallback.c:785)
   by 0x64B4C6C: _cairo_surface_fallback_paint (cairo-surface-fallback.c:1042)
   by 0x64B1A2D: _cairo_surface_paint (cairo-surface.c:2020)
   by 0x649A33E: _cairo_gstate_paint (cairo-gstate.c:988)
   by 0x6493C5F: _moz_cairo_paint (cairo.c:2118)
   by 0x6493D24: _moz_cairo_paint_with_alpha (cairo.c:2146)
   by 0x62CD002: gfxContext::Paint(double) (gfxContext.cpp:748)
   by 0x6300443: mozilla::layers::BasicColorLayer::Paint (BasicLayers.cpp:592)
   by 0x62FFFE3: mozilla::layers::BasicLayerManager::PaintLayer
                 (BasicLayers.cpp:1058)
   by 0x630006C: mozilla::layers::BasicLayerManager::PaintLayer
                 (BasicLayers.cpp:1066)
   by 0x630345F: mozilla::layers::BasicLayerManager::EndTransaction
                 (BasicLayers.cpp:966)

 Address 0x1ef35d30 is 16000 bytes before the accessing pointer's
 legitimate range, a block of size 1000000 alloc'd
   at 0x4C2710C: calloc (vg_replace_malloc.c:467)
   by 0x64EC0D3: _moz_pixman_image_create_bits (pixman-bits-image.c:977)
   by 0x649D55C: _cairo_image_surface_create_with_pixman_format
                 (cairo-image-surface.c:386)
   by 0x649D75D: _moz_cairo_image_surface_create (cairo-image-surface.c:436)
   by 0x649DB57: _cairo_image_surface_create_with_content
                 (cairo-image-surface.c:449)
   by 0x64B1E12: _cairo_surface_create_similar_solid (cairo-surface.c:467)
   by 0x64B1ED0: _moz_cairo_surface_create_similar (cairo-surface.c:446)
   by 0x62CAEA3: gfxASurface::CreateSimilarSurface (gfxASurface.cpp:315)
   by 0x62F6FA2: gfxXlibSurface::CreateSimilarSurface (gfxXlibSurface.cpp:211)
   by 0x62CC4A7: gfxCachedTempSurface::Get (gfxCachedTempSurface.cpp:119)
   by 0x6302365: mozilla::layers::BasicLayerManager::PushGroupWithCachedSurface
                 (BasicLayers.cpp:896)
   by 0x6303407: mozilla::layers::BasicLayerManager::EndTransaction
                 (BasicLayers.cpp:963)
   by 0x57520C0: nsDisplayList::PaintForFrame (nsDisplayList.cpp:395)
   by 0x57521B1: nsDisplayList::PaintRoot (nsDisplayList.cpp:335)
   by 0x576222B: nsLayoutUtils::PaintFrame (nsLayoutUtils.cpp:1406)
   by 0x5770698: PresShell::Paint (nsPresShell.cpp:5932)
   by 0x5AB489F: nsViewManager::RenderViews (nsViewManager.cpp:459)
   by 0x5AB4A05: nsViewManager::Refresh (nsViewManager.cpp:425)
   by 0x5AB6419: nsViewManager::DispatchEvent (nsViewManager.cpp:912)
   by 0x5AB1BFB: HandleEvent(nsGUIEvent*) (nsView.cpp:160)

.. and the same error repeated for offsets -15992, -15984, -15976,
etc, up till -14016 from the beginning of the block.
There's another funny thing about this, which is alluded to also in 
bug 582668 comment #12: why is pixman doing mmx stuff, on this Core i5
which supports (almost) every SSE instruction in the known universe?
(In reply to comment #4)
> There's another funny thing about this, which is alluded to also in 
> bug 582668 comment #12: why is pixman doing mmx stuff, on this Core i5
> which supports (almost) every SSE instruction in the known universe?

Bug 488851 perhaps?
(In reply to comment #5)
> Bug 488851 perhaps?
Right, so the MMX vs SSE thing is irrelevant.  Good.

It'd be nice to know if anyone can repro this failure.
I forgot to mention, this is x86_64 running Ubuntu 10.04.
I frequently get crashes in pixman_fill_mmx when running reftests under VNC with bug 130078 applied. Ubuntu 10.04 x86_64 here as well. Bug 130078 also seems to cause a crash in pixman_fill_sse2 on mac talos on try server, and running win7 mochitests on try server in pixman_fill_sse2.

If this has only started for you a few days ago maybe we can narrow down what caused it.
(In reply to comment #0)
> where :3.0 is a 1024x768 16-bit vnc server

I would not be surprised to find that "16-bit" is the interesting attribute here.
I just checked, my vnc connection seems to be using 16-bit too.
I can not reproduce on clean trunk (ie without 130078).
This is on clean trunk, yesterday morning.
Whiteboard: [sg:critical?][critsmash:investigating]
Blocks: 130078
When I looked at Timothy's crash it seemed to be caused by trying to fill a box
that started outside of the surface extents. 

i.e. boxes[0] had a negative y-coordinate in _cairo_surface_fallback_paint. It
would be interesting to see if that was the case here too.
I've checked in a patch that will likely fix Timothy's crash. It would be good to see if this is still reproduceable.
Jeff's change seems to have fixed my local issue. I hope it fixes the similar looking issue on try server.
Jeff's change stops the segfaults/assertions I was seeing and gets rid of
the heap trashing reports in comment #3.  So it LGTM.  Also, it gets rid of
the heap-profiler stability issues noted in bug 551477 comment 69, which
is good.

I'm kinda curious why this only showed up recently, though.  I've been
running mochitests on this exact same VNC setup for months without problems.
The issue on try server with 130078 seems to be fixed. Anything left to do in this bug?
(In reply to comment #16)
> I'm kinda curious why this only showed up recently, though.  I've been
> running mochitests on this exact same VNC setup for months without problems.

The problem only occurs when we have a clip that extends beyond the bounds of a surface. I'm not sure why we now use these but perhaps we didn't before.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.