Last Comment Bug 776940 - Trying to __delete__() PGrallocBuffer actor after it's dead to IPC
: Trying to __delete__() PGrallocBuffer actor after it's dead to IPC
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Graphics: Layers (show other bugs)
: Trunk
: x86 Mac OS X
: P1 normal (vote)
: mozilla17
Assigned To: Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
:
: Milan Sreckovic [:milan]
Mentors:
Depends on:
Blocks: 777164
  Show dependency treegraph
 
Reported: 2012-07-24 08:31 PDT by Andreas Gal :gal
Modified: 2012-07-26 05:11 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
+


Attachments
patch (645 bytes, patch)
2012-07-24 09:05 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
another attempt (645 bytes, patch)
2012-07-24 09:21 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
Correct-ish patch (2.72 KB, patch)
2012-07-24 10:37 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
no flags Details | Diff | Splinter Review
Fix shutdown protocol of shadow OGL layers (1.27 KB, patch)
2012-07-24 16:40 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
roc: review+
Details | Diff | Splinter Review
Fix shutdown protocol of shadow OGL layers, now with less leaky (1.34 KB, patch)
2012-07-25 00:44 PDT, Chris Jones [:cjones] inactive; ni?/f?/r? if you need me
vladimir: review+
Details | Diff | Splinter Review

Description Andreas Gal :gal 2012-07-24 08:31:48 PDT
[Parent 2439] ###!!! ABORT: actor has been |delete|d: file /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp, line 373
[Parent 2439] ###!!! ABORT: actor has been |delete|d: file /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp, line 373

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2439.2461]
0x410c984c in mozalloc_abort (msg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/memory/mozalloc/mozalloc_abort.cpp:23
23	    MOZ_CRASH();
(gdb) bt 20
#0  0x410c984c in mozalloc_abort (msg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/memory/mozalloc/mozalloc_abort.cpp:23
#1  0x40050404 in _normal_unlock (mutex=0x3) at bionic/libc/bionic/pthread.c:973
#2  pthread_mutex_unlock (mutex=0x3) at bionic/libc/bionic/pthread.c:1120
#3  0x21212122 in ?? ()
#4  0x21212122 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
Comment 1 Andreas Gal :gal 2012-07-24 08:32:53 PDT
STR: open NYT.com in otoro and click on the first ad that appears (MarcJacobs.com).
Comment 2 Andreas Gal :gal 2012-07-24 08:36:28 PDT
This is breaking on the first abort (that for some reason didn't abort us):

#0  mozilla::layers::PGrallocBufferParent::Write (this=0x3593950, __v=0x3593950, __msg=0x28d2010, __nullable=false)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:373
#1  0x40c303f4 in mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x3593950)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:70
#2  0x40d2ad94 in mozilla::layers::ShadowLayerManager::PlatformDestroySharedSurface (this=<value optimized out>, aSurface=0x3427fc4)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:200
#3  0x40d28af0 in mozilla::layers::ShadowLayerManager::DestroySharedSurface (this=0x3593950, aSurface=0x3593950, aDeallocator=0x28d2010)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayers.cpp:552
#4  0x40d29fb2 in mozilla::layers::ShadowLayersParent::DestroySharedSurface (this=0x28d2010, aSurface=0x3593950)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:503
#5  0x40d1daae in mozilla::layers::ShadowThebesLayerOGL::DestroyFrontBuffer (this=0x3427dc0)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ThebesLayerOGL.cpp:1067
#6  0x40d1d8fc in mozilla::layers::ShadowThebesLayerOGL::Destroy (this=0x3593950)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ThebesLayerOGL.cpp:1082
#7  0x40d18486 in ContainerDestroy<mozilla::layers::ShadowContainerLayerOGL> (this=0x34ff448)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:97
#8  mozilla::layers::ShadowContainerLayerOGL::Destroy (this=0x34ff448) at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:365
#9  0x40d18740 in ~ShadowContainerLayerOGL (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:347
#10 0x40d187a0 in ~ShadowContainerLayerOGL (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:348
#11 0x40620210 in gfxPattern::Release (this=0x3593950) at /Users/gal/workspace/B2G/gecko/image/src/Decoder.h:83
#12 0x40d2a30c in ~nsRefPtr (this=0x1f42758, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:874
#13 ~ShadowLayersParent (this=0x1f42758, __in_chrg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:106
#14 0x40d2a338 in ~ShadowLayersParent (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:106
#15 0x40d26f3c in mozilla::layers::CrossProcessCompositorParent::DeallocPLayers (this=<value optimized out>, aLayers=0x1f42758)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/CompositorParent.cpp:995
#16 0x40c2fbd6 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x3661170)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:752
#17 0x40c2fce4 in mozilla::layers::PCompositorParent::OnChannelError (this=0x3661170)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:592
#18 0x40c07a64 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x3661178) at /Users/gal/workspace/B2G/gecko/ipc/glue/AsyncChannel.cpp:549
#19 0x40c07bec in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x3661178)
    at /Users/gal/workspace/B2G/gecko/ipc/glue/AsyncChannel.cpp:514
#20 0x40bf4762 in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/tuple.h:383
#21 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/task.h:307
#22 0x40cd19f4 in MessageLoop::RunTask (this=0x45979df4, task=0x0) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:326
#23 0x40cd2842 in MessageLoop::DeferOrRunPendingTask (this=0x3661178, pending_task=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:334
#24 0x40cd3420 in MessageLoop::DoWork (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:434
#25 0x40cd3690 in base::MessagePumpDefault::Run (this=0x249f3b8, delegate=0x45979df4)
---Type <return> to continue, or q <return> to quit---
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_pump_default.cc:23
#26 0x40cd19a4 in MessageLoop::RunInternal (this=0x4c595) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#27 0x40cd1a5a in MessageLoop::RunHandler (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:201
#28 MessageLoop::Run (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:175
#29 0x40cd9d44 in base::Thread::ThreadMain (this=0x249eeb8) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/thread.cc:156
#30 0x40ce3bb8 in ThreadFunc (closure=0x1) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/platform_thread_posix.cc:31
#31 0x400d90d8 in __thread_entry (func=0x40ce3bb1 <ThreadFunc>, arg=0x249eeb8, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#32 0x400d8c2c in pthread_create (thread_out=<value optimized out>, attr=0xbecf3488, start_routine=0x40ce3bb1 <ThreadFunc>, arg=0x249eeb8)
    at bionic/libc/bionic/pthread.c:357
#33 0x00000000 in ?? ()
Comment 3 Andreas Gal :gal 2012-07-24 09:05:23 PDT
Created attachment 645337 [details] [diff] [review]
patch

Very unlikely to be the right fix, but presented here for general amusement, and so that my demo doesn't crash.
Comment 4 Andreas Gal :gal 2012-07-24 09:09:24 PDT
Still crashes with the patch.
Comment 5 Andreas Gal :gal 2012-07-24 09:21:49 PDT
Created attachment 645341 [details] [diff] [review]
another attempt

This is definitely not the right approach. I think the content process dies and we fail. Just a guess.
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 10:04:30 PDT
I can reproduce.
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 10:32:58 PDT
The problem here is that we have a dtor called during an unexpected time, which is trying to do nontrivial cleanup.  As PGrallocBuffer is part of the protocol tree, it's already been cleaned up and marked dead, automatically.  That wouldn't have happened for shmem because they're basically ignored during shutdown.

This is kind of a latent bug.
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 10:37:22 PDT
Created attachment 645365 [details] [diff] [review]
Correct-ish patch

Needs to be implemented more widely, but this is what we want to do.
Comment 9 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 10:37:39 PDT
Rebuilding to double-check it works atm.
Comment 10 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 10:48:17 PDT
Yes, this fixes the ThebesLayerOGL crash.
Comment 11 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 13:09:06 PDT
The underlying bug here is that the destruction protocol for basic layers is different than GL layers.  The GL layers call Destroy() always from dtors, but shadow layers need to skip that during emergency shutdown, because it's not strictly safe to free surfaces etc.

I think right solution here is to "fix" the shadow GL layers to behave like shadow basic layers.
Comment 12 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 16:40:56 PDT
Created attachment 645578 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers

I think the last person to review shadow-layers destruction code was vlad, 1.5 or 2 years ago, so I'm not sure who to ping for review anymore.  Sorry.

We would never run into this case with omtc because there's no such thing as "emergency shutdown" cross-thread.  We should have hit this if we had GL enabled for xul-fennec tests, which I guess we don't.
Comment 13 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-24 16:56:06 PDT
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> I think the last person to review shadow-layers destruction code was vlad,
> 1.5 or 2 years ago, so I'm not sure who to ping for review anymore.  Sorry.

How about ... vlad?
Comment 14 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 17:30:27 PDT
I don't think he's seen this code for 9 months or so.  But sure.
Comment 15 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 17:30:47 PDT
And, thanks for reviewing anyway.
Comment 16 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 19:47:19 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1878d925ef5
Comment 17 :Benjamin Peterson 2012-07-24 21:22:50 PDT
This appears to be causing Layer leaking on OSX bots:

https://tbpl.mozilla.org/php/getParsedLog.php?id=13825210&tree=Mozilla-Inbound
Comment 18 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-24 22:40:10 PDT
Should have backed this out with great prejudice!
Comment 19 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-24 22:41:06 PDT
Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d00146c7108
Comment 20 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-07-25 00:44:36 PDT
Created attachment 645673 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers, now with less leaky
Comment 22 Vladimir Vukicevic [:vlad] [:vladv] 2012-07-25 08:35:25 PDT
Comment on attachment 645673 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers, now with less leaky

I like less leaky!
Comment 23 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-07-25 15:57:37 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/e0c9bf37a067
Comment 24 Ed Morley [:emorley] 2012-07-26 05:11:17 PDT
https://hg.mozilla.org/mozilla-central/rev/e0c9bf37a067

Note You need to log in before you can comment on or make changes to this bug.