Closed Bug 776940 Opened 12 years ago Closed 12 years ago

Trying to __delete__() PGrallocBuffer actor after it's dead to IPC

Categories

(Core :: Graphics: Layers, defect, P1)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla17
blocking-kilimanjaro +
blocking-basecamp +

People

(Reporter: gal, Assigned: cjones)

References

Details

Attachments

(1 file, 4 obsolete files)

[Parent 2439] ###!!! ABORT: actor has been |delete|d: file /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp, line 373
[Parent 2439] ###!!! ABORT: actor has been |delete|d: file /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp, line 373

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2439.2461]
0x410c984c in mozalloc_abort (msg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/memory/mozalloc/mozalloc_abort.cpp:23
23	    MOZ_CRASH();
(gdb) bt 20
#0  0x410c984c in mozalloc_abort (msg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/memory/mozalloc/mozalloc_abort.cpp:23
#1  0x40050404 in _normal_unlock (mutex=0x3) at bionic/libc/bionic/pthread.c:973
#2  pthread_mutex_unlock (mutex=0x3) at bionic/libc/bionic/pthread.c:1120
#3  0x21212122 in ?? ()
#4  0x21212122 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
Summary: memory correction in texture sharing code → memory correction in direct texuring code
blocking-basecamp: --- → +
blocking-kilimanjaro: --- → +
Priority: -- → P1
STR: open NYT.com in otoro and click on the first ad that appears (MarcJacobs.com).
This is breaking on the first abort (that for some reason didn't abort us):

#0  mozilla::layers::PGrallocBufferParent::Write (this=0x3593950, __v=0x3593950, __msg=0x28d2010, __nullable=false)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:373
#1  0x40c303f4 in mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x3593950)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:70
#2  0x40d2ad94 in mozilla::layers::ShadowLayerManager::PlatformDestroySharedSurface (this=<value optimized out>, aSurface=0x3427fc4)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:200
#3  0x40d28af0 in mozilla::layers::ShadowLayerManager::DestroySharedSurface (this=0x3593950, aSurface=0x3593950, aDeallocator=0x28d2010)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayers.cpp:552
#4  0x40d29fb2 in mozilla::layers::ShadowLayersParent::DestroySharedSurface (this=0x28d2010, aSurface=0x3593950)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:503
#5  0x40d1daae in mozilla::layers::ShadowThebesLayerOGL::DestroyFrontBuffer (this=0x3427dc0)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ThebesLayerOGL.cpp:1067
#6  0x40d1d8fc in mozilla::layers::ShadowThebesLayerOGL::Destroy (this=0x3593950)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ThebesLayerOGL.cpp:1082
#7  0x40d18486 in ContainerDestroy<mozilla::layers::ShadowContainerLayerOGL> (this=0x34ff448)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:97
#8  mozilla::layers::ShadowContainerLayerOGL::Destroy (this=0x34ff448) at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:365
#9  0x40d18740 in ~ShadowContainerLayerOGL (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:347
#10 0x40d187a0 in ~ShadowContainerLayerOGL (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:348
#11 0x40620210 in gfxPattern::Release (this=0x3593950) at /Users/gal/workspace/B2G/gecko/image/src/Decoder.h:83
#12 0x40d2a30c in ~nsRefPtr (this=0x1f42758, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:874
#13 ~ShadowLayersParent (this=0x1f42758, __in_chrg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:106
#14 0x40d2a338 in ~ShadowLayersParent (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:106
#15 0x40d26f3c in mozilla::layers::CrossProcessCompositorParent::DeallocPLayers (this=<value optimized out>, aLayers=0x1f42758)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/CompositorParent.cpp:995
#16 0x40c2fbd6 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x3661170)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:752
#17 0x40c2fce4 in mozilla::layers::PCompositorParent::OnChannelError (this=0x3661170)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:592
#18 0x40c07a64 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x3661178) at /Users/gal/workspace/B2G/gecko/ipc/glue/AsyncChannel.cpp:549
#19 0x40c07bec in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x3661178)
    at /Users/gal/workspace/B2G/gecko/ipc/glue/AsyncChannel.cpp:514
#20 0x40bf4762 in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/tuple.h:383
#21 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/task.h:307
#22 0x40cd19f4 in MessageLoop::RunTask (this=0x45979df4, task=0x0) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:326
#23 0x40cd2842 in MessageLoop::DeferOrRunPendingTask (this=0x3661178, pending_task=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:334
#24 0x40cd3420 in MessageLoop::DoWork (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:434
#25 0x40cd3690 in base::MessagePumpDefault::Run (this=0x249f3b8, delegate=0x45979df4)
---Type <return> to continue, or q <return> to quit---
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_pump_default.cc:23
#26 0x40cd19a4 in MessageLoop::RunInternal (this=0x4c595) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#27 0x40cd1a5a in MessageLoop::RunHandler (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:201
#28 MessageLoop::Run (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:175
#29 0x40cd9d44 in base::Thread::ThreadMain (this=0x249eeb8) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/thread.cc:156
#30 0x40ce3bb8 in ThreadFunc (closure=0x1) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/platform_thread_posix.cc:31
#31 0x400d90d8 in __thread_entry (func=0x40ce3bb1 <ThreadFunc>, arg=0x249eeb8, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#32 0x400d8c2c in pthread_create (thread_out=<value optimized out>, attr=0xbecf3488, start_routine=0x40ce3bb1 <ThreadFunc>, arg=0x249eeb8)
    at bionic/libc/bionic/pthread.c:357
#33 0x00000000 in ?? ()
Attached patch patch (obsolete) — Splinter Review
Very unlikely to be the right fix, but presented here for general amusement, and so that my demo doesn't crash.
Still crashes with the patch.
Attached patch another attempt (obsolete) — Splinter Review
This is definitely not the right approach. I think the content process dies and we fail. Just a guess.
Assignee: nobody → jones.chris.g
I can reproduce.
The problem here is that we have a dtor called during an unexpected time, which is trying to do nontrivial cleanup.  As PGrallocBuffer is part of the protocol tree, it's already been cleaned up and marked dead, automatically.  That wouldn't have happened for shmem because they're basically ignored during shutdown.

This is kind of a latent bug.
Attached patch Correct-ish patch (obsolete) — Splinter Review
Needs to be implemented more widely, but this is what we want to do.
Attachment #645337 - Attachment is obsolete: true
Attachment #645341 - Attachment is obsolete: true
Rebuilding to double-check it works atm.
Yes, this fixes the ThebesLayerOGL crash.
The underlying bug here is that the destruction protocol for basic layers is different than GL layers.  The GL layers call Destroy() always from dtors, but shadow layers need to skip that during emergency shutdown, because it's not strictly safe to free surfaces etc.

I think right solution here is to "fix" the shadow GL layers to behave like shadow basic layers.
I think the last person to review shadow-layers destruction code was vlad, 1.5 or 2 years ago, so I'm not sure who to ping for review anymore.  Sorry.

We would never run into this case with omtc because there's no such thing as "emergency shutdown" cross-thread.  We should have hit this if we had GL enabled for xul-fennec tests, which I guess we don't.
Attachment #645365 - Attachment is obsolete: true
Attachment #645578 - Flags: review?(roc)
Summary: memory correction in direct texuring code → Trying to __delete__() PGrallocBuffer actor after it's dead to IPC
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> I think the last person to review shadow-layers destruction code was vlad,
> 1.5 or 2 years ago, so I'm not sure who to ping for review anymore.  Sorry.

How about ... vlad?
I don't think he's seen this code for 9 months or so.  But sure.
And, thanks for reviewing anyway.
This appears to be causing Layer leaking on OSX bots:

https://tbpl.mozilla.org/php/getParsedLog.php?id=13825210&tree=Mozilla-Inbound
Should have backed this out with great prejudice!
Attachment #645673 - Flags: review? → review?(vladimir)
Comment on attachment 645673 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers, now with less leaky

I like less leaky!
Attachment #645673 - Flags: review?(vladimir) → review+
https://hg.mozilla.org/mozilla-central/rev/e0c9bf37a067
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: