The default bug view has changed. See this FAQ.

Trying to __delete__() PGrallocBuffer actor after it's dead to IPC

RESOLVED FIXED in mozilla17

Status

()

Core
Graphics: Layers
P1
normal
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: gal, Assigned: cjones)

Tracking

Trunk
mozilla17
x86
Mac OS X
Points:
---

Firefox Tracking Flags

(blocking-kilimanjaro:+, blocking-basecamp:+)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

5 years ago
[Parent 2439] ###!!! ABORT: actor has been |delete|d: file /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp, line 373
[Parent 2439] ###!!! ABORT: actor has been |delete|d: file /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp, line 373

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2439.2461]
0x410c984c in mozalloc_abort (msg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/memory/mozalloc/mozalloc_abort.cpp:23
23	    MOZ_CRASH();
(gdb) bt 20
#0  0x410c984c in mozalloc_abort (msg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/memory/mozalloc/mozalloc_abort.cpp:23
#1  0x40050404 in _normal_unlock (mutex=0x3) at bionic/libc/bionic/pthread.c:973
#2  pthread_mutex_unlock (mutex=0x3) at bionic/libc/bionic/pthread.c:1120
#3  0x21212122 in ?? ()
#4  0x21212122 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
(Reporter)

Updated

5 years ago
Summary: memory correction in texture sharing code → memory correction in direct texuring code
(Reporter)

Updated

5 years ago
blocking-basecamp: --- → +
blocking-kilimanjaro: --- → +
Priority: -- → P1
(Reporter)

Comment 1

5 years ago
STR: open NYT.com in otoro and click on the first ad that appears (MarcJacobs.com).
(Reporter)

Comment 2

5 years ago
This is breaking on the first abort (that for some reason didn't abort us):

#0  mozilla::layers::PGrallocBufferParent::Write (this=0x3593950, __v=0x3593950, __msg=0x28d2010, __nullable=false)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:373
#1  0x40c303f4 in mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x3593950)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:70
#2  0x40d2ad94 in mozilla::layers::ShadowLayerManager::PlatformDestroySharedSurface (this=<value optimized out>, aSurface=0x3427fc4)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:200
#3  0x40d28af0 in mozilla::layers::ShadowLayerManager::DestroySharedSurface (this=0x3593950, aSurface=0x3593950, aDeallocator=0x28d2010)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayers.cpp:552
#4  0x40d29fb2 in mozilla::layers::ShadowLayersParent::DestroySharedSurface (this=0x28d2010, aSurface=0x3593950)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:503
#5  0x40d1daae in mozilla::layers::ShadowThebesLayerOGL::DestroyFrontBuffer (this=0x3427dc0)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ThebesLayerOGL.cpp:1067
#6  0x40d1d8fc in mozilla::layers::ShadowThebesLayerOGL::Destroy (this=0x3593950)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ThebesLayerOGL.cpp:1082
#7  0x40d18486 in ContainerDestroy<mozilla::layers::ShadowContainerLayerOGL> (this=0x34ff448)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:97
#8  mozilla::layers::ShadowContainerLayerOGL::Destroy (this=0x34ff448) at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:365
#9  0x40d18740 in ~ShadowContainerLayerOGL (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:347
#10 0x40d187a0 in ~ShadowContainerLayerOGL (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/opengl/ContainerLayerOGL.cpp:348
#11 0x40620210 in gfxPattern::Release (this=0x3593950) at /Users/gal/workspace/B2G/gecko/image/src/Decoder.h:83
#12 0x40d2a30c in ~nsRefPtr (this=0x1f42758, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:874
#13 ~ShadowLayersParent (this=0x1f42758, __in_chrg=<value optimized out>) at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:106
#14 0x40d2a338 in ~ShadowLayersParent (this=0x3593950, __in_chrg=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/ShadowLayersParent.cpp:106
#15 0x40d26f3c in mozilla::layers::CrossProcessCompositorParent::DeallocPLayers (this=<value optimized out>, aLayers=0x1f42758)
    at /Users/gal/workspace/B2G/gecko/gfx/layers/ipc/CompositorParent.cpp:995
#16 0x40c2fbd6 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x3661170)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:752
#17 0x40c2fce4 in mozilla::layers::PCompositorParent::OnChannelError (this=0x3661170)
    at /Users/gal/workspace/B2G/objdir-gecko/ipc/ipdl/PCompositorParent.cpp:592
#18 0x40c07a64 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x3661178) at /Users/gal/workspace/B2G/gecko/ipc/glue/AsyncChannel.cpp:549
#19 0x40c07bec in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x3661178)
    at /Users/gal/workspace/B2G/gecko/ipc/glue/AsyncChannel.cpp:514
#20 0x40bf4762 in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/tuple.h:383
#21 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/task.h:307
#22 0x40cd19f4 in MessageLoop::RunTask (this=0x45979df4, task=0x0) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:326
#23 0x40cd2842 in MessageLoop::DeferOrRunPendingTask (this=0x3661178, pending_task=<value optimized out>)
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:334
#24 0x40cd3420 in MessageLoop::DoWork (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:434
#25 0x40cd3690 in base::MessagePumpDefault::Run (this=0x249f3b8, delegate=0x45979df4)
---Type <return> to continue, or q <return> to quit---
    at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_pump_default.cc:23
#26 0x40cd19a4 in MessageLoop::RunInternal (this=0x4c595) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:208
#27 0x40cd1a5a in MessageLoop::RunHandler (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:201
#28 MessageLoop::Run (this=0x45979df4) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/message_loop.cc:175
#29 0x40cd9d44 in base::Thread::ThreadMain (this=0x249eeb8) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/thread.cc:156
#30 0x40ce3bb8 in ThreadFunc (closure=0x1) at /Users/gal/workspace/B2G/gecko/ipc/chromium/src/base/platform_thread_posix.cc:31
#31 0x400d90d8 in __thread_entry (func=0x40ce3bb1 <ThreadFunc>, arg=0x249eeb8, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#32 0x400d8c2c in pthread_create (thread_out=<value optimized out>, attr=0xbecf3488, start_routine=0x40ce3bb1 <ThreadFunc>, arg=0x249eeb8)
    at bionic/libc/bionic/pthread.c:357
#33 0x00000000 in ?? ()
(Reporter)

Comment 3

5 years ago
Created attachment 645337 [details] [diff] [review]
patch

Very unlikely to be the right fix, but presented here for general amusement, and so that my demo doesn't crash.
(Reporter)

Comment 4

5 years ago
Still crashes with the patch.
(Reporter)

Comment 5

5 years ago
Created attachment 645341 [details] [diff] [review]
another attempt

This is definitely not the right approach. I think the content process dies and we fail. Just a guess.
Assignee: nobody → jones.chris.g
I can reproduce.
The problem here is that we have a dtor called during an unexpected time, which is trying to do nontrivial cleanup.  As PGrallocBuffer is part of the protocol tree, it's already been cleaned up and marked dead, automatically.  That wouldn't have happened for shmem because they're basically ignored during shutdown.

This is kind of a latent bug.
Created attachment 645365 [details] [diff] [review]
Correct-ish patch

Needs to be implemented more widely, but this is what we want to do.
Attachment #645337 - Attachment is obsolete: true
Attachment #645341 - Attachment is obsolete: true
Rebuilding to double-check it works atm.
Yes, this fixes the ThebesLayerOGL crash.
The underlying bug here is that the destruction protocol for basic layers is different than GL layers.  The GL layers call Destroy() always from dtors, but shadow layers need to skip that during emergency shutdown, because it's not strictly safe to free surfaces etc.

I think right solution here is to "fix" the shadow GL layers to behave like shadow basic layers.
Created attachment 645578 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers

I think the last person to review shadow-layers destruction code was vlad, 1.5 or 2 years ago, so I'm not sure who to ping for review anymore.  Sorry.

We would never run into this case with omtc because there's no such thing as "emergency shutdown" cross-thread.  We should have hit this if we had GL enabled for xul-fennec tests, which I guess we don't.
Attachment #645365 - Attachment is obsolete: true
Attachment #645578 - Flags: review?(roc)
Summary: memory correction in direct texuring code → Trying to __delete__() PGrallocBuffer actor after it's dead to IPC
Blocks: 777164
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> I think the last person to review shadow-layers destruction code was vlad,
> 1.5 or 2 years ago, so I'm not sure who to ping for review anymore.  Sorry.

How about ... vlad?
Attachment #645578 - Flags: review?(roc) → review+
I don't think he's seen this code for 9 months or so.  But sure.
And, thanks for reviewing anyway.
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1878d925ef5

Comment 17

5 years ago
This appears to be causing Layer leaking on OSX bots:

https://tbpl.mozilla.org/php/getParsedLog.php?id=13825210&tree=Mozilla-Inbound
Should have backed this out with great prejudice!
Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d00146c7108
Created attachment 645673 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers, now with less leaky
Attachment #645578 - Attachment is obsolete: true
Attachment #645673 - Flags: review?
Attachment #645673 - Flags: review? → review?(vladimir)
I suspect this also caused Android R1 orange:

https://tbpl.mozilla.org/php/getParsedLog.php?id=13826523&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=13827014&tree=Mozilla-Inbound
Comment on attachment 645673 [details] [diff] [review]
Fix shutdown protocol of shadow OGL layers, now with less leaky

I like less leaky!
Attachment #645673 - Flags: review?(vladimir) → review+
http://hg.mozilla.org/integration/mozilla-inbound/rev/e0c9bf37a067
https://hg.mozilla.org/mozilla-central/rev/e0c9bf37a067
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.