Closed Bug 860892 Opened 12 years ago Closed 12 years ago

Add IPC failure handling in CompositorChild and ShadowLayers

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23
blocking-b2g tef+
Tracking Status
firefox21 --- wontfix
firefox22 --- wontfix
firefox23 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: sotaro, Assigned: sotaro)

References

Details

Attachments

(2 files, 3 obsolete files)

+++ This bug was initially created as a clone of Bug #851664 +++
Summary: Add IPC failure handling in CompositorChild → Add IPC failure handling in CompositorChild and ShadowLayers
When IPC failure happens on CompositorChild or ShadowLayers, All managed objects under them are automatically deleted by IPDL generated code. After that, Application can not draw it's content anymore.
Blocks: 851664
blocking-b2g: tef+ → ---
Severity: critical → normal
(In reply to Sotaro Ikeda [:sotaro] from comment #1) > When IPC failure happens on CompositorChild or ShadowLayers, All managed > objects under them are automatically deleted by IPDL generated code. After > that, Application can not draw it's content anymore. Current gecko does not handle this case correctly, it causes a crash in different source code.
(In reply to Sotaro Ikeda [:sotaro] from comment #2) > (In reply to Sotaro Ikeda [:sotaro] from comment #1) > > When IPC failure happens on CompositorChild or ShadowLayers, All managed > > objects under them are automatically deleted by IPDL generated code. After > > that, Application can not draw it's content anymore. > > Current gecko does not handle this case correctly, it causes a crash in > different source code. example crash is Bug 851664.
I talked about it with jrmuizel, nical, BenWa. The conclusion is following. - abort the app process with log is a good way to handle this situation
blocking-b2g: --- → tef?
Attachment #736488 - Flags: review?(jmuizelaar)
Change to abort even when the current process is chrome process.
Attachment #736488 - Attachment is obsolete: true
Attachment #736488 - Flags: review?(jmuizelaar)
Attachment #736512 - Flags: review?(jmuizelaar)
fix nit
Attachment #736512 - Attachment is obsolete: true
Attachment #736512 - Flags: review?(jmuizelaar)
Attachment #736539 - Flags: review?(jmuizelaar)
blocking-b2g: tef? → tef+
Hi Sotaro, I can reproduce on my unagi. It look like we call PGrallocBufferParent::Send__delete__() after its ActorDestroy() is called when IPC channel error happens. Here are the stack traces. So the actor is destroyed on IPC error from the protocol tree once and then by the user ISurfaceAllocator again. > (this is actually PGrallocBufferParent::DestroySubtree()) > Breakpoint 1, mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree (this=0x48845ed8, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PPluginBackgroundDestroyerParent.cpp:328 > 328 Unregister(mId); > #0 mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree (this=0x48845ed8, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PPluginBackgroundDestroyerParent.cpp:328 > #1 0x4135e108 in mozilla::layers::PLayersParent::DestroySubtree (this=0x43358420, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PLayersParent.cpp:776 > #2 0x413568d6 in mozilla::layers::PCompositorParent::DestroySubtree (this=0x49afd480, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:812 > #3 0x41356902 in mozilla::layers::PCompositorParent::OnChannelError (this=0x49afd480) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:663 > #4 0x41317bd4 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:570 > #5 0x41317f3a in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:535 > #6 0x412ff7ae in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at /ipc/chromium/src/base/tuple.h:383 > #7 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at /ipc/chromium/src/base/task.h:307 > #8 0x41550c18 in MessageLoop::RunTask (this=0x46cffdf0, task=0x0) at /ipc/chromium/src/base/message_loop.cc:334 > #9 0x41551a22 in MessageLoop::DeferOrRunPendingTask (this=0x49afd488, pending_task=<value optimized out>) at /ipc/chromium/src/base/message_loop.cc:342 > #10 0x41552600 in MessageLoop::DoWork (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:442 > #11 0x41552880 in base::MessagePumpDefault::Run (this=0x45670400, delegate=0x46cffdf0) at /ipc/chromium/src/base/message_pump_default.cc:23 > #12 0x41550bd4 in MessageLoop::RunInternal (this=0x0) at /ipc/chromium/src/base/message_loop.cc:216 > #13 0x41550c7e in MessageLoop::RunHandler (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:209 > #14 MessageLoop::Run (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:183 > #15 0x41558f5c in base::Thread::ThreadMain (this=0x4566f3a0) at /ipc/chromium/src/base/thread.cc:156 > #16 0x41562c18 in ThreadFunc (closure=0x1) at /ipc/chromium/src/base/platform_thread_posix.cc:39 > #17 0x400bce18 in __thread_entry (func=0x41562c11 <ThreadFunc>, arg=0x4566f3a0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 > #18 0x400bc96c in pthread_create (thread_out=<value optimized out>, attr=0xbedf43b0, start_routine=0x41562c11 <ThreadFunc>, arg=0x4566f3a0) at bionic/libc/bionic/pthread.c:357 > #19 0x00000000 in ?? () > > Breakpoint 2, mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x48845ed8) at /obj-webrtc-opt/ipc/ipdl/PGrallocBufferParent.cpp:64 > 64 return false; > #0 mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x48845ed8) at /obj-webrtc-opt/ipc/ipdl/PGrallocBufferParent.cpp:64 > #1 0x415ba540 in mozilla::layers::ISurfaceAllocator::PlatformDestroySharedSurface (aSurface=0x45625180) at /gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:274 > #2 0x415b6054 in mozilla::layers::ISurfaceAllocator::DestroySharedSurface (this=0x43358444, aSurface=0x45625180) at /gfx/layers/ipc/ISurfaceAllocator.cpp:110 > #3 0x415b7d0e in ~TextureHost (this=0x48269e20, __in_chrg=<value optimized out>) at /gfx/layers/composite/TextureHost.cpp:58 > #4 0x415b7a1a in ~GrallocTextureHostOGL (this=0x48269e20, __in_chrg=<value optimized out>) at /gfx/layers/opengl/TextureHostOGL.cpp:653 > #5 0x415b7a2c in ~GrallocTextureHostOGL (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/opengl/TextureHostOGL.cpp:653 > #6 0x40d35134 in mozilla::RefCounted<mozilla::gfx::GradientStops>::Release (this=0x456f8888, t=0x0) at ../../dist/include/mozilla/RefPtr.h:71 > #7 mozilla::RefPtr<mozilla::gfx::GradientStops>::unref (this=0x456f8888, t=0x0) at ../../dist/include/mozilla/RefPtr.h:166 > #8 mozilla::RefPtr<mozilla::gfx::GradientStops>::assign (this=0x456f8888, t=0x0) at ../../dist/include/mozilla/RefPtr.h:152 > #9 0x41595382 in mozilla::RefPtr<mozilla::layers::CompositableHost>::operator= (this=0x456f8888, t=0x45625180) at ../../dist/include/mozilla/RefPtr.h:127 > #10 0x415afe40 in mozilla::layers::ContentHostDoubleBuffered::DestroyTextures (this=0x456f8820) at /gfx/layers/composite/ContentHost.cpp:303 > #11 0x415b0000 in ~ContentHostDoubleBuffered (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContentHost.cpp:274 > #12 0x415b0034 in ~ContentHostDoubleBuffered (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContentHost.cpp:276 > #13 0x40eb0be8 in mozilla::RefCounted<mozilla::gfx::DrawTarget>::Release (this=<value optimized out>) at ../../../dist/include/mozilla/RefPtr.h:71 > #14 0x415a40c6 in mozilla::RefPtr<mozilla::layers::ContentHost>::unref (this=0x47e6dc00, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:166 > #15 ~RefPtr (this=0x47e6dc00, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:116 > #16 ~ThebesLayerComposite (this=0x47e6dc00, __in_chrg=<value optimized out>) at /gfx/layers/composite/ThebesLayerComposite.cpp:42 > #17 0x415a40f4 in ~ThebesLayerComposite (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ThebesLayerComposite.cpp:42 > #18 0x415a2518 in mozilla::layers::Layer::Release (this=<value optimized out>, aChild=0x47e6dc90) at /gfx/layers/Layers.h:588 > #19 mozilla::layers::ContainerLayerComposite::RemoveChild (this=<value optimized out>, aChild=0x47e6dc90) at /gfx/layers/composite/ContainerLayerComposite.cpp:246 > #20 0x415a2550 in ~ContainerLayerComposite (this=0x460d1c00, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContainerLayerComposite.cpp:176 > #21 0x415a2588 in ~ContainerLayerComposite (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContainerLayerComposite.cpp:178 > #22 0x40cfae68 in gfxDrawable::Release (this=0x48845ed8) at ../../dist/include/gfxDrawable.h:26 > #23 0x415b40aa in ~nsRefPtr (this=0x43358420, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:880 > #24 ~ShadowLayersParent (this=0x43358420, __in_chrg=<value optimized out>) at /gfx/layers/ipc/ShadowLayersParent.cpp:150 > #25 0x415b40d8 in ~ShadowLayersParent (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/ipc/ShadowLayersParent.cpp:150 > #26 0x415aed24 in mozilla::layers::CrossProcessCompositorParent::DeallocPLayers (this=<value optimized out>, aLayers=0x43358420) at /gfx/layers/ipc/CompositorParent.cpp:1407 > #27 0x413566c6 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x49afd480) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:831 > #28 0x41356908 in mozilla::layers::PCompositorParent::OnChannelError (this=0x49afd480) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:664 > #29 0x41317bd4 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:570 > #30 0x41317f3a in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:535 > #31 0x412ff7ae in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at /ipc/chromium/src/base/tuple.h:383 > #32 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at /ipc/chromium/src/base/task.h:307 > #33 0x41550c18 in MessageLoop::RunTask (this=0x46cffdf0, task=0x0) at /ipc/chromium/src/base/message_loop.cc:334 > #34 0x41551a22 in MessageLoop::DeferOrRunPendingTask (this=0x49afd488, pending_task=<value optimized out>) at /ipc/chromium/src/base/message_loop.cc:342 > #35 0x41552600 in MessageLoop::DoWork (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:442 > #36 0x41552880 in base::MessagePumpDefault::Run (this=0x45670400, delegate=0x46cffdf0) at /ipc/chromium/src/base/message_pump_default.cc:23 > #37 0x41550bd4 in MessageLoop::RunInternal (this=0x0) at /ipc/chromium/src/base/message_loop.cc:216 > #38 0x41550c7e in MessageLoop::RunHandler (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:209 > #39 MessageLoop::Run (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:183 > #40 0x41558f5c in base::Thread::ThreadMain (this=0x4566f3a0) at /ipc/chromium/src/base/thread.cc:156 > #41 0x41562c18 in ThreadFunc (closure=0x1) at /ipc/chromium/src/base/platform_thread_posix.cc:39 > #42 0x400bce18 in __thread_entry (func=0x41562c11 <ThreadFunc>, arg=0x4566f3a0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 > #43 0x400bc96c in pthread_create (thread_out=<value optimized out>, attr=0xbedf43b0, start_routine=0x41562c11 <ThreadFunc>, arg=0x4566f3a0) at bionic/libc/bionic/pthread.c:357 > #44 0x00000000 in ?? ()
(In reply to StevenLee from comment #8) > Hi Sotaro, > > I can reproduce on my unagi. It look like we call > PGrallocBufferParent::Send__delete__() after its ActorDestroy() is called > when IPC channel error happens. Here are the stack traces. So the actor is > destroyed on IPC error from the protocol tree once and then by the user > ISurfaceAllocator again. > Thanks for the information. How do you generate the crash? It seems that the crash happens by regressions from Layer refactoring. Bug 860442 seems a related bug.
Attachment #736539 - Flags: review?(jmuizelaar) → review+
Hi StevenLee, the stack trace suggests fucntion mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree(). Do you know why plugin is present in the stack?
Commitable patch. Carry "jmuizelaar: review+".
Attachment #736539 - Attachment is obsolete: true
Attachment #736826 - Flags: review+
Commitable patch for b2g18. Carry "jmuizelaar: review+".
Attachment #736827 - Flags: review+
(In reply to Sotaro Ikeda [:sotaro] from comment #13) > https://tbpl.mozilla.org/?tree=Try&rev=250d1aee57f6 build burn in Fedora. It is not related to the change.
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Sotaro Ikeda [:sotaro] from comment #10) > Hi StevenLee, the stack trace suggests fucntion > mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree(). Do you > know why plugin is present in the stack? Hi Sotaro, it's actually PGrallocBufferParent, not PPluginBackgroundDestroyerParent(). The instances have the same address. I think gdb decoded it incorrectly
(In reply to Cervantes Yu from comment #18) > (In reply to Sotaro Ikeda [:sotaro] from comment #10) > > Hi StevenLee, the stack trace suggests fucntion > > mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree(). Do you > > know why plugin is present in the stack? > > Hi Sotaro, it's actually PGrallocBufferParent, not > PPluginBackgroundDestroyerParent(). The instances have the same address. I > think gdb decoded it incorrectly Thanks for the info.
Blocks: 862097
Comment on attachment 736827 [details] [diff] [review] patch v4 for b2g18 - IPC failure handling in CompositorChild and ShadowLayers Review of attachment 736827 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/ShadowLayersChild.cpp @@ +63,5 @@ > > +void > +ShadowLayersChild::ActorDestroy(ActorDestroyReason why) > +{ > + if (why == AbnormalShutdown) { This kind of code should of had a warning and a comment explaining why. I was looking at this code thinking 'This is really wrong', which really it is because it's a band-aid.
(In reply to Benoit Girard (:BenWa) from comment #20) > Comment on attachment 736827 [details] [diff] [review] > patch v4 for b2g18 - IPC failure handling in CompositorChild and ShadowLayers > > Review of attachment 736827 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/layers/ipc/ShadowLayersChild.cpp > @@ +63,5 @@ > > > > +void > > +ShadowLayersChild::ActorDestroy(ActorDestroyReason why) > > +{ > > + if (why == AbnormalShutdown) { > > This kind of code should of had a warning and a comment explaining why. I > was looking at this code thinking 'This is really wrong', which really it is > because it's a band-aid. Just discussed this with Sotaro. I'm going to do _something_ with it, not 100% sure what, but probably something hacky. I'll add a comment as I do that.
Comment on attachment 736827 [details] [diff] [review] patch v4 for b2g18 - IPC failure handling in CompositorChild and ShadowLayers Review of attachment 736827 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/ShadowLayersChild.cpp @@ +63,5 @@ > > +void > +ShadowLayersChild::ActorDestroy(ActorDestroyReason why) > +{ > + if (why == AbnormalShutdown) { Is this called with AbnormalShutdown when a background app get's sigkilled to free up some memory for the foreground app?
(In reply to Nicolas Silva [:nical] from comment #22) > Is this called with AbnormalShutdown when a background app get's sigkilled > to free up some memory for the foreground app? AbnormalShutdown is called when ipc channel detects an error. It is triggered by OnChannelError(), like following. ------------------------------------------ void AsyncChannel::NotifyMaybeChannelError() { mMonitor->AssertNotCurrentThreadOwns(); // TODO sort out Close() on this side racing with Close() on the // other side if (ChannelClosing == mChannelState) { // the channel closed, but we received a "Goodbye" message // warning us about it. no worries mChannelState = ChannelClosed; NotifyChannelClosed(); return; } // Oops, error! Let the listener know about it. mChannelState = ChannelError; mListener->OnChannelError(); Clear(); }
And it deletes all ipc objects managed under the error receiving ipc object, like following. void PCompositorChild::OnChannelError() { DestroySubtree(AbnormalShutdown); DeallocSubtree(); DeallocShmems(); } void PContentChild::OnChannelError() { DestroySubtree(AbnormalShutdown); DeallocSubtree(); DeallocShmems(); } void PImageBridgeChild::OnChannelError() { DestroySubtree(AbnormalShutdown); DeallocSubtree(); DeallocShmems(); }
Current gecko's ipc is very fragile, if the ipc error happens by some reason, any ipc object could be deleted at any timing. And a lot of gecko's classes directly references the ipc object's raw pointer. like SurfaceDescriptor.
To handle this case, the following ref counting seems to works. https://developer.mozilla.org/en-US/docs/IPDL/Best_Practices
(In reply to Sotaro Ikeda [:sotaro] from comment #25) > Current gecko's ipc is very fragile, if the ipc error happens by some > reason, any ipc object could be deleted at any timing. And a lot of gecko's > classes directly references the ipc object's raw pointer. like > SurfaceDescriptor. SurfaceDescriptor should really not be considered as a class, but more as a temporary object for IPC messaging. Let's never add code that keeps surface descriptors as data again (we must copy the content of surface descriptors into classes that have proper memory management). This applies for all other IPDL defined structures.
IPC failure caused some type of crashes like the following bugs. - Bug 851664, Bug 862097, Bug 857375
Bug 867025 is another type of IPC error it is caused by Camera app. It should be fixed by gecko side, but in the bug, workaround was done in camera app(gaia side). - Bug 867025 - [unagi][tara][weekly build 13.04.17]monkey test crash in mozilla::dom::ContentChild::ProcessingError
Blocks: 925324
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: