Closed
Bug 860892
Opened 12 years ago
Closed 12 years ago
Add IPC failure handling in CompositorChild and ShadowLayers
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
People
(Reporter: sotaro, Assigned: sotaro)
References
Details
Attachments
(2 files, 3 obsolete files)
2.25 KB,
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
2.33 KB,
patch
|
sotaro
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #851664 +++
Assignee | ||
Updated•12 years ago
|
Summary: Add IPC failure handling in CompositorChild → Add IPC failure handling in CompositorChild and ShadowLayers
Assignee | ||
Comment 1•12 years ago
|
||
When IPC failure happens on CompositorChild or ShadowLayers, All managed objects under them are automatically deleted by IPDL generated code. After that, Application can not draw it's content anymore.
Assignee | ||
Updated•12 years ago
|
blocking-b2g: tef+ → ---
Assignee | ||
Updated•12 years ago
|
Severity: critical → normal
Assignee | ||
Comment 2•12 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #1)
> When IPC failure happens on CompositorChild or ShadowLayers, All managed
> objects under them are automatically deleted by IPDL generated code. After
> that, Application can not draw it's content anymore.
Current gecko does not handle this case correctly, it causes a crash in different source code.
Assignee | ||
Comment 3•12 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #2)
> (In reply to Sotaro Ikeda [:sotaro] from comment #1)
> > When IPC failure happens on CompositorChild or ShadowLayers, All managed
> > objects under them are automatically deleted by IPDL generated code. After
> > that, Application can not draw it's content anymore.
>
> Current gecko does not handle this case correctly, it causes a crash in
> different source code.
example crash is Bug 851664.
Assignee | ||
Comment 4•12 years ago
|
||
I talked about it with jrmuizel, nical, BenWa. The conclusion is following.
- abort the app process with log is a good way to handle this situation
Assignee | ||
Comment 5•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
blocking-b2g: --- → tef?
Assignee | ||
Updated•12 years ago
|
Attachment #736488 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 6•12 years ago
|
||
Change to abort even when the current process is chrome process.
Attachment #736488 -
Attachment is obsolete: true
Attachment #736488 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•12 years ago
|
Attachment #736512 -
Flags: review?(jmuizelaar)
Assignee | ||
Comment 7•12 years ago
|
||
fix nit
Attachment #736512 -
Attachment is obsolete: true
Attachment #736512 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•12 years ago
|
Attachment #736539 -
Flags: review?(jmuizelaar)
Updated•12 years ago
|
blocking-b2g: tef? → tef+
Comment 8•12 years ago
|
||
Hi Sotaro,
I can reproduce on my unagi. It look like we call PGrallocBufferParent::Send__delete__() after its ActorDestroy() is called when IPC channel error happens. Here are the stack traces. So the actor is destroyed on IPC error from the protocol tree once and then by the user ISurfaceAllocator again.
> (this is actually PGrallocBufferParent::DestroySubtree())
> Breakpoint 1, mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree (this=0x48845ed8, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PPluginBackgroundDestroyerParent.cpp:328
> 328 Unregister(mId);
> #0 mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree (this=0x48845ed8, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PPluginBackgroundDestroyerParent.cpp:328
> #1 0x4135e108 in mozilla::layers::PLayersParent::DestroySubtree (this=0x43358420, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PLayersParent.cpp:776
> #2 0x413568d6 in mozilla::layers::PCompositorParent::DestroySubtree (this=0x49afd480, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::AbnormalShutdown) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:812
> #3 0x41356902 in mozilla::layers::PCompositorParent::OnChannelError (this=0x49afd480) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:663
> #4 0x41317bd4 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:570
> #5 0x41317f3a in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:535
> #6 0x412ff7ae in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at /ipc/chromium/src/base/tuple.h:383
> #7 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at /ipc/chromium/src/base/task.h:307
> #8 0x41550c18 in MessageLoop::RunTask (this=0x46cffdf0, task=0x0) at /ipc/chromium/src/base/message_loop.cc:334
> #9 0x41551a22 in MessageLoop::DeferOrRunPendingTask (this=0x49afd488, pending_task=<value optimized out>) at /ipc/chromium/src/base/message_loop.cc:342
> #10 0x41552600 in MessageLoop::DoWork (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:442
> #11 0x41552880 in base::MessagePumpDefault::Run (this=0x45670400, delegate=0x46cffdf0) at /ipc/chromium/src/base/message_pump_default.cc:23
> #12 0x41550bd4 in MessageLoop::RunInternal (this=0x0) at /ipc/chromium/src/base/message_loop.cc:216
> #13 0x41550c7e in MessageLoop::RunHandler (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:209
> #14 MessageLoop::Run (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:183
> #15 0x41558f5c in base::Thread::ThreadMain (this=0x4566f3a0) at /ipc/chromium/src/base/thread.cc:156
> #16 0x41562c18 in ThreadFunc (closure=0x1) at /ipc/chromium/src/base/platform_thread_posix.cc:39
> #17 0x400bce18 in __thread_entry (func=0x41562c11 <ThreadFunc>, arg=0x4566f3a0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
> #18 0x400bc96c in pthread_create (thread_out=<value optimized out>, attr=0xbedf43b0, start_routine=0x41562c11 <ThreadFunc>, arg=0x4566f3a0) at bionic/libc/bionic/pthread.c:357
> #19 0x00000000 in ?? ()
>
> Breakpoint 2, mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x48845ed8) at /obj-webrtc-opt/ipc/ipdl/PGrallocBufferParent.cpp:64
> 64 return false;
> #0 mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x48845ed8) at /obj-webrtc-opt/ipc/ipdl/PGrallocBufferParent.cpp:64
> #1 0x415ba540 in mozilla::layers::ISurfaceAllocator::PlatformDestroySharedSurface (aSurface=0x45625180) at /gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:274
> #2 0x415b6054 in mozilla::layers::ISurfaceAllocator::DestroySharedSurface (this=0x43358444, aSurface=0x45625180) at /gfx/layers/ipc/ISurfaceAllocator.cpp:110
> #3 0x415b7d0e in ~TextureHost (this=0x48269e20, __in_chrg=<value optimized out>) at /gfx/layers/composite/TextureHost.cpp:58
> #4 0x415b7a1a in ~GrallocTextureHostOGL (this=0x48269e20, __in_chrg=<value optimized out>) at /gfx/layers/opengl/TextureHostOGL.cpp:653
> #5 0x415b7a2c in ~GrallocTextureHostOGL (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/opengl/TextureHostOGL.cpp:653
> #6 0x40d35134 in mozilla::RefCounted<mozilla::gfx::GradientStops>::Release (this=0x456f8888, t=0x0) at ../../dist/include/mozilla/RefPtr.h:71
> #7 mozilla::RefPtr<mozilla::gfx::GradientStops>::unref (this=0x456f8888, t=0x0) at ../../dist/include/mozilla/RefPtr.h:166
> #8 mozilla::RefPtr<mozilla::gfx::GradientStops>::assign (this=0x456f8888, t=0x0) at ../../dist/include/mozilla/RefPtr.h:152
> #9 0x41595382 in mozilla::RefPtr<mozilla::layers::CompositableHost>::operator= (this=0x456f8888, t=0x45625180) at ../../dist/include/mozilla/RefPtr.h:127
> #10 0x415afe40 in mozilla::layers::ContentHostDoubleBuffered::DestroyTextures (this=0x456f8820) at /gfx/layers/composite/ContentHost.cpp:303
> #11 0x415b0000 in ~ContentHostDoubleBuffered (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContentHost.cpp:274
> #12 0x415b0034 in ~ContentHostDoubleBuffered (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContentHost.cpp:276
> #13 0x40eb0be8 in mozilla::RefCounted<mozilla::gfx::DrawTarget>::Release (this=<value optimized out>) at ../../../dist/include/mozilla/RefPtr.h:71
> #14 0x415a40c6 in mozilla::RefPtr<mozilla::layers::ContentHost>::unref (this=0x47e6dc00, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:166
> #15 ~RefPtr (this=0x47e6dc00, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:116
> #16 ~ThebesLayerComposite (this=0x47e6dc00, __in_chrg=<value optimized out>) at /gfx/layers/composite/ThebesLayerComposite.cpp:42
> #17 0x415a40f4 in ~ThebesLayerComposite (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ThebesLayerComposite.cpp:42
> #18 0x415a2518 in mozilla::layers::Layer::Release (this=<value optimized out>, aChild=0x47e6dc90) at /gfx/layers/Layers.h:588
> #19 mozilla::layers::ContainerLayerComposite::RemoveChild (this=<value optimized out>, aChild=0x47e6dc90) at /gfx/layers/composite/ContainerLayerComposite.cpp:246
> #20 0x415a2550 in ~ContainerLayerComposite (this=0x460d1c00, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContainerLayerComposite.cpp:176
> #21 0x415a2588 in ~ContainerLayerComposite (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/composite/ContainerLayerComposite.cpp:178
> #22 0x40cfae68 in gfxDrawable::Release (this=0x48845ed8) at ../../dist/include/gfxDrawable.h:26
> #23 0x415b40aa in ~nsRefPtr (this=0x43358420, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:880
> #24 ~ShadowLayersParent (this=0x43358420, __in_chrg=<value optimized out>) at /gfx/layers/ipc/ShadowLayersParent.cpp:150
> #25 0x415b40d8 in ~ShadowLayersParent (this=0x48845ed8, __in_chrg=<value optimized out>) at /gfx/layers/ipc/ShadowLayersParent.cpp:150
> #26 0x415aed24 in mozilla::layers::CrossProcessCompositorParent::DeallocPLayers (this=<value optimized out>, aLayers=0x43358420) at /gfx/layers/ipc/CompositorParent.cpp:1407
> #27 0x413566c6 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x49afd480) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:831
> #28 0x41356908 in mozilla::layers::PCompositorParent::OnChannelError (this=0x49afd480) at /obj-webrtc-opt/ipc/ipdl/PCompositorParent.cpp:664
> #29 0x41317bd4 in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:570
> #30 0x41317f3a in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x49afd488) at /ipc/glue/AsyncChannel.cpp:535
> #31 0x412ff7ae in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at /ipc/chromium/src/base/tuple.h:383
> #32 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at /ipc/chromium/src/base/task.h:307
> #33 0x41550c18 in MessageLoop::RunTask (this=0x46cffdf0, task=0x0) at /ipc/chromium/src/base/message_loop.cc:334
> #34 0x41551a22 in MessageLoop::DeferOrRunPendingTask (this=0x49afd488, pending_task=<value optimized out>) at /ipc/chromium/src/base/message_loop.cc:342
> #35 0x41552600 in MessageLoop::DoWork (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:442
> #36 0x41552880 in base::MessagePumpDefault::Run (this=0x45670400, delegate=0x46cffdf0) at /ipc/chromium/src/base/message_pump_default.cc:23
> #37 0x41550bd4 in MessageLoop::RunInternal (this=0x0) at /ipc/chromium/src/base/message_loop.cc:216
> #38 0x41550c7e in MessageLoop::RunHandler (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:209
> #39 MessageLoop::Run (this=0x46cffdf0) at /ipc/chromium/src/base/message_loop.cc:183
> #40 0x41558f5c in base::Thread::ThreadMain (this=0x4566f3a0) at /ipc/chromium/src/base/thread.cc:156
> #41 0x41562c18 in ThreadFunc (closure=0x1) at /ipc/chromium/src/base/platform_thread_posix.cc:39
> #42 0x400bce18 in __thread_entry (func=0x41562c11 <ThreadFunc>, arg=0x4566f3a0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
> #43 0x400bc96c in pthread_create (thread_out=<value optimized out>, attr=0xbedf43b0, start_routine=0x41562c11 <ThreadFunc>, arg=0x4566f3a0) at bionic/libc/bionic/pthread.c:357
> #44 0x00000000 in ?? ()
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to StevenLee from comment #8)
> Hi Sotaro,
>
> I can reproduce on my unagi. It look like we call
> PGrallocBufferParent::Send__delete__() after its ActorDestroy() is called
> when IPC channel error happens. Here are the stack traces. So the actor is
> destroyed on IPC error from the protocol tree once and then by the user
> ISurfaceAllocator again.
>
Thanks for the information. How do you generate the crash? It seems that the crash happens by regressions from Layer refactoring. Bug 860442 seems a related bug.
Updated•12 years ago
|
Attachment #736539 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 10•12 years ago
|
||
Hi StevenLee, the stack trace suggests fucntion mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree(). Do you know why plugin is present in the stack?
Assignee | ||
Comment 11•12 years ago
|
||
Commitable patch. Carry "jmuizelaar: review+".
Attachment #736539 -
Attachment is obsolete: true
Attachment #736826 -
Flags: review+
Assignee | ||
Comment 12•12 years ago
|
||
Commitable patch for b2g18. Carry "jmuizelaar: review+".
Attachment #736827 -
Flags: review+
Assignee | ||
Comment 13•12 years ago
|
||
Assignee | ||
Comment 14•12 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #13)
> https://tbpl.mozilla.org/?tree=Try&rev=250d1aee57f6
build burn in Fedora. It is not related to the change.
Assignee | ||
Updated•12 years ago
|
Keywords: checkin-needed
Comment 15•12 years ago
|
||
Keywords: checkin-needed
Comment 16•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Comment 17•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/c8767792d409
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/91fc54411a34
status-b2g18:
--- → fixed
status-b2g18-v1.0.0:
--- → wontfix
status-b2g18-v1.0.1:
--- → fixed
status-firefox21:
--- → wontfix
status-firefox22:
--- → wontfix
status-firefox23:
--- → fixed
Comment 18•12 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> Hi StevenLee, the stack trace suggests fucntion
> mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree(). Do you
> know why plugin is present in the stack?
Hi Sotaro, it's actually PGrallocBufferParent, not PPluginBackgroundDestroyerParent(). The instances have the same address. I think gdb decoded it incorrectly
Assignee | ||
Comment 19•12 years ago
|
||
(In reply to Cervantes Yu from comment #18)
> (In reply to Sotaro Ikeda [:sotaro] from comment #10)
> > Hi StevenLee, the stack trace suggests fucntion
> > mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree(). Do you
> > know why plugin is present in the stack?
>
> Hi Sotaro, it's actually PGrallocBufferParent, not
> PPluginBackgroundDestroyerParent(). The instances have the same address. I
> think gdb decoded it incorrectly
Thanks for the info.
Comment 20•11 years ago
|
||
Comment on attachment 736827 [details] [diff] [review]
patch v4 for b2g18 - IPC failure handling in CompositorChild and ShadowLayers
Review of attachment 736827 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/ShadowLayersChild.cpp
@@ +63,5 @@
>
> +void
> +ShadowLayersChild::ActorDestroy(ActorDestroyReason why)
> +{
> + if (why == AbnormalShutdown) {
This kind of code should of had a warning and a comment explaining why. I was looking at this code thinking 'This is really wrong', which really it is because it's a band-aid.
Comment 21•11 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #20)
> Comment on attachment 736827 [details] [diff] [review]
> patch v4 for b2g18 - IPC failure handling in CompositorChild and ShadowLayers
>
> Review of attachment 736827 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: gfx/layers/ipc/ShadowLayersChild.cpp
> @@ +63,5 @@
> >
> > +void
> > +ShadowLayersChild::ActorDestroy(ActorDestroyReason why)
> > +{
> > + if (why == AbnormalShutdown) {
>
> This kind of code should of had a warning and a comment explaining why. I
> was looking at this code thinking 'This is really wrong', which really it is
> because it's a band-aid.
Just discussed this with Sotaro. I'm going to do _something_ with it, not 100% sure what, but probably something hacky. I'll add a comment as I do that.
Comment 22•11 years ago
|
||
Comment on attachment 736827 [details] [diff] [review]
patch v4 for b2g18 - IPC failure handling in CompositorChild and ShadowLayers
Review of attachment 736827 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/ShadowLayersChild.cpp
@@ +63,5 @@
>
> +void
> +ShadowLayersChild::ActorDestroy(ActorDestroyReason why)
> +{
> + if (why == AbnormalShutdown) {
Is this called with AbnormalShutdown when a background app get's sigkilled to free up some memory for the foreground app?
Assignee | ||
Comment 23•11 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #22)
> Is this called with AbnormalShutdown when a background app get's sigkilled
> to free up some memory for the foreground app?
AbnormalShutdown is called when ipc channel detects an error. It is triggered by OnChannelError(), like following.
------------------------------------------
void
AsyncChannel::NotifyMaybeChannelError()
{
mMonitor->AssertNotCurrentThreadOwns();
// TODO sort out Close() on this side racing with Close() on the
// other side
if (ChannelClosing == mChannelState) {
// the channel closed, but we received a "Goodbye" message
// warning us about it. no worries
mChannelState = ChannelClosed;
NotifyChannelClosed();
return;
}
// Oops, error! Let the listener know about it.
mChannelState = ChannelError;
mListener->OnChannelError();
Clear();
}
Assignee | ||
Comment 24•11 years ago
|
||
And it deletes all ipc objects managed under the error receiving ipc object, like following.
void
PCompositorChild::OnChannelError()
{
DestroySubtree(AbnormalShutdown);
DeallocSubtree();
DeallocShmems();
}
void
PContentChild::OnChannelError()
{
DestroySubtree(AbnormalShutdown);
DeallocSubtree();
DeallocShmems();
}
void
PImageBridgeChild::OnChannelError()
{
DestroySubtree(AbnormalShutdown);
DeallocSubtree();
DeallocShmems();
}
Assignee | ||
Comment 25•11 years ago
|
||
Current gecko's ipc is very fragile, if the ipc error happens by some reason, any ipc object could be deleted at any timing. And a lot of gecko's classes directly references the ipc object's raw pointer. like SurfaceDescriptor.
Assignee | ||
Comment 26•11 years ago
|
||
To handle this case, the following ref counting seems to works.
https://developer.mozilla.org/en-US/docs/IPDL/Best_Practices
Comment 27•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #25)
> Current gecko's ipc is very fragile, if the ipc error happens by some
> reason, any ipc object could be deleted at any timing. And a lot of gecko's
> classes directly references the ipc object's raw pointer. like
> SurfaceDescriptor.
SurfaceDescriptor should really not be considered as a class, but more as a temporary object for IPC messaging. Let's never add code that keeps surface descriptors as data again (we must copy the content of surface descriptors into classes that have proper memory management). This applies for all other IPDL defined structures.
Assignee | ||
Comment 28•11 years ago
|
||
IPC failure caused some type of crashes like the following bugs.
- Bug 851664, Bug 862097, Bug 857375
Assignee | ||
Comment 29•11 years ago
|
||
Bug 867025 is another type of IPC error it is caused by Camera app. It should be fixed by gecko side, but in the bug, workaround was done in camera app(gaia side).
- Bug 867025 - [unagi][tara][weekly build 13.04.17]monkey test crash in mozilla::dom::ContentChild::ProcessingError
You need to log in
before you can comment on or make changes to this bug.
Description
•