Closed Bug 851664 Opened 11 years ago Closed 11 years ago

IPC failure during stability testing

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
1.0.1 Madrid (19apr)
blocking-b2g tef+
Tracking Status
b2g18 --- fixed
b2g18-v1.0.1 --- fixed

People

(Reporter: ggrisco, Assigned: sotaro)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][CR 462592][madrid])

Crash Data

Attachments

(5 files)

Operating system: Android
                  0.0.0 Linux 3.0.21-perf-ga73c871-00003-g40a4c39 #1 SMP PREEMPT Sat Feb 23 19:26:22 PST 2013 armv7l qcom/msm7627a/msm7627a:4.0.4/IMM76I/eng.lnxbuild.20130223.192112:userdebug/test-keys
CPU: arm
     0 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 0 (crashed)
 0  libxul.so!mozalloc_abort [mozalloc_abort.cpp : 30 + 0x4]
     r4 = 0xbef96ed4    r5 = 0x00000000    r6 = 0xffffffff    r7 = 0xbef96ae8
     r8 = 0x40be21ed    r9 = 0x00000001   r10 = 0xbef96ae8    fp = 0x410c9e76
     sp = 0xbef96ad0    lr = 0x41059a3f    pc = 0x41059a42
    Found by: given as instruction pointer in context
 1  libxul.so!NS_DebugBreak_P [nsDebugImpl.cpp : 423 + 0x5]
     r4 = 0xbef96ed4    r5 = 0x00000000    r6 = 0xffffffff    r7 = 0xbef96ae8
     r8 = 0x40be21ed    r9 = 0x00000001   r10 = 0xbef96ae8    fp = 0x410c9e76
     sp = 0xbef96ad8    pc = 0x40be1fd5
    Found by: call frame info
 2  libxul.so!mozilla::layers::PLayersChild::Write [PLayersChild.cpp : 1422 + 0x13]
     r4 = 0x43fb2760    r5 = 0x433a5b60    r6 = 0xbef96fec    r7 = 0x433a5b60
     r8 = 0xbef96fe0    r9 = 0x43a17c9c   r10 = 0x414935c4    fp = 0x43a1b938
     sp = 0xbef96f08    pc = 0x40b5a123
    Found by: call frame info
 3  libxul.so!mozilla::layers::PLayersChild::Write [PLayersChild.cpp : 3047 + 0x5]
     r4 = 0x43fb2760    r5 = 0x433a5b60    r6 = 0xbef96fec    r7 = 0x433a5b60
     r8 = 0xbef96fe0    r9 = 0x43a17c9c   r10 = 0x414935c4    fp = 0x43a1b938
     sp = 0xbef96f20    pc = 0x40b5b461
    Found by: call frame info
 4  libxul.so!mozilla::layers::PLayersChild::Write [PLayersChild.cpp : 1673 + 0x7]
     r4 = 0xbef96fec    r5 = 0x433a5b60    r6 = 0x43fb2760    r7 = 0x433a5b60
     r8 = 0xbef96fe0    r9 = 0x43a17c9c   r10 = 0x414935c4    fp = 0x43a1b938
     sp = 0xbef96f30    pc = 0x40b5b4f1
    Found by: call frame info
 5  libxul.so!mozilla::layers::PLayersChild::Write [PLayersChild.cpp : 1605 + 0x3]
     r4 = 0x00000164    r5 = 0x00000001    r6 = 0x43fb2760    r7 = 0x433a5b60
     r8 = 0xbef96fe0    r9 = 0x43a17c9c   r10 = 0x414935c4    fp = 0x43a1b938
     sp = 0xbef96f58    pc = 0x40b5b5b1
    Found by: call frame info
 6  libxul.so!mozilla::layers::PLayersChild::SendUpdateNoSwap [PLayersChild.cpp : 322 + 0x3]
     r4 = 0x43fb2760    r5 = 0x433a5b60    r6 = 0x00200006    r7 = 0xbef96fe0
     r8 = 0xbef98230    r9 = 0x43a17c9c   r10 = 0x414935c4    fp = 0x43a1b938
     sp = 0xbef96f78    pc = 0x40b5d6a9
    Found by: call frame info
 7  libxul.so!mozilla::layers::ShadowLayerForwarder::EndTransaction [ShadowLayers.cpp : 377 + 0x7]
     r4 = 0x43a1b8e0    r5 = 0xbef98230    r6 = 0x43a17c88    r7 = 0x43a1b928
     r8 = 0x43a1b924    r9 = 0x43a1b914   r10 = 0x414935c4    fp = 0x43a1b938
     sp = 0xbef96fa8    pc = 0x40c5f81d
    Found by: call frame info
 8  libxul.so!mozilla::layers::BasicShadowLayerManager::ForwardTransaction [BasicLayerManager.cpp : 1205 + 0x9]
     r4 = 0x414935c4    r5 = 0x43a17c10    r6 = 0x00000000    r7 = 0xbef982dc
     r8 = 0x00000001    r9 = 0x43a1bac0   r10 = 0x43a23000    fp = 0xbef98d70
     sp = 0xbef982a8    pc = 0x40c3f553
    Found by: call frame info
 9  libxul.so!mozilla::layers::BasicShadowLayerManager::EndEmptyTransaction [BasicLayerManager.cpp : 1193 + 0x5]
     r4 = 0x43a17c10    r5 = 0x43351800    r6 = 0x43878270    r7 = 0x43a35600
     r8 = 0x00000001    r9 = 0x43a1bac0   r10 = 0x43a23000    fp = 0xbef98d70
     sp = 0xbef98c18    pc = 0x40c3f811
    Found by: call frame info
10  libxul.so!PresShell::Paint [nsPresShell.cpp : 5316 + 0x1]
     r4 = 0x43a17c10    r5 = 0x43351800    r6 = 0x43878270    r7 = 0x43a35600
     r8 = 0x00000001    r9 = 0x43a1bac0   r10 = 0x43a23000    fp = 0xbef98d70
     sp = 0xbef98c20    pc = 0x40519a3b
    Found by: call frame info
11  libxul.so!nsViewManager::ProcessPendingUpdatesForView [nsViewManager.cpp : 431 + 0x1f]
     r4 = 0x00000000    r5 = 0x43a1bac0    r6 = 0x43a4d490    r7 = 0xbef98d70
     r8 = 0x40519925    r9 = 0x00000001   r10 = 0x43a35600    fp = 0xbef98e58
     sp = 0xbef98d68    pc = 0x4072c723
    Found by: call frame info
12  libxul.so!nsViewManager::ProcessPendingUpdates [nsViewManager.cpp : 1221 + 0x9]
     r4 = 0x43a4d490    r5 = 0xbef98eb8    r6 = 0x00000003    r7 = 0x4331c820
     r8 = 0xfffffffc    r9 = 0x414935c4   r10 = 0xbef98e0c    fp = 0xbef98e58
     sp = 0xbef98dc0    pc = 0x4072c7bf
    Found by: call frame info
13  libxul.so!nsRefreshDriver::Notify [nsRefreshDriver.cpp : 436 + 0x5]
     r4 = 0x4331c7f0    r5 = 0xbef98eb8    r6 = 0x00000003    r7 = 0x4331c820
     r8 = 0xfffffffc    r9 = 0x414935c4   r10 = 0xbef98e0c    fp = 0xbef98e58
     sp = 0xbef98dc8    pc = 0x4051d50f
    Found by: call frame info
14  libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp : 476 + 0x9]
     r4 = 0x43a2a7c0    r5 = 0x4331c7f0    r6 = 0x00000001    r7 = 0x00005988
     r8 = 0xbef98fa7    r9 = 0x41a06bac   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98f20    pc = 0x40bdfbb5
    Found by: call frame info
15  libxul.so!nsTimerEvent::Run [nsTimerImpl.cpp : 556 + 0x5]
     r4 = 0x43a2a7c0    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x00000001
     r8 = 0xbef98fa7    r9 = 0x41a06bac   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98f58    pc = 0x40bdfc63
    Found by: call frame info
16  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 620 + 0x5]
     r4 = 0x41a06b80    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x00000001
     r8 = 0xbef98fa7    r9 = 0x41a06bac   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98f60    pc = 0x40bddd9b
    Found by: call frame info
17  libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 237 + 0xb]
     r4 = 0x00000000    r5 = 0xbef998b8    r6 = 0x41a022f0    r7 = 0x00000001
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98fa0    pc = 0x40bbe1bf
    Found by: call frame info
18  libxul.so!mozilla::ipc::MessagePump::Run [MessagePump.cpp : 82 + 0x7]
     r4 = 0x41a022e0    r5 = 0xbef998b8    r6 = 0x41a022f0    r7 = 0x00000001
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98fb0    pc = 0x40ad7ce9
    Found by: call frame info
19  libxul.so!mozilla::ipc::MessagePumpForChildProcess::Run [MessagePump.cpp : 231 + 0x7]
     r4 = 0xbef998b8    r5 = 0x41a022e0    r6 = 0xbef998b8    r7 = 0x00000001
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98fd8    pc = 0x40ad7d9b
    Found by: call frame info
20  libxul.so!MessageLoop::RunInternal [message_loop.cc : 216 + 0x5]
     r4 = 0xbef998b8    r5 = 0x43372340    r6 = 0x41a06b80    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98ff0    pc = 0x40bffbb1
    Found by: call frame info
21  libxul.so!MessageLoop::Run [message_loop.cc : 209 + 0x5]
     r4 = 0xbef998b8    r5 = 0x43372340    r6 = 0x41a06b80    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef98ff8    pc = 0x40bffc67
    Found by: call frame info
22  libxul.so!nsBaseAppShell::Run [nsBaseAppShell.cpp : 163 + 0x7]
     r4 = 0x00000000    r5 = 0x43372340    r6 = 0x41a06b80    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99010    pc = 0x40a5e2f1
    Found by: call frame info
23  libxul.so!XRE_RunAppShell [nsEmbedFunctions.cpp : 646 + 0x5]
     r4 = 0xbef99024    r5 = 0x41a022e0    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99020    pc = 0x403fae5d
    Found by: call frame info
24  libxul.so!mozilla::ipc::MessagePumpForChildProcess::Run [MessagePump.cpp : 198 + 0x3]
     r4 = 0xbef998b8    r5 = 0x41a022e0    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99038    pc = 0x40ad7d69
    Found by: call frame info
25  libxul.so!MessageLoop::RunInternal [message_loop.cc : 216 + 0x5]
     r4 = 0xbef998b8    r5 = 0x41a31190    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99050    pc = 0x40bffbb1
    Found by: call frame info
26  libxul.so!MessageLoop::Run [message_loop.cc : 209 + 0x5]
     r4 = 0xbef998b8    r5 = 0x41a31190    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99058    pc = 0x40bffc67
    Found by: call frame info
27  libxul.so!XRE_InitChildProcess [nsEmbedFunctions.cpp : 485 + 0xb]
     r4 = 0xbef998b8    r5 = 0x41a31190    r6 = 0x00000002    r7 = 0x00000003
     r8 = 0x41a23000    r9 = 0x41a28000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99070    pc = 0x403fb201
    Found by: call frame info
28  plugin-container!main [MozillaRuntimeMain.cpp : 48 + 0x5]
     r4 = 0xbef99a14    r5 = 0x00000005    r6 = 0x00000006    r7 = 0xbef99a30
     r8 = 0x00000000    r9 = 0x00000000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef999e8    pc = 0x00008411
    Found by: call frame info
29  libc.so!__libc_init [libc_init_dynamic.c : 114 + 0x7]
     r4 = 0x000083d4    r5 = 0xbef99a14    r6 = 0x00000006    r7 = 0xbef99a30
     r8 = 0x00000000    r9 = 0x00000000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef999f8    pc = 0x400eda77
    Found by: call frame info
30  0xb00045a9
     r4 = 0x00000000    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x00000000
     r8 = 0x00000000    r9 = 0x00000000   r10 = 0x00000000    fp = 0x00000000
     sp = 0xbef99a10    pc = 0xb00045ab
    Found by: call frame info
blocking-b2g: --- → tef?
(tef+ as this is a new stability issue that has appeared since the CS release.)
blocking-b2g: tef? → tef+
Greg, can you please attach the .extra file for this minidump?
Same as bug 848003 and bug 848037.
Crash Signature: [@ mozalloc_abort | NS_DebugBreak_P | mozilla::layers::PLayersChild::Write ]
No longer blocks: 848037
Attached file EXTRA file attachment
This .extra file snippet should be helpful:

Notes=xpcom_runtime_abort([Child 16377] ###!!! ABORT: NULL actor value passed to non-nullable param: file /local/mnt/workspace/lnxbuild/project/release_dev_msm7627a_737943/checkout/out/target/product/msm7627a/obj/objdir-gecko/ipc/ipdl/PLayersChild.cpp, line 1422)
So some codepath is not passing in a PLayerChild to Write().  I'm not clear on what the call stack is, so I can't say how that parameter is getting omitted.  But assuming it's a gfx/layers e10s issue, I'm foisting for now on Chris, who's touched this code recently, to either fix or find an owner.
Component: General → Graphics: Layers
Product: Boot2Gecko → Core
Assignee: nobody → chrislord.net
This sounds like it would be easy to diagnose, but I have no b2g hardware at the moment and I'm on PTO next week (I'm also not familiar with building b2g, though I assume that's not too big a deal).

Reassigning to Joe to find a taker on gfx team.
Assignee: chrislord.net → joe
Assignee: joe → milan
Jeff, can you take a look?
Assignee: milan → jmuizelaar
Jeff, have you had a chance to look at this? Any idea when a fix will be available?
Flags: needinfo?(jmuizelaar)
GDC has gotten in the way of looking at this.
Attached file PLayersChild.cpp
PLayerChild.cpp from Firefox OSv1.01.
(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> Created attachment 730747 [details]
> PLayersChild.cpp
> 
> PLayerChild.cpp from Firefox OSv1.01.


"PLayersChild.cpp : 1422" is following abort. If it is correct PLayerChild is null.

-------------------------------------------
void
PLayersChild::Write(
        PLayerChild* __v,
        Message* __msg,
        bool __nullable)
{
    int32_t id;
    if ((!(__v))) {
        if ((!(__nullable))) {
            NS_RUNTIMEABORT("NULL actor value passed to non-nullable param");
        }
        id = 0;
    }
From crash log, function call sequence is like following.

PresShell::Paint()
->BasicShadowLayerManager::EndEmptyTransaction()
->BasicShadowLayerManager::ForwardTransaction()
->ShadowLayerForwarder::EndTransaction()
->PLayersChild::SendUpdateNoSwap()
->PLayersChild::Write(const InfallibleTArray<Edit>& __v, Message* __msg)
->PLayersChild::Write(const Edit& __v, Message* __msg)
->PLayersChild::Write(const OpSetLayerAttributes& __v,Message* __msg)//((__v).get_OpCreateRefLayer(), __msg);
->PLayersChild::Write(PLayerChild* __v, Message* __msg, bool __nullable)//((__v).layerChild(), __msg, false);//
->abort!!!! //"PLayerChild* __v" might be null
(In reply to Sotaro Ikeda [:sotaro] from comment #13)
> From crash log, function call sequence is like following.
> ->PLayersChild::Write(PLayerChild* __v, Message* __msg, bool
> __nullable)//((__v).layerChild(), __msg, false);//
> ->abort!!!! //"PLayerChild* __v" might be null

PLayerChild is set in the follwoing 

void
ShadowLayerForwarder::CreatedRefLayer(ShadowableLayer* aRef)
{
  CreatedLayer<OpCreateRefLayer>(mTxn, aRef);
}

the argument should not be nullprt because of following code...weird

template<typename CreatedMethod> void
MaybeCreateShadowFor(BasicShadowableLayer* aLayer,
                     BasicShadowLayerManager* aMgr,
                     CreatedMethod aMethod)
{
  if (!aMgr->HasShadowManager()) {
    return;
  }

  PLayerChild* shadow = aMgr->ConstructShadowFor(aLayer);
  // XXX error handling
  NS_ABORT_IF_FALSE(shadow, "failed to create shadow");

  aLayer->SetShadow(shadow);
  (aMgr->*aMethod)(aLayer);
  aMgr->Hold(aLayer->AsLayer());
}

#define MAYBE_CREATE_SHADOW(_type)                                      \
  MaybeCreateShadowFor(layer, this,                                     \
                       &ShadowLayerForwarder::Created ## _type ## Layer)
#define MAYBE_CREATE_SHADOW(Ref) is called in follwoing code
-----------------------------------------------
already_AddRefed<RefLayer>
BasicShadowLayerManager::CreateRefLayer()
{
  NS_ASSERTION(InConstruction(), "Only allowed in construction phase");
  nsRefPtr<BasicShadowableRefLayer> layer =
    new BasicShadowableRefLayer(this);
  MAYBE_CREATE_SHADOW(Ref);
  return layer.forget();
}
Whiteboard: [b2g-crash][BTG-1244] → [b2g-crash][CR 462592]
Assignee: jmuizelaar → sotaro.ikeda.g
> 4  libxul.so!mozilla::layers::PLayersChild::Write [PLayersChild.cpp : 1673 + 0x7]
>     r4 = 0xbef96fec    r5 = 0x433a5b60    r6 = 0x43fb2760    r7 = 0x433a5b60
>     r8 = 0xbef96fe0    r9 = 0x43a17c9c   r10 = 0x414935c4    fp = 0x43a1b938
>     sp = 0xbef96f30    pc = 0x40b5b4f1

In my environment of v1.01, PLayersChild.cpp : 1673 is following. But TOpCreateRefLayer should not be called in child process. This info might be wrong info. The crash happened in child process from log.

>    case __type::TOpCreateRefLayer:
>        {
>            Write((__v).get_OpCreateRefLayer(), __msg); // here !!!!!
>            return;
>        }
When music is playing in background, rendering for music app do not happen. Therefore the crash log should not related to music app. Maybe dialer app.
Greg, is the crash still happen? Bug 851659 is closed by Bug 851659 comment #12
Flags: needinfo?(jmuizelaar) → needinfo?(ggrisco)
Yes, we've seen this same stacktrace in just about every stability run, even recently.
Flags: needinfo?(ggrisco)
(In reply to Greg Grisco from comment #19)
> Yes, we've seen this same stacktrace in just about every stability run, even
> recently.

I can not reproduce the problem. How can I reproduce it?
Greg, is it possible to reproduce it on my phone? Is there a such test case in public?
Flags: needinfo?(ggrisco)
(In reply to Sotaro Ikeda [:sotaro] from comment #21)
> Greg, is it possible to reproduce it on my phone? Is there a such test case
> in public?

Greg, one more question. How long did it take until crash?
Greg, do you know how to get decoded minidump of crash? You attached the mimidump the bug. I do not know how to get the minidump.
Flags: needinfo?(ggrisco)
Hi Sotaro, I'm trying to get more information on STR, I'll report back here if I'm successful.  In the meantime, I attached the mindump.
(In reply to Greg Grisco from comment #25)
> Hi Sotaro, I'm trying to get more information on STR, I'll report back here
> if I'm successful.  In the meantime, I attached the mindump.

Greg, thanks.
The crash might happened by ipc failure or hit file descriptor's rlimit. I think there is a such possibility.
I created a custom rom that decrease file descriptor's rlimit to 160. I got this idea from bug Bug 853977, because b2g process's file descriptor increase very rapidly when multiple apps are running. I easily confirmed that b2g process's file descriptr number became over 1050. Followings are system's default file descriptr's rlimit.
 - soft limit: 1024
 - hard limit: 4096
A stack trace of the setting app's crash by using the rom in comment #28
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> Created attachment 734313 [details]
> stack track of the setting app's crash
> 
> A stack trace of the setting app's crash by using the rom in comment #28

the stack trace is almost same to  attachment 733961 [details].
Then I tend to think the problem happened because of ipc failure or hit file descriptor's rlimit.
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> I created a custom rom that decrease file descriptor's rlimit to 160. I got
> this idea from bug Bug 853977, because b2g process's file descriptor
> increase very rapidly when multiple apps are running. I easily confirmed
> that b2g process's file descriptr number became over 1050. Followings are
> system's default file descriptr's rlimit.
>  - soft limit: 1024
>  - hard limit: 4096

Is there a cost to setting the soft limit to the hard limit?
(In reply to Milan Sreckovic [:milan] from comment #32)
> (In reply to Sotaro Ikeda [:sotaro] from comment #28)
> > I created a custom rom that decrease file descriptor's rlimit to 160. I got
> > this idea from bug Bug 853977, because b2g process's file descriptor
> > increase very rapidly when multiple apps are running. I easily confirmed
> > that b2g process's file descriptr number became over 1050. Followings are
> > system's default file descriptr's rlimit.
> >  - soft limit: 1024
> >  - hard limit: 4096
> 
> Is there a cost to setting the soft limit to the hard limit?

:dhylands, do you know about the rlimit?
Flags: needinfo?(dhylands)
When the process hits the soft limit, then resource allocations will fail.

An unprivileged process is allowed to increase its soft limit up to the hard-limit.

Only root is able to increase the hard-limit.

As far as I can tell, increasing the soft-limit doesn't consume any extra resources (i.e. it's just a number). The resources themselves are all allocated dynamically.
Flags: needinfo?(dhylands)
Greg, can you do the stability test by increasing the file descriptor's rlimit?
(In reply to Sotaro Ikeda [:sotaro] from comment #29)
> Created attachment 734313 [details]
> stack track of the setting app's crash
> 
> A stack trace of the setting app's crash by using the rom in comment #28

It is still not clear what is the cause of the bug. Maybe related to the file descriptor's rlimit or not. attachment 734313 [details] is almost same crash as the bug. From attachment 734313 [details], we can know how such crash happens.
I manually add some logs and confirmed that the following happens before the crashof attachment 734313 [details].

- [1] ipc fail
- [2] call AsyncChannel::NotifyMaybeChannelError()
- [3] call PCompositorChild::OnChannelError()
- [4] call PCompositorChild::DeallocSubtree()
- [5] delete ShadowLayersChild
- [5] delete all LayerChild
- [6] abort PLayersChild::Write() because PLayerChild is null
(In reply to Sotaro Ikeda [:sotaro] from comment #37)
> I manually add some logs and confirmed that the following happens before the
> crashof attachment 734313 [details].
> 
> - [1] ipc fail
> - [2] call AsyncChannel::NotifyMaybeChannelError()
> - [3] call PCompositorChild::OnChannelError()
> - [4] call PCompositorChild::DeallocSubtree()
> - [5] delete ShadowLayersChild
> - [5] delete all LayerChild
> - [6] abort PLayersChild::Write() because PLayerChild is null

It seems that ipc failure is the direct trigger of the crash. When ipc is detected on the ipc channel, ipdl object delete all sub-ipdl object that manages.
I got the adb log by "adb shell logcat -v threadtime > log.txt"

following the logs of the related abort.
###!!! ABORT: NULL actor value passed to non-nullable param: file /home/sotaro/b2g_unagi_22/B2G/objdir-gecko/ipc/ipdl/PLayersChild.cpp, line 1422
isn't it the same as bug 851667?
I think the source of the bug is same.
ipc failure could happen from other reasons like out of memory.
Please add the qawanted keyword if QA can help verify whether this is fixed once bug 851667 lands.
(In reply to Greg Grisco from comment #25)
> Hi Sotaro, I'm trying to get more information on STR, I'll report back here
> if I'm successful.  In the meantime, I attached the mindump.

I'm not sure if it helps, but here's what I learned about the steps to produce the crash with this stacktrace.  It was seen under three different scenarios:

1 (steps performed manually)
  a. Play music in background in repeat mode.
  b. Enable auto answer and receive MT calls randomly from other phones.
  c. Make MO calls continuously using QXDM. (Short duration calls)
2 (steps performed manuall)
  a. Make a DUN call. (Dial up networking call, here we will use phone as modem).
  b. After making DUN call, start YouTube streaming on PC.
  c. Make MO calls using the QXDM. (Short duration calls)
  d. After weekend run, mini dumps are generated in the phone.
3 (steps using script)
  a. Play Music in repeat mode in the background.
  b. Run a script which will do the following things sequentially.
    1. Airplane mode on and off, 
    2. MO call and 
    3. Mo SMS.
    (Eg: First airplane mode on and off will be done for 5 times, then it will make 5 MO calls for each call wait for 10 seconds and ends the call then it will send 5 MO SMS )
  c. After night run we have seen mini dumps in the phone.
In FirefoxOS v1.01 and v1.1, ShadowLayersChild in content process is not deleted except IPC failure. And the crush happens when dowLayersChild is deleted and ShadowLayerForwarder do not know about it.
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> In FirefoxOS v1.01 and v1.1, ShadowLayersChild in content process is not
> deleted except IPC failure. And the crush happens when dowLayersChild is
> deleted and ShadowLayerForwarder do not know about it.

Correction.
The crush happens when ShadowLayersChild is deleted and ShadowLayerForwarder do not know about it. ShadowLayersChild is not deleted when corresponding ShadowLayerForwarder is still present except IPC failure case.
Change bug name to to more correct name.
Summary: Crash seen while playing music in background in repeat mode while on MO call → IPC failure during stability testing
Depends on: 860892
Bug 860892 will handle the crash by IPC failure. Keep this bug open to track why IPC failure happens.
(In reply to Sotaro Ikeda [:sotaro] from comment #48)
> Bug 860892 will handle the crash by IPC failure. Keep this bug open to track
> why IPC failure happens.

so, should be this bug not be a blocker and bug 860892 become the blocker one?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Daniel Coloma:dcoloma from comment #49)
> (In reply to Sotaro Ikeda [:sotaro] from comment #48)
> > Bug 860892 will handle the crash by IPC failure. Keep this bug open to track
> > why IPC failure happens.
> 
> so, should be this bug not be a blocker and bug 860892 become the blocker
> one?

bug 860892 do not solve the problem. It just make clear IPC failure happens. It is a unrecoverable error. Then just abort the process with the error log.

There is still a problem why ipc failure happens. It might be just a out of memory. But it is not clear.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Sotaro Ikeda [:sotaro] from comment #50)
> (In reply to Daniel Coloma:dcoloma from comment #49)
> > (In reply to Sotaro Ikeda [:sotaro] from comment #48)
> > > Bug 860892 will handle the crash by IPC failure. Keep this bug open to track
> > > why IPC failure happens.
> > 
> > so, should be this bug not be a blocker and bug 860892 become the blocker
> > one?
> 
> bug 860892 do not solve the problem. It just make clear IPC failure happens.
> It is a unrecoverable error. Then just abort the process with the error log.
> 
> There is still a problem why ipc failure happens. It might be just a out of
> memory. But it is not clear.

So I can not say that the bug could not be a blocker now.
Depends on: 862097
From Bug 862097, IPC failure could be detected by content process, when b2g process crashed.
There are cases that Crash reporter does not send b2g process' crash. It is Bug 855040.
(In reply to Sotaro Ikeda [:sotaro] from comment #53)
> There are cases that Crash reporter does not send b2g process' crash. It is
> Bug 855040.

Bug 862097 is this case. The crash happened in b2g process first. Then content process detect IPC failure from the crash. Only content process failure is reported.
Whiteboard: [b2g-crash][CR 462592] → [b2g-crash][CR 462592][madrid]
Set FIXED from Bug 862097 is FIXED. If other IPC connection failure happens, a new bug should be created.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Madrid (19apr)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: