Closed Bug 799538 Opened 9 years ago Closed 9 years ago

[Dialer App] Crash after navigating around the dial pad when run OOP

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+)

RESOLVED DUPLICATE of bug 808743
B2G C1 (to 19nov)
blocking-basecamp +

People

(Reporter: tchung, Assigned: hsinyi)

Details

(Whiteboard: [LOE:L])

Attachments

(3 files)

Attached file logcat
This is not always reproducible, but using the Dialer keypad on a connected call will sometimes crash the app and exit the call.  My recent repro and logcat attached.

Logcat
10-09 09:23:27.115: E/AudioHardwareMSM76XXA(112): setMode:mode = 0, status = 0
10-09 09:23:27.115: E/AudioPolicyManager(112): setPhoneState() state 0
10-09 09:23:27.115: W/AudioFlinger(112): Thread AudioOut_1 cannot connect to the power manager service
10-09 09:23:27.115: E/AudioHardwareMSM76XXA(112): Routing audio to Speakerphone
10-09 09:23:27.115: E/AudioHardwareMSM76XXA(112): doAudioRouteOrMute, mFM_route = -1 , device=6
10-09 09:23:27.115: E/AudioHardwareMSM76XXA(112): rpc_snd_set_device(6, 1, 0)
10-09 09:23:27.115: E/AudioHardwareMSM76XXA(112): rpc_snd_set_volume(38, 0, 7)
10-09 09:23:27.125: E/GeckoConsole(3072): [JavaScript Warning: "Error in parsing value for 'font-size'.  Declaration dropped." {file: "app://communications.gaiamobile.org/dialer/index.html#keyboard-view" line: 0}]
10-09 09:23:27.145: D/memalloc(453): /dev/pmem: Allocated buffer base:0x4bb00000 size:614400 offset:2252800 fd:127
10-09 09:23:27.155: D/memalloc(3072): /dev/pmem: Mapped buffer base:0x463c4000 size:2867200 offset:2252800 fd:44
10-09 09:23:27.235: D/memalloc(453): /dev/pmem: Allocated buffer base:0x4bb00000 size:614400 offset:3481600 fd:142
10-09 09:23:27.245: D/memalloc(3072): /dev/pmem: Mapped buffer base:0x466f0000 size:4096000 offset:3481600 fd:47
10-09 09:23:27.245: E/AudioHardwareMSM76XXA(112): setMode:mode = 0, status = -17
10-09 09:23:27.245: E/AudioPolicyManager(112): setPhoneState() state 0
10-09 09:23:27.245: W/AudioPolicyManager(112): setPhoneState() setting same state 0
10-09 09:23:27.245: E/GeckoConsole(453): [JavaScript Warning: "Error in parsing value for 'font-size'.  Declaration dropped." {file: "app://communications.gaiamobile.org/dialer/index.html#keyboard-view" line: 0}]
10-09 09:23:27.365: E/GeckoConsole(3072): Content JS LOG at app://communications.gaiamobile.org/dialer/js/recents_db.js:20 in anonymous: DB Opened
10-09 09:23:27.385: E/GeckoConsole(453): Content JS LOG at app://communications.gaiamobile.org/dialer/js/recents_db.js:20 in anonymous: DB Opened
10-09 09:23:27.566: I/IdleService(453): Get idle time: time since reset 747 msec
10-09 09:23:27.566: I/IdleService(453): Idle timer callback: current idle time 747 msec
10-09 09:23:27.566: I/IdleService(453): next timeout 1349799807835931 usec (252 msec from now)
10-09 09:23:27.566: I/IdleService(453): SetTimerExpiryIfBefore: next timeout 1349799807835931 usec
10-09 09:23:27.566: I/IdleService(453): reset timer expiry from 0 usec to 1349799807845931 usec
10-09 09:23:27.586: I/ONCRPC(110): Setup RPC Call for task 4037e420
10-09 09:23:27.586: I/ONCRPC(110): oncrpc_xdr_call_msg_start: Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
10-09 09:23:27.586: I/ONCRPC(110): xdr_std_msg_send_call: Sent Xid: 124, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
10-09 09:23:27.586: I/ONCRPC(110): xdr_std_msg_send_call: Received Reply Xid: 124, Prog: 30000000, Ver: ddc9bd91, Proc: 000000b0
10-09 09:23:27.586: I/ONCRPC(110): rpc_handle_rpc_call: for Xid: 72b, Prog: 31000000, Vers: fc37ad5c, Proc: 00000004
10-09 09:23:27.586: I/ONCRPC(110): rpc_handle_rpc_call: Find Status: 0 Xid: 72b
10-09 09:23:27.586: I/ONCRPC(110): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 72b
10-09 09:23:27.596: I/ONCRPC(110): rpc_handle_rpc_call: for Xid: 72c, Prog: 31000000, Vers: fc37ad5c, Proc: 0000000f
10-09 09:23:27.596: I/ONCRPC(110): rpc_handle_rpc_call: Find Status: 0 Xid: 72c
10-09 09:23:27.596: I/ONCRPC(110): oncrpc_proxy_handle_cmd_rpc_call: Dispatching xid: 72c
10-09 09:23:27.716: E/GeckoConsole(3072): Content JS LOG at app://communications.gaiamobile.org/dialer/js/recents_db.js:45 in rbdm_close: DB Closed
10-09 09:23:27.726: D/memalloc(3072): /dev/pmem: Unmapping buffer base:0x463c4000 size:2867200 offset:2252800
10-09 09:23:27.726: E/GeckoConsole(453): Content JS LOG at app://communications.gaiamobile.org/dialer/js/recents_db.js:45 in rbdm_close: DB Closed
10-09 09:23:27.776: D/memalloc(453): /dev/pmem: Freeing buffer base:0x4bd26000 size:614400 offset:2252800 fd:127
10-09 09:23:27.776: D/memalloc(3072): /dev/pmem: Unmapping buffer base:0x466f0000 size:4096000 offset:3481600
10-09 09:23:27.776: D/memalloc(453): /dev/pmem: Freeing buffer base:0x4be52000 size:614400 offset:3481600 fd:142
10-09 09:23:27.866: I/IdleService(453): Get idle time: time since reset 1049 msec
10-09 09:23:27.866: I/IdleService(453): Idle timer callback: current idle time 1049 msec
10-09 09:23:27.866: I/IdleService(453): next timeout 5644767101835931 usec (4294967293950 msec from now)
10-09 09:23:27.866: I/IdleService(453): SetTimerExpiryIfBefore: next timeout 5644767101835931 usec
10-09 09:23:27.866: I/IdleService(453): reset timer expiry from 0 usec to 5644767101845931 usec
10-09 09:23:27.866: I/IdleService(453): Idle timer callback: tell observer 48c30d30 user is idle
10-09 09:23:27.866: I/IdleService(453): Get idle time: time since reset 1049 msec
10-09 09:23:27.886: D/memalloc(453): /dev/pmem: Freeing buffer base:0x4bc90000 size:614400 offset:1638400 fd:128
10-09 09:23:27.886: D/memalloc(3072): /dev/pmem: Unmapping buffer base:0x45b00000 size:2252800 offset:1638400
10-09 09:23:27.886: D/memalloc(453): /dev/pmem: Freeing buffer base:0x4bdbc000 size:614400 offset:2867200 fd:150
10-09 09:23:27.916: I/Gecko(3072): [Child 3072] ###!!! ABORT: aborting because of fatal error: file /data/jenkins/jobs/build-otoro-ics-eu/workspace/gecko/dom/ipc/ContentChild.cpp, line 820

Repro:
1) install otoro EU daily 10-09-2012
** Gaia: 5f7dbe06490e6b52a356e033f52fb31c0fa0e2d0
** gecko: 1f41d3603a035d64c3b1d7c16dba5fd1ce322957
2) launch dialer, and make a call to someone automated (eg. 650-903-0800)
3) open the keypad, and hit a bunch of numbers along with the guided automated voice
4) app crashes

Expected:
- interaction with the dialer app should not crash

Actual:
- crash and exit the call.
This will probably be way easier to track once we get the memory consumption monitoring (something like the framerate on the top left of the screen). Vivien has something in the works.
blocking-basecamp: ? → +
Hsinyi, please follow up this issue, thanks.
Assignee: nobody → htsai
Here's my steps:
1) launch dialer, and make a call to someone automated 
2) open the keypad, and hit a bunch of numbers along with the guided automated voice
3) hang up the call
4) app crashes

Note, it is easier to reproduce this issue if more steps are taken according to the guided voice. When I took only one step, i.e. press one time, I never encountered the crash.

<<The gdb log>>
=========================================
:~/WorkSpace/mozilla/B2G-otoro$ ./run-gdb.sh attach 1157
Attached; pid = 1157
Listening on port 12157
prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin/arm-linux-androideabi-gdb -x /tmp/b2g.gdbinit.hsinyi /home/WorkSpace/mozilla/B2G-otoro/objdir-gecko/dist/bin/plugin-container
GNU gdb (GDB) 7.1-android-gg2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-linux-gnu --target=arm-elf-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/WorkSpace/mozilla/B2G-otoro/objdir-gecko/dist/bin/plugin-container...done.
Remote debugging from host 127.0.0.1
warning: .dynamic section for "/home/WorkSpace/mozilla/B2G-otoro/out/target/product/otoro/symbols/system/lib/libgui.so" is not at the expected address (wrong library or version mismatch?)
__divsi3 () at /tmp/android-build-bb7e003d31d08f72cabc269a652912b7/src/build/../gcc/gcc-4.4.3/libgcc/../gcc/config/arm/lib1funcs.asm:961
961	/tmp/android-build-bb7e003d31d08f72cabc269a652912b7/src/build/../gcc/gcc-4.4.3/libgcc/../gcc/config/arm/lib1funcs.asm: No such file or directory
	in /tmp/android-build-bb7e003d31d08f72cabc269a652912b7/src/build/../gcc/gcc-4.4.3/libgcc/../gcc/config/arm/lib1funcs.asm
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x411388cc in mozalloc_abort (msg=<value optimized out>) at /home/WorkSpace/mozilla/B2G-otoro/gecko/memory/mozalloc/mozalloc_abort.cpp:23
23	    MOZ_CRASH();
(gdb) bt
#0  0x411388cc in mozalloc_abort (msg=<value optimized out>) at /home/WorkSpace/mozilla/B2G-otoro/gecko/memory/mozalloc/mozalloc_abort.cpp:23
#1  0x400d4404 in pthread_key_delete (key=10) at bionic/libc/bionic/pthread.c:1701
#2  0x3120646c in ?? ()
#3  0x3120646c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Built version for comment 3:
gecko: 1f41d3603a035d64c3b1d7c16dba5fd1ce322957
gaia: 87d892f9dd19ef960dad715014945be714120313
More debug info:
========================================
Every 1.0s: adb shell procrank | egrep "(RAM|b2g)"                                                                                                                    Mon Oct 15 17:07:37 2012

  105   83080K   65780K   58905K   56264K  /system/b2g/b2g
  366   56532K   33720K   26603K   23752K  /system/b2g/plugin-container
  261   39596K   31548K   24511K   21732K  /system/b2g/plugin-container
  436   14144K   14140K    9909K    8640K  /system/b2g/plugin-container
RAM: 172880K total, 23512K free, 0K buffers, 17784K cached, 100K shmem, 8228K slab

=============================
Process 366 is the one dialer app ran upon. When it crashed, there was 23512K free memory. We may exclude OOM from the cause of this issue.
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #5)
> More debug info:
> ========================================
> Every 1.0s: adb shell procrank | egrep "(RAM|b2g)"                          
> Mon Oct 15 17:07:37 2012
> 
>   105   83080K   65780K   58905K   56264K  /system/b2g/b2g
>   366   56532K   33720K   26603K   23752K  /system/b2g/plugin-container
>   261   39596K   31548K   24511K   21732K  /system/b2g/plugin-container
>   436   14144K   14140K    9909K    8640K  /system/b2g/plugin-container
> RAM: 172880K total, 23512K free, 0K buffers, 17784K cached, 100K shmem,
> 8228K slab
> 
> =============================
> Process 366 is the one dialer app ran upon. When it crashed, there was
> 23512K free memory. We may exclude OOM from the cause of this issue.

Thanks!

Tony, can you still reproduce the bug?
(In reply to Etienne Segonzac (:etienne) from comment #6)
> 
> Thanks!
> 
> Tony, can you still reproduce the bug?
Unfortunately, I can... Still trying to finding out the actual cause...
gdb log - set breakpoint at mozalloc_abort & backtrace
Comment on attachment 671392 [details]
gdblog - breakpoint at mozalloc_abort

More debug info -- gdb
===========================
Breakpoint 1, mozalloc_abort (
    msg=0xbe8d0aa0 "[Child 2356] ###!!! ABORT: aborting because of fatal error: file /home/hsinyi/WorkSpace/mozilla/B2G-otoro/gecko/dom/ipc/ContentChild.cpp, line 831")
    at /home/hsinyi/WorkSpace/mozilla/B2G-otoro/gecko/memory/mozalloc/mozalloc_abort.cpp:21
21	    fputs(msg, stderr);
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #9)
> Comment on attachment 671392 [details]
> gdblog - breakpoint at mozalloc_abort
> 
> More debug info -- gdb
> ===========================
> Breakpoint 1, mozalloc_abort (
>     msg=0xbe8d0aa0 "[Child 2356] ###!!! ABORT: aborting because of fatal
> error: file
> /home/hsinyi/WorkSpace/mozilla/B2G-otoro/gecko/dom/ipc/ContentChild.cpp,
> line 831")
>     at
> /home/hsinyi/WorkSpace/mozilla/B2G-otoro/gecko/memory/mozalloc/
> mozalloc_abort.cpp:21
> 21	    fputs(msg, stderr);

gecko commit: 1ec4bdea9de7c0567810fd09cb55b02304b3186d
looking deeper in the gdb log, it shows the result code is "code=mozilla::ipc::HasResultCodes::MsgRouteError"
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #11)
> looking deeper in the gdb log, it shows the result code is
> "code=mozilla::ipc::HasResultCodes::MsgRouteError"

for case MsgRouteError: errorMsg is "Route error: message sent to unknown actor ID"
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #12)
> (In reply to Hsin-Yi Tsai [:hsinyi] from comment #11)
> > looking deeper in the gdb log, it shows the result code is
> > "code=mozilla::ipc::HasResultCodes::MsgRouteError"
> 
> for case MsgRouteError: errorMsg is "Route error: message sent to unknown
> actor ID"

I pasted another frame for bt dump.

  0x40ad333e in mozilla::ipc::AsyncChannel::MaybeHandleError (this=0x33f, 
    code=mozilla::ipc::HasResultCodes::MsgRouteError, channelName=<value optimized out>)
    at /home/vincentliu/projs/otoro/gecko/ipc/glue/AsyncChannel.cpp:613

Also cat /proc/<pid>/maps, it is in 
403eb000-412a1000 r-xp 00064000 1f:05 264        /system/b2g/libxul.so
Hi Hsin-Yi,

The frame I pasted in Comment 13 happened before you noted in Comment 11.
(In reply to vliu from comment #13)
> (In reply to Hsin-Yi Tsai [:hsinyi] from comment #12)
> > (In reply to Hsin-Yi Tsai [:hsinyi] from comment #11)
> > > looking deeper in the gdb log, it shows the result code is
> > > "code=mozilla::ipc::HasResultCodes::MsgRouteError"
> > 
> > for case MsgRouteError: errorMsg is "Route error: message sent to unknown
> > actor ID"
> 
> I pasted another frame for bt dump.
> 
>   0x40ad333e in mozilla::ipc::AsyncChannel::MaybeHandleError (this=0x33f, 
>     code=mozilla::ipc::HasResultCodes::MsgRouteError, channelName=<value
> optimized out>)
>     at /home/vincentliu/projs/otoro/gecko/ipc/glue/AsyncChannel.cpp:613
> 
> Also cat /proc/<pid>/maps, it is in 
> 403eb000-412a1000 r-xp 00064000 1f:05 264        /system/b2g/libxul.so

From dump in Comment 13, the channelName had been optimized and couldn't been recognized. Do you think if it is a problem? 

channelName=<value optimized out>
(In reply to vliu from comment #15)
> 
> From dump in Comment 13, the channelName had been optimized and couldn't
> been recognized. Do you think if it is a problem? 
> 
> channelName=<value optimized out>
I think nothing wrong with "<value optimized out>" displayed by gdb.
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #16)
> (In reply to vliu from comment #15)
> > 
> > From dump in Comment 13, the channelName had been optimized and couldn't
> > been recognized. Do you think if it is a problem? 
> > 
> > channelName=<value optimized out>
> I think nothing wrong with "<value optimized out>" displayed by gdb.

Yes, you are right. It means this value is extracted from register and not from stack frame.
I did some hacking work to get IPCMessage type by gdb bt when crash occurred. The message type is 1966087. Referring to [1], this type stands for the message "PIndexedDBTransaction::Msg_Abort__ID". 

[1] https://wiki.mozilla.org/Electrolysis/Debugging#Working_backwards_from_a_C.2B.2B_Message_to_its_IPDL_message
Whiteboard: [LOE:M]
Hi Hsin-Yi,

I set a breakpoint at mozalloc_abort() line 21(matched to fputs(...)). I also declared a global variable in /gecko/ipc/glue/AsyncChannel.cpp showed below.

volatile uint16_t check_msg_type = 0;

I set this check_msg_type in OnDispatchMessage(...)
void
AsyncChannel::OnDispatchMessage(const Message& msg)
{
                  :
    check_msg_type = msg.type();
    (void)MaybeHandleError(mListener->OnMessageReceived(msg), "AsyncChannel");
}

When the program stopped at the breakpoint, I viewed this value is 7.

I also found another function in AsyncChannel.cpp also using msg.type(). Please see below. 

AsyncChannel::MaybeInterceptSpecialIOMessage(const Message& msg)
{
    if (MSG_ROUTING_NONE == msg.routing_id()
        && GOODBYE_MESSAGE_TYPE == msg.type()) {
        ProcessGoodbyeMessage();
        return true;
    }
}

I tried to find the definition of GOODBYE_MESSAGE_TYPE, it defined in /gecko/ipc/glue/ProtocolUtils.h

enum {
    CHANNEL_OPENED_MESSAGE_TYPE = kuint16max - 6,
    SHMEM_DESTROYED_MESSAGE_TYPE = kuint16max - 5,
    UNBLOCK_CHILD_MESSAGE_TYPE = kuint16max - 4,
    BLOCK_CHILD_MESSAGE_TYPE   = kuint16max - 3,
    SHMEM_CREATED_MESSAGE_TYPE = kuint16max - 2,
    GOODBYE_MESSAGE_TYPE       = kuint16max - 1
};

It seems that msg.type() relatives to the above enum table. I would still try to find out any relationship between what I found and [1].
I cannot reproduce this issue if the dialer app is moved from OOP list.
Summary: [Dialer App] Crash after navigating around the dial pad → [Dialer App] Crash after navigating around the dial pad when run OOP
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #18)
> I did some hacking work to get IPCMessage type by gdb bt when crash
> occurred. The message type is 1966087. Referring to [1], this type stands
> for the message "PIndexedDBTransaction::Msg_Abort__ID". 
> 
> [1]
> https://wiki.mozilla.org/Electrolysis/Debugging#Working_backwards_from_a_C.
> 2B.2B_Message_to_its_IPDL_message

Oh, the message type 1966087 should be "PIndexedDBTransaction::Msg_AllRequestsFinished__ID". Thanks for vliu's reminding :D
Attached file IPC debug message
enabled MOZ_IPC_MESSAGE_LOG with DEBUG build. 

Child 5026 is the dialer app.
I am stuck here and need more help to keep moving on...
Let me summarize what vliu and I have done first.

1) We noticed that the crash point is |mozalloc_abort.cpp:23| [1] and we set a break point at |mozalloc_abort.cpp:21| to get more gdb backtrace info. Please see attachment 671392 [details] for gdb bt.
2) We disabled OOP then this issue cannot be reproduced.
3) So, we followed [2] to get the ipdl message type, which is |PIndexedDBTransaction::Msg_AllRequestsFinished__ID|. We then set another breakpoint at |IDBTransaction.cpp#256| [3] where is the only place that child sends AllRequestsFinished message. After sending this, the app crashed.
4) Attachment 674526 [details] is the IPC debug message from enabling MOZ_IPC_MESSAGE_LOG. Child 5026 was the dialer app.

Now we are almost sure that AllRequestsFinished is the cause. vliu has also tried to check memory mapping/unmapping, but we got no clues. We have no idea about where we should dig in. Do you have any ideas? Thanks!

[1] http://mxr.mozilla.org/mozilla-central/source/memory/mozalloc/mozalloc_abort.cpp#23
[2] https://wiki.mozilla.org/Electrolysis/Debugging#Working_backwards_from_a_C.2B.2B_Message_to_its_IPDL_message
[3] mxr.mozilla.org/mozilla-central/source/dom/indexedDB/IDBTransaction.cpp#256
Hi Kanru,

I tried to analysis the display memory usage before crash. The method I looked is checking Allocated/Freeing is pairing or not by its fd. For example, I saw the below freeing buffer with fd=147.

D/memalloc(  106): /dev/pmem: Freeing buffer base:0x4cc9a000 size:450560 offset:630784 fd:147 

Then I looked ahead to found out its allocated info.

D/memalloc(  106): /dev/pmem: Allocated buffer base:0x4cc00000 size:450560 offset:630784 fd:147

Compared the size and offset are the same, so I think there is no problem in it.

There was another pairing Mapped/Unmapping. I only Checked its size and offset according to base value.

/dev/pmem: Mapped buffer base:0x46ac6000 size:2310144 offset:1859584 fd:39
Unmapping buffer base:0x46ac6000 size:2310144 offset:1859584

I am not sure if it is the right way if I want to make sure memory usage would be the factor to cause crash. Can you please give me any suggestion doing that? Thanks
Whiteboard: [LOE:M] → [LOE:L]
We're marking this bug with the C1 milestone since it follows the criteria of "remaining LOE:L work" (see https://etherpad.mozilla.org/b2g-convergence-schedule).

If this work is not finished by Nov19, this bug will need an exception and will be called out at the upcoming Exec Review.
Priority: -- → P1
Target Milestone: --- → B2G C1 (to 19nov)
Root cause of this issue: when a call is disconnected, the call log is written to indexedDB. Sometimes the call screen (content child) is removed earlier than indexedDB sends the last message back to content child, so content parent cannot find the child to send msg that causes MsgRouteError. 

Bug 808743 addressed problems with database invalidation in multiprocess scenarios, indeed the root cause here. I cannot reproduce this after Bug 808743, so I change the bug status to 'RESOLVED/DUPLICATE'. I also add keyword 'qawanted' to ask for more complete test. Please reopen it if you have different opinions.
Status: NEW → RESOLVED
Closed: 9 years ago
Keywords: qawanted
Resolution: --- → DUPLICATE
Duplicate of bug: 808743
As a means of verifying the fixes for 

https://bugzilla.mozilla.org/show_bug.cgi?id=808743  I use the steps described here.

No crash, but I am seeing other issues with DTMF (bugs to come), including getting dtmf tones to hang; getting dialer unresponsive in certain contexts.
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.