Closed Bug 838212 Opened 12 years ago Closed 12 years ago

Crash in bluetooth ipc code while making calls and BT connect/disconnect

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:-, b2g18 affected, b2g18-v1.0.1 affected)

RESOLVED WORKSFORME
blocking-b2g -
Tracking Status
b2g18 --- affected
b2g18-v1.0.1 --- affected

People

(Reporter: ggrisco, Assigned: echou)

Details

(Keywords: crash, Whiteboard: [b2g-crash][cr 448014])

1. Connect to bluetooth headset (Motorola S9 HD) 2. Make some calls and verify if audio is present on the headset While doing some random testing by connecting/disconnecting headset and making calls, this crash was observed. minidump wasn't very useful, but EXTRA file contained this: Notes=EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ xpcom_runtime_abort([Parent 125] ###!!! ABORT: unknown union type: file /local/mnt/workspace/ggrisco/ics.b2g.blue/out/target/product/msm7627a/obj/objdir-gecko/ipc/ipdl/PBluetoothRequestParent.cpp, line 449) Found the abort in PBluetoothRequestParent::write(). The reply type didn't match either "success" or "error" thus causing the abort.
blocking-b2g: --- → tef?
Assignee: nobody → echou
blocking-b2g: tef? → tef+
Severity: normal → critical
Keywords: crash
Whiteboard: [cr 448014] → [b2g-crash][cr 448014]
Based on the log provided by Greg, I traced our code and figured out that the problem could be: The crash happened when a BluetoothRequestParent::ReplyRunnable object began to "Run()". When processed to the line |if (!mRequest->Send__delete__(mRequest, *mReply))| in the function Run(), it crashed because mReply was invalid, which may have been freed or not even initialized (by calling SetReply()). However, I've checked all BluetoothReplyRunnable objects used in our code but couldn't find anything suspected. Unfortunately, I tried to reproduce this but failed, instead I kept hitting another crash described in bug 836523. I'll keep working on this and see if I can find something out. At the mean time, please provide more information as much as possible (STR, frequency).
(In reply to Eric Chou [:ericchou] [:echou] from comment #1) > I'll keep working on this and see if I can find something out. At the mean > time, please provide more information as much as possible (STR, frequency). It was only seen once and so far cannot be reproduced, although I'm not sure how many attempts we've made. So, unfortunately, there isn't a more definite set of STR.
Hi Eric, I had a meeting today about closing the remaining tef+ bugs. Is there anything I can do to help with this one?
Thomas, Eric is likely out on holiday for this week. Help digging into this would definitely be appreciated!
(In reply to Thomas Zimmermann [:tzimmermann] from comment #3) > Hi Eric, > > I had a meeting today about closing the remaining tef+ bugs. Is there > anything I can do to help with this one? Hi Thomas, Sure. Please take this bug and see if you could figure something out. I've investigated but couldn't find the potential root cause. You may want to check comment 1 for more information. I will check mail as possible as I can during holidays, so please feel free to have me review your patches. Of course I think Kyle would love to help, too. Thanks. Eric
Assignee: echou → tzimmermann
Status: NEW → ASSIGNED
Status: ASSIGNED → NEW
So far, I had no luck with reproducing this bug, but at least I found and (hopefully) fixed bug 840544. I'll try again tomorrow.
I've successfully reproduced the bug using the latest mozilla-b2g18 checkout with the patches for fixing bug 836523 applied. It seems to me that this is not triggered by pure connect/disconnect cycles, I have to turn off and on Bluetooth or unpair the device and reconnect it later for it to happen. If I reboot the phone with the device already paired no amount of connect/disconnect cycles causes it to crash. The logcat output when the crash happens always looks like this (1713 is the b2g process): F/libc ( 1713): Fatal signal 11 (SIGSEGV) at 0x003a0074 (code=1) I/DEBUG ( 112): debuggerd committing suicide to free the zombie! I/DEBUG ( 2023): debuggerd: Feb 11 2013 16:18:24 E/AudioHardwareMSM76XXA( 2021): failed to open AUDIO_NORMAL_FILTER /system/etc/AudioFilter.csv: No such file or directory (2). E/AudioHardwareMSM76XXA( 2021): failed to open AUTO_VOLUME_CONTROL /system/etc/AutoVolumeControl.txt: No such file or directory (2) E/AudioHardwareMSM76XXA( 2021): setMode:mode = 0, status = -17 E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(0, 0, 7) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(6, 0, 7) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(12, 0, 7) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(3, 0, 7) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(16, 0, 7) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(25, 0, 7) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(8, 0, 1) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(9, 0, 1) E/QualcommCamera( 2021): Qint android::get_number_of_cameras(): E I/QualcommCameraHardware( 2021): getCameraInfo: IN I/QualcommCameraHardware( 2021): getCameraInfo: loading libqcamera at 0xb000ee28 V/QualcommCameraHardware( 2021): Storing the current target type as 3 I/QualcommCameraHardware( 2021): getCameraInfo: numOfCameras = 1 I/QualcommCameraHardware( 2021): Camera sensor 0 info: I/QualcommCameraHardware( 2021): camera_id: 0 I/QualcommCameraHardware( 2021): modes_supported: 5 I/QualcommCameraHardware( 2021): position: 0 I/QualcommCameraHardware( 2021): sensor_mount_angle: 90 V/QualcommCameraHardware( 2021): getCameraInfo: dlclose(libqcamera) I/QualcommCameraHardware( 2021): getCameraInfo: OUT W/AudioFlinger( 2021): Thread AudioOut_1 cannot connect to the power manager service E/AudioHardwareMSM76XXA( 2021): Routing audio to Speakerphone E/AudioHardwareMSM76XXA( 2021): doAudioRouteOrMute, mFM_route = -1 , device=6 E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_device(6, 1, 0) E/AudioHardwareMSM76XXA( 2021): rpc_snd_set_volume(38, 0, 7) I've captured a back-trace with GDB when it happens: #0 AppendElements<mozilla::ipc::UnixSocketRawData*> (this=0x48ee0830) at ../../dist/include/nsTArray.h:877 #1 AppendElement<mozilla::ipc::UnixSocketRawData*> (this=0x48ee0830) at ../../dist/include/nsTArray.h:894 #2 mozilla::ipc::UnixSocketImpl::QueueWriteData (this=0x48ee0830) at /home/gsvelto/projects/mozilla-b2g18/ipc/unixsocket/UnixSocket.cpp:68 #3 mozilla::ipc::SocketSendTask::Run (this=0x48ee0830) at /home/gsvelto/projects/mozilla-b2g18/ipc/unixsocket/UnixSocket.cpp:372 #4 0x411d7048 in MessageLoop::RunTask (this=0x4217edf0, task=0x48ee0830) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_loop.cc:333 #5 0x411d7e7a in MessageLoop::DeferOrRunPendingTask (this=0x48ee0830, pending_task=<value optimized out>) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_loop.cc:341 #6 0x411d8a58 in MessageLoop::DoWork (this=0x4217edf0) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_loop.cc:441 #7 0x411e8eee in base::MessagePumpLibevent::Run (this=0x40402430, delegate=0x4217edf0) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_pump_libevent.cc:310 #8 0x411d6ff8 in MessageLoop::RunInternal (this=0x48ee0830) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_loop.cc:215 #9 0x411d70ae in MessageLoop::RunHandler (this=0x4217edf0) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_loop.cc:208 #10 MessageLoop::Run (this=0x4217edf0) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/message_loop.cc:182 #11 0x411df3bc in base::Thread::ThreadMain (this=0x40407280) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/thread.cc:156 #12 0x411e9390 in ThreadFunc (closure=0x1) at /home/gsvelto/projects/mozilla-b2g18/ipc/chromium/src/base/platform_thread_posix.cc:39 #13 0x4011ce18 in __thread_entry (func=0x411e9389 <ThreadFunc>, arg=0x40407280, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 #14 0x4011c96c in pthread_create (thread_out=<value optimized out>, attr=0xbeaad8a8, start_routine=0x411e9389 <ThreadFunc>, arg=0x40407280) at bionic/libc/bionic/pthread.c:357 #15 0x00000000 in ?? () What causes the segfault is |this| being null in the invocation of nsTArray::AppendElement, crawling back the stack this leads to the mOutgoingQ field in UnixSocketImpl. I'm tying to reproduce it in a debug build to see if I can extract a little more data out of it.
This looks more like bug 840925 to me. Something in the Bluetooth stack tries to write to a socket which as actually been closed already. But thank you for trying to reproduce this, anyway.
(In reply to Thomas Zimmermann [:tzimmermann] from comment #8) > This looks more like bug 840925 to me. Something in the Bluetooth stack > tries to write to a socket which as actually been closed already. But thank > you for trying to reproduce this, anyway. Yes indeed, it looks like this is a potential duplicate. I've narrowed it down a bit and I can make it happen a little bit more frequently when doing the following: - pair your headset - connect, disconnect then connect again - unpair while connected -> the crash happens There's no need to be making calls or anything else. connecting/disconnecting alone doesn't seem to cause the issue either. I've tried tens of connect/disconnect cycles during a call w/o being able to crash the phone. BTW the fact that the memory had been overwritten is consistent with what I saw in the stack trace which was a funnily messed up UnixSocketImpl instance. I would say we should mark this as a duplicate of bug 840925 or vice-versa.
Now that I think about it a little more marking it as a duplicate is not right because the original report tripped over a different problem. I'll try to cause a crash by further connect/disconnect cycles and if I can't then it might be worth considering closing it as WORKSFORME if nobody else is capable of reproducing it.
Yeah, I just wanted to say that. This problem here only happened once, and the bug you triggered is 840943. I saw you added an STR to 840943. Thanks! I think we should try to un-tef+ this bug here, and rather try to fix some of the others.
I agree, for what it's worth I've tried another call in which I run 50 quick connect/disconnect cycles and never hit the problem. This was on an Otoro with a Jabra EASYGO headset, <wild speculation> since the hardware is different than the original reporter it might just be that this could be a hardware-specific issue </wild speculation>.
This bug only ever happened once and can't be reproduced. Reconsidering blocker status.
blocking-b2g: tef+ → tef?
Not blocking due to lack of reproducibility.
blocking-b2g: tef? → -
He Eric! Here is the ticket. I had no luck reproducing this specific issue. I think we should just close it.
Assignee: tzimmermann → echou
Please reopen if it happens again.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.