Last Comment Bug 756389 - B2G Bluetooth: unexpected crash in DBusThread::StopEventLoop
: B2G Bluetooth: unexpected crash in DBusThread::StopEventLoop
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM: Device Interfaces (show other bugs)
: Trunk
: ARM Gonk (Firefox OS)
: -- normal (vote)
: mozilla15
Assigned To: Kyle Machulis [:qdot]
:
: Andrew Overholt [:overholt]
Mentors:
Depends on:
Blocks: b2g-bluetooth
  Show dependency treegraph
 
Reported: 2012-05-18 01:33 PDT by Vicamo Yang [:vicamo][:vyang]
Modified: 2012-05-24 09:25 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Fix crash on bluetooth thread destruction when thread not up (902 bytes, patch)
2012-05-18 12:05 PDT, Kyle Machulis [:qdot]
cjones.bugs: review+
Details | Diff | Splinter Review
v2: Fix crash on bluetooth thread destruction when thread not up (final) (1.95 KB, patch)
2012-05-23 18:13 PDT, Kyle Machulis [:qdot]
no flags Details | Diff | Splinter Review

Description Vicamo Yang [:vicamo][:vyang] 2012-05-18 01:33:29 PDT
(gdb) c
Continuing.
-*- RadioInterfaceLayer: Starting RIL Worker
[New Thread 2620.2622]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2620.2622]
mozilla::ipc::DBusThread::StopEventLoop (this=0x0) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/dbus/DBusThread.cpp:506
506	  MutexAutoLock lock(mMutex);
(gdb) bt
#0  mozilla::ipc::DBusThread::StopEventLoop (this=0x0) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/dbus/DBusThread.cpp:506
#1  0x40c99fa6 in DisconnectDBus (aMonitor=0xbebdd630, aSuccess=0xbebdd63f)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/dbus/DBusThread.cpp:554
#2  0x40c923e4 in DispatchToFunction<void (*)(mozilla::hal::SwitchDevice, mozilla::Monitor*), mozilla::hal::SwitchDevice, mozilla::Monitor*> (
    this=0x4690e8) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/tuple.h:454
#3  RunnableFunction<void (*)(mozilla::hal::SwitchDevice, mozilla::Monitor*), Tuple2<mozilla::hal::SwitchDevice, mozilla::Monitor*> >::Run (
    this=0x4690e8) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/task.h:415
#4  0x40ce2706 in MessageLoop::RunTask (this=0x100ffdf4, task=0xbebdd63f)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_loop.cc:318
#5  0x40ce3700 in MessageLoop::DeferOrRunPendingTask (this=0x0, pending_task=<value optimized out>)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_loop.cc:326
#6  0x40ce438e in MessageLoop::DoWork (this=0x100ffdf4)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_loop.cc:426
#7  0x40cf6048 in base::MessagePumpLibevent::Run (this=0xf130, delegate=0x100ffdf4)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_pump_libevent.cc:310
#8  0x40ce26a2 in MessageLoop::RunInternal (this=0xbebdd63f)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_loop.cc:208
#9  0x40ce2782 in MessageLoop::RunHandler (this=0x100ffdf4)
    at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_loop.cc:201
#10 MessageLoop::Run (this=0x100ffdf4) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/message_loop.cc:175
#11 0x40ceb6ce in base::Thread::ThreadMain (this=0xf0f8) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/thread.cc:156
#12 0x40cf6542 in ThreadFunc (closure=0x0) at /home/vicamo/WorkSpace/mozilla/b2g/sgs2-ics/gecko/ipc/chromium/src/base/platform_thread_posix.cc:27
#13 0x40106c28 in __thread_entry (func=0x40cf6539 <ThreadFunc>, arg=0xf0f8, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#14 0x4010677c in pthread_create (thread_out=<value optimized out>, attr=0xbebdd910, start_routine=0x40cf6539 <ThreadFunc>, arg=0xf0f8)
    at bionic/libc/bionic/pthread.c:357
#15 0x00000000 in ?? ()
Comment 1 Cervantes Yu [:cyu] [:cervantes] 2012-05-18 03:53:37 PDT
I got the same crash. It seems the problem is bluedroid is not initialized properly.

This part of code (in SystemWorkerManager.cpp:402):

  if(EnsureBluetoothInit()) {
#endif
    StartDBus();
#ifdef MOZ_WIDGET_GONK
  }

If bluedroid is not initialized, StartDBus() will not be called. Then it will crash in:

    if (NS_FAILED(instance->Init())) {
      instance->Shutdown();
      return nsnull;
    }

We might need to add a check like IsBluetoothInitialized() before calling StopDBus() in SystemWorkerManager.
Comment 2 Eric Chou [:ericchou] [:echou] 2012-05-18 06:51:30 PDT
Hi Kyle, could you help to fix this?
Comment 3 Kyle Machulis [:qdot] 2012-05-18 11:18:09 PDT
(In reply to Cervantes Yu from comment #1)

> We might need to add a check like IsBluetoothInitialized() before calling
> StopDBus() in SystemWorkerManager.

It's actually a little simpler than that. I have MOZ_ASSERTs where I should have code that will always run and check the validity of the compilation unit static thread pointers, which is as good as an init'd check. I just need to change the MOZ_ASSERTs to if()'s.

That said, how'd you guys run into this? Is there a phone we're working on that doesn't have bluedroid in the place we expect it, or doesn't have bluetooth for the radio yet? We might need a more tolerant failure case that just doesn't let bluetooth come up but the rest of the system will continue to run.
Comment 4 Vicamo Yang [:vicamo][:vyang] 2012-05-18 11:38:20 PDT
(In reply to Kyle Machulis [:kmachulis] [:qdot] from comment #3)
> That said, how'd you guys run into this? Is there a phone we're working on
> that doesn't have bluedroid in the place we expect it, or doesn't have
> bluetooth for the radio yet? We might need a more tolerant failure case that
> just doesn't let bluetooth come up but the rest of the system will continue
> to run.

Hi Kyle,

Actually it crashed on our SGS2 devices for some unknown reason. To be more detailed, I can't tell you whether or not this issue appears only inside gdb or it's actually a different defect. The logcat/debuggerd doesn't say anything related, but gdb shows two possible crash in our devices: one for corrupted libxpcom, another for DisconnectDBus segfault. The former might, not always, be solved by re-installing gecko, while the latter has no known re-produce steps/solution yet.
Comment 5 Kyle Machulis [:qdot] 2012-05-18 12:01:40 PDT
Oh, I see what's going on, it's nothing as bad as library corruption on my side, just a wrong check. I actually already have bluetooth in a "fail but continue" state, InitBluetooth ALWAYS returns NS_OK (which now needs to be another bug entirely for devices without bluetooth...). The RIL is failing on your devices for some reason (since it says Starting RIL Thread before the crash), so it's not bringing up the thread and NS_ENSURE_SUCCESS kicks out early (Yay macro'd returns :( ) This calls Shutdown, but since we haven't even made it to InitBluetooth yet, we get the StopDBus call, which isn't handled because the check is in an ASSERT. Patch forthcoming.
Comment 6 Kyle Machulis [:qdot] 2012-05-18 12:05:29 PDT
Created attachment 625187 [details] [diff] [review]
Fix crash on bluetooth thread destruction when thread not up
Comment 7 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2012-05-23 15:27:31 PDT
Comment on attachment 625187 [details] [diff] [review]
Fix crash on bluetooth thread destruction when thread not up

r=me to also remove the NS_ENSURE_* bullshit.
Comment 8 Kyle Machulis [:qdot] 2012-05-23 18:13:21 PDT
Created attachment 626660 [details] [diff] [review]
v2: Fix crash on bluetooth thread destruction when thread not up (final)

Final version, changed NS_ENSURE_SUCCESS calls.
Comment 10 Ed Morley [:emorley] 2012-05-24 09:25:46 PDT
https://hg.mozilla.org/mozilla-central/rev/0e26544d7731

Note You need to log in before you can comment on or make changes to this bug.