[Bluetooth] Handle crashes in bluetoothd gracefully

RESOLVED FIXED

Status

Firefox OS
Bluetooth
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: brsun, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 obsolete attachments)

(Reporter)

Description

3 years ago
Program received signal SIGSEGV, Segmentation fault.
nsBaseAppShell::RunSyncSectionsInternal (this=0xb2a236a0, aStable=<optimized out>, aThreadRecursionLevel=0) at ../../../mozilla-central-working/widget/nsBaseAppShell.cpp:370
370	      section.mRunnable->Run();
(gdb) bt
#0  nsBaseAppShell::RunSyncSectionsInternal (this=0xb2a236a0, aStable=<optimized out>, aThreadRecursionLevel=0) at ../../../mozilla-central-working/widget/nsBaseAppShell.cpp:370
#1  0xb57c7afe in nsBaseAppShell::RunSyncSections (this=<optimized out>, stable=stable@entry=true, threadRecursionLevel=<optimized out>) at ../../../mozilla-central-working/widget/nsBaseAppShell.h:95
#2  0xb57c7b08 in nsBaseAppShell::AfterProcessNextEvent (this=<optimized out>, thr=<optimized out>, recursionDepth=<optimized out>, eventWasProcessed=<optimized out>)
    at ../../../mozilla-central-working/widget/nsBaseAppShell.cpp:427
#3  0xb4deab44 in nsThread::ProcessNextEvent (this=0xb6a025c0, aMayWait=<optimized out>, aResult=0xbeead86f) at ../../../../mozilla-central-working/xpcom/threads/nsThread.cpp:890
#4  0xb4df6508 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=false) at /home/bruce_sun/Source/mozilla-central-working/xpcom/glue/nsThreadUtils.cpp:265
#5  0xb4f35ca8 in mozilla::ipc::MessagePump::Run (this=0xb6a8e1c0, aDelegate=0xb6a801a0) at ../../../../mozilla-central-working/ipc/glue/MessagePump.cpp:99
#6  0xb4f2a484 in MessageLoop::RunInternal (this=this@entry=0xb6a801a0) at ../../../../mozilla-central-working/ipc/chromium/src/base/message_loop.cc:233
#7  0xb4f2a538 in RunHandler (this=0xb6a801a0) at ../../../../mozilla-central-working/ipc/chromium/src/base/message_loop.cc:226
#8  MessageLoop::Run (this=0xb6a801a0) at ../../../../mozilla-central-working/ipc/chromium/src/base/message_loop.cc:200
#9  0xb57c519a in nsBaseAppShell::Run (this=0xb2a236a0) at ../../../mozilla-central-working/widget/nsBaseAppShell.cpp:164
#10 0xb5ad2ba2 in nsAppStartup::Run (this=0xb30034f0) at ../../../../../mozilla-central-working/toolkit/components/startup/nsAppStartup.cpp:281
#11 0xb5ae6d82 in XREMain::XRE_mainRun (this=this@entry=0xbeead9d0) at ../../../../mozilla-central-working/toolkit/xre/nsAppRunner.cpp:4150
#12 0xb5ae7f3e in XREMain::XRE_main (this=this@entry=0xbeead9d0, argc=argc@entry=1, argv=argv@entry=0xb6a2b188, aAppData=aAppData@entry=0xb6f72858 <_ZL8sAppData>)
    at ../../../../mozilla-central-working/toolkit/xre/nsAppRunner.cpp:4226
#13 0xb5ae80be in XRE_main (argc=1, argv=0xb6a2b188, aAppData=0xb6f72858 <_ZL8sAppData>, aFlags=<optimized out>) at ../../../../mozilla-central-working/toolkit/xre/nsAppRunner.cpp:4446
#14 0xb6f57a5c in do_main (argc=argc@entry=1, argv=argv@entry=0xb6a2b188) at ../../../../mozilla-central-working/b2g/app/nsBrowserApp.cpp:165
#15 0xb6f57b6a in b2g_main (argc=argc@entry=1, argv=argv@entry=0xbeeaec84) at ../../../../mozilla-central-working/b2g/app/nsBrowserApp.cpp:293
#16 0xb6f578ec in RunProcesses (aReservedFds=..., argv=0xbeeaec84, argc=1) at ../../../../mozilla-central-working/b2g/app/B2GLoader.cpp:225
#17 main (argc=1, argv=0xbeeaec84) at ../../../../mozilla-central-working/b2g/app/B2GLoader.cpp:290
Assignee: nobody → brsun
(Reporter)

Comment 1

3 years ago
Haven't found out the actual root cause yet, but the call-stack on comment 0 might be misleading.

libevent calls abort() [1] in b2g under this situation, and an error message [2] can be noticed from logcat output.

[1] https://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/third_party/libevent/event.c#262
[2] E Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
(Reporter)

Comment 2

3 years ago
The information on comment 1 might be also misleading.

When killing bluetoothd, |BluetoothDaemonInterface::OnDisconnect()| will be called with empty |mResultHandlerQ| [1] in this case. And Gecko crashes while dereferencing |res = mResultHandlerQ.ElementAt(0)| as a result.

[1] https://dxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp#1844
Hi

(In reply to Bruce Sun [:brsun] from comment #2)
> The information on comment 1 might be also misleading.
> 
> When killing bluetoothd, |BluetoothDaemonInterface::OnDisconnect()| will be
> called with empty |mResultHandlerQ| [1] in this case. And Gecko crashes
> while dereferencing |res = mResultHandlerQ.ElementAt(0)| as a result.
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/
> BluetoothDaemonInterface.cpp#1844

Some background:

Gecko should try to handle the error gracefully by restarting bluetoothd. I never implemented this, though. Regular shutdowns are distinguishable from crashes by the existence of the mentioned result handler. I guess there's more to restarts than just restarting bluetoothd, such as re-pairing devices. We might need a signal that tells Gaia about crashed bluetoothd and give it the chance to restart.
Created attachment 8565422 [details] [diff] [review]
[01] Bug 1132229: Return in Dispatch method if Bluetooth runnable is nullptr
Assignee: brsun → tzimmermann
Status: NEW → ASSIGNED
Attachment #8565422 - Flags: review?(shuang)
Created attachment 8565423 [details] [diff] [review]
[02] Bug 1132229: Handle I/O errors in Bluetooth's daemon backend

If the daemon crashes, sending commands will fail. This currently prevents us from disabling a crashed Bluetooth. This patch is required to handle the internal state correctly and make progress in all cases. So even if sending fails, the caller of an interface method will get an error result at least.
Attachment #8565423 - Flags: review?(shuang)
Created attachment 8565424 [details] [diff] [review]
[03] Bug 1132229: Survive crashes of bluetoothd

If we detect a crash, we signal a change in the adapter state. This will cleanup the internal state. Disabling now makes progress, even if individual steps fail (e.g., because of command-send failures).
Attachment #8565424 - Flags: review?(shuang)
Attachment #8565422 - Attachment is obsolete: true
Attachment #8565422 - Flags: review?(shuang)
Attachment #8565423 - Attachment is obsolete: true
Attachment #8565423 - Flags: review?(shuang)
Attachment #8565424 - Attachment is obsolete: true
Attachment #8565424 - Flags: review?(shuang)
Sorry, these patches were meant for bug 1132229.
Assignee: tzimmermann → nobody
Status: ASSIGNED → NEW
I'm updating the title. This bug is about overall handling of Bluetooth crashes, such as restarting or displaying a user-visible error message. The current crash is fixed in bug 1132229.
Depends on: 1132229
Summary: [Bluetooth] b2g crashes after bluetoothd has been killed. → [Bluetooth] Handle crashes in bluetoothd gracefully
Add cc: Aaron, Ian, Arthur
Based on today's discussion, Gaia think there is no need to notify gaia to show notification to users if daemon restarts. They think Gaia might not be able to do much about it, so silent recovery is preferable.

Updated

3 years ago
Depends on: 1143925, 1160126
Resolved fixed since Bluetooth API v1/v2 all fixed.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.