Closed Bug 1143925 Opened 10 years ago Closed 10 years ago

[Bluetooth] bluetoothd daemon doesn't restart automatically

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.2+, firefox38 wontfix, firefox38.0.5 wontfix, firefox39 wontfix, firefox40 fixed, b2g-v2.2 verified, b2g-master verified)

VERIFIED FIXED
2.2 S11 (1may)
blocking-b2g 2.2+
Tracking Status
firefox38 --- wontfix
firefox38.0.5 --- wontfix
firefox39 --- wontfix
firefox40 --- fixed
b2g-v2.2 --- verified
b2g-master --- verified

People

(Reporter: ikumar, Assigned: shawnjohnjr)

References

Details

(Whiteboard: [caf priority: p2][CR 810057])

Attachments

(5 files, 22 obsolete files)

16.40 KB, patch
Details | Diff | Splinter Review
2.10 KB, patch
Details | Diff | Splinter Review
3.17 KB, patch
Details | Diff | Splinter Review
15.20 KB, patch
Details | Diff | Splinter Review
2.11 KB, patch
Details | Diff | Splinter Review
STR:
1. Kill bluetoothd daemon
2. do "adb shell ps | grep bluetoothd" to see that bluetoothd didn't restart

Expected:
bluetoothd should have automatically restarted.

Toggling bluetooth in settings restarts the daemon which we shouldn't have to.
blocking-b2g: --- → 2.2?
Hi! Shawn,

Could you take a look? Thanks

--
Keven
Flags: needinfo?(shuang)
Assignee: nobody → shuang
Flags: needinfo?(shuang)
Hi Inder,
Could you provide the reason why this auto recovery blocks v2.2 release?
Flags: needinfo?(ikumar)
(In reply to Shawn Huang [:shawnjohnjr] from comment #2)
> Hi Inder,
> Could you provide the reason why this auto recovery blocks v2.2 release?

Terrible user experience.  If bluetoothd crashes (either by itself, or more likely indirectly due to a driver restart [1]) then the user will loose all BT function until they wonder why their BT is mysteriously nonfunctional and either (1) restart the device, or (2) enter the Settings menu and toggle BT on/off.  

[1] https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluedroid/tree/stack/btu/btu_hcif.c?h=LF.BR.1.2.3#n1711
Flags: needinfo?(ikumar)
Assignee: shuang → nobody
Component: Gaia::Bluetooth File Transfer → Bluetooth
(In reply to Michael Vines [:m1] [:evilmachines] from comment #3)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #2)
> > Hi Inder,
> > Could you provide the reason why this auto recovery blocks v2.2 release?
> 
> Terrible user experience.  If bluetoothd crashes (either by itself, or more
> likely indirectly due to a driver restart [1]) then the user will loose all
> BT function until they wonder why their BT is mysteriously nonfunctional and
> either (1) restart the device, or (2) enter the Settings menu and toggle BT
> on/off.  
> 
> [1]
> https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/
> bluedroid/tree/stack/btu/btu_hcif.c?h=LF.BR.1.2.3#n1711
I guess the easier ways to recover it, we can send system message to Settings to restart bluetooth if bluetooth daemon connection lost.
(In reply to Shawn Huang [:shawnjohnjr] from comment #4)
>
> I guess the easier ways to recover it, we can send system message to
> Settings to restart bluetooth if bluetooth daemon connection lost.

The implementation details are up to you of course, seems to me like managing this within Gecko would be nicer.  Gaia can be replaced so the less Gecko/Gaia coupling the better (nevermind Gonk/Gaia coupling!)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #5)
> (In reply to Shawn Huang [:shawnjohnjr] from comment #4)
> >
> > I guess the easier ways to recover it, we can send system message to
> > Settings to restart bluetooth if bluetooth daemon connection lost.
> 
> The implementation details are up to you of course, seems to me like
> managing this within Gecko would be nicer.  Gaia can be replaced so the less
> Gecko/Gaia coupling the better (nevermind Gonk/Gaia coupling!)


Sorry for red herring. It's not that easy as I thought, we need more time to fix this bug properly.
See also: Bug 1115656 - [Bluetooth] Handle crashes in bluetoothd gracefully
See Also: → 1115656
(In reply to Michael Vines [:m1] [:evilmachines] from comment #5)
> The implementation details are up to you of course, seems to me like
> managing this within Gecko would be nicer.  Gaia can be replaced so the less
> Gecko/Gaia coupling the better (nevermind Gonk/Gaia coupling!)

Managing within gecko introduces inconsistency between gaia and gecko after restoring. A nicer solution as comment 5 requires more time to implement and test. Disagree with comment 4 as it is a dirty workaround for driver restart.
Whiteboard: [CR 810057]
Whiteboard: [CR 810057] → [caf priority: p2][CR 810057]
As a reference, RILD crash recovery is handled in gecko (bug 1067629 comment 14) and qcom also regards gaia hack as a workaround (bug 1067629 comment 4). An even better way is one general mechanism to deal with all daemons' crash.
Any update on whether this blocks v2.2 release?
For the reasons mentioned in comment 3, please fix it for v2.2.
Hi Keven,

Per comment 3 please assign someone on your team to handle this issue as we're quickly approaching the fxOS 2.2 FC milestone of 4.06 and need this addressed.


Thanks,
Mike
Flags: needinfo?(kkuo)
Assignee: nobody → shuang
Shawn,

Thanks

--
Keven
Flags: needinfo?(kkuo)
blocking-b2g: 2.2? → 2.2+
I tried to restart bluetooth in BluetoothService, but I found error log:
"Bluetooth daemon already connecting/connected!", it looks like I need to handle BluetoothDaemonConnection.
Shawn,

How is this progressing? When do you anticipate being able to deliver a fix?

Thanks,
Mike
Flags: needinfo?(shuang)
Hi Shawn, what's the ETA for fixing this bug?
Flags: needinfo?(shuang)
Whiteboard: [caf priority: p2][CR 810057] → [caf priority: p2][CR 810057][ETA:4/20]
Hi

The reported issue was never promised, planned, or even committed to for v2.2. Auto-recovery is a major feature that needs planning. We committed to making Bluetooth more robust by not crashing when the driver does. And that's what we delivered in v2.2. See bug 1005934 and bug 1132229. For graceful restarts and auto-recovery there already is a discussion in bug 1115656.

In contrast to what comment 3 implies, restarts or crashes are not at all common. It was obviously possible to build an in-Gecko implementation that works reliably and doesn't constantly crash the b2g process.

From a technical POV, even if we could restart the Bluetooth daemon, this doesn't mean that we 'get back Bluetooth.' Devices need to be repaired and reconnected, running file transfers need to be continued or restarted. We don't know yet if it is possible to do this automatically, and if so how.

In any case, the best we could provide now is sending an message to Gaia to let the user know about the crashed Bluetooth driver.
For this v2.2 bug, we expect that the device will return back to an "Idle BT state" after bluetoothd crashes or is terminated by the BT stack so that when the user scratches their head and re-tries what they were doing it generally works.  Resuming any file transfer sessions/etc would clearly be too much, but it should probably try to reconnect to a headset.  Essentially make it behave like how it does currently right after device boot.
1) I don't understand this logic. Why is 'restarting BT and not telling the user' is better than 'not restarting BT and not telling the user'?

2) If there's a permanent problem, your approach can easily run into an endless crash-and-restart cycle.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #19)
> 1) I don't understand this logic. Why is 'restarting BT and not telling the
> user' is better than 'not restarting BT and not telling the user'?

I believe is has been covered in comment 7

> 2) If there's a permanent problem, your approach can easily run into an
> endless crash-and-restart cycle.

True, at that point the device probably needs to be returned if the user cares about BT being functional.  In <= v2.1 this case would manifest as a complete device restart, rendering the device completely unusable.

Why are you trying to hold bluetoothd to a different standard than rild?
(In reply to Michael Vines [:m1] [:evilmachines] from comment #20)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #19)
> > 1) I don't understand this logic. Why is 'restarting BT and not telling the
> > user' is better than 'not restarting BT and not telling the user'?
> 
> I believe is has been covered in comment 7

Heh, and by comment 7 I of course meant comment 3.
> Why are you trying to hold bluetoothd to a different standard than rild?

See comment 17.
Attached patch WIP patch (obsolete) — Splinter Review
Attached patch WIP patch (obsolete) — Splinter Review
Attached file logcat.txt (obsolete) —
Attached file manually restart log (obsolete) —
Attached patch WIP patch (obsolete) — Splinter Review
Attached file manually restart good case (obsolete) —
Attached file logcat log fail to start daemon (obsolete) —
attachment 8593218 [details] manually restart Good case:
04-14 23:01:06.080 I/GeckoBluetooth(  207): Init: BluetoothDaemonInterface::Init
04-14 23:01:06.080 I/GeckoBluetooth(  207): Init: new CMD_CHANNEL socket exists
04-14 23:01:06.080 I/GeckoBluetooth(  207): Init: mListenSocketName: bluetoothd-daed591e9012aa6a
04-14 23:01:06.080 D/        (  207): ListenSocket::Listen 2
04-14 23:01:06.080 D/        (  207): ListenSocket::Listen 1
04-14 23:01:06.080 I/GeckoBluetooth(  207): Init: SetConnection
04-14 23:01:06.110 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 0
04-14 23:01:06.110 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel LISTEN_SOCKET
04-14 23:01:06.110 I/GeckoBluetooth(  207): OnConnectSuccess: ctl.start bluetoothd:-a bluetoothd-daed591e9012aa6a
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 1
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel CMD_CHANNEL
04-14 23:01:06.150 D/        (  207): ListenSocket::Listen 1
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 0
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel LISTEN_SOCKET
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: ctl.start bluetoothd:-a bluetoothd-daed591e9012aa6a
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 2
04-14 23:01:06.150 I/GeckoBluetooth(  207): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel NTF_CHANNEL
04-14 23:01:06.270 I/bluedroid( 1680): init

attachment 8593219 [details] auto-recovery case:
04-14 23:00:40.175 I/GeckoBluetooth(  205): StartGonkBluetooth: StartGonkBluetooth
04-14 23:00:40.175 I/GeckoBluetooth(  205): StartGonkBluetooth: StartGonkBluetooth::Init
04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: BluetoothDaemonInterface::Init
04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: new CMD_CHANNEL socket exists
04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: mListenSocketName: bluetoothd-9746c24e1d972b1d
04-14 23:00:40.175 D/        (  205): ListenSocket::Listen 2
04-14 23:00:40.175 D/        (  205): ListenSocket::Listen 1
04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: SetConnection
04-14 23:00:40.185 I/GeckoBluetooth(  205): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 0
04-14 23:00:40.185 I/GeckoBluetooth(  205): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel LISTEN_SOCKET
04-14 23:00:40.185 I/GeckoBluetooth(  205): OnConnectSuccess: ctl.start bluetoothd:-a bluetoothd-9746c24e1d972b1d
04-14 23:00:40.195 I/GeckoBluetooth(  205): OnError: BluetoothSocketInterface::Accept failed: 1

I did not see CMD_CHANNEL connection. I need to check why it fails :(
Attached patch WIP patch (experimental) (obsolete) — Splinter Review
(In reply to Shawn Huang [:shawnjohnjr] from comment #30)

It's a bit strange that property_set does not return any error but bluetoothd cannot get started.
But when I manually do "adb shell setprop ctl.start bluetoothd:-abluetoothd-9746c24e1d972b1d", everything works, bluetooth comes back again! It seems to me i'm very close to make it work, but something I can't figure out to handle ctl.start :-(
 
> attachment 8593219 [details] auto-recovery case:
> 04-14 23:00:40.175 I/GeckoBluetooth(  205): StartGonkBluetooth:
> StartGonkBluetooth
> 04-14 23:00:40.175 I/GeckoBluetooth(  205): StartGonkBluetooth:
> StartGonkBluetooth::Init
> 04-14 23:00:40.175 I/GeckoBluetooth(  205): Init:
> BluetoothDaemonInterface::Init
> 04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: new CMD_CHANNEL socket
> exists
> 04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: mListenSocketName:
> bluetoothd-9746c24e1d972b1d
> 04-14 23:00:40.175 D/        (  205): ListenSocket::Listen 2
> 04-14 23:00:40.175 D/        (  205): ListenSocket::Listen 1
> 04-14 23:00:40.175 I/GeckoBluetooth(  205): Init: SetConnection
> 04-14 23:00:40.185 I/GeckoBluetooth(  205): OnConnectSuccess:
> !!!!!!OnConnectSuccess, Channel 0
> 04-14 23:00:40.185 I/GeckoBluetooth(  205): OnConnectSuccess:
> !!!!!!OnConnectSuccess, Channel LISTEN_SOCKET
> 04-14 23:00:40.185 I/GeckoBluetooth(  205): OnConnectSuccess: ctl.start
> bluetoothd:-a bluetoothd-9746c24e1d972b1d
> 04-14 23:00:40.195 I/GeckoBluetooth(  205): OnError:
> BluetoothSocketInterface::Accept failed: 1
> 
> I did not see CMD_CHANNEL connection. I need to check why it fails :(
Attached patch (WIP) delay-100.patch (obsolete) — Splinter Review
The failure in comment 30 may relate to delay. Bluetoothd restarts after retry "ctl.start" in 100ms (attached patch). Need more investigation.
(In reply to Ben Tian [:btian] from comment #33)
> Created attachment 8593295 [details] [diff] [review]
> (WIP) delay-100.patch
> 
> The failure in comment 30 may relate to delay. Bluetoothd restarts after
> retry "ctl.start" in 100ms (attached patch). Need more investigation.

Hmm, did you generate a new socket name in |BluetoothDaemonInterface::Init|? The old name might be blocked and not available for listening or connecting.
Reset |isRestart| and |isInRestart| flags to handle normal BT on/off cases.
Attachment #8593295 - Attachment is obsolete: true
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #34)
> (In reply to Ben Tian [:btian] from comment #33)
> > Created attachment 8593295 [details] [diff] [review]
> > (WIP) delay-100.patch
> > 
> > The failure in comment 30 may relate to delay. Bluetoothd restarts after
> > retry "ctl.start" in 100ms (attached patch). Need more investigation.
> 
> Hmm, did you generate a new socket name in |BluetoothDaemonInterface::Init|?
> The old name might be blocked and not available for listening or connecting.


Thomas, but what I saw is ctl.start even don't execute the daemon, I tried to add log in daemon |main| function. Nothing happened.
(In reply to Shawn Huang [:shawnjohnjr] from comment #36)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #34)
> > (In reply to Ben Tian [:btian] from comment #33)
> > > Created attachment 8593295 [details] [diff] [review]
> > > (WIP) delay-100.patch
> > > 
> > > The failure in comment 30 may relate to delay. Bluetoothd restarts after
> > > retry "ctl.start" in 100ms (attached patch). Need more investigation.
> > 
> > Hmm, did you generate a new socket name in |BluetoothDaemonInterface::Init|?
> > The old name might be blocked and not available for listening or connecting.
> 
> 
> Thomas, but what I saw is ctl.start even don't execute the daemon, I tried
> to add log in daemon |main| function. Nothing happened.
typo:
s/ctl.start/property service
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #34)
> Hmm, did you generate a new socket name in |BluetoothDaemonInterface::Init|?
> The old name might be blocked and not available for listening or connecting.

I think 'yes' since [1] is executed while gecko restarts bluetoothd. The flow goes as normal init but cannot start bluetoothd (comment 32).

[1] https://dxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp#2109
I guess it's not acceptable to use usleep in gecko. So I tried to found why it get stopped.

Breakpoint 1, mozilla::dom::bluetooth::BluetoothDaemonInterface::OnConnectSuccess (this=0xb2a72840, 
    aChannel=mozilla::dom::bluetooth::BluetoothDaemonInterface::LISTEN_SOCKET)
    at /code/b2g37_v2_2/mozilla-b2g37_v2_2/dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp:1836
1836	        if (property_set("ctl.start", value.get()) < 0) {
(gdb) s
property_set (key=0xb61bf516 "ctl.start", value=0xacc358d8 "bluetoothd:-a bluetoothd-9fad494b468ab52e") at system/core/libcutils/properties.c:37

I cannot do next step in gdb.

system/core/libcutils/properties.c 
int property_set(const char *key, const char *value)
{
    return __system_property_set(key, value);
}


bionic/libc/bionic/system_properties.c
int __system_property_set(const char *key, const char *value)


It looks like we need to check why fail at |__system_property_set| in bionic.
hmm, when I attach gdb to debug bionic steps by steps, i did not do delay inside gecko, i CAN still turn on bluetooth daemon.
It looks like a timing issue but it seems easy to hit on flame-kk. I guess we'd better test it on msm8909.
I still don't have a full pictures for this problem.

http://androidxref.com/4.4.4_r1/xref/bionic/libc/bionic/system_properties.c#518

(gdb) n
__system_property_set (key=0xb61bf516 "ctl.start", value=<optimized out>) at bionic/libc/bionic/system_properties.c:555
555	    err = send_prop_msg(&msg);
(gdb) p err
$3 = <optimized out>
(gdb) s
send_prop_msg (msg=0xbe971734) at bionic/libc/bionic/system_properties.c:508
508	    if(r == sizeof(prop_msg)) {
(gdb) n
517	        pollfds[0].events = 0;
(gdb) n
516	        pollfds[0].fd = s;
(gdb) n
517	        pollfds[0].events = 0;
(gdb) n
518	        r = TEMP_FAILURE_RETRY(poll(pollfds, 1, 250 /* ms */));
(gdb) n
__system_property_set (key=0xb61bf516 "ctl.start", value=<optimized out>) at bionic/libc/bionic/system_properties.c:532
532	            result = 0;


Then I saw log:
04-14 23:02:59.650 I/GeckoBluetooth(  203): OnConnectSuccess: ctl.start bluetoothd:-a bluetoothd-e367dffa17fa4dec
04-14 23:08:40.740 E/bluetoothd( 1885): bluetoothd main enter
04-14 23:08:40.740 E/bluetoothd( 1885): bluetoothd main make_daemon
04-14 23:16:56.060 I/GeckoBluetooth(  203): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 1
04-14 23:16:56.060 I/GeckoBluetooth(  203): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel CMD_CHANNEL
04-14 23:16:56.060 D/        (  203): ListenSocket::Listen 1
04-14 23:16:56.130 I/GeckoBluetooth(  203): ServiceChanged: 1 client 0. new mClientId 0
04-14 23:16:56.140 I/GeckoBluetooth(  203): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 0
04-14 23:16:56.140 I/GeckoBluetooth(  203): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel LISTEN_SOCKET
04-14 23:16:56.140 I/GeckoBluetooth(  203): OnConnectSuccess: ctl.start bluetoothd:-a bluetoothd-e367dffa17fa4dec
04-14 23:17:00.620 E/HWComposer(  203): Non-uniform vsync interval: 4486599189
04-14 23:17:00.620 I/GeckoBluetooth(  203): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel 2
04-14 23:17:00.620 I/GeckoBluetooth(  203): OnConnectSuccess: !!!!!!OnConnectSuccess, Channel NTF_CHANNEL
04-14 23:17:00.770 I/bluedroid( 1885): init
Update:
After first 'ctl.start', I get 'stopping' while querying property "init.svc.bluetoothd" immediately. The next try in 100ms gets 'stopped' and restart succeeds. Not sure if bluetoothd has to be 'stopped' for restart.

(In reply to Ben Tian [:btian] from comment #38)
> (In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #34)
> > Hmm, did you generate a new socket name in |BluetoothDaemonInterface::Init|?
> > The old name might be blocked and not available for listening or connecting.
> 
> I think 'yes' since [1] is executed while gecko restarts bluetoothd. The
> flow goes as normal init but cannot start bluetoothd (comment 32).
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/source/dom/bluetooth/bluedroid/
> BluetoothDaemonInterface.cpp#2109
Changes:
- replace PostDelayedTask with usleep
- revise |BluetoothA2dpManager::Disconnect| to avoid crash
- remove logs

The patch can restart bluetoothd normally with HFP+A2DP connection on flame-kk. More testing is required.
Attachment #8593309 - Attachment is obsolete: true
Fix crash caused by null |mController| in |OnDisconnectError| of HFP and A2DP managers.

Steps to repro the crash:
1) connect HFP & A2DP to bluetooth headset
2) restart bluetooth daemon
3) turn off BT before connection is restored <--- crash happens

Note HFP and A2DP connection takes a while (~1 min in my trials) to restore after bluetooth daemon restarts.
Attachment #8593806 - Attachment is obsolete: true
Thanks Ben for taking care the rest of things when I'm unavailable.
Assignee: shuang → nobody
I tested on msm8909, sometimes we cannot reached to |BluetoothDaemonInterface::OnDisconnect| LISTEN_SOCKET condition. I guess we need to debug further.
Assignee: nobody → shuang
Add some comment to explain the approach. Note this patch still has bug that A2DP icon remains when bluetooth is turned OFF after crash recovery.
Attachment #8594646 - Attachment is obsolete: true
Attachment #8594692 - Flags: feedback?(tzimmermann)
(In reply to Ben Tian [:btian] from comment #47)
> Created attachment 8594692 [details] [diff] [review]
> [Orion] Patch 1 (v1): Restart bluetooth daemon automatically
> 
> Add some comment to explain the approach. Note this patch still has bug that
> A2DP icon remains when bluetooth is turned OFF after crash recovery.

I will take care the icon problem. :)
Comment on attachment 8594692 [details] [diff] [review]
[Orion] Patch 1 (v1): Restart bluetooth daemon automatically

This patch doesn't apply cleanly. Can you upload a version for the latest m-c? Thanks.
The patch is for Orion (2.2 based) as this is blocker requested by qcom. I'll prepare m-c patch for you.

(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #49)
> Comment on attachment 8594692 [details] [diff] [review]
> [Orion] Patch 1 (v1): Restart bluetooth daemon automatically
> 
> This patch doesn't apply cleanly. Can you upload a version for the latest
> m-c? Thanks.
Comment on attachment 8594692 [details] [diff] [review]
[Orion] Patch 1 (v1): Restart bluetooth daemon automatically

Review of attachment 8594692 [details] [diff] [review]:
-----------------------------------------------------------------

First of all, thanks to both of you for working on this issue.

The current patch however is very fragile and has two main problems: maintaining |mDaemonCrashed| in |BluetoothDaemonInterface|, and the use of signals. I'll outline the changes to cleanup both issues in steps (a) to (f). I hope they'll work in practice. ;)

::: dom/bluetooth/BluetoothService.cpp
@@ +254,5 @@
>      BT_WARNING("Failed to add settings change observer!");
>      return false;
>    }
>  
> +  // Only main process should observe restart-bluetooth notification.

Avoid this signal completely by adding a new notification to the notification handler.

::: dom/bluetooth/BluetoothService.h
@@ +325,4 @@
>  protected:
>    BluetoothService() : mEnabled(false)
>                       , mAdapterAddedReceived(false)
> +                     , mRestarting(false)

Maybe replace mRestarting and mEnabled with a state indicator with states such as SERVICE_DISABLED, SERVICE_STARTING, SERVICE_RESTARTING, SERVICE_ENABLED.

I'm not sure what amount of work to expect for a v2.2 fix, though. Should definitely be a separate patch.

::: dom/bluetooth/bluedroid/BluetoothA2dpManager.cpp
@@ +555,5 @@
>    if (!sBtA2dpInterface) {
>      BT_LOGR("sBluetoothA2dpInterface is null");
> +    if (aController) {
> +      aController->NotifyCompletion(NS_LITERAL_STRING(ERR_NO_AVAILABLE_RESOURCE));
> +    }

This file's changes look like they are self-contained and should go into a separate patch.

::: dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp
@@ +18,4 @@
>  #include "mozilla/ipc/ListenSocket.h"
>  #include "mozilla/ipc/UnixSocketConnector.h"
>  #include "mozilla/unused.h"
> +#include "nsIObserverService.h"

Way too low-level for observers and signals.

@@ +1955,5 @@
>  
> +/*
> + * When Bluetooth daemon crashed, the first point we beware of the crash is
> + * |OnDisconnect(CMD_CHANNEL)| called with empty |mResultHandlerQ|. The
> + * procedure to recover from crash consists of several steps.

There are three cases for restarting

  1) during startup
  2) during regular service
  3) during shutdown

The given assumption doesn't work reliably. If bluetoothd crashes during 1) or 3), |mResultHandlerQ| will contain an element. OTOH, only 2) seems restart-worthy, so the limitation might not be a problem. If you want to handle case 1) and 3), you should call the current result handler's |OnError| method.

In any case, you should definitely document all the cases.

@@ +1979,2 @@
>        sNotificationHandler->AdapterStateChangedNotification(false);
>        sNotificationHandler = nullptr;

The existence of a result handler can tell you if the daemon crashed or shut down correctly.

(a) Don't clear |sNotificationHandler| here, but below in the LISTEN_SOCKET branch.

@@ +1980,5 @@
>        sNotificationHandler = nullptr;
> +      mDaemonCrashed = true;
> +
> +      // Append a dummy to fill result handler queue as cleanup procedure
> +      mResultHandlerQ.AppendElement(nullptr);

(b) Don't append an element.

@@ +1996,3 @@
>        mListenSocket->Close();
>        break;
>      case LISTEN_SOCKET: {

(c) You should distinguish between 'stopped' and 'crashed' by the existence of a result handler in |mResultHandlerQ|.

For 'stopped' you should keep the code as it is.

@@ +2002,5 @@
>          if (res) {
>            res->Cleanup();
>          }
> +
> +        if (mDaemonCrashed) {

(d) For this case where the daemon crashed, you should inform the notification handler about it.

Instead of sending a signal, add a new callback to the notification-handler interface, let's say 'RequestRestart.' The notification handler is the running instance of |BluetoothServiceBluedroid|. This is also the right place to keep state.

@@ +2016,1 @@
>        }

(e) At the end of the LISTEN_SOCKET branch, clear |sNotificationHandler|.

@@ +2290,5 @@
>  void
>  BluetoothDaemonInterface::Cleanup(BluetoothResultHandler* aRes)
>  {
> +  // Don't trigger cleanup procedure during daemon crash recovery
> +  if (mDaemonCrashed) {

Shouldn't this field have been cleared near line 2014 as part of the Recovery step 2?

(f) With points (a) to (e), it should be possible to make a decision about cleaning up in |BuetoothServiceBluedroid| and avoid a call to this method completely.

::: dom/bluetooth/bluedroid/BluetoothDaemonInterface.h
@@ +151,5 @@
>    nsAutoPtr<BluetoothDaemonHandsfreeInterface> mHandsfreeInterface;
>    nsAutoPtr<BluetoothDaemonA2dpInterface> mA2dpInterface;
>    nsAutoPtr<BluetoothDaemonAvrcpInterface> mAvrcpInterface;
> +
> +  bool mDaemonCrashed;

Keeping |mDeamonCrashed| in |BluetoothDaemonInterface| is fragile as it expects behavior from calling classes. If the daemon is gone, |BluetoothDaemonInterface| should rather return into it's original state and not care about restarting. The restart should be handled by |BluetoothServiceBluedroid| instead.

::: dom/bluetooth/bluedroid/hfp/BluetoothHfpManager.cpp
@@ +1225,4 @@
>  BluetoothHfpManager::OnDisconnectError()
>  {
>    MOZ_ASSERT(NS_IsMainThread());
> +  NS_ENSURE_TRUE_VOID(mController);

Same as for A2DP Manager. There should be a patch for preparing the managers before implementing the actual restart.
Attachment #8594692 - Flags: feedback?(tzimmermann) → feedback-
(In reply to Ben Tian [:btian] from comment #50)
> The patch is for Orion (2.2 based) as this is blocker requested by qcom.
> I'll prepare m-c patch for you.

Thanks!
Whiteboard: [caf priority: p2][CR 810057][ETA:4/20] → [caf priority: p2][CR 810057][ETA:4/27]
Comment on attachment 8594692 [details] [diff] [review]
[Orion] Patch 1 (v1): Restart bluetooth daemon automatically

Review of attachment 8594692 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bluetooth/BluetoothService.cpp
@@ +254,5 @@
>      BT_WARNING("Failed to add settings change observer!");
>      return false;
>    }
>  
> +  // Only main process should observe restart-bluetooth notification.

Just want to confirm the comments you gave:
Completely remove signal and observer usage and do not need to handle restart in BluetoothService, but BluetoothServiceBluedroid. Would you mind elaborate a bit adding a new notification to the notification handler?
Flags: needinfo?(tzimmermann)
Hi!

What I meant was that the notification should be something to signal to the notification handler that the backend code needs a restart. The signal/observer code is just not the right mechanism here.

I'm not completely sure if |BluetoothService| needs to be modified, but it should be possible to implement the notification handler there, so it will be able to restart.

For the actual implementation, there could be a method |BluetoothNotificationHandler::BackendErrorNotification(bool aRestart)|. This would be called in |BluetoothDaemonConnection::OnDisconnect| at the end of the LISTEN_SOCKET branch. |BluetoothServiceBluedroid| should implement this method and perform the restart when it's called. The argument |aRestart| could tell whether the backend requests a restart or should simply be cleaned up and destroyed.

I don't particularly care about Orion, but that's what I think we should do on m-c.
Flags: needinfo?(tzimmermann)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #54)
> Hi!
> 
> What I meant was that the notification should be something to signal to the
> notification handler that the backend code needs a restart. The
> signal/observer code is just not the right mechanism here.

> I don't particularly care about Orion, but that's what I think we should do
> on m-c.

Thanks for the explanation. We will modify accordingly.
Attached patch bug1143925-mc.patch (obsolete) — Splinter Review
Revision:
1. Add |BluetoothNotificationHandler::BackendErrorNotification|
2. Call BackendErrorNotification from |BluetoothDaemonConnection::OnDisconnect|
3. Restart Bluetooth in |BluetoothServiceBluedroid::CompleteToggleBt|
Testing on m-c, now i got crash, not sure this is related to m-c. This seems always happen.

04-27 15:54:21.650 I/DEBUG   (  198): backtrace:
04-27 15:54:21.650 I/DEBUG   (  198):     #00  pc 00003514  /system/bin/bluetoothd
04-27 15:54:21.650 I/DEBUG   (  198):     #01  pc 00004c73  /system/bin/bluetoothd
04-27 15:54:21.660 I/DEBUG   (  198):     #02  pc 00004d6b  /system/bin/bluetoothd
04-27 15:54:21.660 I/DEBUG   (  198):     #03  pc 0000388b  /system/bin/bluetoothd
04-27 15:54:21.660 I/DEBUG   (  198):     #04  pc 00003857  /system/bin/bluetoothd
04-27 15:54:21.660 I/DEBUG   (  198):     #05  pc 00002475  /system/bin/bluetoothd
04-27 15:54:21.660 I/DEBUG   (  198):     #06  pc 0000255d  /system/bin/bluetoothd
04-27 15:54:21.660 I/DEBUG   (  198):     #07  pc 00000aaf  /system/lib/libfdio.so
04-27 15:54:21.660 I/DEBUG   (  198):     #08  pc 00000b61  /system/lib/libfdio.so
04-27 15:54:21.660 I/DEBUG   (  198):     #09  pc 00000d6b  /system/lib/libfdio.so (epoll_loop+66)
04-27 15:54:21.670 I/DEBUG   (  198):     #10  pc 00004fdf  /system/bin/bluetoothd
04-27 15:54:21.670 I/DEBUG   (  198):     #11  pc 0000e4a3  /system/lib/libc.so (__libc_init+50)
04-27 15:54:21.670 I/DEBUG   (  198):     #12  pc 00000fcc  /system/bin/bluetoothd
04-27 15:54:21.670 I/DEBUG   (  198): 
04-27 15:54:21.670 I/DEBUG   (  198): stack:
04-27 15:54:21.670 I/DEBUG   (  198):          bed25488  b811f1a8  [heap]
04-27 15:54:21.670 I/DEBUG   (  198):          bed2548c  b6ef5fc3  /system/lib/libc.so (dlmalloc+4254)
04-27 15:54:21.680 I/DEBUG   (  198):          bed25490  00000014  
04-27 15:54:21.680 I/DEBUG   (  198):          bed25494  00000008  
04-27 15:54:21.680 I/DEBUG   (  198):          bed25498  b6ef4f25  /system/lib/libc.so (dlmalloc)
04-27 15:54:21.680 I/DEBUG   (  198):          bed2549c  b6f74040  /system/bin/bluetoothd
04-27 15:54:21.680 I/DEBUG   (  198):          bed254a0  00000000  
04-27 15:54:21.680 I/DEBUG   (  198):          bed254a4  00000000  
04-27 15:54:21.690 I/DEBUG   (  198):          bed254a8  b6e80010  [anon:libc_malloc]
04-27 15:54:21.690 I/DEBUG   (  198):          bed254ac  00000008  
04-27 15:54:21.690 I/DEBUG   (  198):          bed254b0  b6f51340  /system/lib/libfdio.so
04-27 15:54:21.690 I/DEBUG   (  198):          bed254b4  b6f5138e  /system/lib/libfdio.so
04-27 15:54:21.690 I/DEBUG   (  198):          bed254b8  b6f51340  /system/lib/libfdio.so
04-27 15:54:21.690 I/DEBUG   (  198):          bed254bc  b6ef2cb1  /system/lib/libc.so (malloc+12)
04-27 15:54:21.690 I/DEBUG   (  198):          bed254c0  00000000  
04-27 15:54:21.700 I/DEBUG   (  198):          bed254c4  b6f6e5a5  /system/bin/bluetoothd
I guess I did something wrong with cleanup. It causes restart daemon and the first PDU core_unregister_module.

(gdb) bt
#0  0xb6f5d540 in unregister_bt_hf () at system/bluetoothd/src/bt-hf-io.c:1296
#1  0xb6f5eca0 in core_unregister_module (service=<optimized out>) at system/bluetoothd/src/core.c:62
#2  0xb6f5ed9a in unregister_module (cmd=0xb6e6f010) at system/bluetoothd/src/core-io.c:77
#3  0xb6f5d8b8 in handle_pdu (handler=<optimized out>, cmd=<optimized out>, value=<optimized out>, field=0xb6f5f8c6 "opcode") at system/bluetoothd/src/bt-proto.c:50
#4  handle_pdu_by_opcode (cmd=<optimized out>, handler=<optimized out>) at system/bluetoothd/src/bt-proto.c:64
#5  0xb6f5d884 in handle_pdu (handler=<optimized out>, cmd=0xb6e6f010, value=<optimized out>, field=0xb6f5f8be "service") at system/bluetoothd/src/bt-proto.c:50
#6  handle_pdu_by_service (cmd=cmd@entry=0xb6e6f010, handler=<optimized out>) at system/bluetoothd/src/bt-proto.c:57
#7  0xb6f5c4a4 in handle_pdu (cmd=0xb6e6f010) at system/bluetoothd/src/bt-io.c:315
#8  0xb6f5c58a in io_state_in (io_state=0xb6f63040 <io_state>) at system/bluetoothd/src/bt-io.c:129
#9  io_fd_event_in (fd=<optimized out>, data=0xb6f63040 <io_state>) at system/bluetoothd/src/bt-io.c:256
#10 0xb6f3fab0 in fd_events_func (fd=8, epoll_events=1, evfuncs=0xb6f63048 <io_state+8>) at system/libfdio/src/loop.c:127
#11 0xb6f3fb62 in epoll_loop_iteration () at system/libfdio/src/loop.c:208
#12 0xb6f3fd6e in epoll_loop (init=0xb6f5eec9 <init>, uninit=0xb6f5eeb9 <uninit>, data=data@entry=0xbe92a9dc) at system/libfdio/src/loop.c:230
#13 0xb6f5f014 in main (argc=3, argv=0xbe92aa34) at system/bluetoothd/src/main.c:179
Due to sBluetoothHfpInterface is not null. So cleanup is called again.


mozilla::dom::bluetooth::BluetoothDaemonHandsfreeInterface::Cleanup (this=0xaef49688, aRes=0xab7aa3c0)
    at ../../../../../../../code/m-c/mozilla-central/dom/bluetooth/bluedroid/BluetoothDaemonHandsfreeInterface.cpp:1564
mozilla::dom::bluetooth::BluetoothHfpManager::InitHfpInterface (aRes=0xab7aa1d0)
    at ../../../../../../../code/m-c/mozilla-central/dom/bluetooth/bluedroid/hfp/BluetoothHfpManager.cpp:432
(In reply to Shawn Huang [:shawnjohnjr] from comment #61)
> Due to sBluetoothHfpInterface is not null. So cleanup is called again.
> 
> 
> mozilla::dom::bluetooth::BluetoothDaemonHandsfreeInterface::Cleanup
> (this=0xaef49688, aRes=0xab7aa3c0)
>     at
> ../../../../../../../code/m-c/mozilla-central/dom/bluetooth/bluedroid/
> BluetoothDaemonHandsfreeInterface.cpp:1564
> mozilla::dom::bluetooth::BluetoothHfpManager::InitHfpInterface
> (aRes=0xab7aa1d0)
>     at
> ../../../../../../../code/m-c/mozilla-central/dom/bluetooth/bluedroid/hfp/
> BluetoothHfpManager.cpp:432

My patch is not correct :( I will fix it next revision.
Comment on attachment 8594692 [details] [diff] [review]
[Orion] Patch 1 (v1): Restart bluetooth daemon automatically

Review of attachment 8594692 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp
@@ +2290,5 @@
>  void
>  BluetoothDaemonInterface::Cleanup(BluetoothResultHandler* aRes)
>  {
> +  // Don't trigger cleanup procedure during daemon crash recovery
> +  if (mDaemonCrashed) {

Hi Thomas,
I tried to remove |mDeamonCrashed| as you suggested,
and |BluetoothServiceBluedroid::BackendErrorNotification| currently tries to do |stopbluetooth| and wait for execution CompleteToggleBt to trigger startbluetooth again in |BluetoothServiceBluedroid|.

If I understand correctly, we call |AdapterStateChangedNotification|in|BluetoothDaemonInterface::OnDisconnect|. Based on (e), calling |sNotificationHandler->BackendErrorNotification(true)| and set sNotificationHandler to nullptr  at the end of LISTEN_SOCKET branch. Since |AdapterStateChangedNotification| triggers |cleanup|, this makes |BluetoothDaemonInterface::Cleanup| been called and set sNotificationHandler nullptr first. How can avoid a call to this |cleanup| method if |sNotificationHandler->BackendErrorNotification(true)| called later than?
Flags: needinfo?(tzimmermann)
(In reply to Shawn Huang [:shawnjohnjr] from comment #63)
> Comment on attachment 8594692 [details] [diff] [review]
> [Orion] Patch 1 (v1): Restart bluetooth daemon automatically
> 
> Review of attachment 8594692 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp
> @@ +2290,5 @@
> >  void
> >  BluetoothDaemonInterface::Cleanup(BluetoothResultHandler* aRes)
> >  {
> > +  // Don't trigger cleanup procedure during daemon crash recovery
> > +  if (mDaemonCrashed) {
> 
> Hi Thomas,
> I tried to remove |mDeamonCrashed| as you suggested,
> and |BluetoothServiceBluedroid::BackendErrorNotification| currently tries to
> do |stopbluetooth| and wait for execution CompleteToggleBt to trigger
> startbluetooth again in |BluetoothServiceBluedroid|.
> 
> If I understand correctly, we call
> |AdapterStateChangedNotification|in|BluetoothDaemonInterface::OnDisconnect|.
> Based on (e), calling |sNotificationHandler->BackendErrorNotification(true)|
> and set sNotificationHandler to nullptr  at the end of LISTEN_SOCKET branch.
> Since |AdapterStateChangedNotification| triggers |cleanup|, this makes
> |BluetoothDaemonInterface::Cleanup| been called and set sNotificationHandler
> nullptr first. How can avoid a call to this |cleanup| method if
> |sNotificationHandler->BackendErrorNotification(true)| called later than?

Maybe I shouldn't do Stopbluetooth in |BluetoothServiceBluedroid::BackendErrorNotification|. We shall call |BackendErrorNotification| to inform BluetoothServiceBluedroid firstand then call |AdapterStateChangedNotification(false)|.
Hi Shawn,

Just discussed this with Ben in IRC. He asked to remove the call to |AdapterStateChangedNotification|. That is probably correct. I added the call to cleanup after crashes. Not calling |AdapterStateChangedNotification| up during recoveries seems the right thing to do.

Another problem is the order of disconnect operations here. For a running daemon, there are three sockets: LISTEN, CMD and NTF. There will be a disconnect operation for each of them. Since this is not a regular shutdown, the disconnects are seen by Gecko in any order.

What could happen is that |BackendErrorNotification| tries a restart before they have been disconnected correctly. And later the remaining disconnects interfere with the restart procedure.

We need to make sure that the backend has seen |OnDisconnect| for all connected socket, before it calls |BackendErrorNotification|. I guess that the socket classes need a method like is |IsOpen|. Only after all three have been closed, |BackendErrorNotification| gets called.
Flags: needinfo?(tzimmermann)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #65)
> Hi Shawn,
> 
> Just discussed this with Ben in IRC. He asked to remove the call to
> |AdapterStateChangedNotification|. That is probably correct. I added the
> call to cleanup after crashes. Not calling |AdapterStateChangedNotification|
> up during recoveries seems the right thing to do.
> 
I think we have to keep |AdapterStateChangedNotification| otherwise cleanup procedure won't be executed.
If we don't do |AdapterStateChangedNotification|, we can hit problem mentioned in Comment 60 and Comment 61. |cleanup| will be called again during Init for HFP daemon interface.

> Another problem is the order of disconnect operations here. For a running
> daemon, there are three sockets: LISTEN, CMD and NTF. There will be a
> disconnect operation for each of them. Since this is not a regular shutdown,
> the disconnects are seen by Gecko in any order.
But I always saw onDisconnect order as 'CMD->NTF->LISTEN', so I assume this is correct order.
> I think we have to keep |AdapterStateChangedNotification| otherwise cleanup
> procedure won't be executed.
> If we don't do |AdapterStateChangedNotification|, we can hit problem
> mentioned in Comment 60 and Comment 61. |cleanup| will be called again
> during Init for HFP daemon interface.

You could do anything that happens in |AdapterStateChangedNotification| in |BackendErrorNotification| as well, and tweak it a bit for recovery. Might be easier.

> But I always saw onDisconnect order as 'CMD->NTF->LISTEN', so I assume this
> is correct order.

The problem is that this order depends on the order of sockets in the I/O loop's poll function. And that could depend on all kinds of stuff: file-descriptor numbers, details in the poll function, kernel version.
I saw a behavior that bluedroid terminated unexpected during restart.

04-27 15:43:56.520 I/bluedroid( 2126): init
04-27 15:43:56.520 I/bte_conf( 2126): Attempt to load stack conf from /etc/bluetooth/bt_stack.conf
04-27 15:43:56.520 E/BTIF_CORE( 2126): mutex: 0xb6ee9244 is not initialized
04-27 15:43:56.520 I/GKI_LINUX( 2126): gki_task_entry: gki_task_entry task_id=1 [BTIF] starting
04-27 15:43:56.520 I/GKI_LINUX( 2126): gki_task_entry: gki_task task_id=1 [BTIF] terminating
04-27 15:43:56.520 I/GKI_LINUX( 2126): GKI_exit_task: GKI_exit_task 1 done
04-27 15:43:56.520 I/GKI_LINUX( 2126): GKI_destroy_task: GKI_shutdown(): task [BTIF] terminated
04-27 15:43:56.520 E/BTIF_CORE( 2126): mutex: 0xb6ee9244 is not initialized
(In reply to Shawn Huang [:shawnjohnjr] from comment #68)
> 04-27 15:43:56.520 E/BTIF_CORE( 2126): mutex: 0xb6ee9244 is not initialized
> 04-27 15:43:56.520 E/BTIF_CORE( 2126): mutex: 0xb6ee9244 is not initialized
Oh my fault again. It caused by |cleanup| command sent during |StartBluetooth|.

#0  lock_slot (mutex=0xb6e9f244 <mutex_bt_disable>) at external/bluetooth/bluedroid/main/../btif/include/btif_sock_util.h:45
#1  0xb6d08cf4 in btif_shutdown_bluetooth () at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:777
#2  0xb6d06590 in cleanup () at external/bluetooth/bluedroid/main/../btif/src/bluetooth.c:194
#3  0xb6f8e290 in unregister_bt_core () at system/bluetoothd/src/bt-core-io.c:1375
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #67)
> > I think we have to keep |AdapterStateChangedNotification| otherwise cleanup
> > procedure won't be executed.
> > If we don't do |AdapterStateChangedNotification|, we can hit problem
> > mentioned in Comment 60 and Comment 61. |cleanup| will be called again
> > during Init for HFP daemon interface.
> 
> You could do anything that happens in |AdapterStateChangedNotification| in
> |BackendErrorNotification| as well, and tweak it a bit for recovery. Might
> be easier.
> 
> > But I always saw onDisconnect order as 'CMD->NTF->LISTEN', so I assume this
> > is correct order.
> 
> The problem is that this order depends on the order of sockets in the I/O
> loop's poll function. And that could depend on all kinds of stuff:
> file-descriptor numbers, details in the poll function, kernel version.

Ok, i see. I agree with you.
Attachment #8599713 - Attachment is obsolete: true
Attachment #8599713 - Flags: feedback?(brsun)
Attachment #8599720 - Attachment is obsolete: true
Attachment #8599720 - Flags: feedback?(btian)
Attachment #8599720 - Flags: feedback?(brsun)
Attachment #8599724 - Attachment description: Bug 1143925 - Restart bluetoothd daemon automatically for API V1 → Bug 1143925 - Restart bluetoothd daemon automatically for API V1 (m-c)
Attachment #8595790 - Flags: review?(tzimmermann) → review+
Comment on attachment 8599724 [details] [diff] [review]
Bug 1143925 - Restart bluetoothd daemon automatically for API V1 (m-c)

Review of attachment 8599724 [details] [diff] [review]:
-----------------------------------------------------------------

Impressive work, Shawn!

The code looks good, with just a few nits. I did a quick test with my headset and it worked for me. Thanks you so much for working on this!

::: dom/bluetooth/bluedroid/BluetoothDaemonInterface.cpp
@@ +15,5 @@
>  #include "BluetoothDaemonSetupInterface.h"
>  #include "BluetoothDaemonSocketInterface.h"
>  #include "BluetoothInterfaceHelpers.h"
>  #include "mozilla/ipc/ListenSocket.h"
> +#include "mozilla/ipc/SocketBase.h"

I think this is included by ListenSocket.h as well.

@@ +1746,5 @@
>   */
>  #define container(_t, _v, _m) \
>    ( (_t*)( ((const unsigned char*)(_v)) - offsetof(_t, _m) ) )
>  
> +static const int sRetryInterval = 100; // ms

This should be located near the top of the file.

::: dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp
@@ +3092,5 @@
>  }
>  #endif
> +
> +void
> +BluetoothServiceBluedroid::BackendErrorNotification(bool aRestart)

Just a nit: I'd rather call this 'aCrashed' or provide an enumerator with possible errors. The backend should inform the service what happened and the service then decides what to do.

@@ +3099,5 @@
> + // Recovery step 2 stop bluetooth
> + if (aRestart) {
> +  BT_LOGR("Set aRestart = true");
> +  sIsRestart = true;
> +  BT_LOGR("Reocvery step2: StopBluetooth()");

Maybe have just one single message with a descriptive string.
Attachment #8599724 - Flags: review?(tzimmermann) → review+
This patch at least fix build breaks. I will open the patch for further work.
Attachment #8599844 - Flags: review?(tzimmermann)
see also Bug 1160126 - Restart bluetoothd daemon automatically for API V2
Bug 1160126 targets for m-c (v3) for Bluetooth API V2.
See Also: → 1160126
Follow up bug for FXOS v2.2: 
Bug 1160127 - Send profile level disconnection notification to make status bar reset connection status (HFP/A2DP)
Comment on attachment 8599844 [details] [diff] [review]
Bug 1143925 - Restart bluetoothd daemon automatically for API V2 (3/3) (m-c only)

Review of attachment 8599844 [details] [diff] [review]:
-----------------------------------------------------------------

This is patch 1 for v2? I think you should merge the methods if possible, but r+ for the code in general.
Attachment #8599844 - Flags: review?(tzimmermann) → review+
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #81)
> Comment on attachment 8599844 [details] [diff] [review]
> Bug 1143925 - Restart bluetoothd daemon automatically for API V2 (3/3) (m-c
> only)
> 
> Review of attachment 8599844 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is patch 1 for v2? I think you should merge the methods if possible,
> but r+ for the code in general.

Sure. I will merge the methods.
Rebase patch to v2.2 and I tested it on flame-kk using v2.2 Gecko.
Attachment #8599863 - Attachment description: Bug 1143925 - Restart bluetoothd daemon automatically (v2.2), r=tzimmermann → Bug 1143925 - Restart bluetoothd daemon automatically (1/2) (v2.2), r=tzimmermann
Attachment #8599863 - Attachment description: Bug 1143925 - Restart bluetoothd daemon automatically (1/2) (v2.2), r=tzimmermann → Bug 1143925 - Restart bluetoothd daemon automatically, r=tzimmermann (1/2) (v2.2)
Comment on attachment 8599863 [details] [diff] [review]
Bug 1143925 - Restart bluetoothd daemon automatically, r=tzimmermann (1/2)  (v2.2)

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): CAF requested crash recovery feature
User impact if declined: user needs to manually restart daemon
Testing completed: Tested v2.2 on flame-kk
Risk to taking this patch (and alternatives if risky): none unless restart flag unexpected been set 
String or UUID changes made by this patch: none
Attachment #8599863 - Flags: approval-mozilla-b2g37?
Comment on attachment 8599870 [details] [diff] [review]
Bug 1143925 - Avoid crash for HFP/A2DP manager during restart daemon, r=tzimmermann (2/2)(v.2.2)

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): CAF requested crash recovery feature
User impact if declined: user needs to manually restart daemon
Testing completed: Tested v2.2 on flame-kk
Risk to taking this patch (and alternatives if risky): none unless restart flag unexpected been set 
String or UUID changes made by this patch: none
Attachment #8599870 - Flags: approval-mozilla-b2g37?
Whiteboard: [caf priority: p2][CR 810057][ETA:4/27] → [caf priority: p2][CR 810057]
waiting for m-c landing here before considering branch approval.
https://hg.mozilla.org/mozilla-central/rev/c64572d32a9d
https://hg.mozilla.org/mozilla-central/rev/aa2fbf4cb5c9
https://hg.mozilla.org/mozilla-central/rev/a564a069e8f5
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S11 (1may)
Attachment #8599863 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Attachment #8599870 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
I also tested v2.2 patch sets on device 'Orion'. Things look good.
I think this bug is a new enhancement, so set in‑moztrap flag for tracking.
Flags: in-moztrap?(twen)
Blocks: 1115656
This issue is verified fixed on the latest 3.0 and 2.2 Nightly Flame builds.

Actual Results: The bluetoothd process automatically restarted on 5/5 kills.

Environmental Variables:
Device: Flame 3.0
BuildID: 20150526010203
Gaia: 7cd4130d4f988562a77d126860408ada65bb95ef
Gecko: 43f2f0c506ea
Gonk: 040bb1e9ac8a5b6dd756fdd696aa37a8868b5c67
Version: 41.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0

Environmental Variables:
Device: Flame 2.2
BuildID: 20150526002558
Gaia: 6a8d171d00efe8b27cba91bf1d789ab824579664
Gecko: 46f6c7327ab0
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
Status: RESOLVED → VERIFIED
This is covered in moztrap in https://moztrap.mozilla.org/manage/cases/?filter-suite=182.
Flags: in-moztrap?(twen) → in-moztrap+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: