Closed Bug 969447 Opened 10 years ago Closed 10 years ago

Unable to turn bluetooth on after turning it on and off several times

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:1.3+, firefox28 wontfix, firefox29 wontfix, firefox30 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

RESOLVED FIXED
1.4 S1 (14feb)
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: ggrisco, Assigned: ben.tian)

References

()

Details

(Whiteboard: [caf priority: p2][CR 610008])

Attachments

(2 files, 1 obsolete file)

1. Connected a carkit to device under test.
2. Turn on WIFI connect to an APN.
3. Do BT on and off, 10 times.  After that Device UI is not allowing us to do BT on and off.

Note: Wifi settings can be turned off and on ok.

Note: IF we close "Settings" application from home screen then we are able to turn on and turn off BT.
blocking-b2g: --- → 1.3?
Keywords: qawanted
Haven't been able to reproduce this on Buri using:

Gaia   1527d1e450364e383eeb95ff898dca2042e2b4b5
SourceStamp 0a6d83aabb02
BuildID 20140207004002
Version 28.0
Base Image: 1.2cfg

I was pairing to a Plantronics headset because I don't have a car kit available to test.
(In reply to Marcia Knous [:marcia - use needinfo] from comment #1)
> Haven't been able to reproduce this on Buri using:
> 
> Gaia   1527d1e450364e383eeb95ff898dca2042e2b4b5
> SourceStamp 0a6d83aabb02
> BuildID 20140207004002
> Version 28.0
> Base Image: 1.2cfg
> 
> I was pairing to a Plantronics headset because I don't have a car kit
> available to test.

We might want to check if this reproduces on a Sora device, as this could be JB-specific potentially.
blocking-b2g: 1.3? → 1.3+
Results from another testing session with Sora:

Sora Device 1, using a hand rolled build and:

Gaia   741869c9f0eaae242351f312c18fa6051715c086
SourceStamp
BuildID 20140207155315
Version 28.0

Sora Device 2, using an original build that we got with the device:
Gaia   ad3fa846802285b5c6802fdc09f82de16749a71e
SourceStamp
BuildID 20140127210623
Version 28.0a2

I have not been able to reproduce the specific issue as described. Turning the bluetooth on or off more than 10 times at a normal pace on this device did not result in being unable to turn it on again.

When the issue was seen how rapidly are you turning it on and off?
(In reply to Marcia Knous [:marcia - use needinfo] from comment #3)
> Results from another testing session with Sora:
> 
> Sora Device 1, using a hand rolled build and:
> 
> Gaia   741869c9f0eaae242351f312c18fa6051715c086
> SourceStamp
> BuildID 20140207155315
> Version 28.0
> 
> Sora Device 2, using an original build that we got with the device:
> Gaia   ad3fa846802285b5c6802fdc09f82de16749a71e
> SourceStamp
> BuildID 20140127210623
> Version 28.0a2
> 
> I have not been able to reproduce the specific issue as described. Turning
> the bluetooth on or off more than 10 times at a normal pace on this device
> did not result in being unable to turn it on again.
> 
> When the issue was seen how rapidly are you turning it on and off?

To be clear to all, a "hand-rolled" build is one using mozilla RIL/Gonk.   the "original build" was a partner build that contained QC RIL/Gonk"
Greg - We need clearer STR here, as we cannot reproduce. Can you clarify?
Flags: needinfo?(ggrisco)
Keywords: qawanted
Renoming as this is not actionable.
blocking-b2g: 1.3+ → 1.3?
I have tried a few devices, and I can reproduce this bug that Greg reported on QRD device with build 234.
My STR as the following:
1. Turn on Wifi, connect to one STA.
2. Turn on Bluetooth, pair/connect with one Bluetooth headset. Then turn off bluetooth headset.
3. Turn off/on Bluetooth again (in this way, it makes reconnection been initiated).
Repeat Step 3 a few times (less than 10 times)
4. Setting would be frozen (either cannot scan or go back previous Wireless setting page)

I checked the log, the BluetoothProfileController seems start twice when reconnecting, and that is totally not normal.
I don't see the some symptom on Buri.
I will take this bug first and continually work on the root cause ASAP using QRD device, ETA to update further results on Sunday afternoon.
Assignee: nobody → shuang
Flags: needinfo?(ggrisco)
ProfileController was been executed twice with internal 200ms while reconnecting to the remote device.
12-31 19:28:42.859 I/GeckoBluetooth(  245): Start: [HFP/HSP] 
12-31 19:28:43.789 I/GeckoBluetooth(  245): OnConnect: [HFP/HSP] <SocketConnectionError>
12-31 19:28:43.789 I/GeckoBluetooth(  245): Next: [A2DP] 
12-31 19:28:43.829 I/GeckoBluetooth(  245): OnConnect: [A2DP] <A2dpConnectionError>
12-31 19:28:44.039 I/GeckoBluetooth(  245): Start: [HFP/HSP] 
12-31 19:28:44.109 I/GeckoBluetooth(  245): OnConnect: [HFP/HSP] <SocketConnectionError>
12-31 19:28:44.109 I/GeckoBluetooth(  245): Next: [A2DP] 
12-31 19:28:44.109 I/GeckoBluetooth(  245): OnConnect: [A2DP] <Operation already in progress>
I also tried on OpenC device, I don't see this bug on OpenC device. I will focus on QRD first.
Update build 234 info:
gecko: e01ea792cdfb50d06ee217b1a6af7d9b6f171004
gaia: f9a37c77efb4621a1f57e4695b497d18601fe134
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #8)
> ProfileController was been executed twice with internal 200ms while
> reconnecting to the remote device.
> 12-31 19:28:42.859 I/GeckoBluetooth(  245): Start: [HFP/HSP] 
> 12-31 19:28:43.789 I/GeckoBluetooth(  245): OnConnect: [HFP/HSP]
> <SocketConnectionError>
> 12-31 19:28:43.789 I/GeckoBluetooth(  245): Next: [A2DP] 
> 12-31 19:28:43.829 I/GeckoBluetooth(  245): OnConnect: [A2DP]
> <A2dpConnectionError>
> 12-31 19:28:44.039 I/GeckoBluetooth(  245): Start: [HFP/HSP] 
> 12-31 19:28:44.109 I/GeckoBluetooth(  245): OnConnect: [HFP/HSP]
> <SocketConnectionError>
> 12-31 19:28:44.109 I/GeckoBluetooth(  245): Next: [A2DP] 
> 12-31 19:28:44.109 I/GeckoBluetooth(  245): OnConnect: [A2DP] <Operation
> already in progress>

I saw twice ProfileController Start() been called, it's because Setting call twice connect during Bluetooth on/off multiple times. And OnConnect() always had been called, so it looks BluetoothProfileController does not block Setting. After doing 'make reset-gaia', I cannot hit the bug now. I have no idea why.
Now I cannot reproduce on QRD, still don't have any idea why now it's so hard to hit, but yesterday when I hit the bug, I noticed that "Fail to Connect" dialog shows up twice at the same time as Comment 8 shows two Controllers were been started, the interval is 200ms. Now I don't see reconnection failure dialog shows up twice after Bluetooth on/off.
Now Gina and Ben are looking at this issue (Shawn sick leave today). Just had a discussion with them, and I think they have found the root cause of how this happened. We'll provide a hotfix before tomorrow for Greg to confirm if the patch can solve the problem.
We got a device and tried. We could reproduce after tapping on the switch rapidly for many times (sometimes less than 20, sometimes more than 30, but I have to tap very quickly). After lots of tests, we think it may be better to solve this from Gaia than Gecko. We'll confirm several points with Gaia developers in Taipei tomorrow morning.

(In reply to Greg Grisco from comment #0)
> 1. Connected a carkit to device under test.

Should be able to reproduce with a common Bluetooth headset.

> 2. Turn on WIFI connect to an APN.

Step 2 is not necessary.

> 3. Do BT on and off, 10 times.  After that Device UI is not allowing us to
> do BT on and off.

Couldn't reproduce within 10 times on Open C.

> Note: IF we close "Settings" application from home screen then we are able
> to turn on and turn off BT.

Yes, this is the same as we observed. Moreover, only Bluetooth switch was frozen but we could still go back to the previous page to access other items in Settings.

I have recorded a short video of the final status of the Settings app. Please send me a e-mail if you need to confirm if we do reproduce the same problem as yours.

Greg, could you assist with providing more information for us? The following information would be very helpful:

* logcat (with my debug log patch applied)
* video (or at least a screenshot. Just want to make sure we didn't reproduce in the wrong way.)
* frequency of occurrence
Flags: needinfo?(ggrisco)
Attached patch debug-log.patchSplinter Review
Greg, please apply this patch to your codebase so that we can have more logs for debugging. Thank you.
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #7)
> My STR as the following:
> 1. Turn on Wifi, connect to one STA.
> 2. Turn on Bluetooth, pair/connect with one Bluetooth headset. Then turn off
> bluetooth headset.
> 3. Turn off/on Bluetooth again (in this way, it makes reconnection been
> initiated).
> Repeat Step 3 a few times (less than 10 times)
> 4. Setting would be frozen (either cannot scan or go back previous Wireless
> setting page)
> 

Just want to clarify. I got similar symptoms this morning, and I think the root cause might be different from the original one. Here's how I reproduce it:
* Device: buri
* Build: m-c, debug build
* STR:
  1. Turn on BT and connect to a device.
  2. Turn off BT immediately.

As a result, an assertion is hit and the then whole device is frozen (since the assertion failed). Failed to toggle BT afterwards and failed to go back to homescreen, too. This issue is master-only, I'll file another bug to fix it.
In the normal case, the switch is disabled when BT is turning on/off, and the user shouldn't and can't toggle BT again during the interval. However, the protection is broken somehow after repeatedly toggling many times. In this case, we did sense the mozsettings-change, but we didn't do anything (since we were still running the last operation). As a result, the state of UI is inconsistent with the real status of BT, and the switch is frozen then because it waits for an event which should be fired when the operation is accomplished.
Gina

Per comment 16 you say it reproduces on m-c alone. Is this reproducible on 1.3 branch? If not, we'll not block on this for 1.3.
Flags: needinfo?(gyeh)
(In reply to Preeti Raghunath(:Preeti) from comment #18)
> Gina
> 
> Per comment 16 you say it reproduces on m-c alone. Is this reproducible on
> 1.3 branch? If not, we'll not block on this for 1.3.

Preeti,

Ben and I can reproduce on Open C which is using FxOS 1.3. Please see comment 14 for more information.
(In reply to Preeti Raghunath(:Preeti) from comment #18)
> Gina
> 
> Per comment 16 you say it reproduces on m-c alone. Is this reproducible on
> 1.3 branch? If not, we'll not block on this for 1.3.

Preeti, there are two kinds of symptoms as described in bug description (only the BT switch is frozen) and comment 7&16 (the whole device is frozen). I think the root cause for these two symptoms should be different. I'd like to keep this bug for tracking the former one which should block 1.3 and file another bug for the latter to fix it on m-c only.
Flags: needinfo?(gyeh)
Thanks for Shawn's help with investigating at the very first time. After doing many tests with Gina, Ben and Arthur, we just came up with a solution. Ben is going to take over this issue, explaining the root cause and how we will handle this in v1.3.
Assignee: shuang → btian
This patch fixes the problem mentioned in comment 14. The cause is that additional adapteradded event from gecko makes BT on/off state inconsistent between gecko and gaia. The fix removes the sending of additional adapteradded event to maintain BT state consistency.

When user turns BT on/off, gaia notifies gecko and disables UI switch until receiving adapteradded/ondisabled event from gecko, so that gecko gets no more BT state changes during BT enabling/disabling. However in current BT enabling mechanism, gecko sends 2 adapteradded events to gaia and the later one is additional. Therefore if user toggles BT on/off rapidly, the later adapteradded event of previous 'BT on' action would re-enable UI switch disabled by subsequent 'BT off' action, and allow user to turn on BT again in gaia while gecko is still disabling BT. In such case gecko ignores this BT state=on change from gaia and gaia would wait infinitely due to this state inconsistency.
Attachment #8373928 - Flags: review?(echou)
Greg, can you apply the patch in comment 22 and see if it fixes the bug on your device?
Comment on attachment 8373928 [details] [diff] [review]
Patch 1 (v1): Do not send adapteradded event when getting propertychanged of name

Review of attachment 8373928 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #8373928 - Flags: review?(echou) → review+
Component: Gaia::Settings → Bluetooth
blocking-b2g: 1.3? → 1.3+
(In reply to Eric Chou [:ericchou] [:echou] from comment #15)
> Created attachment 8373313 [details] [diff] [review]
> debug-log.patch
> 
> Greg, please apply this patch to your codebase so that we can have more logs
> for debugging. Thank you.

I've been trying this myself without the patch and not exactly able to reproduce it.  I don't see the bluetooth setting getting stuck, but rather I see that the phone is showing that it is connected to headset when a connection does not exist.  Still, I will apply the patch and see if it behaves better.
Flags: needinfo?(ggrisco)
Backed out for Marionette-webapi perma-fail after test_dom_BluetoothManager_enabled.js runs.
https://hg.mozilla.org/integration/b2g-inbound/rev/af7f0f64d659

https://tbpl.mozilla.org/php/getParsedLog.php?id=34493627&full=1&branch=b2g-inbound#error0

08:20:59     INFO -  /builds/slave/test/build/tests/marionette/tests/dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js, runTest (marionette_test.MarionetteJSTestCase) ... ok
08:21:00     INFO -  START LOG:
08:21:00     INFO -  INFO TEST-START: /builds/slave/test/build/tests/marionette/tests/dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js Tue Feb 11 2014 11:20:54 GMT-0500 (EST)
08:21:00     INFO -  INFO navigator.mozBluetooth is available Tue Feb 11 2014 11:20:55 GMT-0500 (EST)
08:21:00     INFO -  INFO Original 'bluetooth.enabled' is false Tue Feb 11 2014 11:20:55 GMT-0500 (EST)
08:21:00     INFO -  INFO Testing 'bluetooth.enabled' => true Tue Feb 11 2014 11:20:55 GMT-0500 (EST)
08:21:00     INFO -  INFO   Examine results [null,true] Tue Feb 11 2014 11:20:56 GMT-0500 (EST)
08:21:00     INFO -  INFO Testing 'bluetooth.enabled' => false Tue Feb 11 2014 11:20:56 GMT-0500 (EST)
08:21:00     INFO -  INFO   Examine results [null,false] Tue Feb 11 2014 11:20:59 GMT-0500 (EST)
08:21:00     INFO -  INFO TEST-END: /builds/slave/test/build/tests/marionette/tests/dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js Tue Feb 11 2014 11:20:59 GMT-0500 (EST)
08:21:00     INFO -  END LOG:
08:21:00     INFO -  ----------------------------------------------------------------------
08:21:00     INFO -  Ran 1 test in 6.054s
08:21:00     INFO -  OK
08:21:07     INFO -  /builds/slave/test/build/tests/marionette/tests/dom/bluetooth/tests/marionette/test_dom_BluetoothManager_adapteradded.js, runTest (marionette_test.MarionetteJSTestCase) ... ERROR
08:21:08     INFO -  Traceback (most recent call last):
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runtests.py", line 33, in <module>
08:21:08     INFO -      cli()
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runtests.py", line 28, in cli
08:21:08     INFO -      runner = startTestRunner(runner_class, options, tests)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runtests.py", line 19, in startTestRunner
08:21:08     INFO -      runner.run_tests(tests)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runner/base.py", line 761, in run_tests
08:21:08     INFO -      self.run_test(test)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runner/base.py", line 881, in run_test
08:21:08     INFO -      self.run_test(i["path"], i["expected"])
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runner/base.py", line 905, in run_test
08:21:08     INFO -      results = runner.run(suite)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runner/base.py", line 345, in run
08:21:08     INFO -      return super(B2GMarionetteTextTestRunner, self).run(test)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/runner/base.py", line 264, in run
08:21:08     INFO -      test(result)
08:21:08     INFO -    File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
08:21:08     INFO -      return self.run(*args, **kwds)
08:21:08     INFO -    File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
08:21:08     INFO -      test(result)
08:21:08     INFO -    File "/usr/lib/python2.7/unittest/case.py", line 391, in __call__
08:21:08     INFO -      return self.run(*args, **kwds)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 200, in run
08:21:08     INFO -      self.cleanTest()
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 279, in cleanTest
08:21:08     INFO -      self._deleteSession()
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 291, in _deleteSession
08:21:08     INFO -      self.marionette.delete_session()
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 741, in delete_session
08:21:08     INFO -      response = self._send_message('deleteSession', 'ok')
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 598, in _send_message
08:21:08     INFO -      response = self.client.send(message)
08:21:08     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/client.py", line 97, in send
08:21:08     INFO -      raise IOError("%s: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors." % str(e))
08:21:08     INFO -  IOError: [Errno 32] Broken pipe: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors.
08:21:08     INFO -  Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method MarionetteTestRunner.cleanup of <__main__.MarionetteTestRunner object at 0x13e5950>> ignored
08:21:08     INFO -  Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method MozHttpd.stop of <mozhttpd.mozhttpd.MozHttpd object at 0x1427350>> ignored
08:21:08    ERROR - Return code: 1
(In reply to Greg Grisco from comment #27)
> (In reply to Eric Chou [:ericchou] [:echou] from comment #15)
> > Created attachment 8373313 [details] [diff] [review]
> > debug-log.patch
> > 
> > Greg, please apply this patch to your codebase so that we can have more logs
> > for debugging. Thank you.
> 
> I've been trying this myself without the patch and not exactly able to
> reproduce it.  I don't see the bluetooth setting getting stuck, but rather I
> see that the phone is showing that it is connected to headset when a
> connection does not exist.  Still, I will apply the patch and see if it
> behaves better.

I applied the patch and it seems to be a better experience.  The headset connects and disconnects as expected and the UI seems to display the state accurately.
Target Milestone: --- → 1.4 S1 (14feb)
(In reply to Greg Grisco from comment #29)
> (In reply to Greg Grisco from comment #27)
> > (In reply to Eric Chou [:ericchou] [:echou] from comment #15)
> > > Created attachment 8373313 [details] [diff] [review]
> > > debug-log.patch
> > > 
> > > Greg, please apply this patch to your codebase so that we can have more logs
> > > for debugging. Thank you.
> > 
> > I've been trying this myself without the patch and not exactly able to
> > reproduce it.  I don't see the bluetooth setting getting stuck, but rather I
> > see that the phone is showing that it is connected to headset when a
> > connection does not exist.  Still, I will apply the patch and see if it
> > behaves better.
> 
> I applied the patch and it seems to be a better experience.  The headset
> connects and disconnects as expected and the UI seems to display the state
> accurately.

Great. We'll re-land the patch tomorrow. Thanks.
We could reproduce the test case failure which was mentioned in comment 28. In fact it's totally unexpected. Ben, Jamin and I have spent some time on finding the root cause today. Now we have a solution to avoid the test case failure but it's more like a workaround (it can still solve the original problem). Since this is an urgent issue, we'll spend another day on investigating but will definitely upload the patch anyway tomorrow even if we still can't figure it out.
The test failure results from static function re-entrancy that crashes gecko. Attached patch introduces an additional lock to protect the static function. We've verified it with marionette tests on try: https://tbpl.mozilla.org/?tree=Try&rev=bcf8952d4e4c. Note this patch avoids crash but the exact cause remains unclear. I'll file another bug to track and solve on central.
Attachment #8373928 - Attachment is obsolete: true
Attachment #8375382 - Flags: review?(echou)
Comment on attachment 8375382 [details] [diff] [review]
Patch 1 (v2): Do not send additional adapteradded event and add lock for GetProperty function

Review of attachment 8375382 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8375382 - Flags: review?(echou) → review+
https://hg.mozilla.org/mozilla-central/rev/2dfe13ba3ab8
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
1.3 blockers no longer have auto-approval to land. Please request b2g28 approval on the patch for uplift.
Comment on attachment 8375382 [details] [diff] [review]
Patch 1 (v2): Do not send additional adapteradded event and add lock for GetProperty function

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #):
  None
User impact if declined:
  BT on/off UI switch becomes inactive if user toggles BT on/off rapidly.
Testing completed:
  - Verified rapid BT on/off toggles on OpenC.
  - Passed marionette-webapi test and builds on all platform on try:
    https://tbpl.mozilla.org/?tree=Try&rev=bcf8952d4e4c
Risk to taking this patch (and alternatives if risky):
  After user toggles BT on, the UI switch take ~1 sec delay to return enabled for next toggle. The delay prevents immediate BT toggle-off while gecko is still enabling BT.
String or UUID changes made by this patch:
  None
Attachment #8375382 - Flags: approval-mozilla-b2g28?
Attachment #8375382 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
(In reply to Ben Tian [:btian] from comment #32)
> Note this patch avoids
> crash but the exact cause remains unclear. I'll file another bug to track
> and solve on central.
Opened bug 972249 to track.
Flags: in-moztrap?
Whiteboard: [CR 610008] → [caf priority: p2][CR 610008]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: