Closed Bug 1072802 Opened 10 years ago Closed 10 years ago

[Bluetooth][NFC][KK] OS crashed when try to pair NFC headset via NFC

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.1+, firefox33 wontfix, firefox34 fixed, firefox35 fixed, b2g-v2.1 verified, b2g-v2.2 verified)

VERIFIED FIXED
2.1 S6 (10oct)
blocking-b2g 2.1+
Tracking Status
firefox33 --- wontfix
firefox34 --- fixed
firefox35 --- fixed
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: ashiue, Assigned: yrliou)

Details

(Keywords: crash)

Crash Data

Attachments

(3 files, 1 obsolete file)

Attached file crash.log
Gaia-Rev        a7e53d661a1329d6813bc6104555b57dbb5e4d85
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-aurora/rev/84d2b47c102d
Build-ID        20140924160202
Version         34.0a2

STR:
1. Enable NFC, disable BT
2. Tap NFC headset
3. Select turn on BT and connect to the headset

Expect result:
Device auto connect to the headset successfully

Actual result:
Step 3 could not connect the headset successfully, and repeat step 3 multiple times, OS crashed.
After device reboot from crash, the NFC headset connected but did not show on paired devices area in Settings->Bluetooth.
[Blocking Requested - why for this release]:
crashed issue
blocking-b2g: --- → 2.1?
QA Whiteboard: [COM=Bluetooth]
Crash Signature: 42ae5eb4-ab3d-4bcb-a86d-c1a652140925
Attached file crash_report.zip
Assignee: nobody → echou
Triage: blocking.
blocking-b2g: 2.1? → 2.1+
Adding to qawanted to see if we can reproduce this on the latest 2.1 build. If so lets get a branch check as well.
Keywords: qawanted
Target Milestone: --- → 2.1 S6 (10oct)
Delegate to Jocelyn. Thanks, Jocelyn!
Assignee: echou → joliu
 0 	libxul.so 	mozilla::dom::bluetooth::BluetoothReply::MaybeDestroy(mozilla::dom::bluetooth::BluetoothReply::Type) 	/builds/slave/b2g_m-aurora_flm-kk_eng_ntly-0/build/objdir-gecko/ipc/ipdl/BluetoothTypes.cpp:524
1 	libxul.so 	mozilla::dom::bluetooth::BluetoothReply::~BluetoothReply() 	/builds/slave/b2g_m-aurora_flm-kk_eng_ntly-0/build/objdir-gecko/ipc/ipdl/BluetoothTypes.cpp:591
2 	libxul.so 	mozilla::dom::bluetooth::BluetoothReplyRunnable::SetReply(mozilla::dom::bluetooth::BluetoothReply*) 	/builds/slave/b2g_m-aurora_flm-kk_eng_ntly-0/build/objdir-gecko/dist/include/nsAutoPtr.h:40
3 	libxul.so 	mozilla::dom::bluetooth::DispatchBluetoothReply(mozilla::dom::bluetooth::BluetoothReplyRunnable*, mozilla::dom::bluetooth::BluetoothValue const&, nsAString_internal const&) 	dom/bluetooth/bluedroid/BluetoothUtils.cpp
4 	libxul.so 	ReplyStatusError 	dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp
5 	libxul.so 	CreateBondResultHandler::OnError(mozilla::dom::bluetooth::BluetoothStatus) 	dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp
6 	libxul.so 	mozilla::dom::bluetooth::BluetoothInterfaceRunnable1<mozilla::dom::bluetooth::BluetoothA2dpResultHandler, void, mozilla::dom::bluetooth::BluetoothStatus, mozilla::dom::bluetooth::BluetoothStatus>::Run() 	dom/bluetooth/bluedroid/BluetoothInterface.cpp
7 	libxul.so 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
Maybe the current code has Thomas's patch, so we cannot reproduce this bug, and Alison tested the build without Thomas's patch, so she can hit the strange bug.
It looks strange to me that there's an A2dp result but we already failed to pair with the device.
We suspect that Bug 1065999 caused unexpected behavior and later leads to this crash.
I have applied the patch in Bug 1065999 and repeat step3 for 50+ time but couldn't reproduce the crash.
Alison, could you apply the patch or try after this patch landed to see if you can still reproduce this crash?

Thanks,
Jocelyn
Flags: needinfo?(ashiue)
(In reply to jocelyn [:jocelyn] from comment #10)
> We suspect that Bug 1065999 caused unexpected behavior and later leads to
> this crash.
> I have applied the patch in Bug 1065999 and repeat step3 for 50+ time but
> couldn't reproduce the crash.
> Alison, could you apply the patch or try after this patch landed to see if
> you can still reproduce this crash?
> 
> Thanks,
> Jocelyn

Thank you, Jocelyn.

Alison,

As Jocelyn mentioned in comment 10, bug 1065999 might be the root cause of this one. Since the 2.1 patch for bug 1065999 was just landed, please give it another try tomorrow. Thank you.
Per offline discussion with Alison, I can reproduce this bug with today's aurora pvt build.
Will continue to find out the root cause of this crash.
Thanks, Alison.
Flags: needinfo?(ashiue)
STR to reproduce this crash:
  (1) Tab NFC headset, click Yes to connect
  (2) Immediately tab NFC headset again, click Yes -> result in crash.
The current code was created from the original implementation; so I wonder if there are older bugs that describe similar problems.

I looked briefly at the methods and here are some ideas:

There should be a check in |OnError| for mRunnable to be present in the array. If not, it's an error.

In |BondStateChangeNotification|, the handling of runnables expects certain elements in |sBondingRunnableArray| to be present in the first slot. It looks like the code expects to process multiple devices separately from each other. If you pair with devices (1st) A and (2nd) B, but B repeats first; |BondStateChangeNotification| will send the reply for A.

I think a fix could be to replace the arrays |sBondingRunnableArray| and |sUnbondingRunnableArray| with hash tables. The key would be the remote device's address. That should solve the ordering problems.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #14)
> The current code was created from the original implementation; so I wonder
> if there are older bugs that describe similar problems.
> 
> I looked briefly at the methods and here are some ideas:
> 
> There should be a check in |OnError| for mRunnable to be present in the
> array. If not, it's an error.
> 
> In |BondStateChangeNotification|, the handling of runnables expects certain
> elements in |sBondingRunnableArray| to be present in the first slot. It
> looks like the code expects to process multiple devices separately from each
> other. If you pair with devices (1st) A and (2nd) B, but B repeats first;
> |BondStateChangeNotification| will send the reply for A.
> 
> I think a fix could be to replace the arrays |sBondingRunnableArray| and
> |sUnbondingRunnableArray| with hash tables. The key would be the remote
> device's address. That should solve the ordering problems.

Thanks for your input, Thomas.
IMHO, it seems it's more like the runnable has been released during IPC process like Jamin mentioned in Bug 1050068.
The program sequence I observed is:
  1) CreateBondResultHandlerA is constructed
  2) CreateBondResultHandlerA::CreateBond (the success reply, which I overwrite one in BluetoothServiceBluedroid.cpp)
  3) CreateBondResultHandlerB is constructed
  4) CreateBondResultHandlerB::OnError ------> Crashed, runnable is existed in the array before we remove it.

If I use nsRefPtr<BluetoothReplyRunnable> instead of raw pointer in CreateBondResultHandler,
the crash cannot be reproduced.
The program sequence will be
  1) - 4) same as above, but no crash
  5) BondState changed to Bonded, then we reply with RunnableA, which is correct in this case.

I'm thinking of uploading a patch which use nsRefPtr<BluetoothReplyRunnable> in CreateBondResultHandler and RemoveBondResultHandler to fix the crash.
Does this make sense to you?
Flags: needinfo?(tzimmermann)
Use nsRefPtr to make sure the runnable won't be released before the reply runnable has been done on main thread.
Since you debugged the code, it probably makes sense. :) But I'm surprised to see this bug happening, as I thought bug 1050068 was strictly about bluetooth2.

I never liked the whole ref-counting and array approach. Maybe when bluetooth2 landed and things have stabilized, we can improve this code a bit.
Flags: needinfo?(tzimmermann)
Comment on attachment 8500413 [details] [diff] [review]
Bug 1072802: Use nsRefPtr to avoid ReplyRunnable being released during IPC process.

This patch is for m-c and v2.1.
Thomas, could you help review this patch?
We can later figure out how to refine current mechanism as you mentioned in Comment 17.
Attachment #8500413 - Flags: review?(tzimmermann)
Comment on attachment 8500413 [details] [diff] [review]
Bug 1072802: Use nsRefPtr to avoid ReplyRunnable being released during IPC process.

Review of attachment 8500413 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, a lot. As I mentioned before: in the long run I'd rather like to improve the overall design. But since this patch fixes a serious problem now, we should land it.
Attachment #8500413 - Flags: review?(tzimmermann) → review+
Approval Request Comment
[Feature/regressing bug #]: Bug 1072802
[User impact if declined]: May result in b2g crash when user try to connect NFC multiple times in a short period.
[Describe test coverage new/current, TBPL]: Manually tested.
[Risks and why]: Low risk since it only use nsRefPtrs to protect the runnable being released during IPC process.
[String/UUID change made/needed]: None
Attachment #8500413 - Attachment is obsolete: true
Attachment #8500961 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/0db99022584b
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Keywords: qawanted
Comment on attachment 8500961 [details] [diff] [review]
[v2.1 & master] Bug 1072802: Use nsRefPtr to avoid ReplyRunnable being released during IPC process. r=tzimmermann

Requesting help with qa verification to make sure we do not crash once this lands on 2.1
Attachment #8500961 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Hi Alison,

Could you verify is the crash still occurring on 2.1 with the patch applied?

Thanks,
Jocelyn
Flags: needinfo?(ashiue)
With the patch https://bugzilla.mozilla.org/attachment.cgi?id=8500961, the phone won't crash, but it keeps showing the connection request page when the headset approaches 
http://youtu.be/76BroGumzVI
Flags: needinfo?(ashiue)
Thanks for confirming that the phone won't crash with the patch applied.
The behavior in the video is the same as the original one. (without the patch)
I quickly take a look into this, it seems system app will show this dialog when the headset approaches without checking current status and not quite related to Bluetooth component.
We could open a new bug to track this if this is not the expected behavior.
Verified on
[2.1]
Gaia-Rev        7e2ef41d3ac98757acaf490b5413fb42061ad3e6
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-aurora/rev/75ebb70f8b38
Build-ID        20141009000203
Version         34.0a2

[2.2]
Gaia-Rev        2665e714beea5dc433862ca6bb8d2b47ffe2f2d1
Gecko-Rev       https://hg.mozilla.org/mozilla-central/rev/4bad24a306b2
Build-ID        20141008160205
Version         35.0a1
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.