Closed Bug 976883 Opened 6 years ago Closed 6 years ago

[Sora][BT] Firefox system crash after pair and unpair 15 times.

Categories

(Firefox OS Graveyard :: Bluetooth, defect, P1, critical)

defect

Tracking

(blocking-b2g:1.3+, firefox28 wontfix, firefox29 wontfix, firefox30 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

RESOLVED FIXED
1.4 S3 (14mar)
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: sync-1, Assigned: echou)

References

()

Details

(Keywords: crash, Whiteboard: [b2g-crash][osrestartcrash])

Attachments

(9 files)

DEFECT DESCRIPTION:
  Firefox system crash after pair and unpair about 15 times.
 
  REPRODUCING PROCEDURES:
  1. Active BT;
  2. Pair and unpair with other device for about 15 tiems;
  3. Come across system crash and request sent report to mozilla.--KO
 
  EXPECTED BEHAVIOUR:
  Should come across system crash.
Attached file log1
Attached file Firefox OS crashed log
(In reply to sync-1 from comment #2)
> Created attachment 8381826 [details]
> Firefox OS crashed log

"service 'xxx' died" is the root reason for this issue.
It's more possible relative with platform, I think.
Jamin will take a first look.
Component: Gaia::Bluetooth File Transfer → Bluetooth
Assignee: nobody → jaliu
Quick update: Sora uses FxOS 1.3 / JB / BlueZ. Jamin will analyze the attached log and see if he can reproduce on Buri first (FxOS 1.3 / ICS / BlueZ).

Reporter, could you attach the hcidump log for further investigation?
blocking-b2g: --- → 1.3?
Keywords: crash
Whiteboard: [b2g-crash]
Severity: normal → critical
Whiteboard: [b2g-crash] → [b2g-crash][osrestartcrash]
Hi reporter,

Thank you for your report.
I can't reproduce this bug on my device Buri with v1.3_last.

I have few questions for reporter.
I've watched the attached videos and I noticed there is no device icon in BT setting page.

1. Could you please tell me what kind of BT device you try to pair to ? (headset?, phone?, notebook?)

   Would it be possible for you to provide the 'class' of the BT device ?
   P.S. You can find the "class of device" by checking the "classes file" in your FxOS phone.
        The path of classes list: /data/misc/bluetoothd/$YOUR_BT_ADDRESS/classes
        The path name list: /data/misc/bluetoothd/$YOUR_BT_ADDRESS/names
 
        For example:
        # adb shell cat /data/misc/bluetoothd/0C:BD:51:20:A2:0E/classes
          B4:98:42:7E:0A:58 0x58020c
        # adb shell cat /data/misc/bluetoothd/0C:BD:51:20:A2:0E/names
          B4:98:42:7E:0A:58 Sony_Headset_A

        ---> The class of 'Sony_Headset_A' is 0x58020c.

2. Could you please provide the info. of the build version you used?
   BuildID: YYYYMMDDHHMMSS (Settings App -> Device Information -> More Information -> Build Identifier)
   
   Gecko Version: [hg changeset hash] or [git commit hash].

   Thank you.
Flags: needinfo?(sync-1)
Is it related to bug 828860?
It really looks like connection made with LE devices.
Seeing this bug, I just suspect this bug is related to LE.
Which device that you tried to pair with?


< HCI Command: LE Create Connection (0x08|0x000d) plen 25
    bdaddr C2:40:4F:F5:FA:C7 type 1
> HCI Event: Command Status (0x0f) plen 4
    LE Create Connection (0x08|0x000d) status 0x00 ncmd 1
< HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0
> HCI Event: Command Complete (0x0e) plen 4
    LE Create Connection Cancel (0x08|0x000e) ncmd 1
> HCI Event: LE Meta Event (0x3e) plen 19
    LE Connection Complete
      status 0x02 handle 0, role master
< HCI Command: LE Create Connection (0x08|0x000d) plen 25
    bdaddr CC:60:DD:53:22:F5 type 1
> HCI Event: Command Status (0x0f) plen 4
    LE Create Connection (0x08|0x000d) status 0x00 ncmd 1
> HCI Event: LE Meta Event (0x3e) plen 19
    LE Connection Complete
      status 0x00 handle 2, role master
< ACL data: handle 2 flags 0x00 dlen 11
    L2CAP(d): cid 0x0006 len 7 [psm 0]
> HCI Event: Disconn Complete (0x05) plen 4
    status 0x00 handle 2 reason 0x3e
bug reporter:
We need to minidump to check this bug. But I seriously suspect this bug is related to bug 828860 based on hcidump. Can you attached minidump?

Michael,
See Comment 39 that you posted before, https://bugzilla.mozilla.org/show_bug.cgi?id=828860#c39, can you share with partner workaround see if the problem still happen? Although we have not gotta minidump it's the same bug.
Flags: needinfo?(mvines)
Hi reporter,

If you try to make a minidump for the crash, could you save hcidump for the entire procedure as well ? (i.e. from BT Enable to BT Pairing)
That would be very useful.

Best regards
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #10)
> Michael,
> See Comment 39 that you posted before,
> https://bugzilla.mozilla.org/show_bug.cgi?id=828860#c39, can you share with
> partner workaround see if the problem still happen? Although we have not
> gotta minidump it's the same bug.

https://www.codeaurora.org/cgit/quic/la/platform/vendor/qcom/msm8610/commit/?h=b2g_jb_3.2&id=241bd7f147edf4fba447f007ff36055d3fe342b7
Flags: needinfo?(mvines)
Had a go at this one on my latest sora build, with RIL from 
ro.build.date=Mon Feb 17 10:59:04 CST 2014

I'm curious as to how fast you are doing the pair/unpair cycles.  If I'm allowing up to a minute in between each, then I'm not seeing the issue.
Information of our code is "FFOS1.3 Mozilla build ID: 20140208004002".
The device I try to pairing name is "Boom Band" and class is 0x000000. "Boom Band" is a customized bluetooth device for TCL-Android phone by Baidu.

Is the minidump log is in /data/b2g/mozilla/$BYPHONE.default/minidumps? If yes, i will provide latter.
(In reply to Kunny Liu from comment #14)
> Information of our code is "FFOS1.3 Mozilla build ID: 20140208004002".
> The device I try to pairing name is "Boom Band" and class is 0x000000. "Boom
> Band" is a customized bluetooth device for TCL-Android phone by Baidu.
Yes. It looks like Boom Band is 'LE' device, which at least match on my Comment 10.
Based on Comment 12, I believe bluez LE support has been disabled already.
So it might be something else cause crash.

> 
> Is the minidump log is in /data/b2g/mozilla/$BYPHONE.default/minidumps? If
> yes, i will provide latter.
Ya. But please use minidump_stackwalk to resolve all the symbols, since we don't have your symbol files.
Or you can directly run gdb and attach b2g process.
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #15)
> Based on Comment 12, I believe bluez LE support has been disabled already.
> So it might be something else cause crash.

It would be good to confirm that the patch in comment 12 has actually been applied though as Sora probably has a different device/... board .mk files.
Flags: needinfo?(sync-1)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #16)
> (In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #15)
> > Based on Comment 12, I believe bluez LE support has been disabled already.
> > So it might be something else cause crash.
> 
> It would be good to confirm that the patch in comment 12 has actually been
> applied though as Sora probably has a different device/... board .mk files.

I have check our code in ./device/qcom/msm8610/msm8610.mk :

# Bluetooth configuration files
PRODUCT_COPY_FILES += \
    system/bluetooth/data/main.conf:system/etc/bluetooth/main.conf \
    system/bluetooth/data/audio.conf:system/etc/bluetooth/audio.conf

I think this patch has been included.
Shawn

Is this related to: https://bugzilla.mozilla.org/show_bug.cgi?id=976458?
Flags: needinfo?(shuang)
I can't get the minidump file because the directory("/data/b2g/mozilla/<path_to_profile_dir>/minidumps") without any file after crash.
(In reply to Preeti Raghunath(:Preeti) from comment #18)
> Shawn
> 
> Is this related to: https://bugzilla.mozilla.org/show_bug.cgi?id=976458?
No. This bug is not related to bug 976458.
Flags: needinfo?(shuang)
(In reply to Kunny Liu from comment #19)
> Created attachment 8382795 [details]
> hcidump&adb log  20140227
> 
> I can't get the minidump file because the
> directory("/data/b2g/mozilla/<path_to_profile_dir>/minidumps") without any
> file after crash.
Two actions we can do here:
1. Get a 'Boom Band' BLE device and we can test ourselves. But I'm not sure it's available for us to buy.
2. Does anyone team member from the reporter's team know how to attach gdb to get backtrace or know how to get minidump?
You can check this article: http://tech.mozilla.com.tw/posts/2470/%E9%82%84%E5%8E%9F-b2g-crash-%E6%A1%88%E7%99%BC%E7%8F%BE%E5%A0%B4
Flags: needinfo?(sync-1)
Hi Kunny -

For attaching gdb, please refer to the following steps:

 $ cd your-source-code-folder
 $ adb shell b2g-ps(this might change since I am not sure if QCT has wrapped this script to something else)
 # Find the pid of the app that is going to crash in the reproduce steps
 $ ./run-gdb.sh attach <pid>
 (gdb) continue
 # reproduce the issue to crash the handset.
 (gdb) bt


Dear Shawn, please kindly correct me if the above information is not right.
Flags: needinfo?(shuang)
Flags: needinfo?(liukun)
I guess QC build did not include run-gdb.sh and they have different build environment. Overall, the steps are correct.
Flags: needinfo?(shuang)
We need more information to know if this happens with other bluetooth headsets.
Does this happen with any other BT devices? Need more info to move ahead
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #23)
> I guess QC build did not include run-gdb.sh and they have different build environment. 

Please use |rungdb| instead.  The rest looks the same.
(In reply to Michael Vines [:m1] [:evilmachines] from comment #26)
> (In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #23)
> > I guess QC build did not include run-gdb.sh and they have different build environment. 
> 
> Please use |rungdb| instead.  The rest looks the same.

I can't find “rungdb” program in my project or b2g phone.
I try to get the minidump file and will provide it latter.
Flags: needinfo?(liukun)
(In reply to Kunny Liu from comment #27)
> I can't find “rungdb” program in my project or b2g phone.

$ . build/envsetup.sh
$ lunch ...
$ rungdb attach <pid>
Flags: needinfo?(sync-1)
Attached file minidump log
Attached file another minidump log
Can you parse minidump with symbols? Since we don't have your symbol files. I cannot see the backtrace.
Flags: needinfo?(liukun)
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #31)
> Can you parse minidump with symbols? Since we don't have your symbol files.
> I cannot see the backtrace.

Yes, I have run this command:
"./minidump_stackwalk /local/source/ffos/minidump/7029e7f1-08fc-7745-32e4a005-307f1da3.dmp /local/source/ffos/soul3.5/out/target/product/msm8610/obj/objdir-gecko/dist/crashreporter-symbols/ > dmp_result2.txt"
Flags: needinfo?(liukun)
Attached file dmp_result4.txt
into the b2g source code directory:
1. ". ./build/envsetup.sh"
2. "lunch msm8610-eng"
3. "export B2G_NOOPT=1"
4. "make -j8"
5. "make buildsymbols"

download images to phone.
run cmd: "MOZ_CRASHREPORTER="1" /system/bin/b2g.sh" in adb shell.

catch minidump log and parse it by: "./minidump_stackwalk /local/source/ffos/minidump/1b090a4a-e7cd-8923-13a409b1-7b4ba113.dmp /local/source/ffos/soul3.5/out/target/product/msm8610/obj/objdir-gecko/dist/crashreporter-symbols/ > dmp_result4.txt"

i think above steps is right.
Kunny,
Thanks a lot. This log is very helpful. I will update as soon as possible.
Michael,
It looks like patch in https://www.codeaurora.org/cgit/quic/la/platform/vendor/qcom/msm8610/commit/?h=b2g_jb_3.2&id=241bd7f147edf4fba447f007ff36055d3fe342b7, it won't stop LE operation from stack level.

What we saw LeConnParams property changed sent back from bt stack.

See:
https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluez/tree/src/device.c?h=b2g/jb_mr1_rb2.17#n1320
device_update_le_conn_params(device, interval, latency, timeout).


It was reported from kernel hci_event.c function hci_le_conn_complete_evt or hci_le_conn_update_complete_evt.

There are two interfaces for bluez to control hci interface.
Since the new bluez goes through mgmtops.c instead of hciopts, in plugins/mgmtopts.c the function mgmt_enable_le does not disable le actually. The old way (hciopts) actually implemented enable/disable LE function.

See:
https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluez/tree/plugins/mgmtops.c?h=b2g/jb_mr1_rb2.17#n2192
https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluez/tree/plugins/hciops.c?h=b2g/jb_mr1_rb2.17#n3506

Since we finally know the root cause, we can fix from gecko side. However, this is a just kindly reminder that mgmt interface cannot disable LE.
Flags: needinfo?(mvines)
Ah, thanks for the info.  We can easily change hciops_enable_le() to return -ENOSYS as well, just like mgmt_enable_le(), as well.   But regardless seems like Gecko shouldn't crash due as well.  LMK
Flags: needinfo?(mvines)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #36)
> Ah, thanks for the info.  We can easily change hciops_enable_le() to return
> -ENOSYS as well, just like mgmt_enable_le(), as well.   But regardless seems
> like Gecko shouldn't crash due as well.  LMK

https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluez/tree/plugins/hciops.c?h=b2g/jb_mr1_rb2.17#n3506

But this file has not been compiled into the system.img. I think change hciops_enable_le() to return -ENOSYS is useless for our project.
(In reply to Michael Vines [:m1] [:evilmachines] from comment #36)
> Ah, thanks for the info.  We can easily change hciops_enable_le() to return
> -ENOSYS as well, just like mgmt_enable_le(), as well.   But regardless seems
> like Gecko shouldn't crash due as well.  LMK
Yes. This is a gecko bug for sure. I'm not saying we won't fix this, I just tried to explain the problem. Indeed, we made a mistake.
(In reply to Kunny Liu from comment #37)
> https://www.codeaurora.org/cgit/quic/la/platform/external/bluetooth/bluez/
> tree/plugins/hciops.c?h=b2g/jb_mr1_rb2.17#n3506
> 
> But this file has not been compiled into the system.img. I think change
> hciops_enable_le() to return -ENOSYS is useless for our project.
Current bluez uses mgmtops interface, not hciops. mgmt shall send hci command to enable/disable LE, instead of return ENOSYS.
	cp.simul = (dev->features[6] & LMP_LE_BREDR) ? 0x01 : 0x00;

	if (hci_send_cmd(dev->sk, OGF_HOST_CTL,
				OCF_WRITE_LE_HOST_SUPPORTED,
				WRITE_LE_HOST_SUPPORTED_CP_SIZE, &cp) < 0)
		return -errno;

Anyway, I don't want to confuse people here, this bug shall be fixed in gecko to handle unknown PropertyChanged event from bluez. Thanks.
* Handle unknown property sent from bt stack properly.

Since Gina won't be able to review for several days, ask for Shawn's help to review.
Assignee: jaliu → echou
Attachment #8385898 - Flags: review?(shuang)
Attachment #8385898 - Flags: feedback?(btian)
Hi Kunny,

Please try the patch I just attached in comment 40 and see if it can solve the issue. Thanks.
Flags: needinfo?(liukun)
Comment on attachment 8385898 [details] [diff] [review]
patch 1: v1: add error handing for unknown property

Review of attachment 8385898 [details] [diff] [review]:
-----------------------------------------------------------------

f=me with comment addressed.

::: dom/bluetooth/bluez/linux/BluetoothDBusService.cpp
@@ +675,4 @@
>    }
>  
>    if (i == aPropertyTypeLen) {
> +    BT_LOGR("unknown property: %s", property);

We should ensure 'property' is non-nullptr here or before.
Attachment #8385898 - Flags: feedback?(btian) → feedback+
(In reply to Ben Tian [:btian] from comment #42)
> Comment on attachment 8385898 [details] [diff] [review]
> patch 1: v1: add error handing for unknown property
> 
> Review of attachment 8385898 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> f=me with comment addressed.
> 
> ::: dom/bluetooth/bluez/linux/BluetoothDBusService.cpp
> @@ +675,4 @@
> >    }
> >  
> >    if (i == aPropertyTypeLen) {
> > +    BT_LOGR("unknown property: %s", property);
> 
> We should ensure 'property' is non-nullptr here or before.

property must be non-null since it passes the if-statement:

  if (dbus_message_iter_get_arg_type(&aIter) != DBUS_TYPE_STRING) {
    return false;
  }
ok, I will merge this patch to our project and test.
Flags: needinfo?(liukun)
Comment on attachment 8385898 [details] [diff] [review]
patch 1: v1: add error handing for unknown property

Review of attachment 8385898 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good to me, but need to wait for confirmation from Kunny.
Attachment #8385898 - Flags: review?(shuang) → review+
I have tried about 50 times to pair with "Boom Band" and can't reproduce this issue.
Thanks for your work!
(In reply to Kunny Liu from comment #46)
> I have tried about 50 times to pair with "Boom Band" and can't reproduce
> this issue.

Great. Thanks. :)
blocking-b2g: 1.3? → 1.3+
https://hg.mozilla.org/mozilla-central/rev/ab45561e141f
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.4 S3 (14mar)
Attached patch patch1: for v1.3Splinter Review
* 1.3-specific patch
Please request approval-mozilla-b2g28 on this patch when this is ready for uplift.
Comment on attachment 8386508 [details] [diff] [review]
patch1: for v1.3

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): not a regression
User impact if declined: random crash during pair/unpair (low occurrence rate)
Testing completed: m-c and manual test (by me and Kunny, see comment 46)
Risk to taking this patch (and alternatives if risky): very low since the patch handles an unexpected result.
String or UUID changes made by this patch: none
Attachment #8386508 - Flags: approval-mozilla-b2g28?
Comment on attachment 8386508 [details] [diff] [review]
patch1: for v1.3

Approving the low risk patch and requesting verification once this lands on 1.3
Attachment #8386508 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Flags: in-moztrap?
Flags: in-moztrap? → in-moztrap+
You need to log in before you can comment on or make changes to this bug.