Closed Bug 989976 Opened 10 years ago Closed 10 years ago

[bluedroid]Hit SIGBUS on unaligned access while doing AdapterPropertiesChangeCallback

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
2.0 S4 (20june)

People

(Reporter: shawnjohnjr, Assigned: shawnjohnjr)

References

Details

Attachments

(2 files, 4 obsolete files)

#0  0x41dbb680 in AdapterPropertiesCallback (aStatus=BT_STATUS_SUCCESS, aNumProperties=1, aProperties=0x4568f9ea)
    at ../../../gecko/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp:278
#1  0x4604462c in execute_storage_request (event=<optimized out>, p_param=<optimized out>)
    at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:981
#2  0x4604492c in btif_context_switched (p_msg=0x4568f9d4) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:177
#3  btif_task (params=<optimized out>) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:308
#4  0x460625d8 in gki_task_entry (params=<optimized out>) at external/bluetooth/bluedroid/gki/./ulinux/gki_ulinux.c:154
#5  0x4018f3dc in __thread_entry (func=0x46062579 <gki_task_entry>, arg=0x46115d60, tls=0x46315f00)
    at bionic/libc/bionic/pthread.c:204
#6  0x4018eac8 in pthread_create (thread_out=0x46315f00, attr=0xbed5e458, start_routine=0x46062579 <gki_task_entry>, 
    arg=<optimized out>) at bionic/libc/bionic/pthread.c:348
Oh, it's similar with Bug 934922, but now for Bug 986314 - [Flatfish] Support Bluedroid on flatfish device. It's easier to hit this bug than on mako/hammerhead.
I can turn on Bluetooth now, after renaming firmware patch file from /system/vendor/modules/bcm40183b2.hcd to /system/vendor/modules/BCM4330B1.hcd.

Also, for gecko, gecko/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp:277
I changed the statement, which seems to work. I still don't understand why this can make differences. But it works. I still need to figure out the root cause bringing SIGBUS error.

   for (int i = 0; i < aNumProperties; i++) {
-    bt_property_t p = aProperties[i];
+    bt_property_t p;
+    p.type = aProperties[i].type;
+    p.len = aProperties[i].len;
+    p.val = aProperties[i].val;
Getting index 1 is wrong in gdb, since aNumProperties==1. Anyway, query index 0, everything looks like valid.

(gdb) p aProperties[0]
$4 = {type = BT_PROPERTY_ADAPTER_SCAN_MODE, len = 4, val = 0x45679666}
(gdb) p *(bt_scan_mode_t*)(aProperties[0].val)
$8 = BT_SCAN_MODE_CONNECTABLE

Thus, I still don't understand why we got SIGBUS error here.
Assignee: nobody → shuang
Hi,

Just guessing, but maybe |p| and |aProperties[i]| have different layouts in memory? If |p| has trailing bytes that are not in |aProperties[i]|, the original code would probably have tried to copy them and triggered an error.

You could build both versions of the code and check the difference. The original code might have used memcpy, while the fixed version copies individual fields.
(In reply to Shawn Huang [:shuang] [:shawnjohnjr] from comment #3)
> Getting index 1 is wrong in gdb, since aNumProperties==1. Anyway, query
> index 0, everything looks like valid.
> 
> (gdb) p aProperties[0]
> $4 = {type = BT_PROPERTY_ADAPTER_SCAN_MODE, len = 4, val = 0x45679666}
> (gdb) p *(bt_scan_mode_t*)(aProperties[0].val)
> $8 = BT_SCAN_MODE_CONNECTABLE
> 
> Thus, I still don't understand why we got SIGBUS error here.

Just a stab in the dark here.

What's the value of &aProperties[0] ?

I noticed that val isn't a multiple of 4 (pointers tend to be word aligned, but not always). I would expect that &aProperties[0] should be a multiple of 4.

Normally when you do a structure copy, the compiler will optimize it someway. When you access the fields individually, that optimization might not be occuring.

There could be something happening at the HW level. You'd probably need to look at the differences in the assembly code between the 2 versions and see if that gives us any clues.
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → ARM
(In reply to Dave Hylands [:dhylands] from comment #5) 
> Just a stab in the dark here.
> 
> What's the value of &aProperties[0] ?
> 
> I noticed that val isn't a multiple of 4 (pointers tend to be word aligned,
> but not always). I would expect that &aProperties[0] should be a multiple of
> 4.
Program received signal SIGBUS, Bus error.
[Switching to Thread 1987.2007]
0x41dbb680 in AdapterPropertiesCallback (aStatus=BT_STATUS_SUCCESS, aNumProperties=1, aProperties=0x45a4565a)
    at ../../../gecko/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp:278
278	    bt_property_t p = aProperties[i];
(gdb) p &aProperties[0]
$1 = (bt_property_t *) 0x45a4565a
(gdb) p aProperties[0]
$2 = {type = BT_PROPERTY_ADAPTER_SCAN_MODE, len = 4, val = 0x45a45666}
So that is an unaligned address.

Perhaps this may shed some light?
http://jsolano.net/2012/09/06/arm-unaligned-data-access-and-floating-point-in-linux/

What type of processor do you have?
i.e. what does "adb shell cat /proc/cpuinfo" report?

What does "adb shell cat /proc/cpu/alignment" report?
One thing I notice strange on flatfish, in frame 1, |&prop| shall be the same address as aProperties[0].
But aProperties=0x45a459ea, &prop = 0x46322a1c.

Checking macro |HAL_CBACK| inside bluedroid, it just pass the callback with args.

0x41dbb680 in AdapterPropertiesCallback (aStatus=BT_STATUS_SUCCESS, aNumProperties=1, aProperties=0x45a459ea)
    at ../../../gecko/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp:278
278	    bt_property_t p = aProperties[i];
(gdb) f 1
#1  0x4604562c in execute_storage_request (event=<optimized out>, p_param=<optimized out>)
    at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:981
981	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, &prop);
(gdb) p &prop
$1 = (bt_property_t *) 0x46322a1c
> i.e. what does "adb shell cat /proc/cpuinfo" report?
> 
> What does "adb shell cat /proc/cpu/alignment" report?
adb shell cat /proc/cpuinfo
Processor	: ARMv7 Processor rev 3 (v7l)
processor	: 0
BogoMIPS	: 239.71

processor	: 1
BogoMIPS	: 239.71

Features	: swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 idiva idivt 
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xc07
CPU revision	: 3

Hardware	: sun6i
Revision	: 0000
Revision	: 0001
Serial		: 51425216343941439400003523850b06

adb shell cat /proc/cpu/alignment
User:		5
System:		1
Skipped:	0
Half:		1
Word:		0
DWord:		0
Multi:		0
User faults:	4 (signal)
Thanks for Cervantes's help for debugging this part.

It looks like something wrong with code generated from compiler after checking disassemble from gdb.
code flow is:
gecko/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp
 ^
 |
external/bluetooth/bluedroid/main/../btif/src/btif_core.c

He found that when executing |execute_storage_request|,
execute_storage_request (event=<optimized out>, p_param=<optimized out>) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:984
case BTIF_CORE_STORAGE_ADAPTER_READ:
HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, &prop);

We shall expect |&prop| as the same as aProperties[0].


(gdb) stepi
0x45f615bc	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f615be	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f615c0	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f615c2	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f6172a in btif_in_get_adapter_properties () at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:888
888	    HAL_CBACK(bt_hal_cbacks, adapter_properties_cb,
(gdb) 
0x45f6172c	888	    HAL_CBACK(bt_hal_cbacks, adapter_properties_cb,
(gdb) stepi
0x45f61664 in execute_storage_request (event=<optimized out>, p_param=<optimized out>) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:984
984	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, &prop);
(gdb) info reg
r0             0x0	0
r1             0x1	1
r2             0x459459ea	1167350250
r3             0x41cbb609	1103869449

> r2 is 0x459459ea, 


(gdb) stepi
0x45f615bc	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f615be	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f615c0	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f615c2	969	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, p_prop);
(gdb) 
0x45f6172a in btif_in_get_adapter_properties () at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:888
888	    HAL_CBACK(bt_hal_cbacks, adapter_properties_cb,
(gdb) 
0x45f6172c	888	    HAL_CBACK(bt_hal_cbacks, adapter_properties_cb,
(gdb) stepi
0x45f61664 in execute_storage_request (event=<optimized out>, p_param=<optimized out>) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:984
984	            HAL_CBACK(bt_hal_cbacks, adapter_properties_cb, status, 1, &prop);

> As you see 0x45f61664, execute blx r3. However, r2 did not get handle.
> (add r2, sp #620) doesn't execute.

   0x45f6165c <+320>:	movs	r1, #1
   0x45f6165e <+322>:	mov	r0, r6
   0x45f61660 <+324>:	ldr	r3, [r2, #8]
   0x45f61662 <+326>:	add	r2, sp, #620	; 0x26c
   0x45f61664 <+328>:	blx	r3

(gdb) info reg
r0             0x0	0
r1             0x1	1
r2             0x459459ea	1167350250

(gdb) p &prop
$4 = (bt_property_t *) 0x4623ea1c


>Verification: Let's make register r2 correct.

(gdb) set $r2 = $sp + 620
(gdb) info reg
r0             0x0	0
r1             0x1	1
r2             0x4623ea1c	1176758812
r3             0x41cbb609	1103869449
Here r2 shall be 0x4623ea1c, but r2 seems not correct.

(gdb) info reg
r0             0x0	0
r1             0x1	1
r2             0x4623ea1c	1176758812
r3             0x41cbb609	1103869449
r4             0x45ff5950	1174362448
r5             0x45ff59d0	1174362576
r6             0x459459ea	1167350250
r7             0x0	0
r8             0x40101228	1074795048
r9             0xa001	40961
r10            0x6f6cd	456397
r11            0x2	2
r12            0x400b4f80	1074483072
sp             0x4623e7a4	0x4623e7a4
lr             0x45f61667	1173755495
pc             0x41cbb60a	0x41cbb60a <AdapterPropertiesCallback(bt_status_t, int, bt_property_t*)+2>
cpsr           0x20000030	536870960
(gdb) n

Breakpoint 3, AdapterPropertiesCallback (aStatus=BT_STATUS_SUCCESS, aNumProperties=1, aProperties=0x4623ea1c) at ../../../gecko/dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp:270
270	{
(gdb) 
273	  BluetoothValue propertyValue;
(gdb) 
274	  InfallibleTArray<BluetoothNamedValue> props;
(gdb) 
276	  for (int i = 0; i < aNumProperties; i++) {
(gdb) 
277	    BT_LOGR("!!!before crash !!!!!!!!!!!!!!!!!!!!!!!!!!");
(gdb) 
278	    bt_property_t p = aProperties[i];
(gdb) 
285	    BT_LOGR("!!!!after crash !!!!!!!!!!!!!!!!!!!!!!!!!!");
History of debugging for SIGBUS.
It's so strange that we can hit this bug on flatfish easily. I will try to disable btif folder optimization see if there is anything we can do about it.
We are misled by the crash stack. From the stack we thought the commands is BTIF_CORE_STORAGE_ADAPTER_READ and we thought the address of a stack-allocated struct should always be aligned, but actually it is BTIF_CORE_STORAGE_ADAPTER_WRITE, which passes r2 with p_prop:

bt_property_t *p_prop = &(p_req->write_req.prop);

and the result of &(p_req->write_req.prop) is not word-aligned. Shawn, please check where
p_param comes from. Is it something we pass to bluedroid? If so then we need to make it aligned. If it's completely from bluedroid, then I guess we need to work around it.
(In reply to Cervantes Yu from comment #13)
> p_param comes from. Is it something we pass to bluedroid? If so then we need
> to make it aligned. If it's completely from bluedroid, then I guess we need
> to work around it.
Thanks for a lot help, Cervantes. Yes, you're right. backtrace is a big lie.

Program received signal SIGBUS, Bus error.
0x41cbb680 in AdapterPropertiesCallback (aStatus=BT_STATUS_SUCCESS, aNumProperties=1, aProperties=0x459459ea)

aProperties is 0x459459ea.

execute_storage_request (event=<optimized out>, p_param=0x459459e2 "")
    at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:966
966	            BTIF_TRACE_EVENT3("type: %d, len %d, 0x%x", p_prop->type,
(gdb) p &(p_req->write_req.prop)
$1 = (bt_property_t *) 0x459459ea

(gdb) bt
#0  execute_storage_request (event=<optimized out>, p_param=0x459459e2 "")
    at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:966
#1  0x45f619a4 in btif_context_switched (p_msg=0x459459d4) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:177
#2  btif_task (params=<optimized out>) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:308
#3  0x45f7f700 in gki_task_entry (params=<optimized out>) at external/bluetooth/bluedroid/gki/./ulinux/gki_ulinux.c:154
#4  0x400c43dc in __thread_entry (func=0x45f7f6a1 <gki_task_entry>, arg=0x46032d60, tls=0x4623ef00)
    at bionic/libc/bionic/pthread.c:204
#5  0x400c3ac8 in pthread_create (thread_out=0x4623ef00, attr=0xbe8a1458, start_routine=0x45f7f6a1 <gki_task_entry>, 
    arg=<optimized out>) at bionic/libc/bionic/pthread.c:348
#6  0x00000000 in ?? ()

(gdb) f 1
#1  0x45f619a4 in btif_context_switched (p_msg=0x459459d4) at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:177
177	        p->p_cb(p->event, p->p_param);

(gdb) p p->p_param
$4 = 0x459459e2 ""
(gdb) p *(p->p_param)
$5 = 0 '\000'
(gdb) p (btif_storage_req_t*)p->p_param
$9 = (btif_storage_req_t *) 0x459459e2
(gdb) p ((btif_storage_req_t*)p->p_param)->write_req.prop
$10 = {type = BT_PROPERTY_ADAPTER_SCAN_MODE, len = 4, val = 0x459459f6}
(gdb) p &(((btif_storage_req_t*)p->p_param)->write_req.prop)
$11 = (bt_property_t *) 0x459459ea

It comes from btif_context_switched, and it was used for managing different callbacks.
The unaligned address is from btif_task, which is BTIF task handler managing all messages being passed Bluetooth HAL and BTA. Its the internal part of bluedroid.
btif_set_adapter_property->btif_transfer_context()->btif_sendmsg(), after msg send, btif_task will receive the msg and process the request, as Comment 14 described.
So it's a problem not specific to flatfish, but a bug in bluedroid? Then it's better to fix it upstream then work it around here. It's strange that the callback from bluedroid gives you a pointer to a struct and you crash when dereferencing it. memcpy() is actually a workaround. I don't think it a good fix that we replace the access to some_struct[i] with memcpy and calculate the address manually just because it's not aligned. If it's not necessarily aligned, it should return something like void* or uint8_t*.
workaround:
    -bt_property_t p = aProperties[i];
    +bt_property_t p;
    +memcpy(&p, ((char*)aProperties) + i * sizeof(bt_property_t), sizeof(bt_property_t));
The fact that memcpy fixes the problem tells me that its definitely an alignment problem.

You could have coded the memcpy as:

memcpy(&p, &aProperties[i], sizeof(*p));

memcpy doesn't need char * casts.

Basically, you need to work backwords and find out where the misaligned pointer is originating from.

Unless you're using packed structures someplace, its highly suggestive of a bug elsewhere.

new and malloc always return aligned pointers. And all global variables are automatically aligned.

So somebody somewhere is pulling this out of a byte stream or something like that and then passing on the unaligned pointer (or there is a bug someplace that's corrupting the pointer).
From its behavior bluetooth works correctly so I guess it's not corrupted pointer, just unaligned data.
I recompliled bluedroid stack with LOCAL_CFLAGS += -O0 (http://androidxref.com/4.3_r2.1/xref/external/bluetooth/bluedroid/main/Android.mk#116). backtrace is what we expect.
Summary: [bluedroid][flatfish] Hit SIGBUS error while doing AdapterPropertiesChangeCallback → [bluedroid]Hit SIGBUS on unaligned access while doing AdapterPropertiesChangeCallback
In btif_in_storage_request_copy_cb, the original write_req.prop seems to be aligned address, 0xbe9104c4, but new one is not 0x45a459ea


(gdb) p &(old_req->write_req.prop)
$9 = (bt_property_t *) 0xbe9104c4
(gdb) s
1081	             new_req->write_req.prop.len = old_req->write_req.prop.len;
(gdb) p &(new_req->write_req.prop)
$10 = (bt_property_t *) 0x45a459ea

#0  btif_in_storage_request_copy_cb (event=2, p_new_buf=0x45a459e2 'Z' <repeats 50 times>"\272, ", <incomplete sequence \335>, p_old_buf=0xbe9104bc "") at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:1068
#1  0x46025f56 in btif_transfer_context (p_cback=0x460277ed <execute_storage_request>, event=2, p_params=0xbe9104bc "", param_len=24, p_copy_cback=0x46027f11 <btif_in_storage_request_copy_cb>)
    at external/bluetooth/bluedroid/main/../btif/src/btif_core.c:214
0xbe9104c4 is on the stack so it's expected that it's aligned. 0x45a459ea is on the heap. We should get aligned address if it's malloc'd, but if it's middle of a malloc'd buffer it can be unaligned.
In http://androidxref.com/4.3_r2.1/xref/external/bluetooth/bluedroid/btif/src/btif_core.c#204
(gdb) p &(new_req->write_req)
$20 = (btif_storage_write_t *) 0x45a459e2
It's not aligned, so I guess write.prop is also not aligned.
I think use memcpy to solve this problem is correct, since bug 715074 uses the same solution.
Comment on attachment 8401238 [details] [diff] [review]
Bug989976.patch

Review of attachment 8401238 [details] [diff] [review]:
-----------------------------------------------------------------

Drive-by comment

::: dom/bluetooth/bluedroid/BluetoothServiceBluedroid.cpp
@@ +275,5 @@
>  
>    for (int i = 0; i < aNumProperties; i++) {
> +    bt_property_t p;
> +    // See Bug 989976, consider aProperties is not aligned enough
> +    memcpy(&p, &aProperties[i], sizeof(bt_property_t));

I find that when you use sizeof, it's generally better (i.e. more maintainable) if you specify a variable rather than a type.

That way, if you ever change the type of the variable, then the sizeof adapts automatically.

With memcpy, I like to use the destination variable to control the size, so I would use sizeof(p) here.

I also think that we should file a followup bug to investigate the root cause of the unaligned memory.
(In reply to Dave Hylands [:dhylands] from comment #26)
> Review of attachment 8401238 [details] [diff] [review]:
> With memcpy, I like to use the destination variable to control the size, so
> I would use sizeof(p) here.
> 
> I also think that we should file a followup bug to investigate the root
> cause of the unaligned memory.
Thanks for the comments. And yes, I'm still finding the root cause.
Follow bug: see also Bug 992741 - [bluedroid] Unaligned access after doing set property (scan mode)
See Also: → 992741
Attachment #8402462 - Attachment is obsolete: true
Attachment #8402462 - Flags: review?(echou)
Attachment #8402484 - Attachment is obsolete: true
Attachment #8402484 - Flags: review?(echou)
Comment on attachment 8434766 [details] [diff] [review]
Bug 989976 - [bluedroid]Hit SIGBUS on unaligned access while doing AdapterPropertiesChangeCallback

Review of attachment 8434766 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for being late. I'm ok with this fix, and we should start working on bug 992741 to find the root cause.
Attachment #8434766 - Flags: review?(echou) → review+
Attachment #8441227 - Attachment description: bug989976.patch → Bug 989976 - [bluedroid]Hit SIGBUS on unaligned access while doing AdapterPropertiesChangeCallback, r=echou
Can you please run this through Try (and post a link here) before requesting checkin? Thanks :)
Keywords: checkin-needed
My bad! Thanks for reminders.
Build pass. But not sure why marioneete did not run.
You used "-u marionette" when you should have used "-u mochitests"

I'd recommend that you use: http://trychooser.pub.build.mozilla.org/ to build your try strings.

So you should resubmit your try run with "-u mochitests" instead.
(In reply to Dave Hylands [:dhylands] (PTO - back Mon Jun 23) from comment #38)
> You used "-u marionette" when you should have used "-u mochitests"
> 
> I'd recommend that you use: http://trychooser.pub.build.mozilla.org/ to
> build your try strings.
> 
> So you should resubmit your try run with "-u mochitests" instead.

Yes. I used trychooser to build try strings. I think i use wrong options. It shall be "-u marionette-webapi".
"try: -b do -p emulator-jb,emulator-kk -u mochitests,marionette-webapi -t none"
Not sure anything I did wrong, but I did not see any test results.
https://hg.mozilla.org/mozilla-central/rev/05377e8cffbe
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.0 S4 (20june)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: