Closed Bug 972249 Opened 10 years ago Closed 8 years ago

Memory corruption results from re-entry of function GetProperty in BluetoothDBusService

Categories

(Firefox OS Graveyard :: Bluetooth, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ben.tian, Assigned: tzimmermann)

References

Details

Attachments

(1 file)

Per bug 969447 comment 32, re-entry of GetProperty results in memory corruption and crashes gecko when BT is enabled. An additional lock workarounds the problem but we still have to fix the root cause.
Assignee: nobody → btian
Note the bug occurs only after patch in bug 969447 comment 22 is applied. This is a timing issue and the crash point varies based on additional log printing. The following is STR and initial crash stack.

=== STR ===
1) Start ICS emulator/Boot device. BT is turned off at the beginning.
2) Turn on BT.
3) Gecko crashes.

=== CRASH STACK ===
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 45.94]
0x424bc94a in dbus_message_iter_recurse (iter=0x42eff808, sub=0x42000000) at external/dbus/dbus/dbus-message.c:2056
2056	  *real_sub = *real;
(gdb) bt
#0  0x424bc94a in dbus_message_iter_recurse (iter=0x42eff808, sub=0x42000000) at external/dbus/dbus/dbus-message.c:2056
#1  0x41263688 in ParseProperties (aIter=<value optimized out>, aValue=..., aErrorStr=..., aPropertyTypes=0x42eff7d8, aPropertyTypeLen=12)
    at /home/bentian/WORKSPACE/projects/gecko/central/dom/bluetooth/bluez/linux/BluetoothDBusService.cpp:791
#2  0x41263766 in UnpackPropertiesMessage (aMsg=0x4475bce0, aErr=0x42effbdc, aValue=..., aIface=0x41e19fe4 "org.bluez.Adapter")
    at /home/bentian/WORKSPACE/projects/gecko/central/dom/bluetooth/bluez/linux/BluetoothDBusService.cpp:840
#3  0x41263a52 in DefaultAdapterPathReplyHandler::HandleGetPropertiesReply (this=0x44bfecc0, aReply=0x4475bce0)
    at /home/bentian/WORKSPACE/projects/gecko/central/dom/bluetooth/bluez/linux/BluetoothDBusService.cpp:1952
#4  DefaultAdapterPathReplyHandler::Handle (this=0x44bfecc0, aReply=0x4475bce0) at /home/bentian/WORKSPACE/projects/gecko/central/dom/bluetooth/bluez/linux/BluetoothDBusService.cpp:1891
#5  0x40df5c68 in mozilla::ipc::DBusReplyHandler::Callback (aReply=0x10000000, aData=0x44bfecc0) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/dbus/DBusUtils.cpp:64
#6  0x40df5e72 in mozilla::ipc::Notification::RunCallback (aCall=0x442d5df0, aData=<value optimized out>) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/dbus/RawDBusConnection.cpp:224
#7  mozilla::ipc::Notification::Handle (aCall=0x442d5df0, aData=<value optimized out>) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/dbus/RawDBusConnection.cpp:212
#8  0x424bf83c in _dbus_pending_call_complete (pending=0x442d5df0) at external/dbus/dbus/dbus-pending-call.c:197
#9  0x424ac28a in complete_pending_call_and_unlock (connection=0x4475b8f0, pending=0x442d5df0, message=<value optimized out>) at external/dbus/dbus/dbus-connection.c:2304
#10 0x424afbac in dbus_connection_dispatch (connection=0x4475b8f0) at external/dbus/dbus/dbus-connection.c:4588
#11 0x40df5f98 in mozilla::ipc::DBusWatcher::OnFileCanReadWithoutBlocking (this=0x4531ece0, aFd=<value optimized out>) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/dbus/RawDBusConnection.cpp:175
#12 0x40d46448 in base::MessagePumpLibevent::OnLibeventNotification (fd=81, flags=<value optimized out>, context=<value optimized out>)
    at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/message_pump_libevent.cc:214
#13 0x40d440b8 in event_process_active_single_queue (base=0x40246300, flags=<value optimized out>) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/third_party/libevent/event.c:1350
#14 event_process_active (base=0x40246300, flags=<value optimized out>) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/third_party/libevent/event.c:1420
#15 event_base_loop (base=0x40246300, flags=<value optimized out>) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/third_party/libevent/event.c:1621
#16 0x40d4661e in base::MessagePumpLibevent::Run (this=0x40201df0, delegate=0x42effde4) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/message_pump_libevent.cc:340
#17 0x40d4e16c in MessageLoop::RunInternal (this=0x424df5dc) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/message_loop.cc:226
#18 0x40d4e1ea in MessageLoop::RunHandler (this=0x42effde4) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/message_loop.cc:219
#19 MessageLoop::Run (this=0x42effde4) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/message_loop.cc:193
#20 0x40d50a9a in base::Thread::ThreadMain (this=0x4021c380) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/thread.cc:162
#21 0x40d46978 in ThreadFunc (closure=0x1) at /home/bentian/WORKSPACE/projects/gecko/central/ipc/chromium/src/base/platform_thread_posix.cc:39
#22 0x40060e4c in __thread_entry (func=0x40d46971 <ThreadFunc>, arg=0x4021c380, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#23 0x4006099c in pthread_create (thread_out=<value optimized out>, attr=0xbee97908, start_routine=0x40d46971 <ThreadFunc>, arg=0x4021c380) at bionic/libc/bionic/pthread.c:357
#24 0x00000000 in ?? ()
(gdb) p real
$2 = (DBusMessageRealIter *) 0x42eff818
(gdb) p real_sub
$3 = (DBusMessageRealIter *) 0x42000000
Some thoughts: All DBus calls run serialized in the I/O thread, except for some in StartInternal/StopInternal, which run on a separate BT thread. But in this case, I don't see how the extra lock in bug 969447 could work around the problem.

Another thing is reference counting. DBus ref-counting is a pure nightmare. I looked into this so often, and I'm still not sure it's 100% correct. I could imagine that reference counters of these DBus messages are zero sometimes and you're iterating over free'd memory.
I just saw that there is a regression from bug 964817. StartInternal executes SendWithReply on the BT thread to get the default adapter. This call should be located on the I/O thread (aka DBus thread).

The reply to this call gets processed on the I/O thread, so it's probably unrelated. But it might still be worth a closer look.
Hi,

I have an idea why this bug happens, and I think it might have been fixed already by bug 983576.

Before bug 983576 landed, |StopBluetoothRunnable| ran on the BT thread and cleaned up the connection, but there was a race condition in this code.

  - the BT thread cleaned up the connection and finished by closing the connection
  - during the cleanup, the I/O thread still processed regular DBus message from the BlueZ daemon

The problem with |GetProperties| originated from calls to |EventFilter|. The event filter is cleared by |dbus_connection_remove_filter| in the stop runnable. There is also a call |dbus_connection_remove_watch|, which is (1) blocking and (2) was executed before the event filter was cleared. The race condition is

  - |dbus_connection_remove_watch| receives a message in the BT thread, releases connection, and enters |EventFilter|
  - the I/O thread receives it's outstanding message, enters |EventFilter|
  - both interfere in |GetProperties|

Since bug 983576, the DBus stop code runs completely on the I/O thread, so it should not be possible that these methods interfere.
The proposal here is to review the possible cause, see if the outlined explanation makes sense, and, if so, land this patch to remove the locking. If the bug returns, we can put back the locking.
Attachment #8399903 - Flags: review?(echou)
Blocks: 990392
De-assign myself since Thomas is working on this bug.
Assignee: btian → nobody
Assignee: nobody → tzimmermann
Comment on attachment 8399903 [details] [diff] [review]
[01] Bug 972249: Remove |sGetPropertyMonitor|

Review of attachment 8399903 [details] [diff] [review]:
-----------------------------------------------------------------

Related operations are all running on I/O thread now, so I think this patch should be safe.

Thanks for cleaning this up. :)
Attachment #8399903 - Flags: review?(echou) → review+
hi Thomas, sorry had to backout this change for test failures in marionette like https://tbpl.mozilla.org/php/getParsedLog.php?id=37214814&tree=B2g-Inbound
So much about my theory... I wonder what else could cause this bug. :/
Instead of fixing this, rather land bug 1088574.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: