[Inari] crash in mozilla::dom::bluetooth::BluetoothService::DistributeSignal

RESOLVED FIXED

Status

defect
--
critical
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: marcia, Assigned: jhlin)

Tracking

({crash, regression})

unspecified
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, firefox23 fixed, b2g18+ verified, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 verified)

Details

(Whiteboard: [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch] QARegressExclude , crash signature)

Attachments

(3 attachments, 2 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-66798839-f9eb-45f8-a361-3c77f2130409 .
============================================================= 

Seen while running on Inari, using:

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/cfef36c0c8bc
Gaia   a80be95f553de517e5d8a159e04511cda5e38be4
BuildID 20130409070205
Version 18.0

Not sure of the STR, but I was pairing and unpairing devices at the time of the crash. Will try to reproduce.
Component: General → Bluetooth
Whiteboard: [b2g-crash]
Assignee: nobody → echou
Duplicate of this bug: 860732
Crash Signature: [@ mozilla::dom::bluetooth::BluetoothService::DistributeSignal] → [@ mozilla::dom::bluetooth::BluetoothService::DistributeSignal] [@ DistributeBluetoothSignalTask::Run]
(In reply to Marcia Knous [:marcia] from comment #0)
> This bug was filed from the Socorro interface and is 
> report bp-66798839-f9eb-45f8-a361-3c77f2130409 .
> ============================================================= 
> 
> Seen while running on Inari, using:
> 
> Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/cfef36c0c8bc
> Gaia   a80be95f553de517e5d8a159e04511cda5e38be4
> BuildID 20130409070205
> Version 18.0
> 
> Not sure of the STR, but I was pairing and unpairing devices at the time of
> the crash. Will try to reproduce.

Problem confirmed, I saw this as well. I'll try to reproduce and fix it.
Duplicate of this bug: 861762
Duplicate of this bug: 861856
Not sure if this should be tef? or leo?, mark tef? first.
blocking-b2g: --- → tef?
Blocks: 856773
Whiteboard: [b2g-crash] → [b2g-crash][CR 476080]
backtrace:

#0  0x40fed2bc in mozilla::ObserverList<mozilla::dom::bluetooth::BluetoothSignal>::Broadcast (
    this=<value optimized out>, aSignal=...) at ../../dist/include/mozilla/Observer.h:69
#1  mozilla::dom::bluetooth::BluetoothService::DistributeSignal (this=<value optimized out>, aSignal=...)
    at /home/eric30/Mozilla/mercurial/mozilla-b2g18/dom/bluetooth/BluetoothService.cpp:435
#2  0x40ff7c9a in DistributeBluetoothSignalTask::Run (this=0x48c891c0)
    at /home/eric30/Mozilla/mercurial/mozilla-b2g18/dom/bluetooth/linux/BluetoothDBusService.cpp:264

(gdb) l
64	
65	  void Broadcast(const T& aParam) {
66	    uint32_t size = mObservers.Length();
67	    for (uint32_t i=0; i<size; ++i) {
68	      mObservers[i]->Notify(aParam);
69	    }
70	  }
71	
72	protected:

(gdb) p mObservers
Cannot access memory at address 0x0

(gdb) f 1
#1  mozilla::dom::bluetooth::BluetoothService::DistributeSignal (this=<value optimized out>, aSignal=...)
    at /home/eric30/Mozilla/mercurial/mozilla-b2g18/dom/bluetooth/BluetoothService.cpp:435
435	  ol->Broadcast(aSignal);

(gdb) p ol.mObservers
$3 = {<nsTArray_base<nsTArrayDefaultAllocator>> = {
    mHdr = 0x48a78b20}, <nsTArray_SafeElementAtHelper<mozilla::Observer<mozilla::dom::bluetooth::BluetoothSignal>*, nsTArray<mozilla::Observer<mozilla::dom::bluetooth::BluetoothSignal>*, nsTArrayDefaultAllocator> >> = {<No data fields>}, <No data fields>}
I've found a series of steps which could reproduce this bug:

(1) Go to Settings
(2) Goto Bluetooth page very quickly (before the name of paired devices appears)
(3) Turn on Bluetooth
(4) Press Home key to be back to homescreen
(5) Longpress Home key and kill Settings app
(6) Go to Settings/Bluetooth again, crash happens.

Do these steps as fast as you can.
mrbkap and I looked at this last week. Seems like mObservers[i]->Notify(aParam); can change mObservers[i]
(In reply to Eric Chou [:ericchou] [:echou] from comment #8)
> I've found a series of steps which could reproduce this bug:
> 
> (1) Go to Settings
> (2) Goto Bluetooth page very quickly (before the name of paired devices
> appears)
> (3) Turn on Bluetooth
> (4) Press Home key to be back to homescreen
> (5) Longpress Home key and kill Settings app
> (6) Go to Settings/Bluetooth again, crash happens.
> 
> Do these steps as fast as you can.

I recorded a video: https://www.dropbox.com/s/inutdrc518ett9a/Bug-860075.mp4
(In reply to Gregor Wagner [:gwagner] from comment #9)
> mrbkap and I looked at this last week. Seems like
> mObservers[i]->Notify(aParam); can change mObservers[i]

Cool, lemme take a look. Thanks, Gregor.
Given the fact that this crash only happens when the user performs the STR "very quickly", we'll call this an edge case. Please nominate for uplift to v1.1 with a risk evaluation once completed.
blocking-b2g: tef? → -
tracking-b2g18: --- → +
(In reply to Alex Keybl [:akeybl] from comment #12)
> Given the fact that this crash only happens when the user performs the STR
> "very quickly", we'll call this an edge case. Please nominate for uplift to
> v1.1 with a risk evaluation once completed.

I renamed the device during receiving a file and after enabling/disabling it usually crashed so this had nothing todo with doing something quickly.
(In reply to Gregor Wagner [:gwagner] from comment #13)
> (In reply to Alex Keybl [:akeybl] from comment #12)
> > Given the fact that this crash only happens when the user performs the STR
> > "very quickly", we'll call this an edge case. Please nominate for uplift to
> > v1.1 with a risk evaluation once completed.
> 
> I renamed the device during receiving a file and after enabling/disabling it
> usually crashed so this had nothing todo with doing something quickly.

Gregor is right. The STR I found is just a way relatively easy to reproduce, but it's not neccesary to do every steps quickly to make this happen. See bug 861008, bug 860732, bug 861762, bug 861856 for more information. These are dups.

Re-nom as tef+.
blocking-b2g: - → tef?
The set of dupes makes clear that this recent crash regression can be triggered easily across many common usage scenarios. Blocking+.
blocking-b2g: tef? → tef+
Keywords: regression
Eric, how long for a fix for this?
Whiteboard: [b2g-crash][CR 476080] → [b2g-crash][CR 476080][status: needs ETA]
(In reply to Dietrich Ayala (:dietrich) from comment #16)
> Eric, how long for a fix for this?

I investigated yseterday but didn't find the root cause. Now I'm on bug 860166 because of my director CJ's request. I'll send a patch for 860166 today and back to this bug tomorrow, so feel free to assign this bug to others if necessary.
(In reply to Eric Chou [:ericchou] [:echou] from comment #10)
> (In reply to Eric Chou [:ericchou] [:echou] from comment #8)
> > I've found a series of steps which could reproduce this bug:
> > 
> > (1) Go to Settings
> > (2) Goto Bluetooth page very quickly (before the name of paired devices
> > appears)
> > (3) Turn on Bluetooth
> > (4) Press Home key to be back to homescreen
> > (5) Longpress Home key and kill Settings app
> > (6) Go to Settings/Bluetooth again, crash happens.
> > 
> > Do these steps as fast as you can.
> 
> I recorded a video: https://www.dropbox.com/s/inutdrc518ett9a/Bug-860075.mp4

For this specific case, the root cause is that Settings/Bluetooth app registers for same Bluetooth signal twice but when it's killed only one record is removed, and the dangling one causes SIGSEGV later.

code: http://mxr.mozilla.org/mozilla-b2g18/source/dom/bluetooth/BluetoothService.cpp

380 PLDHashOperator
381 RemoveAllSignalHandlers(const nsAString& aKey,
382                         nsAutoPtr<BluetoothSignalObserverList>& aData,
383                         void* aUserArg)
384 {
// Same observer(BluetoothParent instance) could have been registered more than once,
// but only one will be removed in next line.
385   aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg));
386   return aData->Length() ? PL_DHASH_NEXT : PL_DHASH_REMOVE;
387 }
388
// The following function is called to remove -all- records of given observer
// when a child(in this case, Settings/Bluetooth app) is killed.
389 void
390 BluetoothService::UnregisterAllSignalHandlers(BluetoothSignalObserver* aHandler)
391 {
392   MOZ_ASSERT(NS_IsMainThread());
393   MOZ_ASSERT(aHandler);
394 
395   mBluetoothSignalObserverTable.Enumerate(RemoveAllSignalHandlers, aHandler);
396 }
tracking-b2g18: + → ---
Good catch, John!
Attachment #738878 - Flags: review?(gyeh)
Attachment #738878 - Flags: review?(anygregor)
Attachment #738878 - Flags: feedback?(jolin)
(In reply to Eric Chou [:ericchou] [:echou] from comment #19)
> Created attachment 738878 [details] [diff] [review]
> patch 1: v1: Fix the crash caused by accessing invalid memory
> 
> Good catch, John!

Forgot to mention, I've done monkey test for about half hour this morning and no crash happened.
Hm that looks like a hack :( Could we avoid to register the same observers?
Blake also had a good point: Do we notify these observers twice?
I checked the implementation and found we did notify these observers twice  :(

Discussed with John and Eric, try to figure out a solution.
(In reply to Gregor Wagner [:gwagner] from comment #21)
> Hm that looks like a hack :( Could we avoid to register the same observers?
> Blake also had a good point: Do we notify these observers twice?

Well, it mighe be a kind of workaround, but it's simple and safe. :|

BluetoothParent instance would be registerd more than once when a child process has more than one BluetoothAdapter(or BluetoothDevice). And, you guys are right. We should be able to avoid this by modifying BluetoothParent or BluetoothServiceChildProcess. John will take over this.
Assignee: echou → jolin
Posted patch Fix SIGSEGV crash (obsolete) — Splinter Review
Attachment #738878 - Attachment is obsolete: true
Attachment #738878 - Flags: review?(gyeh)
Attachment #738878 - Flags: review?(anygregor)
Attachment #738878 - Flags: feedback?(jolin)
Attachment #738989 - Flags: review?(gyeh)
Attachment #738989 - Flags: review?(echou)
Attachment #738989 - Flags: review?(anygregor)
(In reply to John Lin[:jolin][:jhlin] from comment #24)
> Created attachment 738989 [details] [diff] [review]
> Fix SIGSEGV crash
About the patch: the changes in BluetoothServiceChildProcess address the dup registration and the correctness of unregistration using (kind of) ref counting approach. Some assertions were also added in BluetoothService to make sure there was no dup.
To be honest I think ObserverList should provide some way to detect this kind of problem (is there?), or prevent it from even happening.
Comment on attachment 738989 [details] [diff] [review]
Fix SIGSEGV crash

Review of attachment 738989 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me but I don't know the code well enough.

::: dom/bluetooth/ipc/BluetoothServiceChildProcess.cpp
@@ +93,2 @@
>      gBluetoothChild->SendUnregisterSignalHandler(nsString(aNodeName));
>    }

Why does the order matter here?
(In reply to Gregor Wagner [:gwagner] from comment #26)
> Comment on attachment 738989 [details] [diff] [review]
> Fix SIGSEGV crash
> 
> Review of attachment 738989 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good to me but I don't know the code well enough.
> 
> ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.cpp
> @@ +93,2 @@
> >      gBluetoothChild->SendUnregisterSignalHandler(nsString(aNodeName));
> >    }
> 
> Why does the order matter here?

Do you mean why moving "BluetoothService::Unregister..." above "if..."? It's in that order to make IsSignalRegistered() returns false after the last handler is removed.
Comment on attachment 738989 [details] [diff] [review]
Fix SIGSEGV crash

Review of attachment 738989 [details] [diff] [review]:
-----------------------------------------------------------------

This should work, r+ with nits addressed.

::: dom/bluetooth/BluetoothService.cpp
@@ +367,5 @@
>  
>    BluetoothSignalObserverList* ol;
>    if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) {
>      ol->RemoveObserver(aHandler);
> +    // It would be better if we can do duplication check when registering, but ObserverList doesn't support that.

nit: 80 chars per line, please. Here and below.

::: dom/bluetooth/ipc/BluetoothServiceChildProcess.h
@@ +187,5 @@
>    PrepareAdapterInternal(const nsAString& aPath) MOZ_OVERRIDE;
> +
> +  bool
> +  IsSignalRegistered(const nsAString& aNodeName) {
> +    return !!mBluetoothSignalObserverTable.Get(aNodeName);

nit: since it's an one-line function, please make it inline, or could we simply use mBluetoothSignalObserverTable.Get(aNodeName) in BluetoothService.cpp?
Attachment #738989 - Flags: review?(echou) → review+
Comment on attachment 738989 [details] [diff] [review]
Fix SIGSEGV crash

Review of attachment 738989 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bluetooth/BluetoothService.cpp
@@ +368,5 @@
>    BluetoothSignalObserverList* ol;
>    if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) {
>      ol->RemoveObserver(aHandler);
> +    // It would be better if we can do duplication check when registering, but ObserverList doesn't support that.
> +    MOZ_ASSERT(!ol->RemoveObserver(aHandler));

The comment is a bit unclear to me.

Here's my understanding for this assertion: We shouldn't have duplicate instances in the ObserverList, but there's no appropriate way to do duplication check while registering, so assertions are added here.

Am I right?

@@ +385,5 @@
>                          void* aUserArg)
>  {
>    aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg));
> +  // It would be better if we can do duplication check when registering, but ObserverList doesn't support that.
> +  MOZ_ASSERT(!aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg)));

How about making a variable for |static_cast<BluetoothSignalObserver*>|? Then we can re-use it in the assertion.
(In reply to Eric Chou [:ericchou] [:echou] from comment #28)
> Comment on attachment 738989 [details] [diff] [review]
> Fix SIGSEGV crash
> 
> Review of attachment 738989 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This should work, r+ with nits addressed.
> 
> ::: dom/bluetooth/BluetoothService.cpp
> @@ +367,5 @@
> >  
> >    BluetoothSignalObserverList* ol;
> >    if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) {
> >      ol->RemoveObserver(aHandler);
> > +    // It would be better if we can do duplication check when registering, but ObserverList doesn't support that.
> 
> nit: 80 chars per line, please. Here and below.
> 

Okay, I will fix it.

> ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.h
> @@ +187,5 @@
> >    PrepareAdapterInternal(const nsAString& aPath) MOZ_OVERRIDE;
> > +
> > +  bool
> > +  IsSignalRegistered(const nsAString& aNodeName) {
> > +    return !!mBluetoothSignalObserverTable.Get(aNodeName);
> 
> nit: since it's an one-line function, please make it inline, or could we
> simply use mBluetoothSignalObserverTable.Get(aNodeName) in
> BluetoothService.cpp?

I thought defining a function inside the class definition makes it implicitly inline?
> > ::: dom/bluetooth/ipc/BluetoothServiceChildProcess.h
> > @@ +187,5 @@
> > >    PrepareAdapterInternal(const nsAString& aPath) MOZ_OVERRIDE;
> > > +
> > > +  bool
> > > +  IsSignalRegistered(const nsAString& aNodeName) {
> > > +    return !!mBluetoothSignalObserverTable.Get(aNodeName);
> > 
> > nit: since it's an one-line function, please make it inline, or could we
> > simply use mBluetoothSignalObserverTable.Get(aNodeName) in
> > BluetoothService.cpp?
> 
> I thought defining a function inside the class definition makes it
> implicitly inline?

Oh, you're right. :)
Comment on attachment 738989 [details] [diff] [review]
Fix SIGSEGV crash

Review of attachment 738989 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nits addressed.
Attachment #738989 - Flags: review?(gyeh) → review+
(In reply to Gina Yeh [:gyeh] [:ginayeh] from comment #29)
> Comment on attachment 738989 [details] [diff] [review]
> Fix SIGSEGV crash
> 
> Review of attachment 738989 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/bluetooth/BluetoothService.cpp
> @@ +368,5 @@
> >    BluetoothSignalObserverList* ol;
> >    if (mBluetoothSignalObserverTable.Get(aNodeName, &ol)) {
> >      ol->RemoveObserver(aHandler);
> > +    // It would be better if we can do duplication check when registering, but ObserverList doesn't support that.
> > +    MOZ_ASSERT(!ol->RemoveObserver(aHandler));
> 
> The comment is a bit unclear to me.
> 
> Here's my understanding for this assertion: We shouldn't have duplicate
> instances in the ObserverList, but there's no appropriate way to do
> duplication check while registering, so assertions are added here.
> 
> Am I right?

Yes. That's much more clear than my comments. I will update them using your words (with your permission, of course).

> 
> @@ +385,5 @@
> >                          void* aUserArg)
> >  {
> >    aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg));
> > +  // It would be better if we can do duplication check when registering, but ObserverList doesn't support that.
> > +  MOZ_ASSERT(!aData->RemoveObserver(static_cast<BluetoothSignalObserver*>(aUserArg)));
> 
> How about making a variable for |static_cast<BluetoothSignalObserver*>|?
> Then we can re-use it in the assertion.

Hmm... Don't really know about the benifit here (enlighten me please?). I'll do it anyway to please you. :)
Attachment #738989 - Flags: review?(anygregor)
Attachment #738989 - Attachment is obsolete: true
Whiteboard: [b2g-crash][CR 476080][status: needs ETA] → [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch]
This is the patch for v1_0_1
https://hg.mozilla.org/mozilla-central/rev/022a5ed6195b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Verified fixed on Inari the ability to successfully execute the STR in comment #8 without any crashes/issues.

Inari Build ID: 20130429070204
Kernel Date: Feb 21
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/45aa5ba0ed53
Gaia: cf2d4136f0ebc66039637fdbeb72ed184dfbc0f2

Verified fixed on Leo the ability to successfully execute the STR in comment #8 without any crashes/issues.

Leo Build ID: 20130426070204
Kernel Date: Mar 15
Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/6c2493de1441
Gaia: c9046a7acef33328977840176ff5574720d2c74c
Marking as QARegressExclude. Can not verify for firefox23.
Whiteboard: [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch] → [b2g-crash][CR 476080][status: needs ETA][fixed-in-birch] QARegressExclude
You need to log in before you can comment on or make changes to this bug.