Closed Bug 867396 Opened 7 years ago Closed 7 years ago

Failure of NS_DispatchToMainThread() can lead to assertion

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:leo+, b2g18 affected)

RESOLVED INVALID
1.1 CS (11may)
blocking-b2g leo+
Tracking Status
b2g18 --- affected

People

(Reporter: alhadp, Assigned: tzimmermann)

Details

(Whiteboard: [CR 471640] u=fx-os-user c=may-6-17 p=1)

Attachments

(1 file)

If NS_DispatchToMainThread() fails for an nsIRunnable, we end up releasing the runnable right there on the (non-main) thread, as would be expected. The dtor for the runnable ends up calling GetXPConnect(), which asserts - saying that we need to be on main thread.

This issue usually is seen after nightly runs of stress tests (which could have caused a memory allocation within NS_DispatchToMainThread() to fail).

Here are the steps our test team used:

1.Play FM in background.
2.Turn on BT and enable auto answer.
3.Connect the GPS cable and open the GPS test application.
4.Run a script which will do airplane on/off, MO SMS and Mo call sequentially 
5.After the night run, mini dumps are generated in the phone

Call stack:

Thread 15 (crashed)
0  libxul.so!nsXPConnect::GetXPConnect [nsXPConnect.cpp : 141 + 0x2]
     r4 = 0x445967c0    r5 = 0x00000000    r6 = 0x440c8f40    r7 = 0x44cffd28
     r8 = 0x440c8f40    r9 = 0x48289260   r10 = 0x00000001    fp = 0x00000002
     sp = 0x44cffc28    lr = 0x4037636f    pc = 0x410d1e64
    Found by: given as instruction pointer in context
1  libxul.so!XPCJSRuntime::Get [nsXPConnect.cpp : 243 + 0x3]
     r4 = 0x445967c0    r5 = 0x00000000    r6 = 0x440c8f40    r7 = 0x44cffd28
     r8 = 0x440c8f40    r9 = 0x48289260   r10 = 0x00000001    fp = 0x00000002
     sp = 0x44cffc40    pc = 0x410d1eb3
    Found by: call frame info
2  libxul.so!nsXPCWrappedJS::Release [XPCWrappedJS.cpp : 188 + 0x3]
     r4 = 0x445967c0    r5 = 0x00000000    r6 = 0x440c8f40    r7 = 0x44cffd28
     r8 = 0x440c8f40    r9 = 0x48289260   r10 = 0x00000001    fp = 0x00000002
     sp = 0x44cffc48    pc = 0x410e59ff
    Found by: call frame info
3  libxul.so!nsXPTCStubBase::Release [xptcall.cpp : 33 + 0x7]
     r4 = 0x44bc4bf0    r5 = 0x00000000    r6 = 0x440c8f40    r7 = 0x44cffd28
     r8 = 0x440c8f40    r9 = 0x48289260   r10 = 0x00000001    fp = 0x00000002
     sp = 0x44cffc70    pc = 0x413ad923
    Found by: call frame info
4  libqc_b2g_ril.so!nsCOMPtr<nsIRunnable>::nsCOMPtr [nsCOMPtr.h : 408 + 0x5]
     r4 = 0x44bc4bf0    r5 = 0x00000000    r6 = 0x440c8f40    r7 = 0x44cffd28
     r8 = 0x440c8f40    r9 = 0x48289260   r10 = 0x00000001    fp = 0x00000002
     sp = 0x44cffc78    pc = 0x423c885f
    Found by: call frame info
(leo+, blocking product stability)
blocking-b2g: --- → leo+
Whiteboard: [CR 471640]
Target Milestone: --- → 1.1 CS (11may)
Assignee: nobody → tzimmermann
Whiteboard: [CR 471640] → [CR 471640] u=fx-os-user c=may-6-17 p=0
Whiteboard: [CR 471640] u=fx-os-user c=may-6-17 p=0 → [CR 471640] u=fx-os-user c=may-6-17 p=1
(bumping priority to reflect that this bug was the cause of over half the crashes from the weekend stability run)
Severity: normal → critical
Priority: -- → P1
(In reply to Michael Vines [:m1] [:evilmachines] from comment #2)
> (bumping priority to reflect that this bug was the cause of over half the
> crashes from the weekend stability run)

You you have a link to the stack traces or some logs? Thanks.
Flags: needinfo?(mvines)
Hrmp, pretty sure this is not a Gecko bug.  Sorry for the noise.
Status: NEW → RESOLVED
blocking-b2g: leo+ → ---
Closed: 7 years ago
Flags: needinfo?(mvines)
Resolution: --- → INVALID
Ok, maybe a Gecko bug after all?  If you believe the stack trace, we managed to do this:

   XPCWrappedJS.cpp:188 [1] -> nsXPConnect.cpp:243 [2] -> nsXPConnect.cpp:141 [3] -> MOZ_CRASH()

But before we execute [1], there's a check to ensure that we're allowed to be off the main thread that we pass [4].  Yet nsXPConnect.cpp:141 then induces a crash because were off the main thread.  XPConnect bug?

[1] http://mxr.mozilla.org/mozilla-b2g18/source/js/xpconnect/src/XPCWrappedJS.cpp#188
[2] http://mxr.mozilla.org/mozilla-b2g18/source/js/xpconnect/src/nsXPConnect.cpp#243
[3] http://mxr.mozilla.org/mozilla-b2g18/source/js/xpconnect/src/nsXPConnect.cpp#140
[4] http://mxr.mozilla.org/mozilla-b2g18/source/js/xpconnect/src/XPCWrappedJS.cpp#171
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
ni?tzimmermann for comment 5
Thomas, are you still the right person to check this base on comment 5?
Flags: needinfo?(tzimmermann)
Could this be happening late in shutdown? NS_DispatchToMainThread should almost never fail unless we're totally out of memory or else the thread manager has been shut down...
(In reply to ben turner [:bent] from comment #7)
> Could this be happening late in shutdown? NS_DispatchToMainThread should
> almost never fail unless we're totally out of memory or else the thread
> manager has been shut down...

I can double check some of the logs tomorrow but I don't think we are triggering a Gecko shutdown here.  USS for the main process was floating at about 70MB at the time of the crash in one of the logs I looked at today (with plenty of apps in the background), so we're not out of memory either.
blocking-b2g: --- → leo+
We need full stacks for the various threads (at least thread 15 and the main thread) to understand what's happening here.
Attached file full minidump
Thread 0
 0  libxul.so!XPCWrappedNative::GetLock [xpcprivate.h : 2748 + 0x0]
     r0 = 0x4f2b34f0    r1 = 0x00000000    r2 = 0x00000000    r3 = 0x50e0eb60
     r4 = 0x4f2b34f0    r5 = 0x4f2b350c    r6 = 0xbea69148    r7 = 0x435a6358
     r8 = 0x4ee6d280    r9 = 0xbea68f10   r10 = 0xbea69148    fp = 0x51a44244
     sp = 0xbea68ea8    lr = 0x410f02c7    pc = 0x410ee1a6
    Found by: given as instruction pointer in context
Are you sure that NS_DispatchToMainThread is failing?  Those stacks indicate we're not in any situation where it should fail.  I suspect your code looks something like

class DoStuff : public nsRunnable {
public:
  DoStuff(already_AddRefed<nsIFoo> aFoo)
    : mFoo (aFoo)
  { }

  NS_IMETHOD Run() {
    NS_ASSERTION(NS_IsMainThread(), "Wrong thread!");

    // Do stuff with mFoo, which can only be used on the main thread.

    return NS_OK;
  }

private:
  nsCOMPtr<nsIFoo> mFoo;
};


nsCOMPtr<nsIFoo> foo;

...

nsCOMPtr<nsIRunnable> runnable = new DoStuff(foo.forget());
NS_DispatchToMainThread(runnable);


In this case mFoo is cleared by DoStuff::~DoStuff.  You have no guarantee here that DoStuff's destructor will run on the main thread though.  If the main thread executes DoStuff before the other thread returns from NS_DispatchToMainThread and gets around to running runnable's destructor then ~DoStuff will run on the other thread.

The solution to this is to release the reference to the interface when you are on the main thread.
Ah crap.  Yep that's the bug.  Thank you!
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → INVALID
Flags: needinfo?(tzimmermann)
You need to log in before you can comment on or make changes to this bug.