Closed Bug 840673 Opened 10 years ago Closed 9 years ago

Intermittent mochitest-2 shutdown crash | Exited with code -2147483645 during test run [@ mozilla::LazyIdleThread::ShutdownThread()] (Assertion failure: !mThreadIsShuttingDown (Huh?!), at ../../../xpcom/threads/LazyIdleThread.cpp:257)

Categories

(Core :: XPCOM, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: RyanVM, Unassigned)

Details

(Keywords: assertion, crash, intermittent-failure)

Crash Data

Attachments

(1 file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=19659164&tree=Mozilla-Inbound

Rev3 WINNT 5.1 mozilla-inbound debug test mochitest-2 on 2013-02-11 23:39:24 PST for push 324ef02e2161
slave: talos-r3-xp-030

Assertion failure: !mThreadIsShuttingDown (Huh?!), at e:/builds/moz2_slave/m-in-w32-dbg/build/xpcom/threads/LazyIdleThread.cpp:257
nsStringStats
 => mAllocCount:        2397047
 => mReallocCount:       109507
 => mFreeCount:         2381345  --  LEAKED 15702 !!!
 => mShareCount:        4512507
 => mAdoptCount:         141674
 => mAdoptFreeCount:     141669  --  LEAKED 5 !!!
Assertion failure: isEmpty(), at e:\builds\moz2_slave\m-in-w32-dbg\build\obj-firefox\dist\include\mozilla/LinkedList.h:269
TEST-UNEXPECTED-FAIL | Shutdown | Exited with code -2147483645 during test run
INFO | automation.py | Application ran for: 0:32:50.641000
INFO | automation.py | Reading PID log: c:\docume~1\cltbld\locals~1\temp\tmpkpekrmpidlog
==> process 3964 launched child process 3568
==> process 3964 launched child process 176
==> process 3964 launched child process 3112
==> process 3964 launched child process 3192
==> process 3964 launched child process 1932
==> process 3964 launched child process 3348
==> process 3964 launched child process 3372
==> process 3964 launched child process 3648
==> process 3964 launched child process 3676
==> process 3964 launched child process 3764
==> process 3964 launched child process 3856
==> process 3964 launched child process 3948
==> process 3964 launched child process 116
==> process 3964 launched child process 1732
==> process 3964 launched child process 3104
==> process 3964 launched child process 812
==> process 3964 launched child process 1988
==> process 3964 launched child process 280
==> process 3964 launched child process 3636
==> process 3964 launched child process 3780
==> process 3964 launched child process 3132
==> process 3964 launched child process 3216
==> process 3964 launched child process 328
==> process 3964 launched child process 1368
==> process 3964 launched child process 3208
INFO | automation.py | Checking for orphan process with PID: 3568
INFO | automation.py | Checking for orphan process with PID: 176
INFO | automation.py | Checking for orphan process with PID: 3112
INFO | automation.py | Checking for orphan process with PID: 3192
INFO | automation.py | Checking for orphan process with PID: 1932
INFO | automation.py | Checking for orphan process with PID: 3348
INFO | automation.py | Checking for orphan process with PID: 3372
INFO | automation.py | Checking for orphan process with PID: 3648
INFO | automation.py | Checking for orphan process with PID: 3676
INFO | automation.py | Checking for orphan process with PID: 3764
INFO | automation.py | Checking for orphan process with PID: 3856
INFO | automation.py | Checking for orphan process with PID: 3948
INFO | automation.py | Checking for orphan process with PID: 116
INFO | automation.py | Checking for orphan process with PID: 1732
INFO | automation.py | Checking for orphan process with PID: 3104
INFO | automation.py | Checking for orphan process with PID: 812
INFO | automation.py | Checking for orphan process with PID: 1988
INFO | automation.py | Checking for orphan process with PID: 280
INFO | automation.py | Checking for orphan process with PID: 3636
INFO | automation.py | Checking for orphan process with PID: 3780
INFO | automation.py | Checking for orphan process with PID: 3132
INFO | automation.py | Checking for orphan process with PID: 3216
INFO | automation.py | Checking for orphan process with PID: 328
INFO | automation.py | Checking for orphan process with PID: 1368
INFO | automation.py | Checking for orphan process with PID: 3208
PROCESS-CRASH | Shutdown | application crashed [@ mozilla::LazyIdleThread::ShutdownThread()]
Crash dump filename: c:\docume~1\cltbld\locals~1\temp\tmpnldvp1\minidumps\7b6cc98c-382b-4f4a-a2cb-afe062981c46.dmp
Operating system: Windows NT
                  5.1.2600 Service Pack 2
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  EXCEPTION_BREAKPOINT
Crash address: 0x312d563

Thread 0 (crashed)
 0  xul.dll!mozilla::LazyIdleThread::ShutdownThread() [LazyIdleThread.cpp:324ef02e2161 : 257 + 0x18]
    eip = 0x0312d563   esp = 0x0012f59c   ebp = 0x0012f610   ebx = 0x00000000
    esi = 0x03b03ab0   edi = 0x09ae2388   eax = 0x00000000   ecx = 0x557f0a97
    edx = 0x10361f48   efl = 0x00200206
    Found by: given as instruction pointer in context
 1  xul.dll!mozilla::LazyIdleThread::Notify(nsITimer *) [LazyIdleThread.cpp:324ef02e2161 : 461 + 0x7]
    eip = 0x0312dbe8   esp = 0x0012f618   ebp = 0x0012f634
    Found by: call frame info
 2  xul.dll!nsTimerImpl::Fire() [nsTimerImpl.cpp:324ef02e2161 : 485 + 0x6]
    eip = 0x0312a061   esp = 0x0012f63c   ebp = 0x0012f6bc
    Found by: call frame info
 3  xul.dll!nsTimerEvent::Run() [nsTimerImpl.cpp:324ef02e2161 : 565 + 0xe]
    eip = 0x0312a29c   esp = 0x0012f6c4   ebp = 0x0012f6ec
    Found by: call frame info
 4  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:324ef02e2161 : 627 + 0xd]
    eip = 0x03124ded   esp = 0x0012f6f4   ebp = 0x0012f744
    Found by: call frame info
 5  xul.dll!NS_ProcessNextEvent_P(nsIThread *,bool) [nsThreadUtils.cpp:324ef02e2161 : 238 + 0xc]
    eip = 0x030cf55f   esp = 0x0012f74c   ebp = 0x0012f758
    Found by: call frame info
 6  xul.dll!nsThread::Shutdown() [nsThread.cpp:324ef02e2161 : 474 + 0x9]
    eip = 0x0312536b   esp = 0x0012f760   ebp = 0x0012f794
    Found by: call frame info
 7  xul.dll!nsRunnableMethodImpl<tag_nsresult ( nsIThread::*)(void),1>::Run() [nsThreadUtils.h:324ef02e2161 : 367 + 0x3]
    eip = 0x0312716b   esp = 0x0012f79c   ebp = 0x0012f7a0
    Found by: call frame info
 8  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:324ef02e2161 : 627 + 0xd]
    eip = 0x03124ded   esp = 0x0012f7a8   ebp = 0x0012f7f8
Assignee: nobody → bent.mozilla
Attached patch PatchSplinter Review
Shutting down a thread spins the event loop.  While spinning the idle timer can fire and cause us to try to shutdown again, and we assert and die.  The solution is to shutdown the idle timer before shutting down the thread.
Assignee: bent.mozilla → khuey
Status: NEW → ASSIGNED
Attachment #722311 - Flags: review?
Attachment #722311 - Flags: review? → review?(bent.mozilla)
Summary: Intermittent mochitest-4 shutdown crash | Exited with code -2147483645 during test run [@ mozilla::LazyIdleThread::ShutdownThread()] (Assertion failure: !mThreadIsShuttingDown (Huh?!), at ../../../xpcom/threads/LazyIdleThread.cpp:257) → Intermittent mochitest-2 shutdown crash | Exited with code -2147483645 during test run [@ mozilla::LazyIdleThread::ShutdownThread()] (Assertion failure: !mThreadIsShuttingDown (Huh?!), at ../../../xpcom/threads/LazyIdleThread.cpp:257)
Comment on attachment 722311 [details] [diff] [review]
Patch

Review of attachment 722311 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: xpcom/threads/LazyIdleThread.cpp
@@ +238,5 @@
> +  // Do this up front so that when we spin the event loop to shutdown the
> +  // thread the idle timer doesn't cause us to reenter.
> +  if (mIdleTimer) {
> +    DebugOnly<nsresult> rv =
> +      mIdleTimer->Cancel();

Nit: This can fit on one line righ?
Attachment #722311 - Flags: review?(bent.mozilla) → review+
Are you sure it was this?  I would be very surprised if this code even runs on Android talos.
Flags: needinfo?(ryanvm)
The retriggers before and after your patch landed and before and after it was backed out would certainly indicate that it was...
Flags: needinfo?(ryanvm)
Well I'm not going to have the days it'll take to set up an Android build environment and try to reproduce a Talos timeout to burn anytime soon :-/
Assignee: khuey → nobody
Status: ASSIGNED → NEW
In case you were wondering, this still has issues with Android ts. Mark or Kats, any idea what might be going on here?
https://tbpl.mozilla.org/?tree=Try&rev=0b44854b25ef

PROCESS-CRASH | ts | application crashed [@ mozilla::LazyIdleThread::ScheduleTimer()]
Crash dump filename: /tmp/tmphrcHpT/57a956f2-692b-3cd1-4b5b251c-6385fbee.dmp
Operating system: Android
                  0.0.0 Linux 2.6.32.9-00002-gd8084dc-dirty #1 SMP PREEMPT Wed Feb 2 11:32:06 PST 2011 armv7l nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys
CPU: arm
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 5 (crashed)
 0  libxul.so!mozilla::LazyIdleThread::ScheduleTimer() [LazyIdleThread.cpp:0b44854b25ef : 215 + 0x0]
     r4 = 0x00000001    r5 = 0x52ab62e0    r6 = 0x4ed046cc    r7 = 0x00000001
     r8 = 0x00000001    r9 = 0x4ed046ff   r10 = 0x4f0364ec    fp = 0x0024cfd0
     sp = 0x4ed046a0    lr = 0xafd1050c    pc = 0x545882ae
    Found by: given as instruction pointer in context
 1  libxul.so!mozilla::LazyIdleThread::QueryInterface(nsID const&, void**) [LazyIdleThread.cpp:0b44854b25ef : 363 + 0x15]
     sp = 0x4ed046a4    pc = 0x54588289
    Found by: stack scanning
 2  libxul.so!nsRunnableMethodImpl<tag_nsresult (mozilla::dom::NotificationPermissionRequest::*)(), true>::Run() [nsThreadUtils.h:0b44854b25ef : 350 + 0x5]
     sp = 0x4ed046b0    pc = 0x53d324e9
    Found by: stack scanning
 3  libxul.so!nsRunnableMethodImpl<tag_nsresult (mozilla::dom::NotificationPermissionRequest::*)(), true>::Revoke() [nsThreadUtils.h:0b44854b25ef : 305 + 0xb]
     sp = 0x4ed046b4    pc = 0x53d324cd
    Found by: stack scanning
 4  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:0b44854b25ef : 627 + 0x5]
     sp = 0x4ed046b8    pc = 0x54585431
    Found by: stack scanning
It looks like kyle's patch causes a segfault because mIdleTimer is null at [1]. This happens because the call to mThread->Shutdown() invokes a call to ScheduleTimer(), which now happens after mIdleTimer is set to null, rather than before. There is a full stack trace halfway down the file at [3]. I don't know this code so I can't comment on how this should be fixed - either the patch breaks some assumptions, or the code at line 215 needs an | if (mIdleThread) | guard.

[1] https://hg.mozilla.org/try/file/0b44854b25ef/xpcom/threads/LazyIdleThread.cpp#l215
[2] https://hg.mozilla.org/try/file/0b44854b25ef/xpcom/threads/LazyIdleThread.cpp#l285
[3] https://tbpl.mozilla.org/php/getParsedLog.php?id=21753163&tree=Try&full=1
(OrangeWFM for bugs not modified in > 2 months)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.