Last Comment Bug 739142 - crash in nsTimerImpl::PostTimerEvent
: crash in nsTimerImpl::PostTimerEvent
Status: RESOLVED FIXED
: crash, regression, topcrash
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 14 Branch
: All Windows 7
: -- critical (vote)
: mozilla14
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 741056
Blocks: 739762
  Show dependency treegraph
 
Reported: 2012-03-26 00:02 PDT by Scoobidiver (away)
Modified: 2013-02-08 16:05 PST (History)
16 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments

Description Scoobidiver (away) 2012-03-26 00:02:11 PDT
There's a spike in crashes from 14.0a1/20120324.
The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ab2ff3b5611f&tochange=df1f94b2bdee
It might be a regression from bug 738334.

Signature 	nsTimerImpl::PostTimerEvent() More Reports Search
UUID	fc9cad76-9dc4-4873-8d98-a17392120324
Date Processed	2012-03-24 16:53:20
Uptime	86
Last Crash	1.5 minutes before submission
Install Age	2.1 hours since version was first installed.
Install Time	2012-03-24 14:47:50
Product	Firefox
Version	14.0a1
Build ID	20120324031100
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 15 model 36 stepping 2
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x1000020
User Comments	it appears going onto an https site silently crashes firefox
App Notes 	
Cisco VPN
AdapterVendorID: 0x1002, AdapterDeviceID: 0x5955, AdapterSubsysID: 10921734, AdapterDriverVersion: 8.561.0.0
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers- 
EMCheckCompatibility	True	
Total Virtual Memory	2147352576
Available Virtual Memory	1911676928
System Memory Use Percentage	83
Available Page File	759312384
Available Physical Memory	163864576

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsTimerImpl::PostTimerEvent 	xpcom/threads/nsTimerImpl.cpp:605
1 	xul.dll 	TimerThread::Run 	xpcom/threads/TimerThread.cpp:319
2 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
3 	xul.dll 	nsThread::ThreadFunc 	xpcom/threads/nsThread.cpp:289
4 	nspr4.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:426
5 	nspr4.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:122
6 	msvcr100.dll 	_callthreadstartex 	f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:314
7 	msvcr100.dll 	_threadstartex 	f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\threadex.c:292
8 	kernel32.dll 	BaseThreadInitThunk 	
9 	ntdll.dll 	__RtlUserThreadStart 	
10 	ntdll.dll 	_RtlUserThreadStart 	

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsTimerImpl%3A%3APostTimerEvent%28%29
Comment 1 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-03-26 11:03:13 PDT
There's no web worker running here as far as I can tell, so this probably isn't a regression from 738334.
Comment 2 Scoobidiver (away) 2012-03-29 06:01:58 PDT
With combined signatures, it's #8 top crasher in 14.0a1 over the last day.
Comment 3 Patrick Boyd 2012-03-30 07:16:27 PDT
Some disasembly:

1004E290  mov         eax,dword ptr [nsTimerEvent::sAllocator (10DEEC60h)]
1004E295  push        ebx
1004E296  mov         ebx,dword ptr [esp+8]  
1004E29A  push        ebp  
1004E29B  push        esi  
1004E29C  push        edi  
1004E29D  lea         edi,[eax+20h]  
1004E2A0  mov         esi,eax  
1004E2A2  mov         eax,dword ptr [edi]  
1004E2A4  push        eax  
1004E2A5  call        dword ptr [__imp__PR_Lock (109B9A40h)]

Crash is @ 0x1004E2A2. 
EAX = 0x00000000
EDI = 0x00000020
ESI = 0x00000000
ESP = 0x07a2f6a8
Comment 4 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-03-30 11:02:27 PDT
Perhaps related to bug 733277?
Comment 5 Scoobidiver (away) 2012-03-30 11:06:19 PDT
(In reply to ben turner [:bent] from comment #4)
> Perhaps related to bug 733277?
It's not in the regression range.
Comment 6 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-03-30 11:07:38 PDT
It messes with timer locks and allocators, which is exactly where this crash is.
Comment 7 :Ehsan Akhgari (out sick) 2012-03-30 11:12:32 PDT
From comment 3, I bet money that what happens is that sAllocator is null here.  Investigating.
Comment 8 alex_mayorga 2012-04-03 14:10:48 PDT
Should this be Trunk?

Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120403 Firefox/14.0a1 ID:20120403063158 just crashed like bp-449f0d79-66c2-4332-8dd3-610292120403 on a random Google search results page.
Comment 9 :Ehsan Akhgari (out sick) 2012-04-03 14:37:40 PDT
OK, this is really weird.  I grabbed the minidumps for two crashes from two different builds.  They both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is happening on another thread in fact!  Here's the stack from the minidump corresponding to https://crash-stats.mozilla.com/report/index/fc9cad76-9dc4-4873-8d98-a17392120324:

 	ntdll.dll!_KiFastSystemCallRet@0() 	
 	ntdll.dll!_ZwWaitForSingleObject@12()  + 0xc bytes	
 	ntdll.dll!_RtlpWaitOnCriticalSection@8()  + 0xc4 bytes	
 	ntdll.dll!_RtlEnterCriticalSection@4()  - 0x155c4 bytes	
>	xul.dll!google_breakpad::AutoExceptionHandler::AutoExceptionHandler()  Line 461	C++
 	xul.dll!google_breakpad::ExceptionHandler::HandleException(_EXCEPTION_POINTERS * exinfo)  Line 500	C++
 	msvcr71.dll!__CxxUnhandledExceptionFilter(_EXCEPTION_POINTERS * pPtrs)  + 0x46 bytes	C++
 	kernel32.dll!_UnhandledExceptionFilter@4()  + 0xf5 bytes	
 	ntdll.dll!___RtlUserThreadStart@8()  + 0x147ae bytes	
 	ntdll.dll!@_EH4_CallFilterFunc@8()  + 0x12 bytes	
 	ntdll.dll!ExecuteHandler2@20()  + 0x26 bytes	
 	ntdll.dll!ExecuteHandler@20()  + 0x24 bytes	
 	ntdll.dll!_RtlDispatchException@8()  + 0xd3 bytes	
 	ntdll.dll!_KiUserExceptionDispatcher@8()  + 0xf bytes	
 	msvcr100.dll!memmove(unsigned char * dst, unsigned char * src, unsigned long count)  Line 185	Asm
 	xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::ShiftData(unsigned int start, unsigned int oldLen, unsigned int newLen, unsigned int elemSize, unsigned int elemAlign)  Line 277 + 0xd bytes	C++
 	xul.dll!nsHttpPipeline::WriteSegments(nsAHttpSegmentWriter * writer, unsigned int count, unsigned int * countWritten)  Line 775	C++
 	xul.dll!nsHttpConnection::OnSocketReadable()  Line 1369	C++
 	xul.dll!nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream * in)  Line 1489	C++
 	xul.dll!nsSocketInputStream::OnSocketReady(unsigned int condition)  Line 257	C++
 	xul.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd, short outFlags)  Line 1578	C++
 	xul.dll!nsSocketTransportService::DoPollIteration(bool wait)  Line 757	C++
 	xul.dll!nsSocketTransportService::Run()  Line 648	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait, bool * result)  Line 658	C++
 	xul.dll!nsThread::ThreadFunc(void * arg)  Line 289 + 0xf bytes	C++
 	nspr4.dll!_PR_NativeRunThread(void * arg)  Line 448	C
 	nspr4.dll!pr_root(void * arg)  Line 122 + 0xd bytes	C
 	msvcr100.dll!_callthreadstartex()  Line 314 + 0x6 bytes	C
 	msvcr100.dll!_threadstartex(void * ptd)  Line 292 + 0x5 bytes	C
 	kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes	
 	ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes	
 	ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes	

In both cases, these threads were listed right before the timer thread.  These threads don't appear on the breakpad site at all.  Looking at exinfo->ExceptionRecord in ExceptionHandler::HandleException, this is indeed an access violation, and the ExceptionAddress field points to the memmove function on the stack.

I don't know why Breakpad blames this on the wrong thread.  Should this be moved to Necko, and another bug be filed to investigate this on the breakpad side?
Comment 10 Yunier J. 2012-04-03 21:54:06 PDT
Please, fix this quickly. I like the changes on Firefox 14 but it crash per minute and is very inestable
Comment 11 Yunier J. 2012-04-03 22:02:21 PDT
This version is super faster and the loading times are amazing (at least for me). Do you have introduced some changes to Gecko?
Comment 12 Benjamin Smedberg [:bsmedberg] 2012-04-04 06:21:11 PDT
When you load the minidump into MSVC does it get the correct crashing thread?
Comment 13 :Ehsan Akhgari (out sick) 2012-04-04 08:17:24 PDT
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> When you load the minidump into MSVC does it get the correct crashing thread?

No, but I think it just grabs the crashing thread from the minidump, which is written by breakpad.
Comment 14 :Ehsan Akhgari (out sick) 2012-04-04 08:17:53 PDT
Yunier, can you reproduce this crash reliably on Nightly?
Comment 15 Benjamin Smedberg [:bsmedberg] 2012-04-04 08:34:04 PDT
The minidump is not written by breakpad, it's written by the Windows debughlp function MinidumpWriteDump. We do pass in the exception info from the crash site which ends up here: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc#977
Comment 16 :Ehsan Akhgari (out sick) 2012-04-04 08:59:01 PDT
Hmm, OK.  Is the crash stats report also generated from the minidump?  The weird thing is that the crash report doesn't have any mention of the thread running nsHttpPipeline::WriteSegments.

Here's another question: is it possible that a crash on a thread is handled on a different thread by breakpad?
Comment 17 Ted Mielczarek [:ted.mielczarek] 2012-04-04 09:14:06 PDT
The crash report is generated from the minidump using minidump_stackwalk (in the Breakpad tree). There are a few tricky bits:
1) Breakpad uses a background thread that does the actual dump writing. It saves the ID of that thread and tries to ignore it in the output. Perhaps that's what's failing htere?
2) We use SetUnhandledExceptionFilter to catch crashes, so whatever the behavior of that is. As bsmedberg says, we simply pass on the EXCEPTION_POINTERS that we get from that callback to MinidumpWriteDump.
Comment 18 :Ehsan Akhgari (out sick) 2012-04-04 10:17:04 PDT
(In reply to Ted Mielczarek [:ted] from comment #17)
> The crash report is generated from the minidump using minidump_stackwalk (in
> the Breakpad tree). There are a few tricky bits:
> 1) Breakpad uses a background thread that does the actual dump writing. It
> saves the ID of that thread and tries to ignore it in the output. Perhaps
> that's what's failing htere?
> 2) We use SetUnhandledExceptionFilter to catch crashes, so whatever the
> behavior of that is. As bsmedberg says, we simply pass on the
> EXCEPTION_POINTERS that we get from that callback to MinidumpWriteDump.

SetUnhandledExceptionFilter causes the exception handler to be called on the thread raising the exception, always.

As I said, when I was examining the exception being handled in the debugger, it was an access violation caused by the memmove call on the thread which ended up getting removed from the crash report, which is why I'm pretty sure that this crash is being misblamed at nsTimerImpl::PostTimerEvent for some reason...
Comment 19 :Ehsan Akhgari (out sick) 2012-04-04 10:26:33 PDT
I discussed this with Ted on IRC.  One theory is that the memmove call is writing over the address holding sAllocator, which causes the timer thread to crash while that memove is happening on another thread (and then for memmove to crash as well).

I'm pretty sure that this crash is also hiding under other seemingly unrelated signatures as well.  Moving over to Necko.
Comment 20 Patrick McManus [:mcmanus] 2012-04-04 10:37:28 PDT
I recognize the stack trace in comment 9 as one from bug 741056. The patch for 741056 went to inbound a few hours ago.
Comment 21 Patrick McManus [:mcmanus] 2012-04-04 10:38:05 PDT
bug 738812 also has an odd timer thing going on.
Comment 22 :Ehsan Akhgari (out sick) 2012-04-04 10:47:32 PDT
(In reply to Patrick McManus [:mcmanus] from comment #20)
> I recognize the stack trace in comment 9 as one from bug 741056. The patch
> for 741056 went to inbound a few hours ago.

That is very plausible!
Comment 23 :Ehsan Akhgari (out sick) 2012-04-04 10:50:36 PDT
(In reply to Patrick McManus [:mcmanus] from comment #21)
> bug 738812 also has an odd timer thing going on.

That doesn't look relevant to me.  The timer code run there runs on the main thread, not on the timer thread.
Comment 24 Patrick McManus [:mcmanus] 2012-04-04 10:50:55 PDT
Ehsan, can you point me to any info on how to do the magic you talk about in comment 9? "
I grabbed the minidumps for two crashes from two different builds.  They both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is happening on another thread in fact! "
Comment 25 :Ehsan Akhgari (out sick) 2012-04-04 10:53:17 PDT
(In reply to Patrick McManus [:mcmanus] from comment #24)
> Ehsan, can you point me to any info on how to do the magic you talk about in
> comment 9? "
> I grabbed the minidumps for two crashes from two different builds.  They
> both get blamed for the nsTimerImpl::PostTimerEvent thread, but the crash is
> happening on another thread in fact! "

You need special privileges to download minidumps from crash-stats (or know someone who does!) because they can contain user's private data on the stack.  I'll forward the minidump for you using email.  :-)
Comment 26 Patrick McManus [:mcmanus] 2012-04-04 10:57:13 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #25)

> You need special privileges to download minidumps from crash-stats (or know
> someone who does!) because they can contain user's private data on the
> stack.  I'll forward the minidump for you using email.  :-)

ah. I'm actually curious if any of the crash data for 740319 shows the same thing..
Comment 27 :Ehsan Akhgari (out sick) 2012-04-04 11:18:52 PDT
I don't have access myself.  Ping joe (or #developers).
Comment 28 Honza Bambas (:mayhemer) 2012-04-04 15:56:52 PDT
Interesting crash, should be tracked by bug 739762 too.

(In reply to Yunier José Sosa Vázquez from comment #11)
> This version is super faster and the loading times are amazing (at least for
> me). Do you have introduced some changes to Gecko?

Pipelining patches has landed.
Comment 29 Yunier J. 2012-04-04 23:53:05 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #14)
> Yunier, can you reproduce this crash reliably on Nightly?

When I open Nightly and start to browser this crash per minute. In my PC Nightly during opened more or less one minute only, after this crash. Some times the Windows Error it sohw me, anothers times the Mozilla Crash Reporter informs me about this.
Comment 30 Yunier J. 2012-04-05 00:06:50 PDT
(In reply to Honza Bambas (:mayhemer) from comment #28)
> Interesting crash, should be tracked by bug 739762 too.
> 
> (In reply to Yunier José Sosa Vázquez from comment #11)
> > This version is super faster and the loading times are amazing (at least for
> > me). Do you have introduced some changes to Gecko?
> 
> Pipelining patches has landed.

Great !!! this feature many users will note. I have commented with my friends this and have said the same
Comment 31 Ted Mielczarek [:ted.mielczarek] 2012-04-05 07:55:08 PDT
FYI, if having access to minidumps would be useful, you can file a bug and get that access. It's just an LDAP bit.
Comment 32 :Ehsan Akhgari (out sick) 2012-04-05 14:09:59 PDT
Yunier, it would be great if you can test this again in tonight's Nightly and let us know if the crash has been fixed.  Thanks!
Comment 33 Yunier J. 2012-04-05 15:52:13 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #32)
> Yunier, it would be great if you can test this again in tonight's Nightly
> and let us know if the crash has been fixed.  Thanks!

Ehsan the problem has been fixed in the last Nightly (2012-04-05) and all work perfect. Congratulations for you work for fix this bug. I see a great interest by the developers to know more about the bug and resolve it quickly
Comment 34 :Ehsan Akhgari (out sick) 2012-04-09 13:07:49 PDT
Well, Patrick deserves all of the credit here!  :-)

Thanks for verifying this, Yunier!
Comment 35 Yunier J. 2012-04-10 18:08:17 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #34)
> Well, Patrick deserves all of the credit here!  :-)
> 
> Thanks for verifying this, Yunier!

Well, congratulations to Patrick too :-). The people of Mozilla is great.
Comment 37 Scoobidiver (away) 2013-02-08 10:22:13 PST
(In reply to Swarnava Sengupta (:Swarnava) from comment #36)
> also there are 28% crash in 18.0.1
With so few crashes, it isn't worth a new bug.
Comment 38 Patrick McManus [:mcmanus] 2013-02-08 16:05:36 PST
the low volume of leftovers could be a dup of bug 792920

Note You need to log in before you can comment on or make changes to this bug.