Last Comment Bug 694344 - crash WaitForSingleObjectEx with invalid parameter handler called from rand_s
: crash WaitForSingleObjectEx with invalid parameter handler called from rand_s
Status: RESOLVED WORKSFORME
: crash
Product: Firefox
Classification: Client Software
Component: General (show other bugs)
: Trunk
: x86 Windows 7
: -- critical with 1 vote (vote)
: ---
Assigned To: Marcia Knous [:marcia - use ni]
:
:
Mentors:
Depends on: 695791
Blocks: 677797
  Show dependency treegraph
 
Reported: 2011-10-13 09:26 PDT by Marcia Knous [:marcia - use ni]
Modified: 2015-10-14 15:10 PDT (History)
12 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments

Description Marcia Knous [:marcia - use ni] 2011-10-13 09:26:23 PDT
This bug was filed from the Socorro interface and is 
report bp-af2b41cf-b71c-4799-9977-980e52111013 .
============================================================= 

This showed up in the explosive report - there have been a few spikes recently. https://crash-stats.mozilla.com/report/list?signature=WaitForSingleObjectEx%20|%20WaitForSingleObject%20|%20google_breakpad%3A%3AExceptionHandler%3A%3AWriteMinidumpOnHandlerThread%28_EXCEPTION_POINTERS*%2C%20MDRawAssertionInfo*%29

220 Crashes using the 2011101200 build and there have been spikes over 100 crashes using 2011092800 and 2011092900

Frame 	Module 	Signature [Expand] 	Source
0 	ntdll.dll 	KiFastSystemCallRet 	
1 	ntdll.dll 	ZwWaitForSingleObject 	
2 	kernel32.dll 	WaitForSingleObjectEx 	
3 	kernel32.dll 	WaitForSingleObject 	
4 	xul.dll 	google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread 	toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc:764
5 	xul.dll 	google_breakpad::ExceptionHandler::HandleInvalidParameter 	toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc:619
6 	msvcr80.dll 	rand_s 	f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\rand_s.c:86
7 	xul.dll 	`anonymous namespace'::RandUint32 	ipc/chromium/src/base/rand_util_win.cc:16
8 	xul.dll 	base::RandUint64 	ipc/chromium/src/base/rand_util_win.cc:25
9 	xul.dll 	base::RandInt 	ipc/chromium/src/base/rand_util.cc:20
10 	xul.dll 	ChildProcessInfo::GenerateRandomChannelID 	ipc/chromium/src/chrome/common/child_process_info.cc:58
11 	xul.dll 	ChildProcessHost::CreateChannel 	ipc/chromium/src/chrome/common/child_process_host.cc:78
12 	xul.dll 	mozilla::ipc::GeckoChildProcessHost::InitializeChannel 	ipc/glue/GeckoChildProcessHost.cpp:350
13 	xul.dll 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:318
14 	xul.dll 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:326
15 	xul.dll 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:426
16 	xul.dll 	base::MessagePumpForIO::DoRunLoop 	ipc/chromium/src/base/message_pump_win.cc:462
17 	xul.dll 	base::MessagePumpWin::RunWithDispatcher 	ipc/chromium/src/base/message_pump_win.cc:53
18 	xul.dll 	base::MessagePumpWin::Run 	ipc/chromium/src/base/message_pump_win.h:78
19 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
20 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
21 	xul.dll 	base::Thread::ThreadMain 	ipc/chromium/src/base/thread.cc:156
22 	xul.dll 	`anonymous namespace'::ThreadFunc 	ipc/chromium/src/base/platform_thread_win.cc:26
23 	kernel32.dll 	BaseThreadStart
Comment 1 Robert Kaiser 2011-10-14 08:35:33 PDT
This is by far the #1 crash signature on trunk in the last days.

Ted, I see breakpad in there, which makes me wonder if there's another failure in correctly processing the stack is involved here?
Comment 2 Benjamin Smedberg [:bsmedberg] 2011-10-14 08:42:50 PDT
So the actual error site is rand_s, which should be the signature starting point: we get a callback from the CRT for invalid parameters which is the top of this stack and is ignorable. Can we get data on whether the entire spike is the same stack?

I must admit I don't see how we can possibly be *causing* this invalid parameter error: the callsite in question is http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/rand_util_win.cc#15 and we can't be passing NULL or anything like that.
Comment 3 Benjamin Smedberg [:bsmedberg] 2011-10-14 08:49:24 PDT
Reading the VC8 source code to rand_s, I'm pretty sure we're hitting an invalid-parameter error where we can load advapi32.dll (it is loaded dynamically) but the following line fails:

pfnRtlGenRandom = ( PGENRANDOM ) GetProcAddress( hAdvApi32, _TO_STR( RtlGenRandom ) );

If this is the case, we're either hitting an odd windows configuration or we might have problems with library loading. Does this signature perhaps coincide with bug 677797 (mandatory ASLR)? Does it happen only with certain versions/SP levels of Windows?
Comment 4 Marcia Knous [:marcia - use ni] 2011-10-14 09:30:38 PDT
Looking at crash stats, it seems it happens on XP across different SP (2 and 3). The same thing happens for Windows 7 - there are some that have SP 1 and some that do not.
Comment 5 :Ehsan Akhgari 2011-10-16 12:58:38 PDT
This has definitely started happening more frequently since bug 677797 has landed.

advapi32.dll should already be loaded when this code is run, so this should just be a failure in GetProcAddress, which _should_ be unaffected by the mandatory ASLR patch...
Comment 6 Benjamin Smedberg [:bsmedberg] 2011-10-19 12:04:52 PDT
I'm going to make this bug specific to rand_s and give it to Ehsan as the potential regressor.
Comment 7 Ted Mielczarek [:ted.mielczarek] 2011-10-19 12:06:05 PDT
bug 695791 covers fixing the skiplist to get useful signatures out of these.
Comment 8 Benjamin Smedberg [:bsmedberg] 2011-10-19 12:10:52 PDT
I think this should track/block Firefox 10 and bug 677797 should be backed out if we don't understand the issue.
Comment 9 Jim Mathies [:jimm] 2011-10-19 14:18:48 PDT
It looks like this happens on some systems for the first call to rand_s (uptimes are low and we are creating a channel). advapi32.dll has the ASLR bit enabled though so it doesn't look like the code in bug 677797 would directly play a role in it's loading. 

Certainly seems to be related to something that landed on the 11th though. Maybe the best next step would be to backout or disable bug 677797 to confirm it was the cause.
Comment 10 JK 2011-10-21 13:18:57 PDT
Is this also related?

875ecc34-c978-4208-96bc-1ccdf2111015

[@ WaitForMultipleObjectsEx | WaitForMultipleObjects | google_breakpad::CrashGenerationClient::SignalCrashEventAndWait() ]
Comment 11 Jim Mathies [:jimm] 2011-10-21 13:59:31 PDT
(In reply to JK from comment #10)
> Is this also related?
> 
> 875ecc34-c978-4208-96bc-1ccdf2111015
> 
> [@ WaitForMultipleObjectsEx | WaitForMultipleObjects |
> google_breakpad::CrashGenerationClient::SignalCrashEventAndWait() ]

Doesn't look like it. Looks like it may have been caused by a 3rd party dll - znsprnui.dll, which according to the internet - 

znsprnui.dll is a ZNSPRNUI.DLL belonging to Zeon (Beijing) Corp. PDF Driver from Zeon Corp. Non-system processes like znsprnui.dll originate from software you installed on your system.
Comment 12 :Ehsan Akhgari 2011-10-24 17:07:45 PDT
(In reply to Jim Mathies [:jimm] from comment #9)
> It looks like this happens on some systems for the first call to rand_s
> (uptimes are low and we are creating a channel). advapi32.dll has the ASLR
> bit enabled though so it doesn't look like the code in bug 677797 would
> directly play a role in it's loading. 
> 
> Certainly seems to be related to something that landed on the 11th though.
> Maybe the best next step would be to backout or disable bug 677797 to
> confirm it was the cause.

I can do that if you want me to.
Comment 13 Jim Mathies [:jimm] 2011-10-25 07:36:30 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #12)
> (In reply to Jim Mathies [:jimm] from comment #9)
> > It looks like this happens on some systems for the first call to rand_s
> > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR
> > bit enabled though so it doesn't look like the code in bug 677797 would
> > directly play a role in it's loading. 
> > 
> > Certainly seems to be related to something that landed on the 11th though.
> > Maybe the best next step would be to backout or disable bug 677797 to
> > confirm it was the cause.
> 
> I can do that if you want me to.

I won't be able to look into this further until later in the week, so if we want to run this as an experiment in one nightly we might as well do it. Maybe we get lucky and find out it's not the cause.
Comment 14 :Ehsan Akhgari 2011-10-25 08:37:27 PDT
(In reply to Jim Mathies [:jimm] from comment #13)
> (In reply to Ehsan Akhgari [:ehsan] from comment #12)
> > (In reply to Jim Mathies [:jimm] from comment #9)
> > > It looks like this happens on some systems for the first call to rand_s
> > > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR
> > > bit enabled though so it doesn't look like the code in bug 677797 would
> > > directly play a role in it's loading. 
> > > 
> > > Certainly seems to be related to something that landed on the 11th though.
> > > Maybe the best next step would be to backout or disable bug 677797 to
> > > confirm it was the cause.
> > 
> > I can do that if you want me to.
> 
> I won't be able to look into this further until later in the week, so if we
> want to run this as an experiment in one nightly we might as well do it.
> Maybe we get lucky and find out it's not the cause.

Backed out.  Tomorrow's nightly should not have mandatory ASLR any more.
Comment 15 Notlost 2011-11-15 09:03:13 PST
did this fix the issue?
Comment 16 Benjamin Smedberg [:bsmedberg] 2011-11-15 09:53:53 PST
Marcia, can you verify that this spike went away? There may be other WaitForSingleObjectEx crashes, but this bug was specifically about the spike from the ASLR patch.
Comment 17 Marcia Knous [:marcia - use ni] 2011-11-15 10:13:33 PST
Things don't seem to be quite as explosive as they were in October, here are some numbers from recent build IDs: 

2011111000 	1  (Trunk)
2011110900 	38 (Firefox Beta)
2011110800 	6
2011110700 	2
2011110400 	15 (Firefox 8)
2011110300 	22
2011110200 	17
Comment 18 Alex Keybl [:akeybl] 2011-11-28 13:56:01 PST
[Triage Comment]
Given that this is no longer explosive, and this should have made the Aurora cutover, minusing tracking-firefox10.
Comment 19 Benjamin Smedberg [:bsmedberg] 2011-11-29 05:20:23 PST
Marcia, I'm more asking whether this particular version of the crash (from rand_s) is completely gone, in which case this bug can be marked FIXED.
Comment 20 :Ehsan Akhgari 2012-01-10 20:47:37 PST
Marcia: ping?

Note You need to log in before you can comment on or make changes to this bug.