Closed
Bug 694344
Opened 13 years ago
Closed 12 years ago
crash WaitForSingleObjectEx with invalid parameter handler called from rand_s
Categories
(Firefox :: General, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox10 | - | --- |
People
(Reporter: marcia, Assigned: marcia)
References
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is report bp-af2b41cf-b71c-4799-9977-980e52111013 . ============================================================= This showed up in the explosive report - there have been a few spikes recently. https://crash-stats.mozilla.com/report/list?signature=WaitForSingleObjectEx%20|%20WaitForSingleObject%20|%20google_breakpad%3A%3AExceptionHandler%3A%3AWriteMinidumpOnHandlerThread%28_EXCEPTION_POINTERS*%2C%20MDRawAssertionInfo*%29 220 Crashes using the 2011101200 build and there have been spikes over 100 crashes using 2011092800 and 2011092900 Frame Module Signature [Expand] Source 0 ntdll.dll KiFastSystemCallRet 1 ntdll.dll ZwWaitForSingleObject 2 kernel32.dll WaitForSingleObjectEx 3 kernel32.dll WaitForSingleObject 4 xul.dll google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc:764 5 xul.dll google_breakpad::ExceptionHandler::HandleInvalidParameter toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc:619 6 msvcr80.dll rand_s f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\rand_s.c:86 7 xul.dll `anonymous namespace'::RandUint32 ipc/chromium/src/base/rand_util_win.cc:16 8 xul.dll base::RandUint64 ipc/chromium/src/base/rand_util_win.cc:25 9 xul.dll base::RandInt ipc/chromium/src/base/rand_util.cc:20 10 xul.dll ChildProcessInfo::GenerateRandomChannelID ipc/chromium/src/chrome/common/child_process_info.cc:58 11 xul.dll ChildProcessHost::CreateChannel ipc/chromium/src/chrome/common/child_process_host.cc:78 12 xul.dll mozilla::ipc::GeckoChildProcessHost::InitializeChannel ipc/glue/GeckoChildProcessHost.cpp:350 13 xul.dll MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:318 14 xul.dll MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:326 15 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:426 16 xul.dll base::MessagePumpForIO::DoRunLoop ipc/chromium/src/base/message_pump_win.cc:462 17 xul.dll base::MessagePumpWin::RunWithDispatcher ipc/chromium/src/base/message_pump_win.cc:53 18 xul.dll base::MessagePumpWin::Run ipc/chromium/src/base/message_pump_win.h:78 19 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:201 20 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:175 21 xul.dll base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:156 22 xul.dll `anonymous namespace'::ThreadFunc ipc/chromium/src/base/platform_thread_win.cc:26 23 kernel32.dll BaseThreadStart
Comment 1•13 years ago
|
||
This is by far the #1 crash signature on trunk in the last days. Ted, I see breakpad in there, which makes me wonder if there's another failure in correctly processing the stack is involved here?
Comment 2•13 years ago
|
||
So the actual error site is rand_s, which should be the signature starting point: we get a callback from the CRT for invalid parameters which is the top of this stack and is ignorable. Can we get data on whether the entire spike is the same stack? I must admit I don't see how we can possibly be *causing* this invalid parameter error: the callsite in question is http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/rand_util_win.cc#15 and we can't be passing NULL or anything like that.
Comment 3•13 years ago
|
||
Reading the VC8 source code to rand_s, I'm pretty sure we're hitting an invalid-parameter error where we can load advapi32.dll (it is loaded dynamically) but the following line fails: pfnRtlGenRandom = ( PGENRANDOM ) GetProcAddress( hAdvApi32, _TO_STR( RtlGenRandom ) ); If this is the case, we're either hitting an odd windows configuration or we might have problems with library loading. Does this signature perhaps coincide with bug 677797 (mandatory ASLR)? Does it happen only with certain versions/SP levels of Windows?
Assignee | ||
Comment 4•13 years ago
|
||
Looking at crash stats, it seems it happens on XP across different SP (2 and 3). The same thing happens for Windows 7 - there are some that have SP 1 and some that do not.
Comment 5•13 years ago
|
||
This has definitely started happening more frequently since bug 677797 has landed. advapi32.dll should already be loaded when this code is run, so this should just be a failure in GetProcAddress, which _should_ be unaffected by the mandatory ASLR patch...
Blocks: 677797
Comment 6•13 years ago
|
||
I'm going to make this bug specific to rand_s and give it to Ehsan as the potential regressor.
Assignee: nobody → ehsan
Summary: crash WaitForSingleObjectEx → crash WaitForSingleObjectEx with invalid parameter handler called from rand_s
Comment 7•13 years ago
|
||
bug 695791 covers fixing the skiplist to get useful signatures out of these.
Comment 8•13 years ago
|
||
I think this should track/block Firefox 10 and bug 677797 should be backed out if we don't understand the issue.
tracking-firefox10:
--- → ?
Comment 9•13 years ago
|
||
It looks like this happens on some systems for the first call to rand_s (uptimes are low and we are creating a channel). advapi32.dll has the ASLR bit enabled though so it doesn't look like the code in bug 677797 would directly play a role in it's loading. Certainly seems to be related to something that landed on the 11th though. Maybe the best next step would be to backout or disable bug 677797 to confirm it was the cause.
Comment 10•13 years ago
|
||
Is this also related? 875ecc34-c978-4208-96bc-1ccdf2111015 [@ WaitForMultipleObjectsEx | WaitForMultipleObjects | google_breakpad::CrashGenerationClient::SignalCrashEventAndWait() ]
Comment 11•13 years ago
|
||
(In reply to JK from comment #10) > Is this also related? > > 875ecc34-c978-4208-96bc-1ccdf2111015 > > [@ WaitForMultipleObjectsEx | WaitForMultipleObjects | > google_breakpad::CrashGenerationClient::SignalCrashEventAndWait() ] Doesn't look like it. Looks like it may have been caused by a 3rd party dll - znsprnui.dll, which according to the internet - znsprnui.dll is a ZNSPRNUI.DLL belonging to Zeon (Beijing) Corp. PDF Driver from Zeon Corp. Non-system processes like znsprnui.dll originate from software you installed on your system.
Comment 12•13 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #9) > It looks like this happens on some systems for the first call to rand_s > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR > bit enabled though so it doesn't look like the code in bug 677797 would > directly play a role in it's loading. > > Certainly seems to be related to something that landed on the 11th though. > Maybe the best next step would be to backout or disable bug 677797 to > confirm it was the cause. I can do that if you want me to.
Comment 13•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #12) > (In reply to Jim Mathies [:jimm] from comment #9) > > It looks like this happens on some systems for the first call to rand_s > > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR > > bit enabled though so it doesn't look like the code in bug 677797 would > > directly play a role in it's loading. > > > > Certainly seems to be related to something that landed on the 11th though. > > Maybe the best next step would be to backout or disable bug 677797 to > > confirm it was the cause. > > I can do that if you want me to. I won't be able to look into this further until later in the week, so if we want to run this as an experiment in one nightly we might as well do it. Maybe we get lucky and find out it's not the cause.
Comment 14•13 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #13) > (In reply to Ehsan Akhgari [:ehsan] from comment #12) > > (In reply to Jim Mathies [:jimm] from comment #9) > > > It looks like this happens on some systems for the first call to rand_s > > > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR > > > bit enabled though so it doesn't look like the code in bug 677797 would > > > directly play a role in it's loading. > > > > > > Certainly seems to be related to something that landed on the 11th though. > > > Maybe the best next step would be to backout or disable bug 677797 to > > > confirm it was the cause. > > > > I can do that if you want me to. > > I won't be able to look into this further until later in the week, so if we > want to run this as an experiment in one nightly we might as well do it. > Maybe we get lucky and find out it's not the cause. Backed out. Tomorrow's nightly should not have mandatory ASLR any more.
Comment 15•13 years ago
|
||
did this fix the issue?
Comment 16•13 years ago
|
||
Marcia, can you verify that this spike went away? There may be other WaitForSingleObjectEx crashes, but this bug was specifically about the spike from the ASLR patch.
Assignee: ehsan → mozillamarcia.knous
Assignee | ||
Comment 17•13 years ago
|
||
Things don't seem to be quite as explosive as they were in October, here are some numbers from recent build IDs: 2011111000 1 (Trunk) 2011110900 38 (Firefox Beta) 2011110800 6 2011110700 2 2011110400 15 (Firefox 8) 2011110300 22 2011110200 17
Comment 18•13 years ago
|
||
[Triage Comment] Given that this is no longer explosive, and this should have made the Aurora cutover, minusing tracking-firefox10.
Comment 19•13 years ago
|
||
Marcia, I'm more asking whether this particular version of the crash (from rand_s) is completely gone, in which case this bug can be marked FIXED.
Comment 20•12 years ago
|
||
Marcia: ping?
Blocks: 728429
No longer blocks: 728429
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Resolution: FIXED → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•