Closed Bug 694344 Opened 14 years ago Closed 13 years ago

crash WaitForSingleObjectEx with invalid parameter handler called from rand_s

Categories

(Firefox :: General, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox10 - ---

People

(Reporter: marcia, Assigned: marcia)

References

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is report bp-af2b41cf-b71c-4799-9977-980e52111013 . ============================================================= This showed up in the explosive report - there have been a few spikes recently. https://crash-stats.mozilla.com/report/list?signature=WaitForSingleObjectEx%20|%20WaitForSingleObject%20|%20google_breakpad%3A%3AExceptionHandler%3A%3AWriteMinidumpOnHandlerThread%28_EXCEPTION_POINTERS*%2C%20MDRawAssertionInfo*%29 220 Crashes using the 2011101200 build and there have been spikes over 100 crashes using 2011092800 and 2011092900 Frame Module Signature [Expand] Source 0 ntdll.dll KiFastSystemCallRet 1 ntdll.dll ZwWaitForSingleObject 2 kernel32.dll WaitForSingleObjectEx 3 kernel32.dll WaitForSingleObject 4 xul.dll google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc:764 5 xul.dll google_breakpad::ExceptionHandler::HandleInvalidParameter toolkit/crashreporter/google-breakpad/src/client/windows/handler/exception_handler.cc:619 6 msvcr80.dll rand_s f:\\dd\\vctools\\crt_bld\\self_x86\\crt\\src\\rand_s.c:86 7 xul.dll `anonymous namespace'::RandUint32 ipc/chromium/src/base/rand_util_win.cc:16 8 xul.dll base::RandUint64 ipc/chromium/src/base/rand_util_win.cc:25 9 xul.dll base::RandInt ipc/chromium/src/base/rand_util.cc:20 10 xul.dll ChildProcessInfo::GenerateRandomChannelID ipc/chromium/src/chrome/common/child_process_info.cc:58 11 xul.dll ChildProcessHost::CreateChannel ipc/chromium/src/chrome/common/child_process_host.cc:78 12 xul.dll mozilla::ipc::GeckoChildProcessHost::InitializeChannel ipc/glue/GeckoChildProcessHost.cpp:350 13 xul.dll MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:318 14 xul.dll MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:326 15 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:426 16 xul.dll base::MessagePumpForIO::DoRunLoop ipc/chromium/src/base/message_pump_win.cc:462 17 xul.dll base::MessagePumpWin::RunWithDispatcher ipc/chromium/src/base/message_pump_win.cc:53 18 xul.dll base::MessagePumpWin::Run ipc/chromium/src/base/message_pump_win.h:78 19 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:201 20 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:175 21 xul.dll base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:156 22 xul.dll `anonymous namespace'::ThreadFunc ipc/chromium/src/base/platform_thread_win.cc:26 23 kernel32.dll BaseThreadStart
This is by far the #1 crash signature on trunk in the last days. Ted, I see breakpad in there, which makes me wonder if there's another failure in correctly processing the stack is involved here?
So the actual error site is rand_s, which should be the signature starting point: we get a callback from the CRT for invalid parameters which is the top of this stack and is ignorable. Can we get data on whether the entire spike is the same stack? I must admit I don't see how we can possibly be *causing* this invalid parameter error: the callsite in question is http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/rand_util_win.cc#15 and we can't be passing NULL or anything like that.
Reading the VC8 source code to rand_s, I'm pretty sure we're hitting an invalid-parameter error where we can load advapi32.dll (it is loaded dynamically) but the following line fails: pfnRtlGenRandom = ( PGENRANDOM ) GetProcAddress( hAdvApi32, _TO_STR( RtlGenRandom ) ); If this is the case, we're either hitting an odd windows configuration or we might have problems with library loading. Does this signature perhaps coincide with bug 677797 (mandatory ASLR)? Does it happen only with certain versions/SP levels of Windows?
Looking at crash stats, it seems it happens on XP across different SP (2 and 3). The same thing happens for Windows 7 - there are some that have SP 1 and some that do not.
This has definitely started happening more frequently since bug 677797 has landed. advapi32.dll should already be loaded when this code is run, so this should just be a failure in GetProcAddress, which _should_ be unaffected by the mandatory ASLR patch...
Blocks: 677797
I'm going to make this bug specific to rand_s and give it to Ehsan as the potential regressor.
Assignee: nobody → ehsan
Summary: crash WaitForSingleObjectEx → crash WaitForSingleObjectEx with invalid parameter handler called from rand_s
bug 695791 covers fixing the skiplist to get useful signatures out of these.
I think this should track/block Firefox 10 and bug 677797 should be backed out if we don't understand the issue.
It looks like this happens on some systems for the first call to rand_s (uptimes are low and we are creating a channel). advapi32.dll has the ASLR bit enabled though so it doesn't look like the code in bug 677797 would directly play a role in it's loading. Certainly seems to be related to something that landed on the 11th though. Maybe the best next step would be to backout or disable bug 677797 to confirm it was the cause.
Is this also related? 875ecc34-c978-4208-96bc-1ccdf2111015 [@ WaitForMultipleObjectsEx | WaitForMultipleObjects | google_breakpad::CrashGenerationClient::SignalCrashEventAndWait() ]
(In reply to JK from comment #10) > Is this also related? > > 875ecc34-c978-4208-96bc-1ccdf2111015 > > [@ WaitForMultipleObjectsEx | WaitForMultipleObjects | > google_breakpad::CrashGenerationClient::SignalCrashEventAndWait() ] Doesn't look like it. Looks like it may have been caused by a 3rd party dll - znsprnui.dll, which according to the internet - znsprnui.dll is a ZNSPRNUI.DLL belonging to Zeon (Beijing) Corp. PDF Driver from Zeon Corp. Non-system processes like znsprnui.dll originate from software you installed on your system.
(In reply to Jim Mathies [:jimm] from comment #9) > It looks like this happens on some systems for the first call to rand_s > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR > bit enabled though so it doesn't look like the code in bug 677797 would > directly play a role in it's loading. > > Certainly seems to be related to something that landed on the 11th though. > Maybe the best next step would be to backout or disable bug 677797 to > confirm it was the cause. I can do that if you want me to.
(In reply to Ehsan Akhgari [:ehsan] from comment #12) > (In reply to Jim Mathies [:jimm] from comment #9) > > It looks like this happens on some systems for the first call to rand_s > > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR > > bit enabled though so it doesn't look like the code in bug 677797 would > > directly play a role in it's loading. > > > > Certainly seems to be related to something that landed on the 11th though. > > Maybe the best next step would be to backout or disable bug 677797 to > > confirm it was the cause. > > I can do that if you want me to. I won't be able to look into this further until later in the week, so if we want to run this as an experiment in one nightly we might as well do it. Maybe we get lucky and find out it's not the cause.
(In reply to Jim Mathies [:jimm] from comment #13) > (In reply to Ehsan Akhgari [:ehsan] from comment #12) > > (In reply to Jim Mathies [:jimm] from comment #9) > > > It looks like this happens on some systems for the first call to rand_s > > > (uptimes are low and we are creating a channel). advapi32.dll has the ASLR > > > bit enabled though so it doesn't look like the code in bug 677797 would > > > directly play a role in it's loading. > > > > > > Certainly seems to be related to something that landed on the 11th though. > > > Maybe the best next step would be to backout or disable bug 677797 to > > > confirm it was the cause. > > > > I can do that if you want me to. > > I won't be able to look into this further until later in the week, so if we > want to run this as an experiment in one nightly we might as well do it. > Maybe we get lucky and find out it's not the cause. Backed out. Tomorrow's nightly should not have mandatory ASLR any more.
did this fix the issue?
Marcia, can you verify that this spike went away? There may be other WaitForSingleObjectEx crashes, but this bug was specifically about the spike from the ASLR patch.
Assignee: ehsan → mozillamarcia.knous
Things don't seem to be quite as explosive as they were in October, here are some numbers from recent build IDs: 2011111000 1 (Trunk) 2011110900 38 (Firefox Beta) 2011110800 6 2011110700 2 2011110400 15 (Firefox 8) 2011110300 22 2011110200 17
[Triage Comment] Given that this is no longer explosive, and this should have made the Aurora cutover, minusing tracking-firefox10.
Marcia, I'm more asking whether this particular version of the crash (from rand_s) is completely gone, in which case this bug can be marked FIXED.
Marcia: ping?
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Resolution: FIXED → WORKSFORME
See Also: → 951827
See Also: → 1167248
You need to log in before you can comment on or make changes to this bug.