Closed Bug 1167248 Opened 9 years ago Closed 9 years ago

startup crash in _invalid_parameter_noinfo coming from js::random_initState

Categories

(Core :: JavaScript Engine, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla44
Tracking Status
firefox38.0.5 - wontfix
firefox39 + wontfix
firefox40 + fixed
firefox41 + fixed
firefox42 --- wontfix
firefox43 --- wontfix
firefox44 --- fixed
firefox45 --- fixed
firefox46 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: kairo, Assigned: cpeterson)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-112cbe8d-ac35-4c86-ba0a-8d0492150520.
=============================================================

In analysis of startup crashes in 38.0.5b3, I found that there is a _invalid_parameter_noinfo one at the top of the list, and it looks that all stacks point to being called via js::random_initState, like this:

0 	msvcr120.dll 	_invalid_parameter_noinfo 	f:\dd\vctools\crt\crtw32\misc\invarg.c:96
1 	msvcr120.dll 	rand_s 	f:\dd\vctools\crt\crtw32\misc\rand_s.c:86
2 	xul.dll 	js::random_initState(unsigned __int64*) 	js/src/jsmath.cpp
3 	xul.dll 	js::random_next(unsigned __int64*, int) 	js/src/jsmath.cpp
4 	xul.dll 	js::jit::ExecutableAllocator::computeRandomAllocationAddress() 	js/src/jit/ExecutableAllocatorWin.cpp
5 	xul.dll 	js::jit::ExecutableAllocator::systemAlloc(unsigned int) 	js/src/jit/ExecutableAllocatorWin.cpp
6 	xul.dll 	js::jit::ExecutableAllocator::createPool(unsigned int) 	js/src/jit/ExecutableAllocator.h
7 	xul.dll 	js::jit::ExecutableAllocator::alloc(unsigned int, js::jit::ExecutablePool**, js::jit::CodeKind) 	js/src/jit/ExecutableAllocator.h
8 	xul.dll 	js::jit::Linker::newCode<0>(JSContext*, js::jit::CodeKind) 	js/src/jit/Linker.h
9 	xul.dll 	js::jit::JitRuntime::generateProfilerExitFrameTailStub(JSContext*) 	js/src/jit/x86/Trampoline-x86.cpp
10 	xul.dll 	js::jit::JitRuntime::initialize(JSContext*) 	js/src/jit/Ion.cpp


In total, this is the #8 crash in 38.0.5b3 with 1.5% of all crashes right now, so it's even significant without looking at the fact that it's happening on startup.
[Tracking Requested - why for this release]:
Given that it's up to #8 on 38.0.5b3 data, I'm requesting tracking. This one could be an issue for which we might still need a patch to 38.0.5 - David, are you looking into what causes those?
Flags: needinfo?(dmajor)
These crashes are caused by external applications. We fixed a major source of these in bug 1123778.

We need correlations to see what the distribution of modules looks like here. But from a hand-sampling, I don't have high hopes. There seem to be many causes for this issue. This bug might just end up being a catch-all for bad apps.

Do you have a bug for the correlations in 38.0.5 being broken? We should mark it as blocking this.
Flags: needinfo?(dmajor) → needinfo?(kairo)
The correlations issue is bug 1167256. If we need to wait for that, that probably means that 38.0.5 will be released without a fix and I guess we lost those people because they can't start Firefox any more.
Flags: needinfo?(kairo)
The unfortunate reality is that many of these are going to be unfixable from our side even with correlations. Correlations will point us to the biggest one or two buckets, if there even is such a thing.

These kinds of issues require a broader approach than targeted code changes. For example a self-support system (one that works even in the face of a startup crash) or perhaps an update rollback mechanism.
Depends on: 1167256
Too late for 38 but we should be tracking it for 39.
N-I Liz to make sure she knows.
For now, we could try writing up a description of known problems with external applications in the support article about startup crashes: https://support.mozilla.org/en-US/kb/firefox-keeps-crashing-startup

Roland are you the right person to work on that or find someone who would describe this issue in sumo?
Flags: needinfo?(rtanglao)
(In reply to Liz Henry (:lizzard) from comment #6)
> For now, we could try writing up a description of known problems with
> external applications in the support article about startup crashes:
> https://support.mozilla.org/en-US/kb/firefox-keeps-crashing-startup
> 
> Roland are you the right person to work on that or find someone who would
> describe this issue in sumo?

Hi Liz: 
This is desktop right?
Assuming it's desktop I am CCing Mark Schmidt, desktop Support Lead who will review Joni's edits to the crash article.
And also NeedInfoing Joni Savage, SUMO content editor, who will modify the crash article.
Flags: needinfo?(jsavage)
Roland: yes it's desktop as far as I can tell.

KaiRo: Looks like the main cause of this spike in crashes may have been fixed (perhaps externally or from something we may have put on the block list.)  This is no longer showing up as a top (50) crash for 38.0.5 or 39. Do we have a list of external applications that are commonly causing Firefox crashes? For example, nprotect gameguard. Maybe we could list some of those in a support article, unless sumo folks think it better to describe the issue more generally.  Or just link to the current blocklist - where is that?
Flags: needinfo?(lhenry)
(In reply to Liz Henry (:lizzard) from comment #8)
> Do we have a list of external applications that are commonly causing Firefox
> crashes?

I don't know of a list. But the blocklist info page is surely a good source.
True for addons (https://addons.mozilla.org/en-US/firefox/blocked/) 
Maybe it would be worth looking back through Bugzilla for non-addon examples.
Liz: Hi, SUMO content manager here. We can add this to the article. It might be a good idea to describe the issue in more general terms, then link to the add-on blocklist. For non-add-on examples, is there a way to generalize what type of applications could cause startup crashes?

Also, what solution or workaround should we describe in the article? Uninstall the add-on or software while in safe mode?
Flags: needinfo?(jsavage) → needinfo?(lhenry)
I think addons can be one cause of startup crashes, but in this case I'm talking about other applications running on the user's computer. For example: bug 1139497 or bug 1123778.  Neither of them are Firefox addons or plugins.
Flags: needinfo?(lhenry)
KaiRo do you want to have another look at the correlations for this?   Then maybe we can file tech evangelism bugs and contact the makers of some of the plugins or external applications. 
Wontfix for 39 at this point, though.
Assigning to Joni because tracked bugs should have an assignee. Feel free to reassign to whoever necessary.
Assignee: nobody → jsavage
Reseting the assignee, Joni is SUMO content manager.
Naveed, can you help?
Assignee: jsavage → nobody
Flags: needinfo?(nihsanullah)
I had started discussion with Joni because explaining this kind of crash to users in SUMO seemed like a possible route to take if we are going to wontfix this, which is what I concluded from comment 4. Either we're wontfixing this bug and starting some sort of bigger project, or we're just leaving it open and untracking it. 

I didn't want to leave a major known startup crash in that state without at least trying to document its existence.
random_generateSeed() could be involved. Chris can you take a look at this start up crash?
Flags: needinfo?(nihsanullah) → needinfo?(cpeterson)
David, even if this crash is caused by third-party code, do you think we can avoid this crash by not calling rand_s()? Or would we just be delaying the crash to a later CRT call?

http://hg.mozilla.org/releases/mozilla-release/annotate/f81271987769/js/src/jsmath.cpp#l735

Chrome is also affected by this _invalid_parameter_noinfo crash in rand_s: https://crbug.com/348400

They see 90% of the crashes with rand_s on the stack have "%ProgramFiles%\lenovo\onekey theater\activedetect32.dll" and "%ProgramFiles%\lenovo\onekey theater\windowsapihookdll32.dll" loaded as DLL (like Firefox bug 1123778).

However, Chrome's rand_s crashes increased by two orders of magnitude (~1000s -> ~100000s) with the release of Chrome 35, wondering if this instead has something to do with their switch to VS2013 (or SSE2 changes):

https://code.google.com/p/chromium/issues/detail?id=348400#c3
Flags: needinfo?(cpeterson) → needinfo?(dmajor)
A couple of notes before I answer your question.

First, for my own future reference, here is a more recent crash report from 40b1: https://crash-stats.mozilla.com/report/index/7bddd9bb-cdc7-4cd1-980a-1c31e2150704

Second, invalid_parameter_noinfo is a generic error handler that can indicate a variety of failures. Not all of these are coming from rand_s. We should add this signature to the prefix-list so we can get a breakdown of the callers. I thought I already filed a bug for this a long time ago but I can't find it.
Depends on: 1180882
What we're seeing in these crashes is a failure to load advapi32.dll. If some injected code is outright blocking that load, then avoiding rand_s will just move the crash somewhere else (advapi32 is most notably used for registry operations).

On the other hand, VS2013's rand_s loads advapi32 using slightly newer LoadLibrary APIs than the previous CRT. If this is what's confusing the hooks, then maybe there's still hope. Perhaps we can load advapi32 in the old way. I'll write a diagnostic patch to test this hypothesis.
Flags: needinfo?(dmajor)
Keywords: leave-open
Attached patch Diagnostic patchSplinter Review
Assignee: nobody → dmajor
Attachment #8630159 - Flags: review?(jorendorff)
Comment on attachment 8630159 [details] [diff] [review]
Diagnostic patch

Review of attachment 8630159 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsmath.cpp
@@ +23,5 @@
>  # include <unistd.h>
>  #endif
>  
> +#ifdef XP_WIN
> +# include "windows.h"

Please use #include "jswin.h".
Attachment #8630159 - Flags: review?(jorendorff) → review+
> Second, invalid_parameter_noinfo is a generic error handler that can
> indicate a variety of failures. Not all of these are coming from rand_s. We
> should add this signature to the prefix-list so we can get a breakdown of
> the callers.

FWIW the majority of the spike of this signature on beta is coming from IME code and not rand_s. I've filed bug 1181714.
David, have you seen any crash reports from your diagnostic check for injected hooks breaking rand_s's LoadLibraryExW call?
Flags: needinfo?(dmajor)
No. I haven't seen any rand_s crashes at all since this landed (like I said they're mostly that other bug in IME). Have you?
Flags: needinfo?(dmajor)
No. I only see the IME crashes and shlxthdl.dll (some sort of Sun Microsystems plugin for the Explorer shell).
I don't see any crash reports with this signature in the past 7 days. (I do see some in the past 14 days.) Did we fix this issue (with bug 1181714)?
Flags: needinfo?(kairo)
That signature apparently stopped occurring across versions abruptly on 2015-07-20 at 19:00 UTC or so. Not sure why.
Flags: needinfo?(kairo)
David, do you want to remove your LoadLibrary/LoadLibraryExW diagnostic code or let it ride to channels with more users?
Flags: needinfo?(dmajor)
Let's keep it for a while. Other than being an eyesore it's not really doing much harm.

Dear future reader: In the likely event that we promptly forget about this bug and you stumble across the code in Firefox 50-something, you have my blessing to remove it.
Flags: needinfo?(dmajor)
Marked as fixed for 40 & 41 in order to remove it from our radar.
David, it looks like we are seeing some crash reports (81) from your "Test whether the injected hooks react differently to LoadLibraryW / LoadLibraryExW" diagnostic crash. Does this mean we should not call rand_s() here? Or we should use GetProcAddress() to call RtlGenRandom() directly?

e.g. bp-1eae1e6c-b408-44e3-a787-0fdaa2151009

https://crash-stats.mozilla.com/signature/?signature=random_generateSeed#reports

I just found bug 723447 which blames McAfee Host Intrusion Prevention for similar crashes in rand_s().
Crash Signature: [@ _invalid_parameter_noinfo] → [@ _invalid_parameter_noinfo] [@ random_generateSeed]
Flags: needinfo?(dmajor)
See Also: → 694344, 723447
You could try to mimic what VS2010 did under the hood, but it would be nice to better understand the problem first, i.e. get a repro and step through the disassembly to see where it fails.
Flags: needinfo?(dmajor)
Attached patch part-1-call-RtlGenRandom.patch (obsolete) — Splinter Review
Part 1: Call RtlGenRadom() instead of rand_s() to workaround crashes from injected third-party hooks. Also, RtlGenRandom() can fail, so move the #endif to make the PRMJ_Now() fallback available on Windows.

I don't have a reproducible case to debug, like dmajor recommended in comment 36, but we have seen 988 crash reports [1] hit his diagnostic crash and Chrome made this exact change [2] for the same problem.

[1] https://crash-stats.mozilla.com/signature/?date=%3E%3D2015-07-08&signature=random_generateSeed

[2] https://chromium.googlesource.com/chromium/src/+/f4b1928eaa63736de442317980682bcf1244bd17%5E!/

In theory, we should remove all other calls to rand_s(), like Chrome did, but we would need to patch third-party code. We probably don't see crashes in those rand_s() calls because users affected by these injected hooks in ADVAPI32.DLL all crash in jsmath.cpp.

https://mxr.mozilla.org/mozilla-central/ident?i=rand_s
Assignee: dmajor → cpeterson
Attachment #8700464 - Flags: review?(jdemooij)
Call RtlGenRandom() instead of rand_s() to workaround crashes from injected third-party hooks.

Sorry. My v1 patch had a bug. This v2 patch also simplifies the control flow in GenerateSeed(). I planned to submit the simplification as a follow-up patch, but I think all these changes will be easier to review in this one patch.
Attachment #8700464 - Attachment is obsolete: true
Attachment #8700464 - Flags: review?(jdemooij)
Attachment #8700467 - Flags: review?(jdemooij)
Comment on attachment 8700467 [details] [diff] [review]
call-RtlGenRandom-v2.patch

Review of attachment 8700467 [details] [diff] [review]:
-----------------------------------------------------------------

Wow great patch and cleanup! And thanks for doing the research.

::: js/src/jsmath.cpp
@@ +746,2 @@
>  {
> +    uint64_t seed;

We should initialize `seed`, so that we don't read uninitialized memory on Linux when we fail to open /dev/urandom. That may be a feature (more randomness!) but ASan/Valgrind won't like it and it defeats ASLR.

@@ +748,3 @@
>  
>  #if defined(XP_WIN)
> +    MOZ_ALWAYS_TRUE(RtlGenRandom(static_cast<void*>(&seed), sizeof(seed)));

Nit: I think you can remove the static_cast<> here and just pass &seed (the Chrome code also passes unsigned int* directly).
Attachment #8700467 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #39)
> > +    uint64_t seed;
> 
> We should initialize `seed`, so that we don't read uninitialized memory on
> Linux when we fail to open /dev/urandom. That may be a feature (more
> randomness!) but ASan/Valgrind won't like it and it defeats ASLR.

Fixed. This typo was caught on Try by clang's -Wsometimes-uninitialized warning. :)

> @@ +748,3 @@
> >  
> >  #if defined(XP_WIN)
> > +    MOZ_ALWAYS_TRUE(RtlGenRandom(static_cast<void*>(&seed), sizeof(seed)));
> 
> Nit: I think you can remove the static_cast<> here and just pass &seed (the
> Chrome code also passes unsigned int* directly).

Thanks. That's much cleaner without the static_cast.
Comment on attachment 8700467 [details] [diff] [review]
call-RtlGenRandom-v2.patch

Approval Request Comment

[Feature/regressing bug #]: In part, a forced crash added in *this bug* (bug 1167248 comment 25) to help diagnose the original rand_s crash.

[User impact if declined]: Some users will still hit this forced crash and/or the original rand_s crash!

[Describe test coverage new/current, TreeHerder]:

[Risks and why]: This patch does two things:

1. Removes the forced crash.
2. Works around the original crash from injected third-party software. Chrome made this exact change for the same third-party crash in rand_s:

https://chromium.googlesource.com/chromium/src/+/f4b1928eaa63736de442317980682bcf1244bd17%5E!/

[String/UUID change made/needed]: None

There were 141 instances of the forced crash during the last week:

https://crash-stats.mozilla.com/search/?product=Firefox&signature=random_generateSeed&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports

There were 15 instances of the original rand_s crash during the last week (in Firefox version 36–41, including ESR 38):

https://crash-stats.mozilla.com/search/?product=Firefox&signature=rand_s&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Attachment #8700467 - Flags: approval-mozilla-beta?
Attachment #8700467 - Flags: approval-mozilla-aurora?
Keywords: leave-open
If I understand correctly the removal of the leave-open keyword, we can close this bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment on attachment 8700467 [details] [diff] [review]
call-RtlGenRandom-v2.patch

Fix a startup crash, taking it.
Attachment #8700467 - Flags: approval-mozilla-beta?
Attachment #8700467 - Flags: approval-mozilla-beta+
Attachment #8700467 - Flags: approval-mozilla-aurora?
Attachment #8700467 - Flags: approval-mozilla-aurora+
Uplifted to Aurora with some minor finessing:
https://hg.mozilla.org/releases/mozilla-aurora/rev/c5bef3674a1b

The Beta uplift has a merge conflict and will require a little more work.
I don't see any crash reports from this code since my patch landed in 44.

I see one crash report in [@ invalid_parameter_noinfo | rand_s ]. It has the same underlying cause as this SpiderMonkey crash, but crashes in some Chromium IPC code. I can fix it in another bug.

bp-a74be01f-edd6-432f-a19f-eae272160108
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla44
Blocks: 1240589
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: