Open Bug 1016388 Opened 10 years ago Updated 2 years ago

Add an infallible version of JS_NewRuntime

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

People

(Reporter: mccr8, Unassigned)

References

Details

This will hopefully make the stack less weird when CycleCollectedJSRuntime's constructor fails.
I don't know if there's a JS equivalent of NS_ABORT_OOM.
We should probably create one if there's not. Having the annotation of the allocation size is an important part of breaking down these crashes and figuring out next steps.
There's js::CrashAtUnhandlableOOM().  Perhaps there could be an optional size argument.  I'm going to guess that CrashReporter::AnnotateOOMAllocationSize() is not going to be callable from the JS engine, so I guess we need to do the callback dance.
Here's one possible approach.  First, add a size argument to CrashAtUnhandlableOOM(), with a default value of 0.  Then, in CrashAtUnhandlableOOM(), check if the argument is non-zero.  If so, check if there's an OOM callback registered.  If there is, call the OOM callback.  The OOM callback calls |CrashReporter::AnnotateOOMAllocationSize(aSize);|.  I think this should preserve the existing shell behavior of CrashAtUnhandlableOOM().

Does that sound reasonable to you, Jason?
Flags: needinfo?(jorendorff)
Unfortunately, there are a ton of allocations in the JS runtime ctor and init() methods, so actually threading around some kind of way to report the failing allocation size seems like it will be pretty invasive.  So I don't know how nice this will be.
Depends on: 1026545
Silence is assent!  Anyways, I spun off a separate bug for the callbackyness.
Flags: needinfo?(jorendorff)
Sorry.

There are around 35 allocations under JS_NewRuntime.

Why is this a good idea? I just got email saying that from now on, when a small allocation fails, the crash signature won't even contain a stack, just "OOM | Small" --- because the action is to figure out why we're running out of memory, and the stack doesn't help with that. So why make a complex change to try to improve the stack in this case?
(In reply to Jason Orendorff [:jorendorff] from comment #7)
> There are around 35 allocations under JS_NewRuntime.
Yeah, I saw there was a lot going on.

> Why is this a good idea? I just got email saying that from now on, when a
> small allocation fails, the crash signature won't even contain a stack, just
> "OOM | Small" --- because the action is to figure out why we're running out
> of memory, and the stack doesn't help with that. So why make a complex
> change to try to improve the stack in this case?

Well, at a high level, right now this OOM crash is showing up as a MOZ_CRASH(), because we just nullcheck the result of the JS_NewRuntime.  It would be nice to do an NS_ABORT_OOM instead, so it shows up as an OOM, but that wants a size.  I guess the easiest thing to do would be to see what the largest allocation a JS runtime does is, and then just pass that in, and if it is small, then who really cares.  Of course then if later somebody starts allocating large arrays in JS_NewRuntime that would be bad.
From what I know, any time we actually crash because of OOM, we should actually crash with an OOMAllocationSize annotation set and with something that is clear that it's OOM, so we can detect it with the classifiers we have in crash-stats for adding the "OOM | " prefix to signatures. If those all end up being "OOM | small", that is fine, but we should send the data so it can be categorized that way.
(gdb) p sizeof(JSRuntime)
$11 = 25368

This is the size of the first allocation done in JS_NewRuntime. Does 25K count as large?

If so, the caller just needs to be able to cope with OOM here, and this bug is unnecessary, right?
Is that a fixed size or variable? 25k fixed size counts as small.
> Is that a fixed size or variable? 25k fixed size counts as small.

It's fixed.

The first call to JS_NewRuntime starts some helper threads, which reserves a bunch of virtual memory for stack (do we even track allocations under PR_CreateThread?). According to js/src/vm/HelperThreads.cpp, it looks like 512KB per thread, and the number of threads is determined by ThreadCountForCPUCount. On my laptop, it's 8 threads. Plus each thread allocates about 3KB of per-thread state--- ~2KB for jsdtoa.cpp, and ~1KB for the regexp engine.

Since bug 941805, those threads are per-process. Before that, they were per-runtime, which was crazy.

Each runtime also gets 32KB for the GC to use as its marking stack, ~1K for a JS::Zone, ~1K for a compartment. Beyond that, PR_NewLock and PR_NewCondVar are called a bunch of times and that's about it.

It looks like there's nothing variable-size.
In bug 991845 comment 8, jonco said:
> The problem is that we are OOM more often here now the new runtime allocates a 16MB nursery.
That would definitely go under "large" -- is that allocation no longer happening?

Though, I care more about having a clear OOM frame on the stack than I do about getting the exact size right. The current naked MOZ_CRASH gets heavily optimized and produces misleading crash dumps. Even a no-size CrashAtUnhandlableOOM would be an improvement.
(In reply to David Major [:dmajor] from comment #13)
> Even a no-size CrashAtUnhandlableOOM would be an improvement.

Note that bug 1026545 is working on getting CrashAtUnhandlableOOM with a size argument (and get that size annotated in crashes).
(In reply to David Major [:dmajor] from comment #13)
> Though, I care more about having a clear OOM frame on the stack than I do
> about getting the exact size right. The current naked MOZ_CRASH gets heavily
> optimized and produces misleading crash dumps. Even a no-size
> CrashAtUnhandlableOOM would be an improvement.

Well ... Why don't we just change the MOZ_CRASH to something else?

Or better, why does MOZ_CRASH get optimized? Don't we always want a legible stack when MOZ_CRASH is called? Maybe we should fix that. (!)
> Why don't we just change the MOZ_CRASH to something else?
The browser crash at oom thing requires a size for the allocation failure, which is hard to compute, unless we just stick a random made up value in there.
> why does MOZ_CRASH get optimized
Good point! Maybe excluding MOZ_CRASH from PGO would be an improvement. Though maybe it will bloat the size of our binary. ;)
MOZ_CRASH is a macro, it doesn't expand to a function. This means that whatever wacky optimizations are in force for the function may make stackwalking difficult (but perhaps we should also look at the stackwalking in these cases, to see if we can improve).

I also don't care about the *exact* size, but I do care about the general class of size: 32 bytes, 32k, 1M is interesting.
If we're commonly failing on the 16MiB nursery reservation, why don't we just make that allocation call CrashAtUnhandlableOOM? Or better, make it fallible? If we do that, then we can make an OOM anywhere else in JSRuntime creation an OOM | small.
Jon landed a change to the nursery last week to only allocate in 1MIB chunks on an as-needed basis, so this should be much less necessary now.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.