Closed
Bug 1499607
Opened 7 years ago
Closed 7 years ago
Assertion failure: fallibleScope_ ([OOM] Cannot allocate a new chunk in an infallible scope.), at js/src/ds/LifoAlloc.cpp:145
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla65
People
(Reporter: gkw, Assigned: iain)
Details
(4 keywords, Whiteboard: [jsbugmon:update])
Attachments
(2 files)
The following testcase crashes on mozilla-central revision 778427bb6353 (build with --enable-debug --enable-more-deterministic, run with --fuzzing-safe --no-threads --ion-eager --cache-ir-stubs=off):
Function(`
mathy4 = function() {}
mathy0 = function(x, y) {
return (Math.fround(Math.imul(Math.round(Math.exp(Math.imul(Math.cos(7),
Math.round((y !== x) >> 0)) >> 0) >> 0), Math.round(-Math.cos(Math.round(Math.cos(y) | 0))) | 0)))
}
mathy1 = function(x, y) {
mathy0(y, Math.round(!Math.round()) + Math.pow(y ? +y ? Math.pow(-Number.MIN_SAFE_INTEGER) :
+y : Math.round(3) ? +x : y | 0) + Math.round() >> 0)
}
mathy3 = function(x, y) {
return (Math.cos(Math.atan2(!mathy1(Math.round(!Math.round()), y) ?
Math.hypot(Math.round(mathy0(Math.tan(Math.PI, y), Math.round + y | 0)), y * 3) :
Math.round(mathy1(Math.round(), Math.round(~y >> 0 > 0))) | 0), mathy1(y + (+y > y >> 0) >> y) + h))
}
try {
mathy3(2, 2);
} catch (e) {}
mathy1 = function(x, y) {
(m ? +y | 0 | 0 | 0 : h.r(h.u(~x0) & h.u(~x)) | 0) | 0 | h.r(!h.f(h.c(0xf) | 0)) | 0 |
0 ? !+r.E > 0 - h.d(!++y) | 0 > 0 : 7
}
mathy5 = function(x, y) {
Math.trunc(Math.round(Math.tan(Math.round(Math.hypot(Math.hypot()) + Math.pow(+x) > 0,
mathy4(Math.round(mathy3(Math.round(), y)))) - x ? a : f ? u(g) : y)))
}
inputs = [{}, [], new Boolean(false), new String, function() {}, {}, new Number];
for (var k = 0; k < 7; ++k) {
try {
mathy5(inputs[0], inputs[k]);
} catch (e) {}
}
`)();
Function(`
eval(\`
(function() {
mathy1 = function(x, y) {
return (mathy0(~y | 0, +mathy0(Math.hypot(), 0xf)) + Math.sin(Math.hypot(Math.sin() && x) <
Math.round(Math.cos(y >> 0))))
}
mathy4 = function() {}
inputs2 = [];
for (var k = 0; k < 2; ++k) {
try {
mathy5(inputs2[0], inputs2[k]);
} catch (e) {}
}
oomTest(mathy5, { keepFailing: true });
});
\`)();
`)();
Backtrace:
#0 0x0000557cdf63d3f7 in js::LifoAlloc::newChunkWithCapacity (this=<optimized out>, n=<optimized out>) at js/src/ds/LifoAlloc.cpp:145
#1 0x0000557cdf63d596 in js::LifoAlloc::getOrCreateChunk (this=0x7fbde7dc5560, n=0) at js/src/ds/LifoAlloc.cpp:196
#2 0x0000557cdeab5008 in js::LifoAlloc::allocImpl (this=0x7fbde7dc5560, n=48) at js/src/ds/LifoAlloc.h:594
#3 0x0000557cdf44c8ff in js::LifoAlloc::alloc (this=0x7fbde7dc5560, n=48) at js/src/ds/LifoAlloc.h:664
#4 js::LifoAlloc::new_<(anonymous namespace)::CompilerConstraintInstance<(anonymous namespace)::ConstraintDataFreezeObjectFlags>, js::LifoAlloc*&, js::HeapTypeSetKey&, (anonymous namespace)::ConstraintDataFreezeObjectFlags> (this=0x7fbde7dc5560, args=..., args=..., args=...) at js/src/ds/LifoAlloc.h:900
#5 js::TypeSet::ObjectKey::hasFlags (this=<optimized out>, constraints=0x7fbde7c9c240, flags=67108864) at js/src/vm/TypeInference.cpp:2022
/snip
For detailed crash information, see attachment.
![]() |
Reporter | |
Comment 1•7 years ago
|
||
![]() |
Reporter | |
Comment 2•7 years ago
|
||
autobisectjs shows this is probably related to the following changeset:
The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/d5c22661c860
user: Robin Templeton
date: Tue Sep 18 04:04:53 2018 +0000
summary: bug 1490387 - Part 3: Implement BigInt support for bitwise operators. r=jandem
Jan/nbp, is bug 1490387 a likely regressor?
Comment 3•7 years ago
|
||
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #2)
> Jan/nbp, is bug 1490387 a likely regressor?
I do not think so.
Iain or Ashley, do you want to give a try to this simple issue?
Context:
LifoAlloc is a bump allocator, which will increment a pointer to make the next allocation. When we run out of space in the last allocated chunk we allocate a new chunk.
IonMonkey uses a LifoAlloc for most of its allocations, such as MIR instructions, and expect allocations made with the LifoAlloc to be infallible. To ensure that all allocation are infallible, we have to ensure that there is enough reserved space. This assertion error reports that we are making too many allocations without reserving the space, and as such that the returned value is not checked.
What this implies is that we have to call "ensureBallast()" in loops which are not bounded, if they depend on user inputs, such as the number of operands of a resume point or a Phi. Another solution is to use a fallible scope, which enables us from adding new chunks but implies that we have to check the allocation returned values against nullptr.
A simple way to investigate these issues is to first look at the stack if there is any such loop which is not instrumented with ensureBallast() or a fallible scope. Otherwise, look at the locations of previous allocations by tracing the backtraces of LifoAlloc::allocImpl callers, and search for the call sites which are exhausting the LifoAlloc space before this last failing allocation.
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(khyperia)
Flags: needinfo?(jdemooij)
Flags: needinfo?(iireland)
Updated•7 years ago
|
Priority: -- → P2
Comment 4•7 years ago
|
||
Hi Gary,
Can you translate that test case in a test web page so that I can check the other main version of Firefox and maybe regress it (if necessary)?
Thanks.
Flags: needinfo?(nth10sd)
![]() |
Reporter | |
Comment 5•7 years ago
|
||
We won't need to do so - especially if a reduced testcase is landed, and also our fuzzers will find more. Moreover, shell testcases can sometimes be hard to convert to browser ones.
Flags: needinfo?(nth10sd)
Comment 6•7 years ago
|
||
(In reply to Bodea Daniel [:danibodea] from comment #4)
> Can you translate that test case in a test web page so that I can check the
> other main version of Firefox and maybe regress it (if necessary)?
This assertion is a Debug-only assertion to make sure we correctly handle Out Of Memory cases. In release build, this assertion will not fail but an expected crash will happen when running out of memory, in the very unlikely case we hit this precise allocation.
Assignee | ||
Comment 7•7 years ago
|
||
I'll claim this one.
Flags: needinfo?(khyperia)
Flags: needinfo?(iireland)
Assignee | ||
Comment 8•7 years ago
|
||
This is a fairly subtle issue arising from the way we use LifoAlloc for both fallible and infallible allocations.
The sequence of events plays out like this (leaving out a few small irrelevant allocations):
1. We call ensureBallast for 0x4000 bytes. It succeeds, leaving us with just over 0x4000 bytes in the last chunk.
2. We call CompilerConstraintList::add. It allocates a new constraint and tries to add it to the constraint list.
3. The vector underlying the constraint list was already at capacity and has to reallocate. It already had 1024 entries, so it doubles in size and needs to allocate 2048 * 8 = 0x4000 bytes.
4. This eventually reaches TempAllocator::allocate:
MOZ_MUST_USE void* allocate(size_t bytes)
{
LifoAlloc::AutoFallibleScope fallibleAllocator(lifoAlloc());
void* p = lifoScope_.alloc().alloc(bytes);
if (!ensureBallast()) {
return nullptr;
}
return p;
5. We fallibly allocate 0x4000 bytes, which succeeds. We are left with a very small amount of ballast (on the order of 200 bytes).
6. We call ensureBallast to fix this situation. *** We hit an OOM here. ***
7. ensureBallast returns false, so allocate returns nullptr back up the stack. Importantly, our ballast is still very small.
8. The failed allocation bubbles up the stack until it reaches CompilerConstraintList::add:
void add(CompilerConstraint* constraint) {
if (!constraint || !constraints.append(constraint)) {
setFailed();
}
}
9. We swallow the OOM exception, set the constraint list to failed, and continue.
10. A few allocations later, we run out of ballast and assert.
tl;dr: If we fallibly allocate a large chunk of memory out of our ballast, then hit an oom when we try to replenish that ballast, and we don't immediately fail the compilation, then we silently run out of ballast.
Ted and I had a long talk about what we should be doing here, as a result of which we are going to be proposing a session at the all-hands unconference about our OOM story in general. (Summary: in a post-Fission world, is it really worthwhile to spend our complexity budget on OOM recovery?) For this particular bug, we eventually came to the conclusion that the best fix is to mark the ensureBallast in TempAllocator::allocate as an oomUnsafe region. (The same is also true for TempAllocator::allocateArray).
A similar sequence could occur in other places that use AutoFallibleScope. However, in each of those cases, the failed allocation bubbles up until the compilation fails, instead of being swallowed and saved for later. If we immediately fail, then there is no concern about our subsequent "infallible" allocations failing, because there won't be any subsequent allocations.
This change could technically cause a crash in production in a situation where we would currently just fail the compile. However, that can only occur if the fallible allocation that consumes our ballast gets right up to the end of memory (within BallastSize bytes) without going over. (Also, we would have to encounter another fallible allocation before our infallible allocations chewed through the remaining ballast.) Ted and I agree that this is too rare a situation to be worried about.
Patch coming shortly.
Assignee | ||
Comment 9•7 years ago
|
||
Comment 10•7 years ago
|
||
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3de9eb154921
Add OOM unsafe region while replenishing ballast after fallible allocation r=tcampbell
Comment 11•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox65:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Updated•7 years ago
|
Assignee: nobody → iireland
Comment 12•7 years ago
|
||
(In reply to Iain Ireland [:iain] from comment #8)
> Ted and I had a long talk about what we should be doing here, as a result of
> which we are going to be proposing a session at the all-hands unconference
> about our OOM story in general.
My point of view about JIT compilation and OOM, is that JIT compilation should not be observable through OOM, and still let you run code slowly instead of crashing.
> This change could technically cause a crash in production in a situation
> where we would currently just fail the compile.
In production, this code should already crash on real OOM if we are unable to allocate more chunks for the small allocations.
https://searchfox.org/mozilla-central/source/js/src/ds/LifoAlloc.h#684,688
(In reply to Pulsebot from comment #10)
> Pushed by tcampbell@mozilla.com:
> https://hg.mozilla.org/integration/autoland/rev/3de9eb154921
> Add OOM unsafe region while replenishing ballast after fallible allocation
> r=tcampbell
Could we move this oomUnsafe scope to the specific TI allocation (CompilerConstraintList::add) instead of making it for every allocation? I fear that adding this assertion for every TempObject will just inhibit the OOM testing and make us ignore all these reports in the future.
Flags: needinfo?(tcampbell)
Flags: needinfo?(iireland)
Updated•7 years ago
|
status-firefox63:
--- → wontfix
status-firefox-esr60:
--- → wontfix
Comment 13•7 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> My point of view about JIT compilation and OOM, is that JIT compilation
> should not be observable through OOM, and still let you run code slowly
> instead of crashing.
I understand the sentiment in theory, but it seems that in the practical context of Gecko, once we are near OOM, any allocation on another through may crash the process anyways.
> I fear that adding this assertion for every TempObject will just inhibit the
> OOM testing and make us ignore all these reports in the future.
The OOM test will still apply to the primary object allocation. The patch is only about the second ensureBallast call, so I don't think we loose any OOM test coverage.
One thing this patch doesn't address is TempAllocator::lifoAlloc() which potentially has more of these sort of problems hiding. I don't have any good ideas how to manage these or prevent next weeks ensureBallast fuzzbug though =\
Flags: needinfo?(tcampbell)
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> My point of view about JIT compilation and OOM, is that JIT compilation
> should not be observable through OOM, and still let you run code slowly
> instead of crashing.
Agreed, so long as we're talking about Ion compilation. If you OOM during baseline compilation, you are almost certainly going to OOM right away anyway.
Recovering from OOM, and all the testing frameworks to verify that we recover from OOM correctly, add a lot of complexity to Spidermonkey. There are relatively few places where recovering from OOM is practical: broadly speaking, we want Ion compilation to be able to fail gracefully, and we want to be able to handle huge allocations (wasm instances, etc...) that exhaust memory all at once. If we fail to allocate a small object while, say, attaching an IC stub, then it's very unlikely that we'll be able to recover (and if we could then it's a sign that the GC should be collecting more aggressively).
In the long run, we could handle Ion compilation by moving it into a separate process, handle large allocations specially, and eliminate a lot of complexity from the rest of our code. The unconference session would be about A) whether this is a desirable/feasible goal, and B) how that affects the code we write in the short run.
(Further discussion on this issue should probably just wait for the all-hands.)
> In production, this code should already crash on real OOM if we are unable
> to allocate more chunks for the small allocations.
>
> https://searchfox.org/mozilla-central/source/js/src/ds/LifoAlloc.h#684,688
That's only true if we run out of memory on an infallible allocation. If we hit another ensureBallast (which is a fallible allocation) before that point, then we will not crash.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(iireland)
Comment 15•7 years ago
|
||
Just one thing I want to mention. For at least one embedder [1], our OOM story is part of our competitive differentiation vs. V8.
> At the most basic level, we migrated away from V8 because Chrome has a process
> per tab resource model. If a tab dies, Chrome is perfectly happy to open a new
> one, so it can treat out-of-memory errors as fatal.
>
> [...]
>
> SpiderMonkey has several qualities that make it the most suitable option for
> our needs. Most importantly, its process model is most like ours, with a
> single process for the browser and threads for tabs. This necessitates better
> tools for constraining memory and managing resource exhaustion.
[1]: https://engineering.mongodb.com/post/code-generating-away-the-boilerplate-in-our-migration-back-to-spidermonkey
Comment 16•7 years ago
|
||
Is there a user impact here which justifies Beta backport consideration or can this ride the trains?
Flags: needinfo?(iireland)
Comment 17•7 years ago
|
||
This should ride the trains. It primarily is related to testing and fuzzing story.
Flags: needinfo?(iireland)
You need to log in
before you can comment on or make changes to this bug.
Description
•