Closed Bug 1244824 Opened 5 years ago Closed 5 years ago

[meta] Assertion failure: result ([OOM] Is it really infallible?), at js/src/ds/LifoAlloc.h:281

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox47 --- fixed

People

(Reporter: gkw, Unassigned)

References

Details

(Keywords: assertion, meta, regression, Whiteboard: [jsbugmon:ignore])

Attachments

(1 obsolete file)

We are now getting a lot of such assertion failures especially with clang-compiled js binaries. These testcases tend to be hard-to-reproduce and an example stack is the following:

#0  js::LifoAlloc::allocInfallibleOrAssert (this=<optimized out>, n=<optimized out>) at js/src/ds/LifoAlloc.h:281
#1  js::jit::TempAllocator::allocateInfallible (this=<optimized out>, bytes=<optimized out>) at js/src/jit/JitAllocPolicy.h:40
#2  0x0000000000694483 in js::jit::TempObject::operator new (nbytes=136, alloc=...) at js/src/jit/JitAllocPolicy.h:174
#3  js::jit::MBox::New (ins=<optimized out>, alloc=...) at js/src/jit/MIR.h:4433
#4  js::jit::AlwaysBoxAt (alloc=..., at=<optimized out>, operand=<optimized out>) at js/src/jit/TypePolicy.cpp:43
#5  BoxAt (at=0x2f10300, operand=<optimized out>, alloc=...) at js/src/jit/TypePolicy.cpp:53
#6  js::jit::BoxPolicy<1u>::staticAdjustInputs (alloc=..., ins=0x2f10300) at js/src/jit/TypePolicy.cpp:629
#7  0x00000000006a5c54 in js::jit::MixPolicy<js::jit::ObjectPolicy<0u>, js::jit::BoxPolicy<1u> >::staticAdjustInputs (alloc=..., ins=0x2f10300) at js/src/jit/TypePolicy.h:419
#8  js::jit::MixPolicy<js::jit::ObjectPolicy<0u>, js::jit::BoxPolicy<1u> >::adjustInputs (this=<optimized out>, alloc=..., ins=0x2f10300) at js/src/jit/TypePolicy.h:422
#9  0x000000000057cb5a in (anonymous namespace)::TypeAnalyzer::adjustInputs (def=0x2f10300, this=<optimized out>) at js/src/jit/IonAnalysis.cpp:1499
#10 (anonymous namespace)::TypeAnalyzer::insertConversions (this=<optimized out>) at js/src/jit/IonAnalysis.cpp:1562
#11 (anonymous namespace)::TypeAnalyzer::analyze (this=<optimized out>) at js/src/jit/IonAnalysis.cpp:1806
#12 js::jit::ApplyTypeInformation (mir=<optimized out>, graph=...) at js/src/jit/IonAnalysis.cpp:1818
#13 0x0000000000572790 in js::jit::OptimizeMIR (mir=0x2ebe120) at js/src/jit/Ion.cpp:1631
#14 0x0000000000573f89 in js::jit::CompileBackEnd (mir=0x2ebe120) at js/src/jit/Ion.cpp:2008
#15 0x0000000000828285 in js::HelperThread::handleIonWorkload (this=0x2b38800) at js/src/vm/HelperThreads.cpp:1276
#16 0x0000000000827cce in js::HelperThread::threadLoop (this=0x2b38800) at js/src/vm/HelperThreads.cpp:1603
#17 0x00000000008cc569 in nspr::Thread::ThreadRoutine (arg=0x2b3cef0) at js/src/vm/PosixNSPR.cpp:45
#18 0x00007ff6026dc6aa in start_thread (arg=0x7ff60164b700) at pthread_create.c:333
#19 0x00007ff601752eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

:nbp requested that I file this as a meta bug.
Attachment #8714419 - Attachment is obsolete: true
Flags: needinfo?(nicolas.b.pierron)
I'm curious why these are specific to Clang and/or Linux.
I am thinking that there might be a way to avoid all these OOMs during the compilation.  The way I am thinking involves the fact that we do not rely on the fact that IonBuilder / OptimizeMIR / GenerateLIR is quite isolated from the rest of the browser and is not supposed to write anything out-side its LifoAlloc.

The idea is similar to re-implementing a "throw" for C++, and it can be assimilated to a "fork" / "longjump" where we replace the stack of the compiler on OOM.

bool fallibleWrapper (TempAllocator& alloc) {
    CatchOOMAndReplaceStack oomCatcher(alloc);
    if (!oomCatcher.captureStackOrOOM())
        return false;

    …
    return true;
}

The CatchOOMAndReplaceStach class will clone the stack of the current function to some allocated space.  If no OOM happens, the copy of the stack is discarded at the end of the scope.

If an oom happens in the allocator, then we resume the execution to the last oom catcher, by replacing the current stack by the one which got recorded and moving the stack pointer accordingly.  This is similar to what coroutines are doing for switching threads, except that instead of switching threads and switching stacks, we are in fact returning back in time to the original execution of the captureStackOrOOM() function call.  Then, as we do not capture the stack before the oom catcher, resuming the execution will keep the mutated state of the allocator which can confirm that we are effectively running out of memory.

Note, there is a big pit-fall with this approach which is that we do not want to mutate the state of any allocator while restoring the stack, because as opposed to a real "throw" this will not call any of the destructor on the way back.
Flags: needinfo?(jdemooij)
Flags: needinfo?(hv1989)
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> because as opposed to a real "throw" this will not call any of the destructor on the
> way back.

But that's dangerous right? For instance, if we have a Vector<.., SystemAllocPolicy> somewhere on the stack, we have a leak...
Flags: needinfo?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #4)
> (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > because as opposed to a real "throw" this will not call any of the destructor on the
> > way back.
> 
> But that's dangerous right? For instance, if we have a Vector<..,
> SystemAllocPolicy> somewhere on the stack, we have a leak...

Indeed, and we do use such vectors as a temporary buffer for some of the phases.  Which implies that if we use such vectors then we should have another oom catcher which capture the initialization of the vector, which might be error prone :/

So, far I have no better idea to prevent the instrumentation of all the code base, if we want to properly handle these OOMs.
Flags: needinfo?(hv1989)
Removing ni? request, as the 2 remaining bugs are harder to investigate with only the stack, and knowing where the previous allocations were made would be helpful.
Flags: needinfo?(nicolas.b.pierron)
I'd count this FIXED, since :nbp has looked through whatever can be fixed for now. We are not going to be testing clang-compiled non-Asan Linux builds going forward - at least until it appears on treeherder.

Thanks for looking through/fixing the dependencies, Nicolas!
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.