Closed Bug 1485327 Opened 7 years ago Closed 6 years ago

Various GC crashes due to stack overflow

Categories

(Core :: JavaScript: GC, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox63 --- wontfix

People

(Reporter: decoder, Unassigned)

Details

(4 keywords, Whiteboard: [jsbugmon:])

Crash Data

Attachments

(1 obsolete file)

I keep seeing GC-related crashes due to stack overflow (stack space exhaustion). Unfortunately, I can never reproduce any of these crashes, probably because they require GCing in exactly the right almost-overrecursed state. Here is one of the better examples from mozilla-central revision 77433149bfdc (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --disable-debug --enable-optimize, run with --fuzzing-safe --cpu-count=2 --disable-oom-functions --execute=gcstress=1 --ion-warmup-threshold=100 --no-native-regexp --ion-offthread-compile=off --no-sse4 --spectre-mitigations=on --no-wasm-ion --ion-eager): Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000574e44 in js::gc::StoreBuffer::WholeCellBuffer::clear (this=this@entry=0x7f078121ca08) at js/src/gc/StoreBuffer.cpp:156 #1 0x0000000000576ecf in js::gc::StoreBuffer::clear (this=0x7f078121c9a0) at js/src/gc/StoreBuffer.cpp:88 #2 0x0000000000c7ed44 in js::Nursery::doCollection (this=this@entry=0x7f078121c640, reason=reason@entry=JS::gcreason::DEBUG_GC, tenureCounts=...) at js/src/gc/Nursery.cpp:956 #3 0x0000000000c7f51c in js::Nursery::collect (this=this@entry=0x7f078121c640, reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/Nursery.cpp:752 #4 0x0000000000c43a25 in js::gc::GCRuntime::minorGC (this=this@entry=0x7f078121a4a8, reason=reason@entry=JS::gcreason::DEBUG_GC, phase=phase@entry=js::gcstats::PhaseKind::EVICT_NURSERY_FOR_MAJOR_GC) at js/src/gc/GC.cpp:7968 #5 0x0000000000c619d0 in js::gc::GCRuntime::gcCycle (this=this@entry=0x7f078121a4a8, nonincrementalByAPI=nonincrementalByAPI@entry=false, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7522 #6 0x0000000000c61f5d in js::gc::GCRuntime::collect (this=this@entry=0x7f078121a4a8, nonincrementalByAPI=nonincrementalByAPI@entry=false, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7745 #7 0x0000000000c62fa8 in js::gc::GCRuntime::runDebugGC (this=this@entry=0x7f078121a4a8) at js/src/gc/GC.cpp:8337 #8 0x0000000000c63168 in js::gc::GCRuntime::gcIfNeededAtAllocation (this=0x7f078121a4a8, cx=0x7f0781215000) at js/src/gc/Allocator.cpp:317 #9 0x0000000000c6cf59 in js::gc::GCRuntime::checkAllocatorState<(js::AllowGC)1> (this=<optimized out>, cx=0x7f0781215000, kind=<optimized out>) at js/src/gc/Allocator.cpp:278 #10 0x0000000000c6d018 in js::Allocate<JSObject, (js::AllowGC)1> (cx=cx@entry=0x7f0781215000, kind=js::gc::AllocKind::OBJECT4_BACKGROUND, nDynamicSlots=nDynamicSlots@entry=0, heap=heap@entry=js::gc::DefaultHeap, clasp=clasp@entry=0x1cbe280 <js::MappedArgumentsObject::class_>) at js/src/gc/Allocator.cpp:54 #11 0x000000000092d46d in js::NativeObject::create (cx=cx@entry=0x7f0781215000, kind=kind@entry=js::gc::AllocKind::OBJECT4_BACKGROUND, heap=heap@entry=js::gc::DefaultHeap, shape=..., shape@entry=..., group=group@entry=...) at js/src/vm/NativeObject-inl.h:549 #12 0x000000000093013f in js::ArgumentsObject::create<CopyJitFrameArgs> (cx=0x7f0781215000, callee=callee@entry=..., numActuals=0, copy=...) at js/src/vm/ArgumentsObject.cpp:289 #13 0x000000000092b7bd in js::ArgumentsObject::createForIon (cx=<optimized out>, frame=<optimized out>, scopeChain=...) at js/src/vm/ArgumentsObject.cpp:365 #14 0x00003ccc71737077 in ?? () [...] #30 0x0000000000000000 in ?? () rax 0x0 0 rbx 0x7f0781216240 139670207947328 rcx 0x1cea880 30320768 rdx 0x0 0 rsi 0x0 0 rdi 0x7f0781216240 139670207947328 rbp 0x7f078121a4f8 139670207964408 rsp 0x7ffdca599f70 140727998324592 r8 0x7ffdca59a800 140727998326784 r9 0x7ffdca59a7d0 140727998326736 r10 0x1 1 r11 0x246 582 r12 0x7f078121a000 139670207963136 r13 0x7f078121c9a0 139670207973792 r14 0x7f078121a4a8 139670207964328 r15 0x7f078121c640 139670207972928 rip 0x574e44 <js::gc::StoreBuffer::WholeCellBuffer::clear()+132> => 0x574e44 <js::gc::StoreBuffer::WholeCellBuffer::clear()+132>: callq 0xa86ce0 <js::LifoAlloc::freeAll()> 0x574e49 <js::gc::StoreBuffer::WholeCellBuffer::clear()+137>: mov 0x638(%rsp),%rax Not marking s-s because these type of stack-overflow crashes are likely harmless but they might be a stability issue.
Attached file Testcase (obsolete) —
Ccing some GC people to start with. Please ignore the attached testcase, it does not reproduce for me.
Component: JavaScript Engine → JavaScript: GC
Attachment #9003097 - Attachment is obsolete: true
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:bisect]
JSBugMon: Cannot process bug: Error: Failed to isolate test from comment
Whiteboard: [jsbugmon:bisect] → [jsbugmon:]
JSBugMon: Bisection requested, failed due to error: Error: Failed to isolate test from comment
This seems like a general weakness. See the comment https://searchfox.org/mozilla-central/rev/c45b9755593ce6cc91f558b21e544f5165752de7/js/src/jit/CodeGenerator.cpp#5446-5456 It seems like the JIT does not ensure that there will be very much stack space when calling out to C++. Which means it's up to every function called from the JIT to do an over-recursion check on entry. I would think we could do a check whenever making a VM call. nbp suggested doing it in the trampoline code. I could spot-fix this by adding a check in js::Allocate, though that's already a few frames down and we might exhaust the stack before getting there.
Not sure who decides who takes this, or who makes these calls for JS (or GC). Sfink, can you assign a priority or direct us to who could?
Flags: needinfo?(sphink)
I think jandem would be the tech lead here?
Flags: needinfo?(sphink) → needinfo?(jdemooij)
Ted and I spent some time looking at this. Some findings: * This is probably not an actual stack overflow: the value of RSP is > 100 bytes away from a page boundary. * There's a large distance between the values in RBP (+ some other registers) and RSP (+ some other registers). Did RSP get corrupted at some point? * The instruction after the call suggests we have a large stack frame: mov 0x638(%rsp),%rax We should look at other stacks to see if there's a pattern.
Group: javascript-core-security
Clearing NI per comment 8. decoder, if you're still seeing this we should look at more stacks or try to get this in rr :/
Flags: needinfo?(jdemooij)
Flags: needinfo?(choller)
I just double-checked and this happened last in November. It didn't happen too often in general either but I think we can just close this and re-open when I see it pop up again.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(choller)
Resolution: --- → WORKSFORME
Group: javascript-core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: