Closed
Bug 1485327
Opened 7 years ago
Closed 6 years ago
Various GC crashes due to stack overflow
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox63 | --- | wontfix |
People
(Reporter: decoder, Unassigned)
Details
(4 keywords, Whiteboard: [jsbugmon:])
Crash Data
Attachments
(1 obsolete file)
I keep seeing GC-related crashes due to stack overflow (stack space exhaustion). Unfortunately, I can never reproduce any of these crashes, probably because they require GCing in exactly the right almost-overrecursed state.
Here is one of the better examples from mozilla-central revision 77433149bfdc (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --disable-debug --enable-optimize, run with --fuzzing-safe --cpu-count=2 --disable-oom-functions --execute=gcstress=1 --ion-warmup-threshold=100 --no-native-regexp --ion-offthread-compile=off --no-sse4 --spectre-mitigations=on --no-wasm-ion --ion-eager):
Backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000574e44 in js::gc::StoreBuffer::WholeCellBuffer::clear (this=this@entry=0x7f078121ca08) at js/src/gc/StoreBuffer.cpp:156
#1 0x0000000000576ecf in js::gc::StoreBuffer::clear (this=0x7f078121c9a0) at js/src/gc/StoreBuffer.cpp:88
#2 0x0000000000c7ed44 in js::Nursery::doCollection (this=this@entry=0x7f078121c640, reason=reason@entry=JS::gcreason::DEBUG_GC, tenureCounts=...) at js/src/gc/Nursery.cpp:956
#3 0x0000000000c7f51c in js::Nursery::collect (this=this@entry=0x7f078121c640, reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/Nursery.cpp:752
#4 0x0000000000c43a25 in js::gc::GCRuntime::minorGC (this=this@entry=0x7f078121a4a8, reason=reason@entry=JS::gcreason::DEBUG_GC, phase=phase@entry=js::gcstats::PhaseKind::EVICT_NURSERY_FOR_MAJOR_GC) at js/src/gc/GC.cpp:7968
#5 0x0000000000c619d0 in js::gc::GCRuntime::gcCycle (this=this@entry=0x7f078121a4a8, nonincrementalByAPI=nonincrementalByAPI@entry=false, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7522
#6 0x0000000000c61f5d in js::gc::GCRuntime::collect (this=this@entry=0x7f078121a4a8, nonincrementalByAPI=nonincrementalByAPI@entry=false, budget=..., reason=reason@entry=JS::gcreason::DEBUG_GC) at js/src/gc/GC.cpp:7745
#7 0x0000000000c62fa8 in js::gc::GCRuntime::runDebugGC (this=this@entry=0x7f078121a4a8) at js/src/gc/GC.cpp:8337
#8 0x0000000000c63168 in js::gc::GCRuntime::gcIfNeededAtAllocation (this=0x7f078121a4a8, cx=0x7f0781215000) at js/src/gc/Allocator.cpp:317
#9 0x0000000000c6cf59 in js::gc::GCRuntime::checkAllocatorState<(js::AllowGC)1> (this=<optimized out>, cx=0x7f0781215000, kind=<optimized out>) at js/src/gc/Allocator.cpp:278
#10 0x0000000000c6d018 in js::Allocate<JSObject, (js::AllowGC)1> (cx=cx@entry=0x7f0781215000, kind=js::gc::AllocKind::OBJECT4_BACKGROUND, nDynamicSlots=nDynamicSlots@entry=0, heap=heap@entry=js::gc::DefaultHeap, clasp=clasp@entry=0x1cbe280 <js::MappedArgumentsObject::class_>) at js/src/gc/Allocator.cpp:54
#11 0x000000000092d46d in js::NativeObject::create (cx=cx@entry=0x7f0781215000, kind=kind@entry=js::gc::AllocKind::OBJECT4_BACKGROUND, heap=heap@entry=js::gc::DefaultHeap, shape=..., shape@entry=..., group=group@entry=...) at js/src/vm/NativeObject-inl.h:549
#12 0x000000000093013f in js::ArgumentsObject::create<CopyJitFrameArgs> (cx=0x7f0781215000, callee=callee@entry=..., numActuals=0, copy=...) at js/src/vm/ArgumentsObject.cpp:289
#13 0x000000000092b7bd in js::ArgumentsObject::createForIon (cx=<optimized out>, frame=<optimized out>, scopeChain=...) at js/src/vm/ArgumentsObject.cpp:365
#14 0x00003ccc71737077 in ?? ()
[...]
#30 0x0000000000000000 in ?? ()
rax 0x0 0
rbx 0x7f0781216240 139670207947328
rcx 0x1cea880 30320768
rdx 0x0 0
rsi 0x0 0
rdi 0x7f0781216240 139670207947328
rbp 0x7f078121a4f8 139670207964408
rsp 0x7ffdca599f70 140727998324592
r8 0x7ffdca59a800 140727998326784
r9 0x7ffdca59a7d0 140727998326736
r10 0x1 1
r11 0x246 582
r12 0x7f078121a000 139670207963136
r13 0x7f078121c9a0 139670207973792
r14 0x7f078121a4a8 139670207964328
r15 0x7f078121c640 139670207972928
rip 0x574e44 <js::gc::StoreBuffer::WholeCellBuffer::clear()+132>
=> 0x574e44 <js::gc::StoreBuffer::WholeCellBuffer::clear()+132>: callq 0xa86ce0 <js::LifoAlloc::freeAll()>
0x574e49 <js::gc::StoreBuffer::WholeCellBuffer::clear()+137>: mov 0x638(%rsp),%rax
Not marking s-s because these type of stack-overflow crashes are likely harmless but they might be a stability issue.
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Comment 2•7 years ago
|
||
Ccing some GC people to start with. Please ignore the attached testcase, it does not reproduce for me.
Component: JavaScript Engine → JavaScript: GC
Reporter | ||
Updated•7 years ago
|
Attachment #9003097 -
Attachment is obsolete: true
Updated•7 years ago
|
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:bisect]
Comment 3•7 years ago
|
||
JSBugMon: Cannot process bug: Error: Failed to isolate test from comment
Updated•7 years ago
|
Whiteboard: [jsbugmon:bisect] → [jsbugmon:]
Comment 4•7 years ago
|
||
JSBugMon: Bisection requested, failed due to error: Error: Failed to isolate test from comment
Comment 5•7 years ago
|
||
This seems like a general weakness. See the comment https://searchfox.org/mozilla-central/rev/c45b9755593ce6cc91f558b21e544f5165752de7/js/src/jit/CodeGenerator.cpp#5446-5456
It seems like the JIT does not ensure that there will be very much stack space when calling out to C++. Which means it's up to every function called from the JIT to do an over-recursion check on entry.
I would think we could do a check whenever making a VM call. nbp suggested doing it in the trampoline code.
I could spot-fix this by adding a check in js::Allocate, though that's already a few frames down and we might exhaust the stack before getting there.
Comment 6•7 years ago
|
||
Not sure who decides who takes this, or who makes these calls for JS (or GC). Sfink, can you assign a priority or direct us to who could?
Flags: needinfo?(sphink)
Updated•7 years ago
|
Comment 7•7 years ago
|
||
I think jandem would be the tech lead here?
Flags: needinfo?(sphink) → needinfo?(jdemooij)
Comment 8•7 years ago
|
||
Ted and I spent some time looking at this. Some findings:
* This is probably not an actual stack overflow: the value of RSP is > 100 bytes away from a page boundary.
* There's a large distance between the values in RBP (+ some other registers) and RSP (+ some other registers). Did RSP get corrupted at some point?
* The instruction after the call suggests we have a large stack frame:
mov 0x638(%rsp),%rax
We should look at other stacks to see if there's a pattern.
Group: javascript-core-security
Updated•7 years ago
|
Comment 9•7 years ago
|
||
Clearing NI per comment 8.
decoder, if you're still seeing this we should look at more stacks or try to get this in rr :/
Flags: needinfo?(jdemooij)
Updated•6 years ago
|
Flags: needinfo?(choller)
Reporter | ||
Comment 10•6 years ago
|
||
I just double-checked and this happened last in November. It didn't happen too often in general either but I think we can just close this and re-open when I see it pop up again.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(choller)
Resolution: --- → WORKSFORME
Updated•5 years ago
|
Group: javascript-core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•