Crash [@ js::jit::ExtractLinearSum] with stack space exhaustion
Categories
(Core :: JavaScript Engine, defect, P5)
Tracking
()
People
(Reporter: decoder, Assigned: kanishk509, Mentored)
References
Details
(4 keywords, Whiteboard: [bugmon:confirmed])
Crash Data
Attachments
(3 files)
The following testcase crashes on mozilla-central revision 2bf86657a448 (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --disable-debug --enable-optimize, run with --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off --ion-gvn=off):
See attachment.
Backtrace:
received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff5d01700 (LWP 19042)]
0x0000555555eb1fa2 in js::jit::ExtractLinearSum (ins=ins@entry=0x7ffff5f7b218, space=space@entry=js::jit::MathSpace::Modulo) at js/src/jit/IonAnalysis.cpp:3428
#0 0x0000555555eb1fa2 in js::jit::ExtractLinearSum (ins=ins@entry=0x7ffff5f7b218, space=space@entry=js::jit::MathSpace::Modulo) at js/src/jit/IonAnalysis.cpp:3428
#1 0x0000555555eb204c in js::jit::ExtractLinearSum (ins=ins@entry=0x7ffff5f7b328, space=space@entry=js::jit::MathSpace::Modulo) at js/src/jit/IonAnalysis.cpp:3461
[...]
#126 0x0000555555eb204c in js::jit::ExtractLinearSum (ins=ins@entry=0x7ffff5f85738, space=space@entry=js::jit::MathSpace::Modulo) at js/src/jit/IonAnalysis.cpp:3461
#127 0x0000555555eb204c in js::jit::ExtractLinearSum (ins=0x7ffff5f85888, space=js::jit::MathSpace::Modulo) at js/src/jit/IonAnalysis.cpp:3461
rax 0x7ffff5f7b2a0 140737320039072
rbx 0x7ffff5f7b328 140737320039208
rcx 0x0 0
rdx 0x0 0
rsi 0x0 0
rdi 0x7ffff5f7b218 140737320038936
rbp 0x7ffff5f7b218 140737320038936
rsp 0x7ffff5b05000 140737315360768
r8 0x0 0
r9 0x7ffff4500000 140737292271616
r10 0x0 0
r11 0x0 0
r12 0x7ffff5cfe900 140737317431552
r13 0x4000 16384
r14 0x0 0
r15 0x7ffff5f7b2a0 140737320039072
rip 0x555555eb1fa2 <js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace)+2>
=> 0x555555eb1fa2 <js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace)+2>: push %r14
0x555555eb1fa4 <js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace)+4>: push %r13
Reporter | ||
Comment 1•5 years ago
|
||
Comment 2•5 years ago
|
||
Steve, can you triage this and find someone to look into this regression crash?
Comment 3•5 years ago
|
||
Iain, do you think you could look at this regression fuzz bug? I believe Ted already talk to you about it.
Comment 5•5 years ago
|
||
This testcase builds a WASM function whose body is "0 + 0 + 0 + 0 ... + 0", with 65536 0's:
var lfLogBuffer = `
(function testBigOffset() {
var builder = new WasmModuleBuilder();
let body = [kExprI32Const, 0, kExprI32Add];
while (body.length <= 65536) body = body.concat(body);
body.unshift(kExprI32Const, 0);
body.push(kExprUnreachable);
builder.addFunction('main', kSig_v_v).addBody(body).exportFunc("bar");
try {
builder.instantiate().exports.main();
} catch (e) {}
})();
`;
During Ion compilation, we reach FoldLinearArithConstants, which tries to analyze this addition. It calls ExtractLinearSum, which tries to recurse over what is essentially a 65K element linked list. We run out of stack space and segfault.
This only happens with global value numbering disabled. With GVN enabled, we compile successfully. (Pre-GVN, the .gv output for iongraph is 10MB. Post-GVN, it is 482 bytes. I haven't been able to build PDFs to see exactly what changed, but I think the numbers speak for themselves.)
It would be nice to rewrite ExtractLinearSum to avoid the recursion, but I don't think it's a high priority, given that this only occurs on huge inputs, and only when we've tied one hand behind our back.
I think this would be a good first bug for somebody to fix.
Comment 6•5 years ago
|
||
P5, setting as fix-optional for 67 so as to not remove it from our weekly triage meeting.
Updated•5 years ago
|
Comment hidden (obsolete) |
Comment 8•5 years ago
|
||
It would be astonishing if that were the cause of this bug.
autobisectjs shows this is probably related to the following changeset:
The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/b2b72530f293
user: Brian Hackett
date: Wed Feb 22 06:45:05 2017 -0700
summary: Bug 1341326 - Set stack limit and stack size properly for helper threads, r=jandem.
Brian, is bug 1341326 a likely regressor?
Comment 10•5 years ago
|
||
(In reply to Gary Kwong [:gkw] [:nth10sd] - clearing bugmail from comment #9)
autobisectjs shows this is probably related to the following changeset:
The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/b2b72530f293
user: Brian Hackett
date: Wed Feb 22 06:45:05 2017 -0700
summary: Bug 1341326 - Set stack limit and stack size properly for helper threads, r=jandem.Brian, is bug 1341326 a likely regressor?
No, that bug just sets the stack limit for helper threads.
Comment 11•5 years ago
|
||
Bulk change for all regression bugs with status-firefox67 as 'fix-optional' to be marked 'affected' for status-firefox68.
Updated•5 years ago
|
Comment 12•5 years ago
|
||
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 3d5cd10cb1b2).
Comment 14•5 years ago
|
||
Bulk change to wontfix for 68 (P3+ carryover with needinfo).
Comment 15•5 years ago
|
||
Decoder: Based on comment 12 above, can we close this bug?
Reporter | ||
Comment 16•5 years ago
|
||
No, this bug is still active. It is likely that the testcase in comment 0 broke due to some unrelated change that affected stack size. But the problem still exists.
Updated•4 years ago
|
Comment 17•4 years ago
|
||
BugMon: Bug remains reproducible on 884162af76f5
Updated•4 years ago
|
Comment 18•4 years ago
|
||
Bugmon Analysis:
Unable to reproduce bug on mozilla-central 20200312154222-e230c3be11df.
The bug appears to have been fixed in the following build range:
> Start: df3fda80bbeb71d4e1b10e8d205d740b2f959d9e (20190325155230)
> End: 3d5cd10cb1b20c1f83a189fbeb2e22f470bac0ec (20190325155340)
> Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=df3fda80bbeb71d4e1b10e8d205d740b2f959d9e&tochange=3d5cd10cb1b20c1f83a189fbeb2e22f470bac0ec
Removing bugmon keyword as no further action possible.
Please review the bug and re-add the keyword for further analysis.
Reporter | ||
Comment 19•4 years ago
|
||
This is an automated crash issue comment: Summary: Crash [@ js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace)] Build version: mozilla-central revision 20200310-c7766d0b4a12 Build flags: (buildFlags not available) Runtime options: --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off --baseline-eager --ion-gvn=off Testcase: See attachment. Backtrace: received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff5dff700 (LWP 2335)] 0x0000555555f17cec in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #0 0x0000555555f17cec in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #1 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #2 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #3 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #4 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #5 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #6 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #7 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () #8 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () [...] #127 0x0000555555f17d97 in js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace) () rax 0x7ffff56e5248 140737311035976 rbx 0x7ffff56e52c0 140737311036096 rcx 0x180000 1572864 rdx 0x17f040 1568832 rsi 0x0 0 rdi 0x7ffff56e51d0 140737311035856 rbp 0x7ffff5c03020 140737316401184 rsp 0x7ffff5c03000 140737316401152 r8 0x6900 26880 r9 0x0 0 r10 0x0 0 r11 0x200 512 r12 0x7ffff56e51d0 140737311035856 r13 0x7ffff56e5248 140737311035976 r14 0x0 0 r15 0x0 0 rip 0x555555f17cec <js::jit::ExtractLinearSum(js::jit::MDefinition*, js::jit::MathSpace)+12> => 0x555555f17cec <_ZN2js3jit16ExtractLinearSumEPNS0_11MDefinitionENS0_9MathSpaceE+12>: push %rbx 0x555555f17ced <_ZN2js3jit16ExtractLinearSumEPNS0_11MDefinitionENS0_9MathSpaceE+13>: push %rax
Reporter | ||
Comment 20•4 years ago
|
||
Reporter | ||
Comment 21•4 years ago
|
||
This bug is still active and causing me some trouble in conjunction with WebAssembly. Apparently it triggers easily when large WebAssembly bodies are combined with --ion-gvn=off
.
Jan, should we stop fuzzing with --ion-gvn=off
for now?
Updated•4 years ago
|
Comment 23•4 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #21)
This bug is still active and causing me some trouble in conjunction with WebAssembly. Apparently it triggers easily when large WebAssembly bodies are combined with
--ion-gvn=off
.Jan, should we stop fuzzing with
--ion-gvn=off
for now?
I'd prefer not doing that - turning off GVN makes certain code paths much harder to reach and it also means code will start depending on GVN always running which isn't great.
I'll try to look at this soon. Maybe we could manually limit the recursion depth as a simple short-term fix for fuzzing.
Comment 24•4 years ago
|
||
Hi; I'm trying to see if I can get my head around fixes to pick one up.
Are there unit tests for https://searchfox.org/mozilla-central/source/js/src/jit/IonAnalysis.cpp#3311 which could help me get an understanding of its current behavior?
Is SimpleLinearSum only looking if it can replace the MDefinition with a constant?
If SimpleLinearSum.constant == 0 does that mean it was not possible to extract the SimpleLinearSum, so the term will be evaluated?
Updated•4 years ago
|
Comment 25•4 years ago
|
||
iain, you're flagged as mentor on this one, cf comment 24.
Comment 26•4 years ago
•
|
||
Hi Brent! Thanks for taking a look at this!
There are no unit tests for most of SM. It's hard to split the interesting parts JS execution up into tidy testable units. Once you have a fix, you can test it using the jit-test suite. Instructions for building and testing SM are here. (I would recommend testing an optimized debug build.)
Is SimpleLinearSum only looking if it can replace the MDefinition with a constant?
Close. ExtractLinearSum
is trying to replace the MDefinition with the sum of an MDefinition and a constant. So if you had an MDefinition representing 7 + foo() + 5 - 3
, ExtractLinearSum
would simplify it to foo() + 9
. SimpleLinearSum
is the type we use to represent this result. It contains an arbitrary MDefinition (an MCall of foo
in this example) and a constant (9 in this case).
If SimpleLinearSum.constant == 0 does that mean it was not possible to extract the SimpleLinearSum, so the term will be evaluated?
The term will always be evaluated unless the entire expression can be folded to a constant, in which case term
will be nullptr
.
The problem in this bug happens when we are calling ExtractLinearSum
on an MDefinition that represents an extremely large sequence of 1 + 1 + 1 + ...
. Because ExtractLinearSum
calls itself recursively, we end up overflowing the stack and crashing. Normally this kind of expression is cleaned up before we get to this point by another optimization called Global Value Numbering. However, turning off GVN can be useful while fuzzing SpiderMonkey. The goal of this bug is to avoid crashing here.
There are a few ways we could do this. The easiest might be to keep track of the depth of the recursion and just return ins + 0
when that limit is exceeded. (The most complete fix would be to change this from a recursive algorithm to an iterative one using a work queue, but that might be overkill for a problem that only occurs when we are tying one hand behind our back.)
Does that help? Let me know if you have any additional questions!
Assignee | ||
Comment 27•4 years ago
|
||
I would like to work on this if nobody else is actively doing so.
keep track of the depth of the recursion and just return ins + 0 when that limit is exceeded.
I see that it crashes for a depth of 65536. I'm not sure what the expected available stack size is for this environment, so can you suggest an appropriate depth limit I should set?
Comment 28•4 years ago
|
||
Hi! Thanks for working on this.
You are correct that the stack size varies from environment to environment. Instead of hardcoding a limit, we have checkRecursionLimit
to test whether we are close to overflowing the stack. Because you will be handling the potential overflow immediately, the specific variant you probably want to use here is checkRecursionLimitDontReport, which won't report a stack overflow error to the JSContext.
Let me know if you need any more help!
Assignee | ||
Comment 29•4 years ago
|
||
Hi Iain, thanks for the help.
bool CheckRecursionLimitDontReport(JSContext* cx)
CheckRecursionLimitDontReport
requires a pointer to a JSContext, but ExtractLinearSum
does not take that as an input. How can we deduce the stack space and stack limits without a JSContext? Do we need to add that as a parameter to ExtractLinearSum
and pass a JSContext pointer wherever we are calling it?
Comment 30•4 years ago
|
||
Ha! An exceptionally good question. That'll teach me to reply without rereading ExtractLinearSum
first.
Passing a JSContext*
around is not really feasible here, unfortunately. Compilation can run offthread without access to a JSContext. My previous response was too clever. The real solution is probably to use a hardcoded limit, which means we have to come up with a number.
In practice, when GVN is not disabled, I would be surprised if the depth hit double digits on most code. I would try something like 100, then add a temporary MOZ_ASSERT
and run the jit-tests to verify that we don't hit the limit normally. (Use mach jit-test --ion
to trigger more aggressive stress-testing of the JITs.) If 100 hits the limit, bump it up to 500 or 1000. (The exact number shouldn't matter much. Real code should almost never hit this limit.)
Sorry about the previous bad advice. Good luck!
Updated•4 years ago
|
Assignee | ||
Comment 31•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 32•4 years ago
|
||
Hey Iain,
I logged the recursion depths and found that jit-test --ion
hits a max recursion depth of 33.
I've set the SAFE_RECURSION_LIMIT
to 100
. I've defined this constant inside the ExtractLinearSum
function body, let me know if I should put it somewhere else.
Submitted the patch for review.
Comment 34•4 years ago
|
||
Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/77e417379092 Set maximum recursion limit for ExtractLinearSum. r=iain
Comment 35•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Description
•