Closed Bug 1258654 Opened 8 years ago Closed 8 years ago

Crash [@ ??] with ASan signal handler not catching segmentation fault

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox48 --- wontfix
firefox49 --- fixed

People

(Reporter: decoder, Assigned: bbouvier)

References

Details

(Keywords: crash, regression, testcase)

Attachments

(2 files)

The attached binary WebAssembly testcase crashes on mozilla-inbound revision 078bf91ed20a+ (build with --enable-gczeal --enable-optimize --enable-debug --enable-address-sanitizer --without-intl-api --enable-posix-nspr-emulation --disable-jemalloc --disable-tests --enable-debug). To reproduce, you can run the following code in the JS shell:

var data = os.file.readFile(file, 'binary');
Wasm.instantiateModule(new Uint8Array(data.buffer));



I don't have a backtrace (neither in GDB nor in ASan). ASan simply segfaults with this test rather than calling its SIGSEGV handler. This could either be a bug in ASan (unlikely) or a bug in our code (e.g. not calling the proper signal handler after the asm.js signal handler). It would be very important to fix the issue with ASan here to unbreak automation.
Attached file Testcase
As stated on irc:
- couldn't reproduce locally, with a non ASAN build.
- could reproduce with optimized ASAN build without debug symbols provided by decoder.
- it's an intermittent issue, reproducing locally 95% of the time.
- decoder discussed with ASAN people, it might be that we consume all stack, so a missing stack overflow check somewhere.

I need to try with an ASAN debug build locally.
Flags: needinfo?(bbouvier)
I can reproduce in a debug ASAN build. This is not specific to ASAN, but ASAN shows it more easily because each call takes more stack space.

Very likely to be a stack exhaustion issue: with --no-threads, the issue does not reproduce. Without it, I get a random segfault in EmitExpr. Looking at the backtrace, there are >1000 frames, all of them being alternating EmitExpr/EmitUnary<MAsmJSNeg> (the text format of this test case would like at least 500 minus signs side by side).

We can't just use JS_CHECK_RECURSION_* because we don't have a cx on the compiler threads. I've tried a workaround where we store the recursion limit on the IonCompileTask, then pass it to FunctionCompiler, then use it in EmitExpr in JS_CHECK_RECURSION_LIMIT_DONT_REPORT with this stack limit value. This doesn't work too, because apparently the recursion limit given by GetStackNativeLimit is still too high for an helper thread?

Anyways, this will be fixed by postorder decoding (where we can have more control on the stack space consumed at a given expression), so I will just make this bug dependent on the postorder encoding instead.
Depends on: 1259295
Flags: needinfo?(bbouvier)
Ah, I bet the reason this only observable with wasm is that, with asm.js, you wouldn't be able to get such a deeply-nested AST to parse; in effect the parser's recursion checks are subsuming the ion helper threads'.  With wasm, the decoder is super-simple so it's able to fit more frames.

The real fix, which is coming in hopefully a week or two, is to just switch to iterative postorder (no recursion).  Since wasm isn't released, I'd recommend waiting until that.
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)
> I've tried a workaround where we store the recursion limit
> on the IonCompileTask, then pass it to FunctionCompiler, then use it in
> EmitExpr in JS_CHECK_RECURSION_LIMIT_DONT_REPORT with this stack limit
> value. This doesn't work too, because apparently the recursion limit given
> by GetStackNativeLimit is still too high for an helper thread?

The limit stored is an absolute pointer, not depth, so if you grab it from the validator thread, it won't be valid in an ion helper thread :)
Attached patch testcase.patchSplinter Review
Now that postorder has landed, I thought we should re-test this. Unfortunately, the test case here uses I32.neg, which is asm.js specific (and should have been disallowed by the decoder in the first place, I think).

But we can still make a test case with F32.neg. This one crashes before the postorder patch with an ASAN debug build on x64, and it doesn't after the postorder patch, so we can close as WFM. Actually, let's see if this test can pass on the infra (and not timeout, in particular):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=528ff89acbce
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
And this try run (with wasmEvalText instead of wasmEval, sigh) shows that the test could be checked in. Let's do it.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6fddad24369
Attachment #8747635 - Flags: review?(luke)
Comment on attachment 8747635 [details] [diff] [review]
testcase.patch

Great!
Attachment #8747635 - Flags: review?(luke) → review+
https://hg.mozilla.org/mozilla-central/rev/20c1fbedf311
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: