Closed Bug 1320956 Opened 3 years ago Closed 3 years ago

Wasm baseline: box2d benchmark segfaults on ARM hardware (Jetson TK1)

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

ARM
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(2 files)

Hardware: NVIDIA Jetson TK1 (4xTegra) 2GB RAM Ubuntu 14.04 hardfp

Test case: github.com/lars-t-hansen/embenchen/asm_v_wasm/wasm_box2d.js

Mozilla-inbound as of this morning

The JS shell crashes with a segfault in what appears to be generated code, no stack, invalid PC.  Could be anything.

The other benchmarks in that directory all work well and produce the correct output.
The first bug is that the callout path for mod64 in the baseline compiler does not properly read the operands from the instruction stream; this is easily fixed.

The second bug is something to do with float register management (assertion).  Since that is a very knotty matter on ARM it's not surprising that there might be some issues here on the box2D benchmark, which is FP-heavy.

(There might be more problems.)

This suggests we should run embenchen wasm benchmarks on the emulator, maybe?  They don't take forever to run, less than the test suite I would expect, for a single pass.

Also run the wasm benchmarks in a debug build on x86 and x64, probably, ditto.
(In reply to Lars T Hansen [:lth] from comment #1)
> The second bug is something to do with float register management
> (assertion).  Since that is a very knotty matter on ARM it's not surprising
> that there might be some issues here on the box2D benchmark, which is
> FP-heavy.

At first blush this looks like the problem is that takeRegisterIndex(FloatRegister) ignores the 'kind' of the register, which is wrong, becasue that will result in the wrong bits being selected.  On ARM, we can't handle the "code" of a register without always also considering its "kind".
A simple oversight in how we process the input stream, apparently we don't have a test case for this.  No test case included here yet - if the running time isn't too awful I'll probably try to submit the box2D benchmark as the test case, but if that isn't workable I'll create a test before I close this bug.
Attachment #8815851 - Flags: review?(bbouvier)
BTW, this problem repros on emulator, as does the regalloc problem I mentioned earlier.
Comment on attachment 8815851 [details] [diff] [review]
bug1320956-read-operands.patch

Review of attachment 8815851 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +6704,5 @@
>          iter_.readConversion(inType, outType, &unused_a) && \
>              (deadCode_ || doEmit(symbol, inType, outType))
>  
> +#define emitIntDivCallout(doEmit, symbol, type) \
> +        iter_.readBinary(type, &unused_a, &unused_b) && (deadCode_ || (doEmit(symbol, type), true))

nit: > 100 chars?
Attachment #8815851 - Flags: review?(bbouvier) → review+
Blocks: 1321521
https://hg.mozilla.org/integration/mozilla-inbound/rev/df0d879c4b7c6b301a829994aca72630289bdb95
Bug 1320956 - wasm baseline, read operands also when calling out to i64 division. r=bbouvier
(In reply to Lars T Hansen [:lth] from comment #2)
> (In reply to Lars T Hansen [:lth] from comment #1)
> > The second bug is something to do with float register management
> > (assertion).  Since that is a very knotty matter on ARM it's not surprising
> > that there might be some issues here on the box2D benchmark, which is
> > FP-heavy.
> 
> At first blush this looks like the problem is that
> takeRegisterIndex(FloatRegister) ignores the 'kind' of the register, which
> is wrong, becasue that will result in the wrong bits being selected.  On
> ARM, we can't handle the "code" of a register without always also
> considering its "kind".

Spun off as bug 1321521, it's a bigger job than that.
https://hg.mozilla.org/mozilla-central/rev/df0d879c4b7c
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
A second fix: Flush the ARM assembler's constant table before emitting the jump table.  This fixes wasm_box2d, provided there is a fix for the register allocator (forthcoming, see bug 1321521).  See bug 1321521 comment 3 for analysis.

Based on my reading of the assembler code, flushing should do the trick here because it clears the constant pool and what we do in jumpTable() does not create any new entries, so we should not risk another situation like this, but if you know otherwise I'd love to hear about it.
Attachment #8816515 - Flags: review?(bbouvier)
Comment on attachment 8816515 [details] [diff] [review]
bug1320956-flush-constants.patch

Review of attachment 8816515 [details] [diff] [review]:
-----------------------------------------------------------------

I agree with your analysis, thank you for the patch.

::: js/src/wasm/WasmBaselineCompile.cpp
@@ +2425,5 @@
>  
> +    void jumpTable(LabelVector& labels, Label* theTable) {
> +#if defined(JS_CODEGEN_ARM)
> +        /* Do not risk interrupting the table with other constants */
> +        masm.flush();

fwiw, i think other archs implement flush() as a no-op, so in theory this needs not be guarded.

@@ +5552,4 @@
>  
>      // Emit indirect jump.  rc is live here.
>  
> +    tableSwitch(&theTable, rc, &dispatchCode);

Should the constant pool be flushed at the end of tableSwitch too? I see a sub there that uses an immediate integer, and there might be at least one way to trigger a constant pool in there...
Attachment #8816515 - Flags: review?(bbouvier) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #10)
> 
> @@ +5552,4 @@
> >  
> >      // Emit indirect jump.  rc is live here.
> >  
> > +    tableSwitch(&theTable, rc, &dispatchCode);
> 
> Should the constant pool be flushed at the end of tableSwitch too? I see a
> sub there that uses an immediate integer, and there might be at least one
> way to trigger a constant pool in there...

In fact, unless it's a *small* immediate integer the instruction sequence generated won't be four bytes long as assumed, so this is an actual bug in the compiler for sufficiently large tables, when the constant has to be loaded from memory or synthesized.  I even suspect that "large" is not all that large: probably 1KB, given how immediates can be encoded on ARM.
The bug becomes visible on two test cases if I add an assertion that the dispatch sequence is 8 bytes:

wasm/spec/br_table.wast.js
asm.js/testControlFlow.js (also see bug 1322397)

Neither of those test cases *failed* before despite this problem because the test cases are mostly trivial -- sad, really.  For the big test in br_table.wast.js it's a 50-50 chance that an error will not be detected at all.
(In reply to Lars T Hansen [:lth] from comment #11)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #10)
> > 
> > @@ +5552,4 @@
> > >  
> > >      // Emit indirect jump.  rc is live here.
> > >  
> > > +    tableSwitch(&theTable, rc, &dispatchCode);
> > 
> > Should the constant pool be flushed at the end of tableSwitch too? I see a
> > sub there that uses an immediate integer, and there might be at least one
> > way to trigger a constant pool in there...
> 
> In fact, unless it's a *small* immediate integer the instruction sequence
> generated won't be four bytes long as assumed, so this is an actual bug in
> the compiler for sufficiently large tables, when the constant has to be
> loaded from memory or synthesized.  I even suspect that "large" is not all
> that large: probably 1KB, given how immediates can be encoded on ARM.

OK, so that's overstated, but a flush is really needed prior to the bind() of the label.  The way the switch code works is that it computes the address of the table by loading the PC and offsetting that value by the distance from the load to the start of the table.  (It needs to add 8 to the offset because the loading of the PC does not load the PC of the mov, but PC+8, but that has nothing to do with the generated code.)  We must flush constant pools before the bind of 'here' because that label carries the address of the load instruction, and nothing must come between the bind and the ma_mov.

I still think wasm/spec/br_table.wast.js could be stronger but I won't pursue that now.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3193092d26a96dddb7f112a08d7277c83d9e8f3e
Bug 1320956 - wasm baseline, do not interrupt tableswitch's jump table with other constants. r=bbouvier
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/3193092d26a9
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.