Closed
Bug 1320956
Opened 8 years ago
Closed 8 years ago
Wasm baseline: box2d benchmark segfaults on ARM hardware (Jetson TK1)
Categories
(Core :: JavaScript Engine: JIT, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
Details
Attachments
(2 files)
6.32 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
2.52 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
Hardware: NVIDIA Jetson TK1 (4xTegra) 2GB RAM Ubuntu 14.04 hardfp Test case: github.com/lars-t-hansen/embenchen/asm_v_wasm/wasm_box2d.js Mozilla-inbound as of this morning The JS shell crashes with a segfault in what appears to be generated code, no stack, invalid PC. Could be anything. The other benchmarks in that directory all work well and produce the correct output.
Assignee | ||
Comment 1•8 years ago
|
||
The first bug is that the callout path for mod64 in the baseline compiler does not properly read the operands from the instruction stream; this is easily fixed. The second bug is something to do with float register management (assertion). Since that is a very knotty matter on ARM it's not surprising that there might be some issues here on the box2D benchmark, which is FP-heavy. (There might be more problems.) This suggests we should run embenchen wasm benchmarks on the emulator, maybe? They don't take forever to run, less than the test suite I would expect, for a single pass. Also run the wasm benchmarks in a debug build on x86 and x64, probably, ditto.
Assignee | ||
Comment 2•8 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #1) > The second bug is something to do with float register management > (assertion). Since that is a very knotty matter on ARM it's not surprising > that there might be some issues here on the box2D benchmark, which is > FP-heavy. At first blush this looks like the problem is that takeRegisterIndex(FloatRegister) ignores the 'kind' of the register, which is wrong, becasue that will result in the wrong bits being selected. On ARM, we can't handle the "code" of a register without always also considering its "kind".
Assignee | ||
Comment 3•8 years ago
|
||
A simple oversight in how we process the input stream, apparently we don't have a test case for this. No test case included here yet - if the running time isn't too awful I'll probably try to submit the box2D benchmark as the test case, but if that isn't workable I'll create a test before I close this bug.
Attachment #8815851 -
Flags: review?(bbouvier)
Assignee | ||
Comment 4•8 years ago
|
||
BTW, this problem repros on emulator, as does the regalloc problem I mentioned earlier.
Comment 5•8 years ago
|
||
Comment on attachment 8815851 [details] [diff] [review] bug1320956-read-operands.patch Review of attachment 8815851 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! ::: js/src/wasm/WasmBaselineCompile.cpp @@ +6704,5 @@ > iter_.readConversion(inType, outType, &unused_a) && \ > (deadCode_ || doEmit(symbol, inType, outType)) > > +#define emitIntDivCallout(doEmit, symbol, type) \ > + iter_.readBinary(type, &unused_a, &unused_b) && (deadCode_ || (doEmit(symbol, type), true)) nit: > 100 chars?
Attachment #8815851 -
Flags: review?(bbouvier) → review+
Assignee | ||
Comment 6•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/df0d879c4b7c6b301a829994aca72630289bdb95 Bug 1320956 - wasm baseline, read operands also when calling out to i64 division. r=bbouvier
Assignee | ||
Comment 7•8 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #2) > (In reply to Lars T Hansen [:lth] from comment #1) > > The second bug is something to do with float register management > > (assertion). Since that is a very knotty matter on ARM it's not surprising > > that there might be some issues here on the box2D benchmark, which is > > FP-heavy. > > At first blush this looks like the problem is that > takeRegisterIndex(FloatRegister) ignores the 'kind' of the register, which > is wrong, becasue that will result in the wrong bits being selected. On > ARM, we can't handle the "code" of a register without always also > considering its "kind". Spun off as bug 1321521, it's a bigger job than that.
Comment 8•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/df0d879c4b7c
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 9•8 years ago
|
||
A second fix: Flush the ARM assembler's constant table before emitting the jump table. This fixes wasm_box2d, provided there is a fix for the register allocator (forthcoming, see bug 1321521). See bug 1321521 comment 3 for analysis. Based on my reading of the assembler code, flushing should do the trick here because it clears the constant pool and what we do in jumpTable() does not create any new entries, so we should not risk another situation like this, but if you know otherwise I'd love to hear about it.
Attachment #8816515 -
Flags: review?(bbouvier)
Comment 10•8 years ago
|
||
Comment on attachment 8816515 [details] [diff] [review] bug1320956-flush-constants.patch Review of attachment 8816515 [details] [diff] [review]: ----------------------------------------------------------------- I agree with your analysis, thank you for the patch. ::: js/src/wasm/WasmBaselineCompile.cpp @@ +2425,5 @@ > > + void jumpTable(LabelVector& labels, Label* theTable) { > +#if defined(JS_CODEGEN_ARM) > + /* Do not risk interrupting the table with other constants */ > + masm.flush(); fwiw, i think other archs implement flush() as a no-op, so in theory this needs not be guarded. @@ +5552,4 @@ > > // Emit indirect jump. rc is live here. > > + tableSwitch(&theTable, rc, &dispatchCode); Should the constant pool be flushed at the end of tableSwitch too? I see a sub there that uses an immediate integer, and there might be at least one way to trigger a constant pool in there...
Attachment #8816515 -
Flags: review?(bbouvier) → review+
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #10) > > @@ +5552,4 @@ > > > > // Emit indirect jump. rc is live here. > > > > + tableSwitch(&theTable, rc, &dispatchCode); > > Should the constant pool be flushed at the end of tableSwitch too? I see a > sub there that uses an immediate integer, and there might be at least one > way to trigger a constant pool in there... In fact, unless it's a *small* immediate integer the instruction sequence generated won't be four bytes long as assumed, so this is an actual bug in the compiler for sufficiently large tables, when the constant has to be loaded from memory or synthesized. I even suspect that "large" is not all that large: probably 1KB, given how immediates can be encoded on ARM.
Assignee | ||
Comment 12•8 years ago
|
||
The bug becomes visible on two test cases if I add an assertion that the dispatch sequence is 8 bytes: wasm/spec/br_table.wast.js asm.js/testControlFlow.js (also see bug 1322397) Neither of those test cases *failed* before despite this problem because the test cases are mostly trivial -- sad, really. For the big test in br_table.wast.js it's a 50-50 chance that an error will not be detected at all.
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #11) > (In reply to Benjamin Bouvier [:bbouvier] from comment #10) > > > > @@ +5552,4 @@ > > > > > > // Emit indirect jump. rc is live here. > > > > > > + tableSwitch(&theTable, rc, &dispatchCode); > > > > Should the constant pool be flushed at the end of tableSwitch too? I see a > > sub there that uses an immediate integer, and there might be at least one > > way to trigger a constant pool in there... > > In fact, unless it's a *small* immediate integer the instruction sequence > generated won't be four bytes long as assumed, so this is an actual bug in > the compiler for sufficiently large tables, when the constant has to be > loaded from memory or synthesized. I even suspect that "large" is not all > that large: probably 1KB, given how immediates can be encoded on ARM. OK, so that's overstated, but a flush is really needed prior to the bind() of the label. The way the switch code works is that it computes the address of the table by loading the PC and offsetting that value by the distance from the load to the start of the table. (It needs to add 8 to the offset because the loading of the PC does not load the PC of the mov, but PC+8, but that has nothing to do with the generated code.) We must flush constant pools before the bind of 'here' because that label carries the address of the load instruction, and nothing must come between the bind and the ma_mov. I still think wasm/spec/br_table.wast.js could be stronger but I won't pursue that now.
Assignee | ||
Comment 14•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2dbf4912c2cf
Assignee | ||
Comment 15•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3193092d26a96dddb7f112a08d7277c83d9e8f3e Bug 1320956 - wasm baseline, do not interrupt tableswitch's jump table with other constants. r=bbouvier
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Comment 16•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3193092d26a9
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•