Closed Bug 580515 Opened 10 years ago Closed 9 years ago
_cmovd mishandled with X86 _FORCE _SSE2=no
Nanojit's i386 back-end has two modes. You can avoid generating SSE code (ie. use the x87 stack only for FP computation) by specifying the env var X86_FORCE_SSE2=no. I'm probably the only person who tests this configuration regularly. The following three trace-tests are failing X86_FORCE_SSE2=no debug32/js -j -e "const platform='linux2'; const libdir='/home/njn/moz/ws0/js/src/trace-test/lib/';" -f /home/njn/moz/ws0/js/src/trace-test/lib/prolog.js -f /home/njn/moz/ws0/js/src/trace-test/tests/closures/set-outer-trace-3.js X86_FORCE_SSE2=no debug32/js -j -e "const platform='linux2'; const libdir='/home/njn/moz/ws0/js/src/trace-test/lib/';" -f /home/njn/moz/ws0/js/src/trace-test/lib/prolog.js -f /home/njn/moz/ws0/js/src/trace-test/tests/closures/t004.js X86_FORCE_SSE2=no debug32/js -j -e "const platform='linux2'; const libdir='/home/njn/moz/ws0/js/src/trace-test/lib/';" -f /home/njn/moz/ws0/js/src/trace-test/lib/prolog.js -f /home/njn/moz/ws0/js/src/trace-test/tests/closures/t027.js The failure is this: Assertion failure: (_allocator.active[FST0] && _fpuStkDepth == -1) || (!_allocator.active[FST0] && _fpuStkDepth == 0) (../nanojit/Assembler.cpp:352) It could well be a latent bug in the i386 backend that fatvals has exposed.
(In reply to comment #0) > It could well be a latent bug in the i386 backend that fatvals has exposed. That error is unlike anything I've ever seen caused by the tracer. Do you know what situation it is asserting?
(In reply to comment #1) > > That error is unlike anything I've ever seen caused by the tracer. Probably because you've never run with X86_FORCE_SSE2=no before. > Do you know > what situation it is asserting? In x87 mode the backend is really simplistic, it only ever pushes a single FP value onto the x87 stack at a time. Sometimes you have to pop what's on the stack explicitly, and sometimes you consume it naturally. There's 'bool pop' parameters in various places in Nativei386.cpp, eg. see the first arg of FSTQ. It also bleeds into Assembler.cpp, esp. see prepareResultReg() which contains a big comment about this stuff. What the assertion means is that something has gone wrong with the stack management -- eg. it's trying to push a second value on the stack, or pop an empty stack (I'm not sure which). The _fpuStkDepth variable is how the back-end tracks how many elements the stack has. I've never quite understood how it works, though.
I heard on IRC that we can assume SSE2 for Firefox 4.0. In which case this bug is moot. Shaver, can you confirm this?
The three tests mentioned in comment 0 are no longer failing. wmaddox fixed up some problems in the x87 assertion checking, that's probably what fixed it. We do have one new failure: X86_FORCE_SSE2=no /home/njn/moz/ws0/js/src/debug32/shell/js -j -e "const p latform='linux2'; const libdir='/home/njn/moz/ws0/js/src/trace-test/lib/';" -f /home/njn/moz/ws0/js/src/trace-test/lib/prolog.js -f /home/njn/moz/ws0/js/src/trace-test/tests/basic/test586387.js /home/njn/moz/ws0/js/src/trace-test/tests/basic/test586387.js:14: Error: Assertion failed: got 37, expected 496 A regression range is needed.
For the bug 586387 test: The first bad revision is: changeset: 48822:e866db9165ef user: Luke Wagner <firstname.lastname@example.org> date: Tue Aug 03 22:06:44 2010 -0700 summary: Bug 584158 - ensure that typed arrays cannot produce non-canonical nans (r=gal)
(In reply to comment #5) > > The first bad revision is: > changeset: 48822:e866db9165ef > user: Luke Wagner <email@example.com> > date: Tue Aug 03 22:06:44 2010 -0700 > summary: Bug 584158 - ensure that typed arrays cannot produce non-canonical > nans (r=gal) This revision added uses of LIR_cmovd, so I think it very likely that LIR_cmovd isn't handled properly on i386 without SSE2.
Summary: TM: tests failing with X86_FORCE_SSE2=no due to fatvals → TM: LIR_cmovd mishandled with X86_FORCE_SSE2=no
Here's a smaller test case: var v = new Float32Array(32); var t = 0; for (var i = 0; i < v.length; ++i) t += v[i]; print(t); Run normally, it prints zero. Run with X86_FORCE_SSE2=no it prints a bogus value, often -Infinity, often a very large negative double. Here's the relevant LIR and native code: ldf2d1 = ldf2d.tdata addi1 ...... fld32 0(eax) eqd1 = eqd ldf2d1, ldf2d1 immd1 = immd nan cmovd1 = cmovd eqd1 ? ldf2d1 : immd1/*nan*/ ...... fld f0 ...... fcompp ...... fnstsw_ax ...... test ah, 68 ...... fldq -40(ebp) <= restore cmovd1 ...... mov eax,-28(ebp) <= restore eos ...... jnp ...... ...... ffree f0 ...... fincstp ...... fldq (......) The LIR is from TM's canonicalizeNaNs(). It compares ldf2d1 to itself (a NaN) test. If it fails (ie. it's a NaN), it replaces it with a canonical NaN value. I haven't yet worked out the problem with the generated code, my x87 knowledge is weak.
Status: NEW → ASSIGNED
(In reply to comment #7) > > eqd1 = eqd ldf2d1, ldf2d1 > immd1 = immd nan > cmovd1 = cmovd eqd1 ? ldf2d1 : immd1/*nan*/ > ...... fld f0 > ...... fcompp > ...... fnstsw_ax > ...... test ah, 68 > ...... fldq -40(ebp) <= restore cmovd1 > ...... mov eax,-28(ebp) <= restore eos > ...... jnp ...... > ...... ffree f0 > ...... fincstp > ...... fldq (......) The |fldq -40(ebp)| looks wrong. When generating code backwards we have these steps: - After the cmovd1 instruction, cmovd1 is in FST0. - When we get back to the comparison part (in asm_cmpd()) we need to use FST0 to hold ldf2d1 to do the comparison. - So we have to "spill" cmovd1 out of FST0; because we're going backwards, this generates a restore (the |fldq -40(ebp)|) - The problem is that -40(ebp), the spill slot assigned for cmovd1, doesn't hold something meaningful at this point. We're restoring garbage, because we're restoring cmovd1 before it's even been computed! Man, I hate x87 code. Not sure what the fix is.
BTW, I think the |ffree f0; fincstp| sequence can be changed to just |fstp f0|. It just needs to pop the stack.
blocking2.0: --- → ?
This ended up being a bit more complicated than I thought it would. The best explanation of what this patch does comes from the following comment I added to asm_cmov(): // The obvious way to handle this is as follows: // // mov rr, rt # only needed if rt is live afterwards // do comparison // jt end // mov rr, rf // end: // // The problem with this is that doing the comparison can cause // registers to be evicted, possibly including 'rr', which holds // 'ins'. And that screws things up. So instead we do this: // // do comparison // mov rr, rt # only needed if rt is live afterwards // jt end // mov rr, rf // end: // // Putting the 'mov' between the comparison and the jump is ok // because move instructions don't modify the condition codes. The "obvious way" was how things were done prior to the patch. The rest of the patch is just a lot of moving stuff around to allow this to happen. In particular: - I had to allow separation of conditional tests and conditional branches, which required adding asm_branch_helper() et al. - In NativeX64.cpp, this required operand swapping performed for some double comparisons to be done in asm_cmpd() instead of asm_branchd() and asm_condd(). This makes the X64 back-end more like the i386 back-end, which I think is a good thing. - In Nativei386.cpp, I used the commonly-used FSTP(FST0) instruction to pop the stack instead of the unusual FINCSTP()/FFREE() sequence. This allowed FINCSTP() and FFREE() to be removed. - I added a lirasm test. Without the patch, it asserts when in non-SSE2 mode. The failing test mentioned in comment 4 now works, too.
Attachment #493853 - Flags: review?(edwsmith)
Comment on attachment 493853 [details] [diff] [review] patch (agasint TM 58013:2000c21b7910) Tamarin doesn't currently use cmovd, so I haven't tested on the sandbox. Bug 606561 actually worked around the x87 bug; we can re-try a cmovd version of Math.abs() once this lands.
Attachment #493853 - Flags: review?(edwsmith) → review+
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.