Closed
Bug 580534
Opened 14 years ago
Closed 14 years ago
Implement LIR_cmovd
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: gal, Assigned: gal)
References
Details
(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)
Attachments
(1 file, 7 obsolete files)
23.07 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
I added cmovd for 386 and x64. ARM is still missing. This is a 5% win for a couple SS cases and a mild loss elsewhere (might be noise). I will measure more precisely.
Assignee | ||
Comment 1•14 years ago
|
||
Assignee: general → gal
Assignee | ||
Comment 2•14 years ago
|
||
This isn't as much of a speedup as I was hoping for because guards tend to keep the cmov alive: a[0] = a[1] for example: 37: ldq2 = ldq.o addq1[0] 38: JSVAL_UPPER_EXCL_SHIFTED_TAG_OF_NUMBER_SET = immq 0xfff9000000000000 39: ltuq1 = ltuq ldq2, JSVAL_UPPER_EXCL_SHIFTED_TAG_OF_NUMBER_SET/*0xfff9000000000000*/ 40: xf4: xf ltuq1 -> pc=0x100515a5a imacpc=0x0 sp+32 rp+0 (GuardID=004) 41: qasd1 = qasd ldq2 42: q2i1 = q2i ldq2 43: i2d1 = i2d q2i1 44: JSVAL_SHIFTED_TAG_MAX_DOUBLE = immq 0xfff8000000000000 45: gtuq1 = gtuq ldq2, JSVAL_SHIFTED_TAG_MAX_DOUBLE/*0xfff8000000000000*/ 46: cmovd1 = cmovd gtuq1 ? i2d1 : qasd1 47: std.s sp[16] = cmovd1 48: js_Array_dense_setelem1 = calli.sro #js_Array_dense_setelem ( cx $global0 immi2/*1*/ ldq2 ) 49: eqi2 = eqi js_Array_dense_setelem1, immi1/*0*/ Here the std.s sp[16] keeps the cmov alive, even though we successfully look past the unbox/rebox code and use ldq directly in the setelem. The guard here is the OOM guard on setelem (right after the setelem). This is a bit of a bummer. Time to sink stores I guess.
Assignee | ||
Comment 3•14 years ago
|
||
Attachment #458969 -
Attachment is obsolete: true
Assignee | ||
Comment 4•14 years ago
|
||
Attachment #459950 -
Attachment is obsolete: true
Assignee | ||
Updated•14 years ago
|
Attachment #460356 -
Flags: review?(nnethercote)
Comment 6•14 years ago
|
||
Comment on attachment 460356 [details] [diff] [review] patch Even though there's no speed-up, it's nice to get rid of js_UnboxDouble and js_UnboxInt32. IIRC there used to be js_TryUnboxInt32, but it seems to have disappeared. >- return resultCanBeImpreciseIfFractional >- ? lir->ins1(LIR_d2i, f) >- : lir->insCall(&js_DoubleToInt32_ci, &f); >+ if (resultCanBeImpreciseIfFractional) >+ return lir->ins1(LIR_d2i, f); >+ return lir->insCall(&js_DoubleToInt32_ci, &f); What's wrong with the ternary operator? This change is unnecessary. >--- a/js/src/nanojit/LIRopcode.tbl >+++ b/js/src/nanojit/LIRopcode.tbl >@@ -240,17 +240,17 @@ OP_64(geuq, 71, Op2, I, 1) // u > OP_UN(72) > > OP___(eqd, 73, Op2, I, 1) // double equality > OP___(ltd, 74, Op2, I, 1) // double less-than > OP___(gtd, 75, Op2, I, 1) // double greater-than > OP___(led, 76, Op2, I, 1) // double less-than-or-equal > OP___(ged, 77, Op2, I, 1) // double greater-than-or-equal > >-OP_UN(78) >+OP___(cmovd, 78, Op3, D, 1) // conditional move double Put cmovd lower down, after cmovq, please. You'll have to decrement the opcode number of a bunch of opcodes before it, but keeping a sane opcode order is more important than minimizing diffs. I'm getting this on 3d-raytrace.js: Assertion failure: bad asm_nongp_copy(xmm1, f0): false (../nanojit/Nativei386.cpp:2545) Which makes me wonder how extensively you've tested this -- trace-tests doesn't pass. If I change the occurrence of 'FpRegs' in asm_cmov() to 'XmmRegs' it does pass. A big problem is that all this is totally broken on non-SSE2-capable i386 machines. Run tests with X86_FORCE_SSE2=no to see this. Such machines don't have cmov either, so I tried allowing LIR_cmovd to go through untouched in insChoose(), but then I got an assertion failure in findVictim().
Comment 7•14 years ago
|
||
(In reply to comment #0) > I added cmovd for 386 and x64. ARM is still missing. This is a 5% win for a > couple SS cases and a mild loss elsewhere (might be noise). I will measure more > precisely. Is this on both x86 and x64? I agree that its weird for the benefit to be so small; Shark shows more than 1% total SS time spent in js_UnboxDouble.
Assignee | ||
Comment 8•14 years ago
|
||
Attachment #460356 -
Attachment is obsolete: true
Attachment #460356 -
Flags: review?(nnethercote)
Assignee | ||
Updated•14 years ago
|
Attachment #460944 -
Flags: review?(nnethercote)
Comment 9•14 years ago
|
||
Comment on attachment 460944 [details] [diff] [review] patch Thanks for adding code the non-SSE2 code for LIR_cmovd. It still is broken, though, I get this assertion failure for trace-test/tests/basic/equalInt.js: Assertion failure: index >= 0, at ../jsarray.cpp:450 and trace-tests hangs shortly after that one, I'm not sure on which test. >+inline LIns* >+TraceRecorder::cmov(LIns *cond, LIns *iftrue, LIns *iffalse) >+{ >+ return lir->insChoose(cond, iftrue, iffalse, iftrue->isD() || avmplus::AvmCore::use_cmov()); >+} The meaning and effect of the 4th argument is non-obvious. I'd like a comment explaining how LIR_cmovd doesn't cause the machine instruction 'cmov' to be used on x86 and so we can always use LIR_cmovd safely. Alternatively, if it's true that we can now rely on SSE2 being present, the 4th arg can be changed to 'true' and I'd like a comment saying "we assume SSE2 is present". >diff --git a/js/src/nanojit/LIRopcode.tbl b/js/src/nanojit/LIRopcode.tbl >--- a/js/src/nanojit/LIRopcode.tbl >+++ b/js/src/nanojit/LIRopcode.tbl >@@ -333,13 +333,15 @@ OP_SF(ii2d, 126, Op2, D, 1) // j > > // LIR_hcalli is a hack that's only used on 32-bit platforms that use > // SoftFloat. Its operand is always a LIR_calli, but one that specifies a > // function that returns a double. It indicates that the double result is > // returned via two 32-bit integer registers. The result is always used as the > // second operand of a LIR_ii2d. > OP_SF(hcalli, 127, Op1, I, 1) > >+OP___(cmovd, 128, Op3, D, 1) // conditional move double >+ Nope! LIR_cmovd needs to be opcode 107, and then all the ones before it down to 79 need to be decremented by one. The NJ part will need to be landed separately on nanojit-central first, of course. Also, should the NJ part be reviewed by an Adobe person? I thought that's what the procedure was for NJ patches written by Mozilla people.
Assignee | ||
Comment 10•14 years ago
|
||
Are you sure you have a clean build? Can you please post what configuration exactly fails? and maybe a stack? host-5-73:Darwin_DBG.OBJ gal$ cd .. host-5-73:src gal$ cd trace-test host-5-73:trace-test gal$ python trace-test.py ../Darwin_DBG.OBJ/js [600| 0|600] 100% ==================================================>| 16.0s PASSED ALL host-5-73:trace-test gal$ python trace-test.py ../Darwin_DBG32/js [600| 0|600] 100% ==================================================>| 16.6s PASSED ALL host-5-73:trace-test gal$ hg qtop cmovd host-5-73:trace-test gal$ X86_FORCE_SSE2=1 python trace-test.py ../Darwin_DBG32/js [600| 0|600] 100% ==================================================>| 16.6s PASSED ALL host-5-73:trace-test gal$
Comment 11•14 years ago
|
||
> host-5-73:trace-test gal$ X86_FORCE_SSE2=1 python trace-test.py
> ../Darwin_DBG32/js
> [600| 0|600] 100% ==================================================>| 16.6s
> PASSED ALL
You want "X86_FORCE_SSE2=no".
Assignee | ||
Comment 12•14 years ago
|
||
Attachment #460944 -
Attachment is obsolete: true
Attachment #460944 -
Flags: review?(nnethercote)
Assignee | ||
Comment 13•14 years ago
|
||
The new version of the patch passes on 386 (FPU and SSE) and x64.
Updated•14 years ago
|
Attachment #462543 -
Flags: review+
Assignee | ||
Comment 14•14 years ago
|
||
Attachment #462543 -
Attachment is obsolete: true
Assignee | ||
Comment 15•14 years ago
|
||
Attachment #462632 -
Attachment is obsolete: true
Attachment #462635 -
Flags: review?(nnethercote)
Assignee | ||
Comment 16•14 years ago
|
||
Attachment #462635 -
Attachment is obsolete: true
Attachment #462660 -
Flags: review?(nnethercote)
Attachment #462635 -
Flags: review?(nnethercote)
Updated•14 years ago
|
Attachment #462660 -
Flags: review?(nnethercote) → review+
Comment 17•14 years ago
|
||
NJ: http://hg.mozilla.org/projects/nanojit-central/rev/186f3f376d66 TM: http://hg.mozilla.org/tracemonkey/rev/62e0a9c01829
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey
Comment 18•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/62e0a9c01829
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 19•14 years ago
|
||
Since this bug got split into two pieces, I opened bug 585525 for the second half.
Comment 20•14 years ago
|
||
TR: http://hg.mozilla.org/tamarin-redux/rev/e2c9cd7aa674
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Updated•14 years ago
|
Summary: TM: inline UnboxDouble and UnboxInt32 → Implement LIR_cmovd
You need to log in
before you can comment on or make changes to this bug.
Description
•