Closed Bug 580534 Opened 10 years ago Closed 9 years ago

Implement LIR_cmovd

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: gal)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 file, 7 obsolete files)

I added cmovd for 386 and x64. ARM is still missing. This is a 5% win for a couple SS cases and a mild loss elsewhere (might be noise). I will measure more precisely.
Attached patch patch (obsolete) — Splinter Review
Assignee: general → gal
This isn't as much of a speedup as I was hoping for because guards tend to keep the cmov alive:

a[0] = a[1] for example:

  37: ldq2 = ldq.o addq1[0]
  38: JSVAL_UPPER_EXCL_SHIFTED_TAG_OF_NUMBER_SET = immq 0xfff9000000000000
  39: ltuq1 = ltuq ldq2, JSVAL_UPPER_EXCL_SHIFTED_TAG_OF_NUMBER_SET/*0xfff9000000000000*/
  40: xf4: xf ltuq1 -> pc=0x100515a5a imacpc=0x0 sp+32 rp+0 (GuardID=004)
  41: qasd1 = qasd ldq2
  42: q2i1 = q2i ldq2
  43: i2d1 = i2d q2i1
  44: JSVAL_SHIFTED_TAG_MAX_DOUBLE = immq 0xfff8000000000000
  45: gtuq1 = gtuq ldq2, JSVAL_SHIFTED_TAG_MAX_DOUBLE/*0xfff8000000000000*/
  46: cmovd1 = cmovd gtuq1 ? i2d1 : qasd1
  47: std.s sp[16] = cmovd1
  48: js_Array_dense_setelem1 = calli.sro #js_Array_dense_setelem ( cx $global0 immi2/*1*/ ldq2 )
  49: eqi2 = eqi js_Array_dense_setelem1, immi1/*0*/

Here the std.s sp[16] keeps the cmov alive, even though we successfully look past the unbox/rebox code and use ldq directly in the setelem.

The guard here is the OOM guard on setelem (right after the setelem).

This is a bit of a bummer. Time to sink stores I guess.
Attached patch patch (obsolete) — Splinter Review
Attachment #458969 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #459950 - Attachment is obsolete: true
Attachment #460356 - Flags: review?(nnethercote)
Duplicate of this bug: 578917
Comment on attachment 460356 [details] [diff] [review]
patch

Even though there's no speed-up, it's nice to get rid of js_UnboxDouble and
js_UnboxInt32.  IIRC there used to be js_TryUnboxInt32, but it seems to have
disappeared.


>-    return resultCanBeImpreciseIfFractional
>-         ? lir->ins1(LIR_d2i, f)
>-         : lir->insCall(&js_DoubleToInt32_ci, &f);
>+    if (resultCanBeImpreciseIfFractional)
>+        return lir->ins1(LIR_d2i, f);
>+    return lir->insCall(&js_DoubleToInt32_ci, &f);

What's wrong with the ternary operator?  This change is unnecessary.


>--- a/js/src/nanojit/LIRopcode.tbl
>+++ b/js/src/nanojit/LIRopcode.tbl
>@@ -240,17 +240,17 @@ OP_64(geuq,     71, Op2,  I,    1)  // u
> OP_UN(72)
> 
> OP___(eqd,      73, Op2,  I,    1)  // double equality
> OP___(ltd,      74, Op2,  I,    1)  // double less-than
> OP___(gtd,      75, Op2,  I,    1)  // double greater-than
> OP___(led,      76, Op2,  I,    1)  // double less-than-or-equal
> OP___(ged,      77, Op2,  I,    1)  // double greater-than-or-equal
> 
>-OP_UN(78)
>+OP___(cmovd,    78, Op3,  D,    1)  // conditional move double

Put cmovd lower down, after cmovq, please.  You'll have to decrement the
opcode number of a bunch of opcodes before it, but keeping a sane opcode
order is more important than minimizing diffs.


I'm getting this on 3d-raytrace.js:

  Assertion failure: bad asm_nongp_copy(xmm1, f0): false (../nanojit/Nativei386.cpp:2545)

Which makes me wonder how extensively you've tested this -- trace-tests
doesn't pass.  If I change the occurrence of 'FpRegs' in asm_cmov() to
'XmmRegs' it does pass.


A big problem is that all this is totally broken on non-SSE2-capable i386
machines.  Run tests with X86_FORCE_SSE2=no to see this.  Such machines
don't have cmov either, so I tried allowing LIR_cmovd to go through
untouched in insChoose(), but then I got an assertion failure in
findVictim().
(In reply to comment #0)
> I added cmovd for 386 and x64. ARM is still missing. This is a 5% win for a
> couple SS cases and a mild loss elsewhere (might be noise). I will measure more
> precisely.

Is this on both x86 and x64?  I agree that its weird for the benefit to be so small; Shark shows more than 1% total SS time spent in js_UnboxDouble.
Attached patch patch (obsolete) — Splinter Review
Attachment #460356 - Attachment is obsolete: true
Attachment #460356 - Flags: review?(nnethercote)
Attachment #460944 - Flags: review?(nnethercote)
Blocks: 580752
Comment on attachment 460944 [details] [diff] [review]
patch

Thanks for adding code the non-SSE2 code for LIR_cmovd.  It still is 
broken, though, I get this assertion failure for
trace-test/tests/basic/equalInt.js: 

  Assertion failure: index >= 0, at ../jsarray.cpp:450

and trace-tests hangs shortly after that one, I'm not sure on which test.


>+inline LIns*
>+TraceRecorder::cmov(LIns *cond, LIns *iftrue, LIns *iffalse)
>+{
>+    return lir->insChoose(cond, iftrue, iffalse, iftrue->isD() || avmplus::AvmCore::use_cmov());
>+}

The meaning and effect of the 4th argument is non-obvious.  I'd like a
comment explaining how LIR_cmovd doesn't cause the machine instruction
'cmov' to be used on x86 and so we can always use LIR_cmovd safely.

Alternatively, if it's true that we can now rely on SSE2 being present, the
4th arg can be changed to 'true' and I'd like a comment saying "we assume
SSE2 is present".


>diff --git a/js/src/nanojit/LIRopcode.tbl b/js/src/nanojit/LIRopcode.tbl
>--- a/js/src/nanojit/LIRopcode.tbl
>+++ b/js/src/nanojit/LIRopcode.tbl
>@@ -333,13 +333,15 @@ OP_SF(ii2d,    126, Op2,  D,    1)  // j
> 
> // LIR_hcalli is a hack that's only used on 32-bit platforms that use
> // SoftFloat.  Its operand is always a LIR_calli, but one that specifies a
> // function that returns a double.  It indicates that the double result is
> // returned via two 32-bit integer registers.  The result is always used as the
> // second operand of a LIR_ii2d.
> OP_SF(hcalli,  127, Op1,  I,    1)
> 
>+OP___(cmovd,   128, Op3,  D,    1)  // conditional move double
>+

Nope!  LIR_cmovd needs to be opcode 107, and then all the ones before it
down to 79 need to be decremented by one.


The NJ part will need to be landed separately on nanojit-central first, of
course.  Also, should the NJ part be reviewed by an Adobe person?  I thought
that's what the procedure was for NJ patches written by Mozilla people.
Are you sure you have a clean build? Can you please post what configuration exactly fails? and maybe a stack?

host-5-73:Darwin_DBG.OBJ gal$ cd ..
host-5-73:src gal$ cd trace-test
host-5-73:trace-test gal$ python trace-test.py ../Darwin_DBG.OBJ/js
[600|  0|600] 100% ==================================================>|   16.0s
PASSED ALL
host-5-73:trace-test gal$ python trace-test.py ../Darwin_DBG32/js
[600|  0|600] 100% ==================================================>|   16.6s
PASSED ALL
host-5-73:trace-test gal$ hg qtop
cmovd
host-5-73:trace-test gal$ X86_FORCE_SSE2=1 python trace-test.py ../Darwin_DBG32/js
[600|  0|600] 100% ==================================================>|   16.6s
PASSED ALL
host-5-73:trace-test gal$
> host-5-73:trace-test gal$ X86_FORCE_SSE2=1 python trace-test.py
> ../Darwin_DBG32/js
> [600|  0|600] 100% ==================================================>|   16.6s
> PASSED ALL

You want "X86_FORCE_SSE2=no".
Attached patch patch (obsolete) — Splinter Review
Attachment #460944 - Attachment is obsolete: true
Attachment #460944 - Flags: review?(nnethercote)
Depends on: 584214
The new version of the patch passes on 386 (FPU and SSE) and x64.
Attachment #462543 - Flags: review+
Attachment #462543 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #462632 - Attachment is obsolete: true
Attachment #462635 - Flags: review?(nnethercote)
Attached patch patchSplinter Review
Attachment #462635 - Attachment is obsolete: true
Attachment #462660 - Flags: review?(nnethercote)
Attachment #462635 - Flags: review?(nnethercote)
Attachment #462660 - Flags: review?(nnethercote) → review+
http://hg.mozilla.org/mozilla-central/rev/62e0a9c01829
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Since this bug got split into two pieces, I opened bug 585525 for the second half.
TR: http://hg.mozilla.org/tamarin-redux/rev/e2c9cd7aa674
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Summary: TM: inline UnboxDouble and UnboxInt32 → Implement LIR_cmovd
You need to log in before you can comment on or make changes to this bug.