Closed Bug 580534 Opened 14 years ago Closed 14 years ago

Implement LIR_cmovd

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: gal, Assigned: gal)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 file, 7 obsolete files)

patch 14 years ago Andreas Gal :gal 28.62 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 32.82 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 31.11 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 38.86 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 45.47 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
nanojit patch with lirasm changes 14 years ago Andreas Gal :gal 22.46 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 22.89 KB, patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal 23.07 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review

Andreas Gal :gal

Assignee

Description

•

14 years ago

I added cmovd for 386 and x64. ARM is still missing. This is a 5% win for a couple SS cases and a mild loss elsewhere (might be noise). I will measure more precisely.

Andreas Gal :gal

Assignee

Comment 1

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Assignee: general → gal

Andreas Gal :gal

Assignee

Comment 2

•

14 years ago

This isn't as much of a speedup as I was hoping for because guards tend to keep the cmov alive:

a[0] = a[1] for example:

  37: ldq2 = ldq.o addq1[0]
  38: JSVAL_UPPER_EXCL_SHIFTED_TAG_OF_NUMBER_SET = immq 0xfff9000000000000
  39: ltuq1 = ltuq ldq2, JSVAL_UPPER_EXCL_SHIFTED_TAG_OF_NUMBER_SET/*0xfff9000000000000*/
  40: xf4: xf ltuq1 -> pc=0x100515a5a imacpc=0x0 sp+32 rp+0 (GuardID=004)
  41: qasd1 = qasd ldq2
  42: q2i1 = q2i ldq2
  43: i2d1 = i2d q2i1
  44: JSVAL_SHIFTED_TAG_MAX_DOUBLE = immq 0xfff8000000000000
  45: gtuq1 = gtuq ldq2, JSVAL_SHIFTED_TAG_MAX_DOUBLE/*0xfff8000000000000*/
  46: cmovd1 = cmovd gtuq1 ? i2d1 : qasd1
  47: std.s sp[16] = cmovd1
  48: js_Array_dense_setelem1 = calli.sro #js_Array_dense_setelem ( cx $global0 immi2/*1*/ ldq2 )
  49: eqi2 = eqi js_Array_dense_setelem1, immi1/*0*/

Here the std.s sp[16] keeps the cmov alive, even though we successfully look past the unbox/rebox code and use ldq directly in the setelem.

The guard here is the OOM guard on setelem (right after the setelem).

This is a bit of a bummer. Time to sink stores I guess.

Andreas Gal :gal

Assignee

Comment 3

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #458969 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 4

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #459950 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Updated

•

14 years ago

Attachment #460356 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Comment 6

•

14 years ago

Comment on attachment 460356 [details] [diff] [review]
patch

Even though there's no speed-up, it's nice to get rid of js_UnboxDouble and
js_UnboxInt32.  IIRC there used to be js_TryUnboxInt32, but it seems to have
disappeared.


>-    return resultCanBeImpreciseIfFractional
>-         ? lir->ins1(LIR_d2i, f)
>-         : lir->insCall(&js_DoubleToInt32_ci, &f);
>+    if (resultCanBeImpreciseIfFractional)
>+        return lir->ins1(LIR_d2i, f);
>+    return lir->insCall(&js_DoubleToInt32_ci, &f);

What's wrong with the ternary operator?  This change is unnecessary.


>--- a/js/src/nanojit/LIRopcode.tbl
>+++ b/js/src/nanojit/LIRopcode.tbl
>@@ -240,17 +240,17 @@ OP_64(geuq,     71, Op2,  I,    1)  // u
> OP_UN(72)
> 
> OP___(eqd,      73, Op2,  I,    1)  // double equality
> OP___(ltd,      74, Op2,  I,    1)  // double less-than
> OP___(gtd,      75, Op2,  I,    1)  // double greater-than
> OP___(led,      76, Op2,  I,    1)  // double less-than-or-equal
> OP___(ged,      77, Op2,  I,    1)  // double greater-than-or-equal
> 
>-OP_UN(78)
>+OP___(cmovd,    78, Op3,  D,    1)  // conditional move double

Put cmovd lower down, after cmovq, please.  You'll have to decrement the
opcode number of a bunch of opcodes before it, but keeping a sane opcode
order is more important than minimizing diffs.


I'm getting this on 3d-raytrace.js:

  Assertion failure: bad asm_nongp_copy(xmm1, f0): false (../nanojit/Nativei386.cpp:2545)

Which makes me wonder how extensively you've tested this -- trace-tests
doesn't pass.  If I change the occurrence of 'FpRegs' in asm_cmov() to
'XmmRegs' it does pass.


A big problem is that all this is totally broken on non-SSE2-capable i386
machines.  Run tests with X86_FORCE_SSE2=no to see this.  Such machines
don't have cmov either, so I tried allowing LIR_cmovd to go through
untouched in insChoose(), but then I got an assertion failure in
findVictim().

Luke Wagner [:luke]

Comment 7

•

14 years ago

(In reply to comment #0)
> I added cmovd for 386 and x64. ARM is still missing. This is a 5% win for a
> couple SS cases and a mild loss elsewhere (might be noise). I will measure more
> precisely.

Is this on both x86 and x64?  I agree that its weird for the benefit to be so small; Shark shows more than 1% total SS time spent in js_UnboxDouble.

Andreas Gal :gal

Assignee

Comment 8

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #460356 - Attachment is obsolete: true

Attachment #460356 - Flags: review?(nnethercote)

Andreas Gal :gal

Assignee

Updated

•

14 years ago

Attachment #460944 - Flags: review?(nnethercote)

Andreas Gal :gal

Assignee

Updated

•

14 years ago

Blocks: 580752

Nicholas Nethercote [inactive]

Comment 9

•

14 years ago

Comment on attachment 460944 [details] [diff] [review]
patch

Thanks for adding code the non-SSE2 code for LIR_cmovd.  It still is 
broken, though, I get this assertion failure for
trace-test/tests/basic/equalInt.js: 

  Assertion failure: index >= 0, at ../jsarray.cpp:450

and trace-tests hangs shortly after that one, I'm not sure on which test.


>+inline LIns*
>+TraceRecorder::cmov(LIns *cond, LIns *iftrue, LIns *iffalse)
>+{
>+    return lir->insChoose(cond, iftrue, iffalse, iftrue->isD() || avmplus::AvmCore::use_cmov());
>+}

The meaning and effect of the 4th argument is non-obvious.  I'd like a
comment explaining how LIR_cmovd doesn't cause the machine instruction
'cmov' to be used on x86 and so we can always use LIR_cmovd safely.

Alternatively, if it's true that we can now rely on SSE2 being present, the
4th arg can be changed to 'true' and I'd like a comment saying "we assume
SSE2 is present".


>diff --git a/js/src/nanojit/LIRopcode.tbl b/js/src/nanojit/LIRopcode.tbl
>--- a/js/src/nanojit/LIRopcode.tbl
>+++ b/js/src/nanojit/LIRopcode.tbl
>@@ -333,13 +333,15 @@ OP_SF(ii2d,    126, Op2,  D,    1)  // j
> 
> // LIR_hcalli is a hack that's only used on 32-bit platforms that use
> // SoftFloat.  Its operand is always a LIR_calli, but one that specifies a
> // function that returns a double.  It indicates that the double result is
> // returned via two 32-bit integer registers.  The result is always used as the
> // second operand of a LIR_ii2d.
> OP_SF(hcalli,  127, Op1,  I,    1)
> 
>+OP___(cmovd,   128, Op3,  D,    1)  // conditional move double
>+

Nope!  LIR_cmovd needs to be opcode 107, and then all the ones before it
down to 79 need to be decremented by one.


The NJ part will need to be landed separately on nanojit-central first, of
course.  Also, should the NJ part be reviewed by an Adobe person?  I thought
that's what the procedure was for NJ patches written by Mozilla people.

Andreas Gal :gal

Assignee

Comment 10

•

14 years ago

Are you sure you have a clean build? Can you please post what configuration exactly fails? and maybe a stack?

host-5-73:Darwin_DBG.OBJ gal$ cd ..
host-5-73:src gal$ cd trace-test
host-5-73:trace-test gal$ python trace-test.py ../Darwin_DBG.OBJ/js
[600|  0|600] 100% ==================================================>|   16.0s
PASSED ALL
host-5-73:trace-test gal$ python trace-test.py ../Darwin_DBG32/js
[600|  0|600] 100% ==================================================>|   16.6s
PASSED ALL
host-5-73:trace-test gal$ hg qtop
cmovd
host-5-73:trace-test gal$ X86_FORCE_SSE2=1 python trace-test.py ../Darwin_DBG32/js
[600|  0|600] 100% ==================================================>|   16.6s
PASSED ALL
host-5-73:trace-test gal$

Nicholas Nethercote [inactive]

Comment 11

•

14 years ago

> host-5-73:trace-test gal$ X86_FORCE_SSE2=1 python trace-test.py
> ../Darwin_DBG32/js
> [600|  0|600] 100% ==================================================>|   16.6s
> PASSED ALL

You want "X86_FORCE_SSE2=no".

Andreas Gal :gal

Assignee

Comment 12

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #460944 - Attachment is obsolete: true

Attachment #460944 - Flags: review?(nnethercote)

Andreas Gal :gal

Assignee

Updated

•

14 years ago

Depends on: 584214

Andreas Gal :gal

Assignee

Comment 13

•

14 years ago

The new version of the patch passes on 386 (FPU and SSE) and x64.

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Attachment #462543 - Flags: review+

Andreas Gal :gal

Assignee

Comment 14

•

14 years ago

Attached patch nanojit patch with lirasm changes (obsolete) — Details — Splinter Review

Attachment #462543 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 15

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #462632 - Attachment is obsolete: true

Attachment #462635 - Flags: review?(nnethercote)

Andreas Gal :gal

Assignee

Comment 16

•

14 years ago

Attached patch patch — Details — Splinter Review

Attachment #462635 - Attachment is obsolete: true

Attachment #462660 - Flags: review?(nnethercote)

Attachment #462635 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Attachment #462660 - Flags: review?(nnethercote) → review+

Nicholas Nethercote [inactive]

Comment 17

•

14 years ago

NJ: http://hg.mozilla.org/projects/nanojit-central/rev/186f3f376d66
TM: http://hg.mozilla.org/tracemonkey/rev/62e0a9c01829

Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey

Robert Sayre

Comment 18

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/62e0a9c01829

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nicholas Nethercote [inactive]

Comment 19

•

14 years ago

Since this bug got split into two pieces, I opened bug 585525 for the second half.

Edwin Smith

Comment 20

•

14 years ago

TR: http://hg.mozilla.org/tamarin-redux/rev/e2c9cd7aa674

Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin

Nicholas Nethercote [inactive]

Updated

•

14 years ago

Summary: TM: inline UnboxDouble and UnboxInt32 → Implement LIR_cmovd

You need to log in before you can comment on or make changes to this bug.