Closed Bug 477779 Opened 15 years ago Closed 6 years ago

Add LIR_ned opcode and use for double->bool conversion

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)

x86_64
All
defect

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: brbaker, Unassigned)

References

Details

(Whiteboard: PACMAN must-fix-candidate)

Attachments

(4 files, 2 obsolete files)

Comparing MIR vs LIR on windows 64 (only platform that has both 64bit versions of the jit) shows a performance increase across the board for LIR except for 2 testcases. 

canaries/simpleflexapp.as
MIR: 327.8 
LIR: 403.1
%diff: -18.7%

misc/primes.as
MIR: 12731.1
LIR: 14720.1
%diff: -13.5%

These results were repeatable every time I ran the tests.
Flags: flashplayer-triage+
Flags: flashplayer-qrb?
Blocks: 477785
Out of curiosity, what's the typical performance increase on the other cases?
~5%
Blocks: 478870
No longer blocks: 477785
Blocks: 481413
No longer blocks: 478870
No longer blocks: 481413
Target Milestone: --- → flash10.x
Assignee: nobody → edwsmith
Flags: flashplayer-qrb? → flashplayer-qrb+
Priority: -- → P3
Priority: P3 → P4
Status: NEW → ASSIGNED
for coerce_bool <double>, MIR emits:

    MIR_ne(d, 0.0)   =>  ucomisd, setne

and nanojit emits:

    LIR_calli doubleToBool(d)  => push (d), call  (plus spillage)

the MIR_ne instruction has no analog in LIR; MIR_ne actually implements the ecma double->boolean conversion correctly in two instructions.  It is not the opposite of MIR_eq, and is only used for double->boolean conversions!
implements slightly different (and simpler) logic than feq

             fune    feq
unordered    0       0
equal        0       1
less         1       0
greater      1       0

which means it's suitable for double->bool:

X            fune(X,0.0)
NaN          0
+0.0         0
-0.0         0
otherwise    1

posting for prototyping and early feedback.  LIR_fune might be an obscure instruction.  LIR_f2b (float to bool) would be more consise and less obscure, but would also hardcode the operand of 0.0.

Net speedup is 11-12% on primes on X64 (mac)
testing simpleflexapp.abc:

$AVM -Ojit simpleflexapputil/avmglue_abc simpleflexapputil/hello_frame1_abc simpleflexapputil/hello_frame2_abc simpleflexapp.abc |grep metric

MIR: metric time 218
LIR: metric time 109
=> 50% speedup

testing primes.abc

$AVM -Ojit primes.abc

MIR: metric time 10452
LIR: metric time 9704
=> 7% speedup
In retrospect, the 'u' distinction between fune and feq seemed unimportant.  all the float comparisons (including fne) return false for unordered values.

- added LIns::isFcmp() (write things once).
- SSE2 x86 and x64 implemented so far.

directional feedback?  the main alternative to LIR_fne, would be LIR_f2b.  (f2b(x) == fne(x,0.0)).  f2b is the only thing we need for now, but it feels too special purpose and I fear we'd just need to add fne later.  on the other hand, why be overly general?  opinions welcome.

ARM note:  on ARM, FCMPDZ apparently corresponds exactly to LIR_f2b, so perhaps f2b is not too special purpose.  (ARM fne => FCMPD).  on other cpu's, implementing f2b in the backend would require a temp scratch register to hold the value 0.0; fne is thus more friendly to register allocation by making the immediate 0.0 visible in the IR.
Attachment #417606 - Attachment is obsolete: true
Attachment #417743 - Flags: review?(nnethercote)
Comment on attachment 417743 [details] [diff] [review]
adds new LIR_fne (float not equal), use it for double->bool

Looks good, although you'll have to split into NJ and TR parts.

I wonder if this will be useful for TM -- Gal, do we have a doubleToBool builtin?  I couldn't find one.

Question:  what is MIR?
Attachment #417743 - Flags: review?(nnethercote) → review+
(In reply to comment #7)
> Question:  what is MIR?

Err... a Soviet Space Station?

(Actually, I think it was an acronym for Macromedia Intermediate Representation... basically, the JIT that FP9/FP10 used, pre-nanojit)
I searched tracemonkey for a while, found js_ValueToBoolean() (which corresponds to tamarin's AvmCore::boolean()), but never did find the code that traces it.  but i'd be surprised if it wasn't traced, somewhere.  

it's apparently invoked by POP_BOOLEAN macro in jsinterp, if that's any help.
I think we do the conversion in some C code. Never really appeared in any benchmarks for us (I think, I might be wrong) as a hot spot.
(In reply to comment #7)
> (From update of attachment 417743 [details] [diff] [review])
> Looks good, although you'll have to split into NJ and TR parts.

yeah, i'm going to do the ppc32/64, arm, and sparc ports first.  the asm_fcmp cleanup you did in the x86 backend made that one easy going.

the benchmark in question is tamarin-redux/test/performance/misc/primes.as, you could rip off the type annotations and try it in TM if interested.

if tracemonkey manages to make the mod in primes.as always be an int mod, then double->bool probably never shows up hot.
Yeah, we speculate aggressively on integer modulo. This might show up in the double slow path that gets hit in case of an overflow.
Suprisingly, this patch actually makes primes.as slower on PPC.  Posting here so we don't lose it, but instead of spending more time on it, I plan to just add a flag and only use LIR_fne on backends that have it.  initially, sparc and ppc won't have it, so we'll just use the function helper there.

on x86, x64, and ARM, LIR_fne provides about 10% speedup on primes.as
Using LIR_eq(cond,0) is equivalent to LIR_xor(cond,1), but ExprFilter knows how to optimize branches using eq(x,0) by inverting the sense of the branch, which results in better optimization across the board.

in the PPC backend, quad constants were not 8-aligned, this patch is a fix.
Attachment #419033 - Flags: review?(rreitmai)
Attachment #419033 - Flags: review?(stejohns)
Attachment #419033 - Flags: review?(stejohns) → review+
Attachment #419033 - Flags: review?(rreitmai) → review+
Comment on attachment 419033 [details] [diff] [review]
Fix alignment bug in PPC asm_quad(), use eq0 instead of xor for OP_not

pushed http://hg.mozilla.org/tamarin-redux/rev/1b59cdfc8946
Attachment #419033 - Attachment is obsolete: true
Marking this closed/fixed now on the assumption that TM does not plan to use LIR_fne immediately and thus doesn't need to track this bug in the near term.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
TM ends up taking all the NJ commits, even the ones that don't really affect TM, and we like to have the full paper trail between Bugzilla and the repositories... so I'll reopen this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: fixed-in-nanojit, fixed-in-tamarin
http://hg.mozilla.org/tracemonkey/rev/408f0bf6fb1c
Whiteboard: fixed-in-nanojit, fixed-in-tamarin → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey
Target Milestone: flash10.1 → flash10.2
I don't know how i missed this, but the above LIR_fne patch has never landed; what landed were some precursor patches.  Good thing it was reopened!
Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey, PACMAN
clearing whiteboard, updated subject line for clarity.  The scope of this bug is to add a "fne" opcode with the truth table from comment #4 (which happily matches the truth table for ToBool(double) from ECMAScript.
Summary: x64 LIR is slower than x64 MIR on 2 performance testcases, needs investigation → Add LIR_fne opcode and use for double->bool conversion
Whiteboard: fixed-in-nanojit, fixed-in-tamarin, fixed-in-tracemonkey, PACMAN → PACMAN
FWIW, I have a patch to add LIR_nei and LIR_neq that I've been thinking about resurrecting.  Not having LIR_ne* always felt strange to me.
(In reply to comment #22)
> FWIW, I have a patch to add LIR_nei and LIR_neq that I've been thinking about
> resurrecting.  Not having LIR_ne* always felt strange to me.

If the condition is feeding a branch, LIR_eqi and LIR_eqq plus jt/jf cover all the cases.  But, LIR_nei/neq are better when you want the 0/1 value of the condition.  so, +1 for completeness, we'd use those if available.
Blocks: 552542
Assignee: edwsmith → wsharp
OS: Windows Vista → All
Priority: P4 → P3
From primes.as:

function prime(n:int) {
	var m : int = n;
	var limit : int = n/2;
	for (var i : int = 2; i<=limit;++i) {
		if(!(m%i)) {
			return false;
		}
	}
	return true;
}

It would be great to have enough info to do an integer modulo there and completely avoid the (int,int)->(double,double)->modulo->negate->doubleToBool roundtrip.  (Outside the scope of this LIR_fne work item)
Add LIR_ned support for i386, i386 w/o SSE, x64 and ARM.  NJ_NED_SUPPORTED controls whether fast path is supported on various platforms.
Attachment #452048 - Flags: superreview?(edwsmith)
Attachment #452048 - Flags: review?(nnethercote)
Comment on attachment 452048 [details] [diff] [review]
Updated patch for latest code (i386, i386 w/o SSE, x64, ARM)

one bug in this patch.  disregard.
Attachment #452048 - Flags: superreview?(edwsmith)
Attachment #452048 - Flags: review?(nnethercote)
Attached patch updated patchSplinter Review
fix last patch which incorrectly used NJ_JTBL_SUPPORTED instead of NJ_NED_SUPPORTED
Attachment #452254 - Flags: superreview?(edwsmith)
Attachment #452254 - Flags: review?(nnethercote)
Comment on attachment 452254 [details] [diff] [review]
updated patch

Werner, can you add this to your ~/.hgrc file?

    [defaults]
    diff=-p -U 8

It gives 8 lines of context instead of 4, which makes patches easier to review.

Also, do you know about committing to the nanojit-central repository?  See 
https://developer.mozilla.org/en/NanojitMerge for details.


>@@ -2507,6 +2510,7 @@
>     static double FASTCALL muld(double a, double b) { return a * b; }
>     static double FASTCALL divd(double a, double b) { return a / b; }
>     static int32_t FASTCALL eqd(double a, double b) { return a == b; }
>+    static int32_t FASTCALL ned(double a, double b) { return a == a && a != b; }

I think this is wrong.  "NaN != 0" should give true.



>diff -r 7c9acb06a202 nanojit/LIRopcode.tbl
>--- a/nanojit/LIRopcode.tbl	Thu Jun 17 15:51:37 2010 +0200
>+++ b/nanojit/LIRopcode.tbl	Fri Jun 18 06:53:01 2010 -0700
>@@ -244,8 +244,7 @@
> OP___(gtd,      75, Op2,  I,    1)  // double greater-than
> OP___(led,      76, Op2,  I,    1)  // double less-than-or-equal
> OP___(ged,      77, Op2,  I,    1)  // double greater-than-or-equal
>-
>-OP_UN(78)
>+OP___(ned,		78, Op2,  I,    1)  // double inequality (unordered|equal => false; else true)

I'd prefer 'ned' to follow just after 'eqd'.  This will require shifting the
others down, but that's ok.  You may need get some static assertions because
some of the comparisons must have even opcodes, some must have odd, this is
for the x^1 inversion, I can't remember the exact details.  So you may
need to shift the opcodes around some more.

Hmm, there are some static assertions regarding the order of the comparison
operations in LIR.h around line 104.  They'll need to be updated too.


>     NIns* Assembler::asm_branchd(bool onFalse, LIns *cond, NIns *target) {
>         LOpcode condop = cond->opcode();
>         NIns *patch;
>         LIns *a = cond->oprnd1();
>         LIns *b = cond->oprnd2();
>-        if (condop == LIR_eqd) {
>+        if (condop == LIR_ned) {
>+            // unordered and equal are "false", otherwise "true" (no need to check parity bit)
>+            if (onFalse) {
>+                JE(0, target);
>+            } else {
>+                JNE(0, target);
>+            }
>+            patch = _nIns;
>+        }
>+        else if (condop == LIR_eqd) {
>             if (onFalse) {
>                 // branch if unordered or !=
>                 JP(16, target);     // underrun of 12 needed, round up for overhang --> 16

Can you put the 'eqd' code before the 'ned' code?



>+                // LIR_ned:
>+                //   ucomisd       ZPC   outcome (SETNE/JNE succeeds if Z==0)
>+                //   -------       ---   -------
>+                //   UNORDERED     111   SETNE/JNE fails
>+                //   EQUAL         100   SETNE/JNE fails
>+                //   GREATER_THAN  000   SETNE/JNE succeeds
>+                //   LESS_THAN     001   SETNE/JNE succeeds

As above, AIUI the UNORDERED case should succeed.  Eg. "NaN != 0" gives
true.  This problem sinks the patch as it stands.


I'll file a follow-up bug for LIR_nei/LIR_neq, they'd be good to have as
well for completeness.
Attachment #452254 - Flags: review?(nnethercote) → review-
the truth table I'm going for with fne (now named ned) is:

comparison    eqd      ned
----------    ---      ---
x == y        true     false
x < y         false    true
x > y         false    true
unordered     false    false

This is what x86/arm do naturally, and its what works for ecma-compatible double->bool conversion.  It is not symmetric with eqd; but to me that's its attraction, since otherwise it'd be easy to just flip the sense of jt/jf used with eqd.

(however for nei/neq, they should be symmetric with eqi/eqq).

(in retrospect, we should write down these tables for the LIR float-comparison instructions somewhere in LIR.h or LIRopcodes.h).
Summary: Add LIR_fne opcode and use for double->bool conversion → Add LIR_ned opcode and use for double->bool conversion
(In reply to comment #30)

fwiw, i'd be open to a better name for an opcode with the above truth table.  the way the proposed truth table makes sense to me, and why its not the inverse of eqd, is if you read "ned" as "<>"... i.e (x <> y).   in that light, (NaN <> 0) = false since (x <> y) => (x < y || x > y) =>  (NaN < 0 || NaN > 0) => false||false.
If what you want is really double-to-bool conversion, then this should be a unary op called LIR_d2b.  We can add LIR_ned (the inverse of LIR_eqd) as part of bug 573416.

So what are the semantics of LIR_d2b?  I guess 0.0, -0.0, and NaN convert to 0 and everything else to 1?
We usually stay away from adding LIR opcodes for things that have a specific JS semantics.
(In reply to comment #32)
> If what you want is really double-to-bool conversion, then this should be a
> unary op called LIR_d2b.  We can add LIR_ned (the inverse of LIR_eqd) as part
> of bug 573416.
> 
> So what are the semantics of LIR_d2b?  I guess 0.0, -0.0, and NaN convert to 0
> and everything else to 1?

Correct.  I considered tradeoffs of f2b in comment #4 and comment #6, decided to go with a binary op to simplify the backend implementations and make the instruction less special purpose.  (Only ARM has a FCMPZ instruction with an implicit 0.0 argument). I'm not opposed to a LIR_ned that's the inverse of eqd, but I have no need for it either.

I dont feel strongly either way; d2b and ned both will serve the need just fine.  the apparent issues are backend complexity and LIR semantics. here's how i'd summarize:

argument against special puropse ops: "support LIR_ned, maybe under a different name."

argument against over generalization: "support LIR_d2b".

In support of the first argument, here's another proposal:

1. Change LIR_ned's name to LIR_gld meaning "greater or less than", i.e. "x<>y".  this more accurately reflects its truth table since NaN is not greater or less than anything (gld(NaN,x)=false).  its inverse would be 'ngld' if we ever wanted it.

2. Inverses.  Inverses of the floating point operators get a prepended "n".  neqd, nltd, etc.  These are strictly computed as !(op).  Thus, nltd(x,y) != ged(x,y), since the unordered cases are different.  For integer operations, LIR_nlti == LIR_gti, so we could optionally add those as aliases.  LIR_nei/q would be new, there's no existing equivalent.
I'm happier if it's not called LIR_ned.

Re the over/undergeneralization question:  under what circumstances would you want the 2nd operand to be something other than 0.0?

Also, would isCmp() be true for LIR_gld?  In my mind there are six obvious comparison operators, ==/!=/</>/<=/>= and going beyond that is a bit weird.


(In reply to comment #33)
>
> We usually stay away from adding LIR opcodes for things that have a specific JS
> semantics.

Gal, which opcode has JS-specific semantics?
(In reply to comment #35)
> I'm happier if it's not called LIR_ned.
> 
> Re the over/undergeneralization question:  under what circumstances would you
> want the 2nd operand to be something other than 0.0?

I can't think of any, my only concrete use case is with 0.0.  A guard for div or mod (testing for nonzero/non-nan divisor) might use this opcode, but also would still use 0.0 on the RHS.  I can imagine cases when, mathematically, you might want to compare X with a constant and exclude NaN, however since its not possible in C/Java/JS, there's not much justification for it.

if our optimizers were able to recognize the pattern (X == X && X != C)  or (!isNaN(X) && X != C) then they could use this single opcode as part of a strength reduction.  That's pretty far fetched, but it is a code pattern we could look for in real code, if we wanted to search for more use cases.  doing so would stray deeply into "solution looking for a problem" territory.

> Also, would isCmp() be true for LIR_gld?  In my mind there are six obvious
> comparison operators, ==/!=/</>/<=/>= and going beyond that is a bit weird.

Yes, I think isCmp(LIR_gld) would be true.  NaN's make floating point comparisons weird, for sure.  that there's two variants of inequality doesn't bother me though.

> (In reply to comment #33)
> >
> > We usually stay away from adding LIR opcodes for things that have a specific JS
> > semantics.
> 
> Gal, which opcode has JS-specific semantics?

I think he meant that a LIR_d2b would have JS-specific semantics.  If we went with that opcode, we should just specify its behavior as a truth table, with the happy coincidence that it matched JS semantics for ToBool(Number).

Sounds like you're advocating LIR_d2b(x) more strongly than I'm advocating LIR_gld(x,0.0).  That works for me.  There's time for more discussion/opinions, too; this isn't an urgent bug.
(In reply to comment #36)
> > Also, would isCmp() be true for LIR_gld?  In my mind there are six obvious
> > comparison operators, ==/!=/</>/<=/>= and going beyond that is a bit weird.

Strictly speaking, there are 12 comparison operators for floats, due to NaN; the six above, and six more with inverted truth tables (prefixed with 'n' probably), which treat NaN the opposite way.  For ints, the 12 collapse down to 6 since half of the truth tables work out the same, because of no NaN.
Attachment #452254 - Flags: superreview?(edwsmith)
status?
Patch needs updating for the LIR_d2b change and the feedback from Nick.  Also, it did not work on ARM for some reason so that needs investigation after I get set up wth some ARM hardware.
Per team mtg, removing my name from bugs I'm not actively working on.
Assignee: wsharp → nobody
Assignee: nobody → rreitmai
Whiteboard: PACMAN PACMAN → PACMAN must–fix-candidate PACMAN must–fix-candidate
Whiteboard: PACMAN must–fix-candidate PACMAN must–fix-candidate → PACMAN must-fix-candidate PACMAN must-fix-candidate
Flags: flashplayer-injection-
Target Milestone: Q3 11 - Serrano → Future
Depends on: Andre
Rick, is this bug still active? Assign to nobody if you are not working on it. Removing Andre blocker.
No longer depends on: Andre
Assignee: rreitmai → nobody
Tamarin is a dead project now. Mass WONTFIX.
Status: REOPENED → RESOLVED
Closed: 15 years ago6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: