Closed Bug 523251 Opened 12 years ago Closed 12 years ago

TM: add LIR_f2i [nanojit]

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: vlad)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(5 files, 5 obsolete files)

We would like to fast-path DoubleToECMAint32. For this LIR_f2i will provide an approximate answer, which is compared via LIR_i2f with the original, in which case we can bypass the call to the builtin.
Attached patch patch (obsolete) — Splinter Review
Assignee: general → gal
Tested and works for i386 sse and no sse. lirasm is super handy here.
Attachment #407187 - Attachment is obsolete: true
Blocks: 522593
What do you mean by "approximate"? On ARM, we can use the VCVT (VFP) instruction to perform double-to-int conversions, but it is hard to know exactly how it should behave for LIR_f2i. I don't know what your SSE magic does :-)

Also, does LIR_f2i need to deal with NaN or infinite inputs? IEEE754 says that such conversions are invalid, and ARM VFP will accordingly raise an "Invalid Operation" exception. There are a few ways around this, but the best solution will probably depend on exactly what LIR_f2i needs to do with wonky inputs (such as NaNs).
The exact semantics of LIR_f2i is still up for negotiations. I was thinking roughly along these lines:

- if input is a signed 32-bit integer, the output will be correct
- if the input is not a whole number, NaN or infinite, the output is anything
- no exceptions raised

How well would this work with the VFP? And could we hack up a fast path code without calling a method for softfloat?
> - if input is a signed 32-bit integer, the output will be correct

Yep, that bit's free using VCVT.

> - if the input is not a whole number, NaN or infinite, the output is anything

That's easy (of course).

> - no exceptions raised

That's the tricky bit. We might need to reconfigure the VFP coprocessor to do this, and the overhead of doing this _might_ hurt. (I don't know the details at the moment.)

> How well would this work with the VFP?

No problems, other than the exceptions.

> And could we hack up a fast path code
> without calling a method for softfloat?

Yes, this should be possible. If we can ignore all the tricky bits, a short inline soft-float routine should be easy enough to design.
Attached patch freshened patch (obsolete) — Splinter Review
Updated-to-tm patch
Ok, here's the state of play on ARM (based on Cortex-A8 tests):

 - I've written a very fast ARM routine to perform the double-to-int32_t conversion.

 - If the source (double) is already in a VFP register (e.g. from the result of a previous operation), use VFP's VCVT instruction. (This was previously called FTOSID.)

 - If the source is in memory, and must be loaded anyway, the ARM routine wins — at least on A8 — because the result ends up in an ARM register (where you'd want integers to be).

 - Cortex-A9 will be different. It has faster VFP for one thing, and it can move stuff from VFP to ARM quite efficiently. Unfortunately, I can't try this right now.

----

The best solution, then, depends on where the data starts off. If it starts in memory, I'd go for the ARM solution, but otherwise go with VCVT. Unfortunately, I don't think we have that information when we're emitting code (because we do it backwards), so I vote for using VCVT and "findRegFor(FpRegs)" to implement this.
OS: Mac OS X → All
Hardware: x86 → All
Attached patch another updated patch (obsolete) — Splinter Review
Attachment #417006 - Attachment is obsolete: true
Attached patch updated to tm tip (obsolete) — Splinter Review
Updated to tip again; has some ARM code, but it's probably not correct in the non-VFP case, and it doesn't do the right thing with exceptions.
Attachment #418925 - Attachment is obsolete: true
Jacob, can I get you to integrate the ARM pieces here?  I'm getting ready to land this for x86, though I'll disable the code that will use it for ARM until we have that code fixed.
Yep, no problem.

"[...] it doesn't do the right thing with exceptions [...]"

The VFP conversion instructions may throw exceptions unless you turn them off (as in RunFast mode). Is that what you mean? The result should be what you expect in either case, and this is in fact what GCC emits, but it might not be usefully fast if you have to handle exceptions. Based on my memory of a previous bug, I believe that we use RunFast mode, so the existing VFP implementation should be fine.

If you think it'll be worthwhile, I can integrate my pure ARM version (as I mentioned before); this will obviously never generate VFP exceptions. I'd guess that the VFP version would be better in general, however.

----

"[...] it's probably not correct in the non-VFP case [...]"

What do you mean by that? Do we need a soft-float implementation? The reason I ask is that most FP-related asm_* methods are never called when in soft-float mode, and the asm_*2f functions don't support soft-float.
Oh, right, I forgot we ran in RunFast mode.  So this code is ok to go in, I think?  If you want to stick the pure ARM code in as a followup that'd be great, but I'm not too worried about the non-VFP case.
Attached patch added softfp (obsolete) — Splinter Review
Added the softfloat bit, missed that earlier; should be good to go.
Attachment #420196 - Attachment is obsolete: true
Attachment #421146 - Flags: review?(gal)
Comment on attachment 421146 [details] [diff] [review]
added softfp

>+void Assembler::asm_f2i(LInsp ins)
>+{
>+    // where our result goes
>+    Register rr = prepResultReg(ins, GpRegs);
>+    Register sr = findRegFor(ins->oprnd1(), FpRegs);
>+
>+    // XXX correct exception/rounding handling?

What does this mean? Is that a todo? Want to file a bug and reference it here?

>+
>+    FMRS(rr, S14);
>+    FTOSID(S14, sr);
>+}
>+
Attachment #421146 - Flags: review?(gal) → review+
Attached patch updatedSplinter Review
Updated with fixed softfp
Comment on attachment 421149 [details] [diff] [review]
updated

Bonus points if you test this with softfloat enabled (you can do it on linux, just flip the config bit), and also maybe move the _argtypes & ARGSIZE_MASK_ANY thing into Callinfo as ->returnType() or so.
Attachment #421149 - Flags: review+
Attachment #421146 - Attachment is obsolete: true
(In reply to comment #17)
> maybe move the _argtypes & ARGSIZE_MASK_ANY
> thing into Callinfo as ->returnType() or so.

Bug 507089 covers this, but no work has been done on it.
Attached patch one moreSplinter Review
One more -- asking for a re-review, added returnType() and argType() to CallInfo struct  and added correct softfp handling.
Attachment #421188 - Flags: review?(gal)
Comment on attachment 421188 [details] [diff] [review]
one more

Nick might want to actually use the new member functions in nj. Let's cc him.
Attachment #421188 - Flags: review?(gal) → review+
Yeah, there's a pile of places where they could be used, but I figured that would be covered by 507089.
Vlad, can you add an argument to argType() indicating that it indexes in reverse order, like the comment for LIns::arg()?
Done.  Added a few more bits to make lirasm and the lirasm test work as well.

http://hg.mozilla.org/tracemonkey/rev/fea4da580994
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Alas, this got backed out to land on nj-central, and nj-central is having test-passing problems.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm making a right mess of the i2f function on x86; njn said he'd help tomorrow, because I'm totally lost.

An SSE2 portion that looks like this:

    void Assembler::asm_f2i(LInsp ins)
    {
        LIns *lhs = ins->oprnd1();

        if (config.sse2) {
            Register rr = prepareResultReg(ins, GpRegs);
            Register ra = findRegFor(lhs, XmmRegs);
            SSE_CVTSD2SI(rr, ra);
        } else {

            Register ra = findRegFor(lhs, FpRegs);
            NanoAssert(ra == FST0);

            Register rr = ins->getReg();
            if (isKnownReg(rr))
                evict(ins);

            int d = findMemFor(ins);
            FISTP(d, FP);

        }

        freeResrouceOf(ins);
    }

works.  I can't figure out the right non-SSE2 code though, as I keep hitting this assertion:

Assertion failed: (_allocator.active[FST0] && _fpuStkDepth == -1) || (!_allocator.active[FST0] && _fpuStkDepth == 0) (../nanojit/Assembler.cpp:339)

On entry to the ... portion _fpuStkDepth is -1 if we just came from a fp op (often the arg is fadd or similar), FST0 is active (with lhs)... when FISTP() is called, fpu_pop() is called, and I end up with FST0 active still with lhs and _fpuStkDepth becomes -2.  Help!
I saw that too, when I tested it here, but only on the --random test. I didn't fix it.

In addition, f2i.in doesn't look right to me. Do you not want to use the floating-point opcodes?

a = alloc 8
d = float 5.0
stfi d a 0
x = ldf a 0
i = f2i x
ret i

That clears up a "bad lexical cast" error for me (on both x86 and ARM). The "stfi" and "ldf" changes make no difference, but make more semantic sense (and they might have FP-specific side effects that I don't know about).

----

(In reply to comment #13)
> Oh, right, I forgot we ran in RunFast mode.  So this code is ok to go in, I
> think?  If you want to stick the pure ARM code in as a followup that'd be
> great, but I'm not too worried about the non-VFP case.

Yep, it's good to go. I'm not worried about the non-VFP case either; pretty much every target device will have VFP so there's little gain to be had from optimizing for non-VFP.
Status: REOPENED → NEW
Attached patch nanojit patchSplinter Review
Ok, now really updated and fixed.  Passes all tests, had to rework f2i with njn's help.
Assignee: gal → vladimir
Attachment #421323 - Flags: review?(nnethercote)
Comment on attachment 421323 [details] [diff] [review]
nanojit patch


Nit: can you add LIR_f2i to the comment on the LOP_I_F line in lirasm/LInsClasses.tbl?


>@@ -1363,6 +1367,8 @@ namespace nanojit
>     {
>         LInsp args[MAXARGS];
>         uint32_t argc = ins->argc();
>+        if (argc >= MAXARGS)
>+            printf ("argc: %d MAXARGS: %d\n", argc, MAXARGS);

Nit: debugging code?


>-OPDEF(__94,     94, None, Void)
>+OPDEF(f2i,      94, Op1,  I32)  // approximate f2i conversion (no exception raised)

What exactly does "approximate" mean?  Eg. if it's constant folded via ExprFilter could the result be different to the result if it's computed at run-time?  What happens if an exceptional value occurs?  Comment 4 and comment 5 discussed this, but the outcome wasn't entirely clear.  A comment explaining this a bit more would be good, esp. the relation between f2i and i2f and any identities there.


>+#define FISTP(d,b)  FIST(1,d,b)

Nit: this is unused.

r+ from me with the nits fixed.  But we should run this past one of the Adobe guys.  Asking edwsmith for a review.
Attachment #421323 - Flags: review?(nnethercote)
Attachment #421323 - Flags: review?(edwsmith)
Attachment #421323 - Flags: review+
Attachment #421323 - Flags: review?(edwsmith) → review+
http://hg.mozilla.org/projects/nanojit-central/rev/95e6284a8725
Status: NEW → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-nanojit
We can't pull this into TR until we get the remaining backends (PPC and Sparc).
We should either provide implementations for all existing backends, or add a conditional compiletime flag (a la NJ_EXPANDED_LOADSTORE_SUPPORTED).
Add stub version of asm_f2i on Sparc and PPC to allow compilation. Also add NJ_F2I_SUPPORTED compiletime flag so that code can determine if the backend actually supports this opcode. (Intent is that the flag will go away when real PPC/Sparc implementations arrive.)
Attachment #422263 - Flags: review?(gal)
Attachment #422263 - Flags: review?(rreitmai)
Attachment #422263 - Flags: review?(rreitmai) → review+
pushed in advance of gal's review to fix build break: http://hg.mozilla.org/projects/nanojit-central/rev/387c73e51e77
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
added bug 540495 and bug 540496 to track implementation of PPC and Sparc versions.
Attachment #422263 - Flags: review?(gal) → review+
You need to log in before you can comment on or make changes to this bug.