Closed Bug 523251 Opened 15 years ago Closed 15 years ago

TM: add LIR_f2i [nanojit]

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: gal, Assigned: vlad)

References

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(5 files, 5 obsolete files)

patch 15 years ago Andreas Gal :gal 9.04 KB, patch		Details \| Diff \| Splinter Review
amd64 backend support, I ♥ lirasm 15 years ago Andreas Gal :gal 11.71 KB, patch		Details \| Diff \| Splinter Review
freshened patch 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 6.48 KB, patch		Details \| Diff \| Splinter Review
another updated patch 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 8.67 KB, patch		Details \| Diff \| Splinter Review
updated to tm tip 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 8.67 KB, patch		Details \| Diff \| Splinter Review
added softfp 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 9.26 KB, patch	gal : review+	Details \| Diff \| Splinter Review
updated 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 10.19 KB, patch	gal : review+	Details \| Diff \| Splinter Review
one more 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 10.75 KB, patch	gal : review+	Details \| Diff \| Splinter Review
nanojit patch 15 years ago Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond) 11.74 KB, patch	n.nethercote : review+ edwsmith : review+	Details \| Diff \| Splinter Review
Provide ppc/sparc stubs, compile-time flag 15 years ago Steven Johnson 3.30 KB, patch	gal : review+ rreitmai : review+	Details \| Diff \| Splinter Review

Andreas Gal :gal

Reporter

Description

•

15 years ago

We would like to fast-path DoubleToECMAint32. For this LIR_f2i will provide an approximate answer, which is compared via LIR_i2f with the original, in which case we can bypass the call to the builtin.

Andreas Gal :gal

Reporter

Comment 1

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Assignee: general → gal

Andreas Gal :gal

Reporter

Comment 2

•

15 years ago

Tested and works for i386 sse and no sse. lirasm is super handy here.

Andreas Gal :gal

Reporter

Comment 3

•

15 years ago

Attached patch amd64 backend support, I ♥ lirasm — Details — Splinter Review

Attachment #407187 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Blocks: 522593

Jacob Bramley [:jbramley]

Comment 4

•

15 years ago

What do you mean by "approximate"? On ARM, we can use the VCVT (VFP) instruction to perform double-to-int conversions, but it is hard to know exactly how it should behave for LIR_f2i. I don't know what your SSE magic does :-)

Also, does LIR_f2i need to deal with NaN or infinite inputs? IEEE754 says that such conversions are invalid, and ARM VFP will accordingly raise an "Invalid Operation" exception. There are a few ways around this, but the best solution will probably depend on exactly what LIR_f2i needs to do with wonky inputs (such as NaNs).

Andreas Gal :gal

Reporter

Comment 5

•

15 years ago

The exact semantics of LIR_f2i is still up for negotiations. I was thinking roughly along these lines:

- if input is a signed 32-bit integer, the output will be correct
- if the input is not a whole number, NaN or infinite, the output is anything
- no exceptions raised

How well would this work with the VFP? And could we hack up a fast path code without calling a method for softfloat?

Jacob Bramley [:jbramley]

Comment 6

•

15 years ago

> - if input is a signed 32-bit integer, the output will be correct

Yep, that bit's free using VCVT.

> - if the input is not a whole number, NaN or infinite, the output is anything

That's easy (of course).

> - no exceptions raised

That's the tricky bit. We might need to reconfigure the VFP coprocessor to do this, and the overhead of doing this _might_ hurt. (I don't know the details at the moment.)

> How well would this work with the VFP?

No problems, other than the exceptions.

> And could we hack up a fast path code
> without calling a method for softfloat?

Yes, this should be possible. If we can ignore all the tricky bits, a short inline soft-float routine should be easy enough to design.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 7

•

15 years ago

Attached patch freshened patch (obsolete) — Details — Splinter Review

Updated-to-tm patch

Jacob Bramley [:jbramley]

Comment 8

•

15 years ago

Ok, here's the state of play on ARM (based on Cortex-A8 tests):

 - I've written a very fast ARM routine to perform the double-to-int32_t conversion.

 - If the source (double) is already in a VFP register (e.g. from the result of a previous operation), use VFP's VCVT instruction. (This was previously called FTOSID.)

 - If the source is in memory, and must be loaded anyway, the ARM routine wins — at least on A8 — because the result ends up in an ARM register (where you'd want integers to be).

 - Cortex-A9 will be different. It has faster VFP for one thing, and it can move stuff from VFP to ARM quite efficiently. Unfortunately, I can't try this right now.

----

The best solution, then, depends on where the data starts off. If it starts in memory, I'd go for the ARM solution, but otherwise go with VCVT. Unfortunately, I don't think we have that information when we're emitting code (because we do it backwards), so I vote for using VCVT and "findRegFor(FpRegs)" to implement this.

OS: Mac OS X → All

Hardware: x86 → All

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 9

•

15 years ago

Attached patch another updated patch (obsolete) — Details — Splinter Review

Attachment #417006 - Attachment is obsolete: true

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 10

•

15 years ago

Attached patch updated to tm tip (obsolete) — Details — Splinter Review

Updated to tip again; has some ARM code, but it's probably not correct in the non-VFP case, and it doesn't do the right thing with exceptions.

Attachment #418925 - Attachment is obsolete: true

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Updated

•

15 years ago

Blocks: 533659

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 11

•

15 years ago

Jacob, can I get you to integrate the ARM pieces here?  I'm getting ready to land this for x86, though I'll disable the code that will use it for ARM until we have that code fixed.

Jacob Bramley [:jbramley]

Comment 12

•

15 years ago

Yep, no problem.

"[...] it doesn't do the right thing with exceptions [...]"

The VFP conversion instructions may throw exceptions unless you turn them off (as in RunFast mode). Is that what you mean? The result should be what you expect in either case, and this is in fact what GCC emits, but it might not be usefully fast if you have to handle exceptions. Based on my memory of a previous bug, I believe that we use RunFast mode, so the existing VFP implementation should be fine.

If you think it'll be worthwhile, I can integrate my pure ARM version (as I mentioned before); this will obviously never generate VFP exceptions. I'd guess that the VFP version would be better in general, however.

----

"[...] it's probably not correct in the non-VFP case [...]"

What do you mean by that? Do we need a soft-float implementation? The reason I ask is that most FP-related asm_* methods are never called when in soft-float mode, and the asm_*2f functions don't support soft-float.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 13

•

15 years ago

Oh, right, I forgot we ran in RunFast mode.  So this code is ok to go in, I think?  If you want to stick the pure ARM code in as a followup that'd be great, but I'm not too worried about the non-VFP case.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 14

•

15 years ago

Attached patch added softfp (obsolete) — Details — Splinter Review

Added the softfloat bit, missed that earlier; should be good to go.

Attachment #420196 - Attachment is obsolete: true

Attachment #421146 - Flags: review?(gal)

Andreas Gal :gal

Reporter

Comment 15

•

15 years ago

Comment on attachment 421146 [details] [diff] [review]
added softfp

>+void Assembler::asm_f2i(LInsp ins)
>+{
>+    // where our result goes
>+    Register rr = prepResultReg(ins, GpRegs);
>+    Register sr = findRegFor(ins->oprnd1(), FpRegs);
>+
>+    // XXX correct exception/rounding handling?

What does this mean? Is that a todo? Want to file a bug and reference it here?

>+
>+    FMRS(rr, S14);
>+    FTOSID(S14, sr);
>+}
>+

Attachment #421146 - Flags: review?(gal) → review+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 16

•

15 years ago

Attached patch updated — Details — Splinter Review

Updated with fixed softfp

Andreas Gal :gal

Reporter

Comment 17

•

15 years ago

Comment on attachment 421149 [details] [diff] [review]
updated

Bonus points if you test this with softfloat enabled (you can do it on linux, just flip the config bit), and also maybe move the _argtypes & ARGSIZE_MASK_ANY thing into Callinfo as ->returnType() or so.

Attachment #421149 - Flags: review+

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Attachment #421146 - Attachment is obsolete: true

Nicholas Nethercote [inactive]

Comment 18

•

15 years ago

(In reply to comment #17)
> maybe move the _argtypes & ARGSIZE_MASK_ANY
> thing into Callinfo as ->returnType() or so.

Bug 507089 covers this, but no work has been done on it.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 19

•

15 years ago

Attached patch one more — Details — Splinter Review

One more -- asking for a re-review, added returnType() and argType() to CallInfo struct  and added correct softfp handling.

Attachment #421188 - Flags: review?(gal)

Andreas Gal :gal

Reporter

Comment 20

•

15 years ago

Comment on attachment 421188 [details] [diff] [review]
one more

Nick might want to actually use the new member functions in nj. Let's cc him.

Attachment #421188 - Flags: review?(gal) → review+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 21

•

15 years ago

Yeah, there's a pile of places where they could be used, but I figured that would be covered by 507089.

Nicholas Nethercote [inactive]

Comment 22

•

15 years ago

Vlad, can you add an argument to argType() indicating that it indexes in reverse order, like the comment for LIns::arg()?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 23

•

15 years ago

Done.  Added a few more bits to make lirasm and the lirasm test work as well.

http://hg.mozilla.org/tracemonkey/rev/fea4da580994

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 24

•

15 years ago

Alas, this got backed out to land on nj-central, and nj-central is having test-passing problems.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 25

•

15 years ago

I'm making a right mess of the i2f function on x86; njn said he'd help tomorrow, because I'm totally lost.

An SSE2 portion that looks like this:

    void Assembler::asm_f2i(LInsp ins)
    {
        LIns *lhs = ins->oprnd1();

        if (config.sse2) {
            Register rr = prepareResultReg(ins, GpRegs);
            Register ra = findRegFor(lhs, XmmRegs);
            SSE_CVTSD2SI(rr, ra);
        } else {

            Register ra = findRegFor(lhs, FpRegs);
            NanoAssert(ra == FST0);

            Register rr = ins->getReg();
            if (isKnownReg(rr))
                evict(ins);

            int d = findMemFor(ins);
            FISTP(d, FP);

        }

        freeResrouceOf(ins);
    }

works.  I can't figure out the right non-SSE2 code though, as I keep hitting this assertion:

Assertion failed: (_allocator.active[FST0] && _fpuStkDepth == -1) || (!_allocator.active[FST0] && _fpuStkDepth == 0) (../nanojit/Assembler.cpp:339)

On entry to the ... portion _fpuStkDepth is -1 if we just came from a fp op (often the arg is fadd or similar), FST0 is active (with lhs)... when FISTP() is called, fpu_pop() is called, and I end up with FST0 active still with lhs and _fpuStkDepth becomes -2.  Help!

Jacob Bramley [:jbramley]

Comment 26

•

15 years ago

I saw that too, when I tested it here, but only on the --random test. I didn't fix it.

In addition, f2i.in doesn't look right to me. Do you not want to use the floating-point opcodes?

a = alloc 8
d = float 5.0
stfi d a 0
x = ldf a 0
i = f2i x
ret i

That clears up a "bad lexical cast" error for me (on both x86 and ARM). The "stfi" and "ldf" changes make no difference, but make more semantic sense (and they might have FP-specific side effects that I don't know about).

----

(In reply to comment #13)
> Oh, right, I forgot we ran in RunFast mode.  So this code is ok to go in, I
> think?  If you want to stick the pure ARM code in as a followup that'd be
> great, but I'm not too worried about the non-VFP case.

Yep, it's good to go. I'm not worried about the non-VFP case either; pretty much every target device will have VFP so there's little gain to be had from optimizing for non-VFP.

Status: REOPENED → NEW

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 27

•

15 years ago

Attached patch nanojit patch — Details — Splinter Review

Ok, now really updated and fixed.  Passes all tests, had to rework f2i with njn's help.

Assignee: gal → vladimir

Attachment #421323 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Comment 28

•

15 years ago

Comment on attachment 421323 [details] [diff] [review]
nanojit patch


Nit: can you add LIR_f2i to the comment on the LOP_I_F line in lirasm/LInsClasses.tbl?


>@@ -1363,6 +1367,8 @@ namespace nanojit
>     {
>         LInsp args[MAXARGS];
>         uint32_t argc = ins->argc();
>+        if (argc >= MAXARGS)
>+            printf ("argc: %d MAXARGS: %d\n", argc, MAXARGS);

Nit: debugging code?


>-OPDEF(__94,     94, None, Void)
>+OPDEF(f2i,      94, Op1,  I32)  // approximate f2i conversion (no exception raised)

What exactly does "approximate" mean?  Eg. if it's constant folded via ExprFilter could the result be different to the result if it's computed at run-time?  What happens if an exceptional value occurs?  Comment 4 and comment 5 discussed this, but the outcome wasn't entirely clear.  A comment explaining this a bit more would be good, esp. the relation between f2i and i2f and any identities there.


>+#define FISTP(d,b)  FIST(1,d,b)

Nit: this is unused.

r+ from me with the nits fixed.  But we should run this past one of the Adobe guys.  Asking edwsmith for a review.

Attachment #421323 - Flags: review?(nnethercote)

Attachment #421323 - Flags: review?(edwsmith)

Attachment #421323 - Flags: review+

Edwin Smith

Updated

•

15 years ago

Attachment #421323 - Flags: review?(edwsmith) → review+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Comment 29

•

15 years ago

http://hg.mozilla.org/projects/nanojit-central/rev/95e6284a8725

Status: NEW → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Assignee

Updated

•

15 years ago

Whiteboard: fixed-in-nanojit

Nicholas Nethercote [inactive]

Comment 30

•

15 years ago

Follow-up ARM fix:
http://hg.mozilla.org/projects/nanojit-central/rev/0628e6475f1f

http://hg.mozilla.org/tracemonkey/rev/b5fe3850f255
http://hg.mozilla.org/tracemonkey/rev/1bf96abef84d

Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey

Steven Johnson

Comment 31

•

15 years ago

We can't pull this into TR until we get the remaining backends (PPC and Sparc).

Steven Johnson

Comment 32

•

15 years ago

We should either provide implementations for all existing backends, or add a conditional compiletime flag (a la NJ_EXPANDED_LOADSTORE_SUPPORTED).

Steven Johnson

Comment 33

•

15 years ago

Attached patch Provide ppc/sparc stubs, compile-time flag — Details — Splinter Review

Add stub version of asm_f2i on Sparc and PPC to allow compilation. Also add NJ_F2I_SUPPORTED compiletime flag so that code can determine if the backend actually supports this opcode. (Intent is that the flag will go away when real PPC/Sparc implementations arrive.)

Attachment #422263 - Flags: review?(gal)

Steven Johnson

Updated

•

15 years ago

Attachment #422263 - Flags: review?(rreitmai)

Rick Reitmaier

Updated

•

15 years ago

Attachment #422263 - Flags: review?(rreitmai) → review+

Steven Johnson

Comment 34

•

15 years ago

pushed in advance of gal's review to fix build break: http://hg.mozilla.org/projects/nanojit-central/rev/387c73e51e77

Steven Johnson

Updated

•

15 years ago

Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin

Steven Johnson

Comment 35

•

15 years ago

added bug 540495 and bug 540496 to track implementation of PPC and Sparc versions.

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Attachment #422263 - Flags: review?(gal) → review+

Nicholas Nethercote [inactive]

Comment 36

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/8dfd412fcf59

Robert Sayre

Comment 37

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/fea4da580994

You need to log in before you can comment on or make changes to this bug.