Closed
Bug 523251
Opened 15 years ago
Closed 15 years ago
TM: add LIR_f2i [nanojit]
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: gal, Assigned: vlad)
References
Details
(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)
Attachments
(5 files, 5 obsolete files)
11.71 KB,
patch
|
Details | Diff | Splinter Review | |
10.19 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
10.75 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
11.74 KB,
patch
|
n.nethercote
:
review+
edwsmith
:
review+
|
Details | Diff | Splinter Review |
3.30 KB,
patch
|
gal
:
review+
rreitmai
:
review+
|
Details | Diff | Splinter Review |
We would like to fast-path DoubleToECMAint32. For this LIR_f2i will provide an approximate answer, which is compared via LIR_i2f with the original, in which case we can bypass the call to the builtin.
Reporter | ||
Comment 1•15 years ago
|
||
Assignee: general → gal
Reporter | ||
Comment 2•15 years ago
|
||
Tested and works for i386 sse and no sse. lirasm is super handy here.
Reporter | ||
Comment 3•15 years ago
|
||
Attachment #407187 -
Attachment is obsolete: true
Comment 4•15 years ago
|
||
What do you mean by "approximate"? On ARM, we can use the VCVT (VFP) instruction to perform double-to-int conversions, but it is hard to know exactly how it should behave for LIR_f2i. I don't know what your SSE magic does :-) Also, does LIR_f2i need to deal with NaN or infinite inputs? IEEE754 says that such conversions are invalid, and ARM VFP will accordingly raise an "Invalid Operation" exception. There are a few ways around this, but the best solution will probably depend on exactly what LIR_f2i needs to do with wonky inputs (such as NaNs).
Reporter | ||
Comment 5•15 years ago
|
||
The exact semantics of LIR_f2i is still up for negotiations. I was thinking roughly along these lines: - if input is a signed 32-bit integer, the output will be correct - if the input is not a whole number, NaN or infinite, the output is anything - no exceptions raised How well would this work with the VFP? And could we hack up a fast path code without calling a method for softfloat?
Comment 6•15 years ago
|
||
> - if input is a signed 32-bit integer, the output will be correct Yep, that bit's free using VCVT. > - if the input is not a whole number, NaN or infinite, the output is anything That's easy (of course). > - no exceptions raised That's the tricky bit. We might need to reconfigure the VFP coprocessor to do this, and the overhead of doing this _might_ hurt. (I don't know the details at the moment.) > How well would this work with the VFP? No problems, other than the exceptions. > And could we hack up a fast path code > without calling a method for softfloat? Yes, this should be possible. If we can ignore all the tricky bits, a short inline soft-float routine should be easy enough to design.
Assignee | ||
Comment 7•15 years ago
|
||
Updated-to-tm patch
Comment 8•15 years ago
|
||
Ok, here's the state of play on ARM (based on Cortex-A8 tests): - I've written a very fast ARM routine to perform the double-to-int32_t conversion. - If the source (double) is already in a VFP register (e.g. from the result of a previous operation), use VFP's VCVT instruction. (This was previously called FTOSID.) - If the source is in memory, and must be loaded anyway, the ARM routine wins — at least on A8 — because the result ends up in an ARM register (where you'd want integers to be). - Cortex-A9 will be different. It has faster VFP for one thing, and it can move stuff from VFP to ARM quite efficiently. Unfortunately, I can't try this right now. ---- The best solution, then, depends on where the data starts off. If it starts in memory, I'd go for the ARM solution, but otherwise go with VCVT. Unfortunately, I don't think we have that information when we're emitting code (because we do it backwards), so I vote for using VCVT and "findRegFor(FpRegs)" to implement this.
OS: Mac OS X → All
Hardware: x86 → All
Assignee | ||
Comment 9•15 years ago
|
||
Attachment #417006 -
Attachment is obsolete: true
Assignee | ||
Comment 10•15 years ago
|
||
Updated to tip again; has some ARM code, but it's probably not correct in the non-VFP case, and it doesn't do the right thing with exceptions.
Attachment #418925 -
Attachment is obsolete: true
Assignee | ||
Comment 11•15 years ago
|
||
Jacob, can I get you to integrate the ARM pieces here? I'm getting ready to land this for x86, though I'll disable the code that will use it for ARM until we have that code fixed.
Comment 12•15 years ago
|
||
Yep, no problem. "[...] it doesn't do the right thing with exceptions [...]" The VFP conversion instructions may throw exceptions unless you turn them off (as in RunFast mode). Is that what you mean? The result should be what you expect in either case, and this is in fact what GCC emits, but it might not be usefully fast if you have to handle exceptions. Based on my memory of a previous bug, I believe that we use RunFast mode, so the existing VFP implementation should be fine. If you think it'll be worthwhile, I can integrate my pure ARM version (as I mentioned before); this will obviously never generate VFP exceptions. I'd guess that the VFP version would be better in general, however. ---- "[...] it's probably not correct in the non-VFP case [...]" What do you mean by that? Do we need a soft-float implementation? The reason I ask is that most FP-related asm_* methods are never called when in soft-float mode, and the asm_*2f functions don't support soft-float.
Assignee | ||
Comment 13•15 years ago
|
||
Oh, right, I forgot we ran in RunFast mode. So this code is ok to go in, I think? If you want to stick the pure ARM code in as a followup that'd be great, but I'm not too worried about the non-VFP case.
Assignee | ||
Comment 14•15 years ago
|
||
Added the softfloat bit, missed that earlier; should be good to go.
Attachment #420196 -
Attachment is obsolete: true
Attachment #421146 -
Flags: review?(gal)
Reporter | ||
Comment 15•15 years ago
|
||
Comment on attachment 421146 [details] [diff] [review] added softfp >+void Assembler::asm_f2i(LInsp ins) >+{ >+ // where our result goes >+ Register rr = prepResultReg(ins, GpRegs); >+ Register sr = findRegFor(ins->oprnd1(), FpRegs); >+ >+ // XXX correct exception/rounding handling? What does this mean? Is that a todo? Want to file a bug and reference it here? >+ >+ FMRS(rr, S14); >+ FTOSID(S14, sr); >+} >+
Attachment #421146 -
Flags: review?(gal) → review+
Assignee | ||
Comment 16•15 years ago
|
||
Updated with fixed softfp
Reporter | ||
Comment 17•15 years ago
|
||
Comment on attachment 421149 [details] [diff] [review] updated Bonus points if you test this with softfloat enabled (you can do it on linux, just flip the config bit), and also maybe move the _argtypes & ARGSIZE_MASK_ANY thing into Callinfo as ->returnType() or so.
Attachment #421149 -
Flags: review+
Reporter | ||
Updated•15 years ago
|
Attachment #421146 -
Attachment is obsolete: true
Comment 18•15 years ago
|
||
(In reply to comment #17) > maybe move the _argtypes & ARGSIZE_MASK_ANY > thing into Callinfo as ->returnType() or so. Bug 507089 covers this, but no work has been done on it.
Assignee | ||
Comment 19•15 years ago
|
||
One more -- asking for a re-review, added returnType() and argType() to CallInfo struct and added correct softfp handling.
Attachment #421188 -
Flags: review?(gal)
Reporter | ||
Comment 20•15 years ago
|
||
Comment on attachment 421188 [details] [diff] [review] one more Nick might want to actually use the new member functions in nj. Let's cc him.
Attachment #421188 -
Flags: review?(gal) → review+
Assignee | ||
Comment 21•15 years ago
|
||
Yeah, there's a pile of places where they could be used, but I figured that would be covered by 507089.
Comment 22•15 years ago
|
||
Vlad, can you add an argument to argType() indicating that it indexes in reverse order, like the comment for LIns::arg()?
Assignee | ||
Comment 23•15 years ago
|
||
Done. Added a few more bits to make lirasm and the lirasm test work as well. http://hg.mozilla.org/tracemonkey/rev/fea4da580994
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 24•15 years ago
|
||
Alas, this got backed out to land on nj-central, and nj-central is having test-passing problems.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 25•15 years ago
|
||
I'm making a right mess of the i2f function on x86; njn said he'd help tomorrow, because I'm totally lost. An SSE2 portion that looks like this: void Assembler::asm_f2i(LInsp ins) { LIns *lhs = ins->oprnd1(); if (config.sse2) { Register rr = prepareResultReg(ins, GpRegs); Register ra = findRegFor(lhs, XmmRegs); SSE_CVTSD2SI(rr, ra); } else { Register ra = findRegFor(lhs, FpRegs); NanoAssert(ra == FST0); Register rr = ins->getReg(); if (isKnownReg(rr)) evict(ins); int d = findMemFor(ins); FISTP(d, FP); } freeResrouceOf(ins); } works. I can't figure out the right non-SSE2 code though, as I keep hitting this assertion: Assertion failed: (_allocator.active[FST0] && _fpuStkDepth == -1) || (!_allocator.active[FST0] && _fpuStkDepth == 0) (../nanojit/Assembler.cpp:339) On entry to the ... portion _fpuStkDepth is -1 if we just came from a fp op (often the arg is fadd or similar), FST0 is active (with lhs)... when FISTP() is called, fpu_pop() is called, and I end up with FST0 active still with lhs and _fpuStkDepth becomes -2. Help!
Comment 26•15 years ago
|
||
I saw that too, when I tested it here, but only on the --random test. I didn't fix it. In addition, f2i.in doesn't look right to me. Do you not want to use the floating-point opcodes? a = alloc 8 d = float 5.0 stfi d a 0 x = ldf a 0 i = f2i x ret i That clears up a "bad lexical cast" error for me (on both x86 and ARM). The "stfi" and "ldf" changes make no difference, but make more semantic sense (and they might have FP-specific side effects that I don't know about). ---- (In reply to comment #13) > Oh, right, I forgot we ran in RunFast mode. So this code is ok to go in, I > think? If you want to stick the pure ARM code in as a followup that'd be > great, but I'm not too worried about the non-VFP case. Yep, it's good to go. I'm not worried about the non-VFP case either; pretty much every target device will have VFP so there's little gain to be had from optimizing for non-VFP.
Status: REOPENED → NEW
Assignee | ||
Comment 27•15 years ago
|
||
Ok, now really updated and fixed. Passes all tests, had to rework f2i with njn's help.
Assignee: gal → vladimir
Attachment #421323 -
Flags: review?(nnethercote)
Comment 28•15 years ago
|
||
Comment on attachment 421323 [details] [diff] [review] nanojit patch Nit: can you add LIR_f2i to the comment on the LOP_I_F line in lirasm/LInsClasses.tbl? >@@ -1363,6 +1367,8 @@ namespace nanojit > { > LInsp args[MAXARGS]; > uint32_t argc = ins->argc(); >+ if (argc >= MAXARGS) >+ printf ("argc: %d MAXARGS: %d\n", argc, MAXARGS); Nit: debugging code? >-OPDEF(__94, 94, None, Void) >+OPDEF(f2i, 94, Op1, I32) // approximate f2i conversion (no exception raised) What exactly does "approximate" mean? Eg. if it's constant folded via ExprFilter could the result be different to the result if it's computed at run-time? What happens if an exceptional value occurs? Comment 4 and comment 5 discussed this, but the outcome wasn't entirely clear. A comment explaining this a bit more would be good, esp. the relation between f2i and i2f and any identities there. >+#define FISTP(d,b) FIST(1,d,b) Nit: this is unused. r+ from me with the nits fixed. But we should run this past one of the Adobe guys. Asking edwsmith for a review.
Attachment #421323 -
Flags: review?(nnethercote)
Attachment #421323 -
Flags: review?(edwsmith)
Attachment #421323 -
Flags: review+
Updated•15 years ago
|
Attachment #421323 -
Flags: review?(edwsmith) → review+
Assignee | ||
Comment 29•15 years ago
|
||
http://hg.mozilla.org/projects/nanojit-central/rev/95e6284a8725
Status: NEW → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•15 years ago
|
Whiteboard: fixed-in-nanojit
Comment 30•15 years ago
|
||
Follow-up ARM fix: http://hg.mozilla.org/projects/nanojit-central/rev/0628e6475f1f http://hg.mozilla.org/tracemonkey/rev/b5fe3850f255 http://hg.mozilla.org/tracemonkey/rev/1bf96abef84d
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey
Comment 31•15 years ago
|
||
We can't pull this into TR until we get the remaining backends (PPC and Sparc).
Comment 32•15 years ago
|
||
We should either provide implementations for all existing backends, or add a conditional compiletime flag (a la NJ_EXPANDED_LOADSTORE_SUPPORTED).
Comment 33•15 years ago
|
||
Add stub version of asm_f2i on Sparc and PPC to allow compilation. Also add NJ_F2I_SUPPORTED compiletime flag so that code can determine if the backend actually supports this opcode. (Intent is that the flag will go away when real PPC/Sparc implementations arrive.)
Attachment #422263 -
Flags: review?(gal)
Updated•15 years ago
|
Attachment #422263 -
Flags: review?(rreitmai)
Updated•15 years ago
|
Attachment #422263 -
Flags: review?(rreitmai) → review+
Comment 34•15 years ago
|
||
pushed in advance of gal's review to fix build break: http://hg.mozilla.org/projects/nanojit-central/rev/387c73e51e77
Updated•15 years ago
|
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Comment 35•15 years ago
|
||
added bug 540495 and bug 540496 to track implementation of PPC and Sparc versions.
Reporter | ||
Updated•15 years ago
|
Attachment #422263 -
Flags: review?(gal) → review+
Comment 36•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/8dfd412fcf59
You need to log in
before you can comment on or make changes to this bug.
Description
•