Closed
Bug 601135
Opened 15 years ago
Closed 15 years ago
Change LIR_d2i to prefer truncation over rounding
Categories
(Core Graveyard :: Nanojit, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wsharp, Unassigned)
References
Details
(Whiteboard: fixed-in-nanojit,fixed-in-tracemonkey,fixed-in-tamarin)
Attachments
(1 file, 2 obsolete files)
10.35 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
The Tamarin project wants to use asm_d2i (only on SSE capable machines) and prefers if d2i does truncation instead of rounding. 1.9 should become 1.0 instead of 2.0.
This requires changing CVTSD2SI into CVTTSD2SI for asm_d2i on i386 + x64.
Reporter | ||
Comment 1•15 years ago
|
||
Attachment #480121 -
Flags: review?(nnethercote)
Reporter | ||
Comment 2•15 years ago
|
||
Untested ARM code for truncating d2i (Tamarin never calls this):
+#define FTOSIZD(_Sd,_Dm) do { \
+ underrunProtect(4); \
+ NanoAssert(ARM_VFP); \
+ NanoAssert(((_Sd) == S14) && IsFpReg(_Dm)); \
+ *(--_nIns) = (NIns)( COND_AL | (0xEBD<<16) | (0x7<<12) | (0xBC<<4) | FpRegNum(_Dm) ); \
+ asm_output("ftosizd s14, %s", gpn(_Dm)); \
+ } while (0)
+
![]() |
||
Comment 3•15 years ago
|
||
Comment on attachment 480121 [details] [diff] [review]
Switch asm_d2i to call CVTTSD2SI
>-OP___(d2i, 113, Op1, I, 1) // convert double to int (no exceptions raised, platform rounding rules)
>+OP___(d2i, 113, Op1, I, 1) // convert double to int (no exceptions raised, platform truncation rules preferred but not guaranteed)
Sorry, "preferred but not guaranteed" is not acceptable. What you really
mean is "I've made it truncate on i386, X64 and ARM and I don't want to
think about the other back-ends right now".
If you want truncation, *require* truncation. As we discussed, there's
nothing stopping you from doing so, because no-one's relying on the rounding
behaviour currently. When you do that, please specify exactly what
"truncation" means using some positive and negative examples. That
obviously won't fit in a single line, so use a multi-line comment above the
d2i line (see 'modi' for a similar example).
The rest of the patch looks fine, but you'll need to consider the other
back-ends; at the very least, make them assert in asm_d2i() so the
maintainers can fix them later.
(An aside: AFAICT, changing d2i to truncate removes the one bit of undefined
behaviour in LIR, which I think is a good thing.)
Attachment #480121 -
Flags: review?(nnethercote) → review-
Reporter | ||
Comment 4•15 years ago
|
||
As we discussed (in email), there is no clean way to support truncate on non-SSE x86. Do we want to emit FP control word logic for non-SSE?
"The main stumbling point I see is non-SSE x86 support which would need to deal with FP control word logic. I don't think we want to support this:
uint16_t oldcw, newcw;
int32_t intval;
_asm fnstcw [oldcw];
_asm mov ax,[oldcw];
_asm or ax,0xc3f;
_asm mov [newcw],ax;
_asm fldcw [newcw];
_asm fld [value];
_asm fistp [intval];
_asm fldcw [oldcw];
return intval;"
Since Tamarin only uses the non-SSE variant, that is currently all we care about it. If i assert in the x86 non-SSE variant, is someone really going to fix it somehow? Or just remove the assert? I realize this is not the cleanest opcode design.
x64 - truncates
x86+SSE - truncates
x86+x97 - rounds
ARM - rounds
No other back ends support this opcode currently.
![]() |
||
Comment 5•15 years ago
|
||
So you are saying that Tamarin will only rely on the truncation behaviour if the machine is a SSE2-capable x86/x86_64 machine?
Probably the most sensible thing is to specify the varying rounding semantics clearly in LIRopcode.tbl, ie. this:
> x64 - truncates
> x86+SSE - truncates
> x86+x97 - rounds
> ARM - rounds
But fleshed out with the remaining platforms, some of which are not-yet-implemented, and with examples/clear descriptions of exactly what "rounding" and "truncating" means.
It's kind of gross specifying the semantics of LIR on a platform-by-platform basis, but that's what you want, right?
Comment 6•15 years ago
|
||
(In reply to comment #5)
> It's kind of gross specifying the semantics of LIR on a platform-by-platform
> basis, but that's what you want, right?
This would be an icky precedent to set. Unless there's a compelling performance reason to have different semantics on different platforms, settling on one everywhere is much preferable IMHO.
![]() |
||
Comment 7•15 years ago
|
||
(In reply to comment #6)
> (In reply to comment #5)
> > It's kind of gross specifying the semantics of LIR on a platform-by-platform
> > basis, but that's what you want, right?
>
> This would be an icky precedent to set. Unless there's a compelling performance
> reason to have different semantics on different platforms, settling on one
> everywhere is much preferable IMHO.
I agree! It's Adobe that wants this change :P
Another possibility: we already have some x86/x86_64-only opcodes (divi, modi). We could create d2itrunc (truncd2i?) and only define it on x86/x86_64. Not sure it's much better than per-platform semantics, though, and adding new opcodes requires additional work.
Comment 8•15 years ago
|
||
(In reply to comment #5)
> So you are saying that Tamarin will only rely on the truncation behaviour if
> the machine is a SSE2-capable x86/x86_64 machine?
Yes. The piece of code we want to inline is already a big ifdef nest, so its okay. The next platform of interest is ARM, but nobody has looked yet.
> Probably the most sensible thing is to specify the varying rounding semantics
> clearly in LIRopcode.tbl, ie. this:
>
> > x64 - truncates
> > x86+SSE - truncates
> > x86+x97 - rounds
> > ARM - rounds
>
> But fleshed out with the remaining platforms, some of which are
> not-yet-implemented, and with examples/clear descriptions of exactly what
> "rounding" and "truncating" means.
+1. Putting the cpu-specific instruction in the comment would make it really specific. The SSE2 comment should say that the escape value is 0x80000000 since frontends have to test for it.
> It's kind of gross specifying the semantics of LIR on a platform-by-platform
> basis, but that's what you want, right?
Yep. Nailing down LIR_d2i's behavior on SSE2 and leaving the other platforms as-is is less work than adding a new LIR instruction just for CVTTSD2SI. If we had to generalize to more CPUs, then I think adding new instructions (maybe d2itrunc) would be worth the work.
Comment 9•15 years ago
|
||
I jumped through essentially the same hoops when doing arch neutral IR
representations for floating point conversions in Valgrind. My
2 Euro-cents / random comments:
> It's kind of gross specifying the semantics of LIR on a platform-by-platform
> basis, but that's what you want, right?
Urr, don't go there. Semantic swamp ahead.
regarding
> uint16_t oldcw, newcw;
> int32_t intval;
> _asm fnstcw [oldcw];
> _asm mov ax,[oldcw];
> _asm or ax,0xc3f;
> _asm mov [newcw],ax;
> _asm fldcw [newcw];
> _asm fld [value];
> _asm fistp [intval];
> _asm fldcw [oldcw];
> return intval;"
as a Plan B. One thing nobody has said (afaics) is that fldcw is
an expensive operation (like, pipeline-flushingly-expensive),
especially on older hardware. You really don't want to be doing
that too often.
-----------
IEEE754 defines 4 rounding modes: to +inf, to -inf, to nearest, to
zero (P, M, N, Z). 'to nearest' is the default.
One thing you could do is clone d2i for as many of these as required,
viz: d2i_P, d2i_M, d2i_N, d2i_Z. That keeps the LIR arch-neutral
w.r.t. rounding, whilst allowing you to specify whatever kind you
want.
The backends should assume that the processor runs in N (to nearest)
by default. So for a d2i_N you can just emit a conversion insn and
nothing else. For a d2i_M (which is what Werner wants) you can emit
either CVTTSD2SI or the long, expensive sequence above, if that's
what's necessary.
The tricky bit is that this assumes that any helper functions called
from JITted code don't change the rounding mode away from _N. On at
least some platforms I've seen, the ABI supposedly guarantees this
but hmm ... Valgrind actually checks the rounding mode every time it
leaves JIT generated code, to catch any snafus, and we have caught
one, although it was a bug in the Linux kernel not preserving the mode
across context switches (amazingly).
It also means that _M (truncate) conversions are expensive unless the
hardware supports a _M-style conversion instruction that doesn't involve
messing with the FPU control word. These are becoming more common but
by no means universal and don't exist on remotely oldish hardware or
on some lower-end hardware where maxing out FP performance isn't a
priority.
Rather than "trunc" etc prefixes, I'd suggest using the 754
terminology (+Inf/-Inf/Zero/Nearest, P/M/Z/N), so we don't have to
keep remembering which of those "trunc" (and the unadorned version)
map to.
------
The above proposal tidies up the rounding issue a bit. It still
fudges on the overflow detection, because some platforms produce
0x80..00 for overflows, and others produce 0x80..00 or 0x7F..FF
depending on the sign.
Reporter | ||
Comment 10•15 years ago
|
||
The basic stumbling point is pre-SSE2 support on x86. Would it be possible to just drop support for this architecture for asm_d2i? We could put in an assert and/or just do rounding and all other platforms will truncate. Or support it with rounding mode and a very large warning comment.
If we want consistency for one opcode, whether or not we use d2i or introduce a new opcode, we need to somehow support x87/non-SSE2. If we need to emit those 8 instructions (which are terribly slow as Julian mentions), we will need to add a bunch of new instruction support the to native i386 backend. I can write this for d2i but the caveat is it will be much slower than rounding mode for your i2d(d2i() check.
Comment 11•15 years ago
|
||
(In reply to comment #10)
> The basic stumbling point is pre-SSE2 support on x86.
Does Mozilla still support pre-SSE2? Flash/AIR still do, but they are becoming a pretty small percentage of the market, so part of me feels like we don't need to use heroic measures for hardware that is generally >5 years old and no longer made...
Comment 12•15 years ago
|
||
There's no more stumbling block.
All we want (and Nick and I agree on this) is for the SSE2 case to use CVTTSD2SI, and for the other existing cases (x87, and other cpu's), to stay the way they are, and for the comments to be super clear about what you get in each case.
As for what the comments say, I liked Julian's suggested terminology (and opcode names) since it derives from the IEEE specs. I hadn't made the connection that CVTTSD2SI == truncate == round-towards -Inf, but its nice if the comments are grounded in spec-based specifics like that.
Comment 13•15 years ago
|
||
Maybe I'm just more unread in the ways of IEEE754 than I thought, but what njn deems N for nearest I'd always heard described as round to nearest even, which N does not immediately imply to me. "even" is the key word there for me, not nearest -- just saying this to keep it in mind for naming.
Comment 14•15 years ago
|
||
(In reply to comment #12)
> I hadn't made the
> connection that CVTTSD2SI == truncate == round-towards -Inf
Oops, I misspoke. CVTTSD2SI = truncate = round-towards-zero. (d2i_Z, rather than _M, if i'm reading the CVTTSD2SI spec right).
Comment 15•15 years ago
|
||
(In reply to comment #14)
> (In reply to comment #12)
> > I hadn't made the
> > connection that CVTTSD2SI == truncate == round-towards -Inf
>
> Oops, I misspoke. CVTTSD2SI = truncate = round-towards-zero. (d2i_Z, rather
> than _M, if i'm reading the CVTTSD2SI spec right).
Yeh, I was just going to say that. Except that it was my mistake
to begin with :) I must say, the Intel spec is a bit loosely worded,
but I think _Z is right.
Comment 16•15 years ago
|
||
(In reply to comment #13)
The 754 doc I have ("Reaffirmed May 21, 1991") says
Round to Nearest: [...] In this mode the representable value
nearest to the infinitely precise result shall be delivered; if
the two nearest representable values are equally near, the one
with its least significant bit zero shall be delivered.
So the even thing is kinda in there, but only for tiebreaker
purposes.
(I must say, I really like '754. It's a model of conciseness.)
Comment 17•15 years ago
|
||
Ah, my slight mis-memory. I think the sticking point for me is that *all* the rounding modes round to nearest in general. It's only in the equally-close special case that they differ, and how they differ there (up, down, even, zero) is what's most important when thinking about rounding modes.
Reporter | ||
Comment 18•15 years ago
|
||
I talked to Ed over IM who also talked to Andreas. The current proposal:
1. Leave LIR_d2i as is.
2. Add LIR_d2i_z which supports round to zero behavior. Specifically document CVTTSD2SI usage and also 0x80000000 failure value which we depend upon.
3. LIR_d2i_z will start with support on x86+SSE and x64. Non-SSE x86 will assert. (ARM should be easy with code from comment 2 but I have no way to test)
4. Add NJ_D2I_Z_SUPPORTED for x86 and x86
5. Rename NJ_F2I_SUPPORTED to NJ_D2I_SUPPORTED.
6. While I'm creating a patch, add NJ_DIVI_SUPPORTED define for platforms that support LIR_divi. I need this for bug 570476.
Reporter | ||
Comment 19•15 years ago
|
||
This patch contains my proposal in comment #18. I hadn't realize that LIR_d2i was supported on a bunch of platforms besides x86, arm, x64. LIR_d2i_z is currently only supported on x86+SSE and x64.
I did remove a bit of LIR_d2i support in ExprFilter::ins1.
case LIR_d2i:
- if (oprnd->isImmD())
- return insImmI(int32_t(oprnd->immD()));
A int32_t(double) cast is definitely different than what d2i performs in the JIT. It is closer to what d2i_z performs (maybe identical?) but seems iffy to leave this as-is. Tamarin does not currently depend upon this code.
Attachment #480121 -
Attachment is obsolete: true
Attachment #481337 -
Flags: review?(nnethercote)
Comment 20•15 years ago
|
||
(In reply to comment #18)
> I talked to Ed over IM who also talked to Andreas. The current proposal:
I'm guilty of waffling on this today, after IM'ing with Nick last night and Julian this afternoon. The new proposed name isn't firm, but the idea is to get something specific that becomes CVTTSD2SI on SSE2, at least. Also, ARM has a dedicated round-to-zero conversion instruction, which we will probably want soonish.
The clincher (for me at least) comes after re-reading Nick's most excellent post:
http://blog.mozilla.com/nnethercote/2010/02/01/a-win-for-code-hygiene/
The whole post is spot-on, but these two "lessons learned" stand out:
* Clean IR semantics are worth the effort...
* Listen to Julian, especially when he talks about correctness...
> I did remove a bit of LIR_d2i support in ExprFilter::ins1.
>
> case LIR_d2i:
> - if (oprnd->isImmD())
> - return insImmI(int32_t(oprnd->immD()));
>
> A int32_t(double) cast is definitely different than what d2i performs in the
> JIT. It is closer to what d2i_z performs (maybe identical?) but seems iffy to
> leave this as-is. Tamarin does not currently depend upon this code.
Speaking of getting it right (!). Could we preserve the optimization when int32_t(d) == d?
Comment 21•15 years ago
|
||
A bit of heresy but would it make sense to have a callout to a helper function on platforms that aren't currently supported?
This way we don't have to play macro games (esp. in the front end) and the behaviour is consistent across all platforms.
![]() |
||
Comment 22•15 years ago
|
||
I think this is all overkill. What Julian says in comment 9 is great in the Valgrind context, when an arch-neutral IR needs to represent every possible type of rounding instruction. And if we found ourselves in the position of needing multiple rounding modes, that's the right direction in which to head.
But we only need one specific rounding mode, and it's not even needed on all platforms.
As for naming ("truncate" vs. "round towards zero" or whatever) I'm happy to go with IEEE terminology or whatever others deem appropriate. But that's moot if we go with the original simple suggestion, as we'd just stick with 'd2i'.
> * Clean IR semantics are worth the effort...
> * Listen to Julian, especially when he talks about correctness...
IMO, that case was a bit different to this one. If we go with the simple solution here, we'll have semantics that are a bit odd but still well-defined. That blog post was talking about the old LIR_ov instruction which subverted the semantics by involving the condition codes which weren't modelled in the IR.
As for non-SSE2 machines, AFAIK it's not yet decided if Mozilla will support them for Firefox 4.0, so we should assume for the moment that we will suppport them. I don't care if the non-SSE2 code is bad, though.
Reporter | ||
Comment 23•15 years ago
|
||
Nick, what happened to your thought that "It's kind of gross specifying the semantics of LIR on a platform-by-platform basis".
Is that your proposed solution? Or should we add an entirely new opcode? Those seem like the two choices and either way is fine with me. There are a lot of platforms that support d2i (more than I thought originally) and I don't know exactly what their "platform rounding rules" are currently so LIR_d2i is already somewhat vague.
Who can make a final decision on how to proceed with this bug?
![]() |
||
Comment 24•15 years ago
|
||
(In reply to comment #23)
> Nick, what happened to your thought that "It's kind of gross specifying the
> semantics of LIR on a platform-by-platform basis".
I think it's the least worst solution.
> Who can make a final decision on how to proceed with this bug?
Ed, I guess!
![]() |
||
Comment 25•15 years ago
|
||
Comment on attachment 481337 [details] [diff] [review]
2nd round
As I said earlier, I think a new opcode is needless complexity. Ed, do you want to weigh in again?
Attachment #481337 -
Flags: review?(nnethercote) → review-
Comment 26•15 years ago
|
||
I thought I already did, but I obviously don't see my own comment here (oops! dang.)
Earlier this week I wrote that I value consensus more than the extra opcode, and we should just go with the first patch that just uses CVTTSD2SI on SSE2 cpus with no changes to other cpus, and comments fixed as suggested to clarify the behavior for each cpu. I also suggest a comment pointing to this bug saying something to the effect of: "If rounding-mode-specific d2i opcodes are needed in the future, bug XXX has discussion and a proof of concept"
![]() |
||
Comment 27•15 years ago
|
||
(In reply to comment #26)
>
> and comments fixed as suggested to clarify
> the behavior for each cpu.
Mentioning the exact machine instruction used on each platform would be a good thing to include in those comments.
Reporter | ||
Comment 28•15 years ago
|
||
Attachment #481337 -
Attachment is obsolete: true
Attachment #487951 -
Flags: review?(nnethercote)
![]() |
||
Comment 29•15 years ago
|
||
Comment on attachment 487951 [details] [diff] [review]
change x64/x86, add comments for all platforms
Looks good, thanks for persevering.
Attachment #487951 -
Flags: review?(nnethercote) → review+
Reporter | ||
Comment 30•15 years ago
|
||
![]() |
||
Comment 31•15 years ago
|
||
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey
Comment 32•15 years ago
|
||
Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey → fixed-in-nanojit,fixed-in-tracemonkey,fixed-in-tamarin
Comment 33•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•