Closed
Bug 570476
Opened 15 years ago
Closed 14 years ago
Specializer could support integer division for non-zero integer constants.
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
Q3 11 - Serrano
People
(Reporter: wsharp, Unassigned)
References
Details
(Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)
Attachments
(1 file, 3 obsolete files)
4.41 KB,
patch
|
wmaddox
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Reporter | ||
Comment 1•15 years ago
|
||
We can not promote all division operations to integer division without supporting divide by zero:
a:int
b:int
c:int
c = a / b; // if b is zero, we need to handle this:
but it would easy enough to handle when b was a non-zero constant:
c = a / 5;
Changing the specializer code to handle this case doubles the performance of division on my x86 XEON processor. Code for patch:
else if (op == LIR_divd)
{
LIns *a = v->oprnd1();
LIns *b = v->oprnd2();
a = isPromote(a->opcode()) ? a->oprnd1() : imm2Int(a);
b = imm2Int(b);
if (a && b && b->immI() != 0)
return out->ins2(LIR_divi, a, b);
}
Whiteboard: PACMAN
Comment 2•15 years ago
|
||
Sounds nice. I wonder if it would be profitable to handle non-constant cases here too, with a check-for-zero at runtime? (Results would all have to be promoted to Number since 0 -> Inf but the time savings might still be worthwhile)
Reporter | ||
Comment 3•15 years ago
|
||
The case I was optimize for was when the result stays as integer. In that case, a divide by zero equals zero. To simulate a runtime check, I added AS3 code in my loop...
if (!d)
c = 0
else
c = a / d;
This adds a slight bit of overhead to the optimization that we could theoretically emit in the JITed code. One interesting thing is integer division on my Xeon is about 2x faster for small numbers (99/5), about 30% faster for larger numbers and only 10% faster for very large numbers (1000000+). So the big win in really only with small integer values, at least on x86.
Assignee: nobody → wsharp
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Reporter | ||
Updated•14 years ago
|
Assignee: wsharp → nobody
Reporter | ||
Updated•14 years ago
|
Summary: Specializer could support integer division for positive integer constants. → Specializer could support integer division for non-zero integer constants.
Reporter | ||
Comment 4•14 years ago
|
||
Brightspot data from the specializer where we have an integer_d following a LIR_divd
0/1 values are:
a->isPromote(), b->isPromote(), a->imm2Int() != 0, b->imm2Int() != 0, a && b, our integer constant is non-zero.
So 216 divide ops have a int->double promotion on one side plus a non-zero constant on the other. These we can optimize to an integer divide:
216 DIVD 1 0 0 1 1 (nonzero const=1) int / constant int - we can optimize
208 DIVD 0 0 0 1 0 (nonzero const=1) float / constant int
149 DIVD 1 0 0 0 0 (nonzero const=0) int / float
111 DIVD 0 0 0 0 0 (nonzero const=0) float / float
20 DIVD 0 1 0 0 0 (nonzero const=0) float / int
19 DIVD 1 1 0 0 1 (nonzero const=0) int but not a constant divisor
9 DIVD 0 0 1 0 0 (nonzero const=0) constant int / float
6 DIVD 0 1 1 0 1 (nonzero const=0) constant / int
And a dump of what our constants are:
142 DIVD 1 0 0 1 1 (nonzero const=2)
11 DIVD 1 0 0 1 1 (nonzero const=1000)
8 DIVD 1 0 0 1 1 (nonzero const=3)
7 DIVD 1 0 0 1 1 (nonzero const=4)
7 DIVD 1 0 0 1 1 (nonzero const=1024)
5 DIVD 1 0 0 1 1 (nonzero const=8)
4 DIVD 1 0 0 1 1 (nonzero const=60)
3 DIVD 1 0 0 1 1 (nonzero const=16)
1 DIVD 1 0 0 1 1 (nonzero const=108)
1 DIVD 1 0 0 1 1 (nonzero const=18)
1 DIVD 1 0 0 1 1 (nonzero const=10)
1 DIVD 1 0 0 1 1 (nonzero const=60000)
1 DIVD 1 0 0 1 1 (nonzero const=5)
Perhaps we should specialize the /2 as >>1 beyond optimizing the integer division itself.
Reporter | ||
Comment 5•14 years ago
|
||
If our divisor is a non-zero integer constant, optimize it to an integer divide. If our dividend is unsigned and our divisor is 2, use a rshlui. Out of 249 divide by 2 calls, 36 of them were an unsigned dividend. right-shift does not work for signed variables since (-1>>1 != -1/2).
Attachment #478843 -
Flags: superreview?(edwsmith)
Attachment #478843 -
Flags: review?(wmaddox)
Comment 6•14 years ago
|
||
(In reply to comment #4)
> Perhaps we should specialize the /2 as >>1 beyond optimizing the integer
> division itself.
Not too hard to make this more general: if exactlyOneBit(rhs) == true, then the RHS is a power of 2. Then, nanojit::msbSet32 or lsbSet32() will give you the shift amount.
Comment 7•14 years ago
|
||
Also beware LIR_divi is currently only implemented on x86-32
Comment 8•14 years ago
|
||
Comment on attachment 478843 [details] [diff] [review]
support for faster integer divide specialization
Looks good. R+
It would be possible to avoid the call imm2Int(b) if a == NULL,
rather than computing both a and b and testing a && b. The same
could be said for the preceding addd/subd/muld case as well, though.
Do any of the performance suite benchmarks show significant improvement?
Attachment #478843 -
Flags: review?(wmaddox) → review+
Reporter | ||
Comment 9•14 years ago
|
||
Made changes suggested from both Bill and Ed. IA32 and AMD64 are the two backends that support LIR_divi currently.
I have only tested this with micro benchmarks:
int/2 - float 497 msec, int 456 msec
unsigned/2 - float 614 msec, int shift 266 msec
int/3 - float 1172, int 758 msec
uint/1024 - float 614 msec, int shift 268 msec
Attachment #478843 -
Attachment is obsolete: true
Attachment #479044 -
Flags: superreview?(edwsmith)
Attachment #479044 -
Flags: review?(wmaddox)
Attachment #478843 -
Flags: superreview?(edwsmith)
Comment 10•14 years ago
|
||
Comment on attachment 479044 [details] [diff] [review]
incorporate review suggestions. #ifdef ia32 and amd64
Other CPU-specific features have things like NJ_EXPANDED_LOADSTORE_SUPPORTED, NJ_F2I_SUPPORTED, etc... if there isn't one for LIR_divi, we should add one and use it, rather than adding CPU-specific ifdefs here
Reporter | ||
Comment 11•14 years ago
|
||
Attachment #479044 -
Attachment is obsolete: true
Attachment #479521 -
Flags: superreview?(edwsmith)
Attachment #479521 -
Flags: review?(wmaddox)
Attachment #479044 -
Flags: superreview?(edwsmith)
Attachment #479044 -
Flags: review?(wmaddox)
Comment 12•14 years ago
|
||
Comment on attachment 479521 [details] [diff] [review]
Update patch with NJ_DIVI_SUPPORTED flag
clearing SR flag until rebased (Specializer went away in bug 600649).
Attachment #479521 -
Flags: superreview?(edwsmith)
Comment 13•14 years ago
|
||
See also https://bugzilla.mozilla.org/show_bug.cgi?id=600459, "better codegen for div/mod by a constant"
Comment 14•14 years ago
|
||
Comment on attachment 479521 [details] [diff] [review]
Update patch with NJ_DIVI_SUPPORTED flag
R-, since the patch needs rebasing as Edwin noted above.
While you are at it, you might consider multiplication by powers of two.
Also, the integer-specific opcode multiply_i would benefit from the same
optimization. (There is no divide_i.)
I suspect we are more likely to see int rather than uint arguments, so
it might be slightly preferable to reverse the operands to the '&&' below:
if (exactlyOneBit(intConst) && aOpcode == LIR_ui2d) {
Attachment #479521 -
Flags: review?(wmaddox) → review-
Reporter | ||
Comment 15•14 years ago
|
||
Rebased to current codebase. Please review in isolation from bug 600585 which if it lands first will modify this patch ever so slightly. (a trivial change that I don't think requires a re-review).
Attachment #479521 -
Attachment is obsolete: true
Attachment #482396 -
Flags: superreview?(edwsmith)
Attachment #482396 -
Flags: review?(wmaddox)
Comment 16•14 years ago
|
||
Comment on attachment 482396 [details] [diff] [review]
rebase patch
R+
Nit:
You rely on the default in Native.h to define NJ_DIVI_SUPPORTED on unsupported platforms. In the machine-specific NativeXXX.h files, it appears that the convention is to enumerate the unsuported options as well, explicitly giving them a value of zero. I have not verified whether this is done with complete consistency. Following this convention while having a transparent fallback to a default seems to be just asking for inconsistent application of the convention to creep int. I'm not sure what the policy should be, but perhaps it is best to go with the flow for the purposes of this patch.
Attachment #482396 -
Flags: review?(wmaddox) → review+
Reporter | ||
Updated•14 years ago
|
Attachment #482396 -
Flags: superreview?(edwsmith)
Reporter | ||
Comment 17•14 years ago
|
||
Reporter | ||
Comment 18•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 19•14 years ago
|
||
Is it silly to ask why only the x86 backend supports LIR_divi? (Surely ARM has an integer division instruction?)
Reporter | ||
Comment 20•14 years ago
|
||
ARM has no hardware support for integer division. It's done in a C++ helper.
Comment 21•14 years ago
|
||
Whiteboard: PACMAN → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey
Comment 22•14 years ago
|
||
Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Updated•14 years ago
|
Flags: flashplayer-bug+
You need to log in
before you can comment on or make changes to this bug.
Description
•