Closed
Bug 562653
Opened 15 years ago
Closed 6 years ago
modulo op (%) is slow
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)
Tamarin Graveyard
Baseline JIT (CodegenLIR)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Q2 12 - Cyril
People
(Reporter: pnkfelix, Assigned: wmaddox)
References
Details
(Whiteboard: PACMAN, has-patch)
Attachments
(1 file, 1 obsolete file)
7.07 KB,
patch
|
wmaddox
:
review-
|
Details | Diff | Splinter Review |
From the lego mailing list:
"Quick question for the player engineers: Any ideas why the performance of the modulus operator is so poor? For example:
a = i%5;
b = i-(i/5|0)*5;
// b is about 50% faster"
Comment 1•15 years ago
|
||
Perhaps because the modulo call generates a function call (with associated overhead) vs inlined SSE2 instructions for the latter?
Updated•15 years ago
|
Comment 2•15 years ago
|
||
Plenty of room for specialization of % (nonzero-int rhs, in this case for example)
Updated•15 years ago
|
Assignee: nobody → wsharp
Comment 3•15 years ago
|
||
Added code in Specializer to handle int(fmod(double(a), double(b)) where a and b are integers. Integer modulo is over 10x faster than FP. Tweaked MathUtils::mod so it does not have to call modInternal. About 10% faster. Also added constant folding to Specializer code...my first experiments to get in there were constant folding away a subtract operation but still leaving the int->double->int roundtrip.
Further support would be to inline mod routine directly: if (b = 0) result=0 else load a, cdq, idiv b, get mod result from x86 register directly.
Attachment #449319 -
Flags: superreview?(edwsmith)
Attachment #449319 -
Flags: review?(stejohns)
Comment 4•15 years ago
|
||
This is cool, but I'm not sure C++ '%' always (ie on all platforms and compilers) has the same semantics as ActionScript '%'. That needs investigation. Specifically, the C++ spec says that if either operand is negative then the sign of the remainder is implementation-defined (C++ 1998 5.6 subsection 4). In ECMAScript, "the sign of the result equals the sign of the dividend" (ES5 11.5.3). The ES5 spec also makes the point that the semantics of remainder are those of fmod() in the C library, and of Java. If we've been using fmod() for % up until now then ActionScript probably has those semantics in fact, as well.
I think the C++ rule comes from C. I don't have a copy of the C rationale but I believe the rule is what it is to cater to different behavior on different CPUs. The compiler also figures into it; SPARC has a divide operation that does not provide the remainder, for example. x86 follows the ES spec.
Status: NEW → ASSIGNED
Comment 5•15 years ago
|
||
What is the process for verifying results on all supported platforms?
1. Get QE (or myself) to write a test case for negative mod values?
2. Run a buildbot verifying correct results on all platforms?
3. Add the change to a user depot?
4. Run buildbot on my user depot?
Simple test with x86 result with and without changelist applied:
function modTest(a:int,b:int)
{
var c:int = a % b;
trace(a + " % " + b + " = " + c);
}
modTest(-12,-5); // -2
modTest(12,-5); // 2
modTest(-12,5); // -2
modTest(12,5); // 2
modTest(-10,3); // -1
modTest(10,3); // 1
modTest(10,-3) // 1
modTest(-10,-3) // -1
Comment 6•15 years ago
|
||
(In reply to comment #5)
> What is the process for verifying results on all supported platforms?
>
> 1. Get QE (or myself) to write a test case for negative mod values?
> 2. Run a buildbot verifying correct results on all platforms?
> 3. Add the change to a user depot?
> 4. Run buildbot on my user depot?
basically, yes. If you review the test suite and find that all the necessary tests are already done, then super. but odds are they won't be and its probably more expedient to write them.
While you're at it, consider adding tests with fractions on the lhs and rhs.
>
> Simple test with x86 result with and without changelist applied:
>
> function modTest(a:int,b:int)
> {
> var c:int = a % b;
> trace(a + " % " + b + " = " + c);
> }
> modTest(-12,-5); // -2
> modTest(12,-5); // 2
> modTest(-12,5); // -2
> modTest(12,5); // 2
> modTest(-10,3); // -1
> modTest(10,3); // 1
> modTest(10,-3) // 1
> modTest(-10,-3) // -1
Comment 7•15 years ago
|
||
Comment on attachment 449319 [details] [diff] [review]
Add integer specialized mod call, tweak Win32 mod routine
Switching primary reviewer to Bill since he's working on similar optimizations for other operators. Steven - feedback/review welcome of course.
Attachment #449319 -
Attachment is patch: true
Attachment #449319 -
Flags: review?(stejohns) → review?(wmaddox)
Comment 8•15 years ago
|
||
Patch uses naked "int" types for the new function, rather than C99 types; new code should always use C99 types unless there's a compelling reason otherwise.
Comment 9•15 years ago
|
||
Comment on attachment 449319 [details] [diff] [review]
Add integer specialized mod call, tweak Win32 mod routine
cancelling my SR until cross-platform issues of C++ % are addressed.
Attachment #449319 -
Flags: superreview?(edwsmith) → superreview?
Updated•15 years ago
|
Attachment #449319 -
Flags: superreview?
Updated•15 years ago
|
Assignee: wsharp → nobody
Component: Virtual Machine → JIT Compiler (NanoJIT)
Updated•15 years ago
|
Attachment #449319 -
Flags: review?(wmaddox)
Comment 10•14 years ago
|
||
Lars/Ed, I'd like to push this patch to my sandbox and run the tryserver on all platforms to version cross-platform C++ issues. Do you have any specific integer mod tests should I test besides the following?
> function modTest(a:int,b:int)
> {
> var c:int = a % b;
> trace(a + " % " + b + " = " + c);
> }
> modTest(-12,-5); // -2
> modTest(12,-5); // 2
> modTest(-12,5); // -2
> modTest(12,5); // 2
> modTest(-10,3); // -1
> modTest(10,3); // 1
> modTest(10,-3) // 1
> modTest(-10,-3) // -1
Comment 11•14 years ago
|
||
It's always good to toss in maximal and minimal int values, and zeroes just to make sure they work (they need to) - both with and without an explicit type annotation on 'c'. And test uint in addition to int, perhaps.
Comment 12•14 years ago
|
||
Interestingly, inlining the modi(div(a,b)) call for x86 is no faster than calling out to a helper function to perform the same thing. NJ is not quite as efficient because of the two sti() calls which write the intermediate result from a register to memory.
FASTCALL is also no faster because it puts the registers in the wrong location for the x86 DIV instruction.
Calling int helper is still 3x faster than float variant.
Attachment #449319 -
Attachment is obsolete: true
Updated•14 years ago
|
Assignee: nobody → wmaddox
Updated•14 years ago
|
Flags: flashplayer-bug-
Whiteboard: PACMAN → PACMAN, has-patch, must-fix-candidate
Updated•14 years ago
|
Attachment #485766 -
Flags: review?(wmaddox)
Updated•14 years ago
|
Flags: flashplayer-injection-
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #12)
R-, mostly because the patch is seriously stale.
> Interestingly, inlining the modi(div(a,b)) call for x86 is no faster than
> calling out to a helper function to perform the same thing.
This is a bit surprising, as the arguments have to go to memory for the helper, but perhaps we're seeing x86 microarchitectural trickery.
In any case, we should remove the code in the '#if 0' branch if not useful.
The patch will not apply against the current TR due to significant reorganization of the specialization code, including some changes made to keep specialization from breaking CSE. In the new design, it is awkward to specialize on both arguments and results simultaneously. Instead, we should specialize on the integer arguments in inlineBuiltinFunctions(), producing a double result, and then specialize the resulting call on the integer result in coerceNumberToInt(). This may be improve peformance as well in cases where a Number result is needed, as doing an integer modulo and then converting the result to double gives the correct answer in this case.
A comment is in order in MathUtils::modInteger() explaining the result of zero when the divisor is zero. The reason is that the specialization is taking place in a context where the result is coerced to int, hence the actual modulus result NaN is coerced to zero.
Assignee | ||
Updated•14 years ago
|
Attachment #485766 -
Flags: review?(wmaddox) → review-
Updated•14 years ago
|
Target Milestone: Q3 11 - Serrano → Future
Updated•14 years ago
|
Priority: P2 → P3
Whiteboard: PACMAN, has-patch, must-fix-candidate → PACMAN, has-patch
Target Milestone: Future → Q1 12 - Brannan
Flags: flashplayer-qrb+
Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril
Comment 14•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 15•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•