Closed Bug 562653 Opened 15 years ago Closed 6 years ago

modulo op (%) is slow

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Q2 12 - Cyril

People

(Reporter: pnkfelix, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN, has-patch)

Attachments

(1 file, 1 obsolete file)

Add integer specialized mod call, tweak Win32 mod routine 15 years ago Werner Sharp 3.87 KB, patch		Details \| Diff \| Splinter Review
updated patch including inlining mod on x86 14 years ago Werner Sharp 7.07 KB, patch	wmaddox : review-	Details \| Diff \| Splinter Review

Felix S. Klock II [:pnkfelix, :fklock]

Reporter

Description

•

15 years ago

From the lego mailing list: "Quick question for the player engineers: Any ideas why the performance of the modulus operator is so poor? For example: a = i%5; b = i-(i/5|0)*5; // b is about 50% faster"

Steven Johnson

Comment 1

•

15 years ago

Perhaps because the modulo call generates a function call (with associated overhead) vs inlined SSE2 instructions for the latter?

Edwin Smith

Updated

•

15 years ago

Blocks: 552542

Priority: -- → P2

Hardware: x86 → All

Edwin Smith

Comment 2

•

15 years ago

Plenty of room for specialization of % (nonzero-int rhs, in this case for example)

Werner Sharp

Updated

•

15 years ago

Assignee: nobody → wsharp

Werner Sharp

Comment 3

•

15 years ago

Attached patch Add integer specialized mod call, tweak Win32 mod routine (obsolete) — Details — Splinter Review

Added code in Specializer to handle int(fmod(double(a), double(b)) where a and b are integers. Integer modulo is over 10x faster than FP. Tweaked MathUtils::mod so it does not have to call modInternal. About 10% faster. Also added constant folding to Specializer code...my first experiments to get in there were constant folding away a subtract operation but still leaving the int->double->int roundtrip. Further support would be to inline mod routine directly: if (b = 0) result=0 else load a, cdq, idiv b, get mod result from x86 register directly.

Attachment #449319 - Flags: superreview?(edwsmith)

Attachment #449319 - Flags: review?(stejohns)

Lars T Hansen

Comment 4

•

15 years ago

This is cool, but I'm not sure C++ '%' always (ie on all platforms and compilers) has the same semantics as ActionScript '%'. That needs investigation. Specifically, the C++ spec says that if either operand is negative then the sign of the remainder is implementation-defined (C++ 1998 5.6 subsection 4). In ECMAScript, "the sign of the result equals the sign of the dividend" (ES5 11.5.3). The ES5 spec also makes the point that the semantics of remainder are those of fmod() in the C library, and of Java. If we've been using fmod() for % up until now then ActionScript probably has those semantics in fact, as well. I think the C++ rule comes from C. I don't have a copy of the C rationale but I believe the rule is what it is to cater to different behavior on different CPUs. The compiler also figures into it; SPARC has a divide operation that does not provide the remainder, for example. x86 follows the ES spec.

Status: NEW → ASSIGNED

Werner Sharp

Comment 5

•

15 years ago

What is the process for verifying results on all supported platforms? 1. Get QE (or myself) to write a test case for negative mod values? 2. Run a buildbot verifying correct results on all platforms? 3. Add the change to a user depot? 4. Run buildbot on my user depot? Simple test with x86 result with and without changelist applied: function modTest(a:int,b:int) { var c:int = a % b; trace(a + " % " + b + " = " + c); } modTest(-12,-5); // -2 modTest(12,-5); // 2 modTest(-12,5); // -2 modTest(12,5); // 2 modTest(-10,3); // -1 modTest(10,3); // 1 modTest(10,-3) // 1 modTest(-10,-3) // -1

Edwin Smith

Comment 6

•

15 years ago

(In reply to comment #5) > What is the process for verifying results on all supported platforms? > > 1. Get QE (or myself) to write a test case for negative mod values? > 2. Run a buildbot verifying correct results on all platforms? > 3. Add the change to a user depot? > 4. Run buildbot on my user depot? basically, yes. If you review the test suite and find that all the necessary tests are already done, then super. but odds are they won't be and its probably more expedient to write them. While you're at it, consider adding tests with fractions on the lhs and rhs. > > Simple test with x86 result with and without changelist applied: > > function modTest(a:int,b:int) > { > var c:int = a % b; > trace(a + " % " + b + " = " + c); > } > modTest(-12,-5); // -2 > modTest(12,-5); // 2 > modTest(-12,5); // -2 > modTest(12,5); // 2 > modTest(-10,3); // -1 > modTest(10,3); // 1 > modTest(10,-3) // 1 > modTest(-10,-3) // -1

Edwin Smith

Comment 7

•

15 years ago

Comment on attachment 449319 [details] [diff] [review] Add integer specialized mod call, tweak Win32 mod routine Switching primary reviewer to Bill since he's working on similar optimizations for other operators. Steven - feedback/review welcome of course.

Attachment #449319 - Attachment is patch: true

Attachment #449319 - Flags: review?(stejohns) → review?(wmaddox)

Steven Johnson

Comment 8

•

15 years ago

Patch uses naked "int" types for the new function, rather than C99 types; new code should always use C99 types unless there's a compelling reason otherwise.

Edwin Smith

Comment 9

•

15 years ago

Comment on attachment 449319 [details] [diff] [review] Add integer specialized mod call, tweak Win32 mod routine cancelling my SR until cross-platform issues of C++ % are addressed.

Attachment #449319 - Flags: superreview?(edwsmith) → superreview?

Edwin Smith

Updated

•

15 years ago

Attachment #449319 - Flags: superreview?

Edwin Smith

Updated

•

15 years ago

Assignee: wsharp → nobody

Component: Virtual Machine → JIT Compiler (NanoJIT)

Werner Sharp

Updated

•

15 years ago

Attachment #449319 - Flags: review?(wmaddox)

Werner Sharp

Comment 10

•

14 years ago

Lars/Ed, I'd like to push this patch to my sandbox and run the tryserver on all platforms to version cross-platform C++ issues. Do you have any specific integer mod tests should I test besides the following? > function modTest(a:int,b:int) > { > var c:int = a % b; > trace(a + " % " + b + " = " + c); > } > modTest(-12,-5); // -2 > modTest(12,-5); // 2 > modTest(-12,5); // -2 > modTest(12,5); // 2 > modTest(-10,3); // -1 > modTest(10,3); // 1 > modTest(10,-3) // 1 > modTest(-10,-3) // -1

Lars T Hansen

Comment 11

•

14 years ago

It's always good to toss in maximal and minimal int values, and zeroes just to make sure they work (they need to) - both with and without an explicit type annotation on 'c'. And test uint in addition to int, perhaps.

Werner Sharp

Comment 12

•

14 years ago

Attached patch updated patch including inlining mod on x86 — Details — Splinter Review

Interestingly, inlining the modi(div(a,b)) call for x86 is no faster than calling out to a helper function to perform the same thing. NJ is not quite as efficient because of the two sti() calls which write the intermediate result from a register to memory. FASTCALL is also no faster because it puts the registers in the wrong location for the x86 DIV instruction. Calling int helper is still 3x faster than float variant.

Attachment #449319 - Attachment is obsolete: true

Andre Kruetzfeldt

Updated

•

14 years ago

Assignee: nobody → wmaddox

Trevor Baker

Updated

•

14 years ago

Flags: flashplayer-bug-

Whiteboard: PACMAN → PACMAN, has-patch, must-fix-candidate

Trevor Baker

Updated

•

14 years ago

Attachment #485766 - Flags: review?(wmaddox)

Andre Kruetzfeldt

Updated

•

14 years ago

Flags: flashplayer-injection-

William Maddox

Assignee

Comment 13

•

14 years ago

(In reply to comment #12) R-, mostly because the patch is seriously stale. > Interestingly, inlining the modi(div(a,b)) call for x86 is no faster than > calling out to a helper function to perform the same thing. This is a bit surprising, as the arguments have to go to memory for the helper, but perhaps we're seeing x86 microarchitectural trickery. In any case, we should remove the code in the '#if 0' branch if not useful. The patch will not apply against the current TR due to significant reorganization of the specialization code, including some changes made to keep specialization from breaking CSE. In the new design, it is awkward to specialize on both arguments and results simultaneously. Instead, we should specialize on the integer arguments in inlineBuiltinFunctions(), producing a double result, and then specialize the resulting call on the integer result in coerceNumberToInt(). This may be improve peformance as well in cases where a Number result is needed, as doing an integer modulo and then converting the result to double gives the correct answer in this case. A comment is in order in MathUtils::modInteger() explaining the result of zero when the divisor is zero. The reason is that the specialization is taking place in a context where the result is coerced to int, hence the actual modulus result NaN is coerced to zero.

William Maddox

Assignee

Updated

•

14 years ago

Attachment #485766 - Flags: review?(wmaddox) → review-

Andre Kruetzfeldt

Updated

•

14 years ago

Target Milestone: Q3 11 - Serrano → Future

Andre Kruetzfeldt

Updated

•

14 years ago

Priority: P2 → P3

Dan Smith

Updated

•

13 years ago

Whiteboard: PACMAN, has-patch, must-fix-candidate → PACMAN, has-patch

Target Milestone: Future → Q1 12 - Brannan

Dan Smith

Updated

•

13 years ago

Flags: flashplayer-qrb+

Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril

Sylvestre Ledru [:Sylvestre]

Comment 14

•

6 years ago

Tamarin is a dead project now. Mass WONTFIX.

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Sylvestre Ledru [:Sylvestre]

Comment 15

•

6 years ago

Tamarin isn't maintained anymore. WONTFIX remaining bugs.

You need to log in before you can comment on or make changes to this bug.