Closed Bug 606561 Opened 14 years ago Closed 13 years ago

optimize various Math calls

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q4 11 - Anza

People

(Reporter: wsharp, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN WE:3001900)

Attachments

(6 files, 5 obsolete files)

add support for a few math routines on the list 14 years ago Werner Sharp 15.49 KB, patch		Details \| Diff \| Splinter Review
optimize math calls, string.length, cleanup inlineBuiltinFunctions 14 years ago Werner Sharp 31.83 KB, patch		Details \| Diff \| Splinter Review
slight update to last patch 14 years ago Werner Sharp 31.84 KB, patch	wmaddox : review+	Details \| Diff \| Splinter Review
dont rely on buggy cmovd 14 years ago Werner Sharp 26.95 KB, patch	wmaddox : review-	Details \| Diff \| Splinter Review
fix math.abs(0x80000000) 14 years ago Werner Sharp 28.16 KB, patch	edwsmith : superreview+	Details \| Diff \| Splinter Review
(v3) tighten up inlined Math.abs code 14 years ago Edwin Smith 24.59 KB, patch		Details \| Diff \| Splinter Review
(v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max 13 years ago William Maddox 25.40 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review
Math.abs(0) ensure is +0 13 years ago Brent Baker 2.13 KB, patch		Details \| Diff \| Splinter Review
v2. Math.abs(0) ensure is +0 13 years ago Brent Baker 3.06 KB, patch		Details \| Diff \| Splinter Review
Produce correctly-signed result for Math.abs(-0), add test case 13 years ago William Maddox 3.97 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review
Another try at fixing abs(). Specialize to direct MathUtils call only. 13 years ago William Maddox 7.21 KB, patch	rreitmai : review+	Details \| Diff \| Splinter Review

Werner Sharp

Reporter

Description

•

14 years ago

From the brightspot data from bug 588922, we should optimize various Math routines to either emit custom JIT code or to directly bind to the routine (no thunk). The potential list: Math.round Math.floor Math._max (2 arg) Math.abs Math._min (2 arg) Math.sin Math.random Math.ceil Math.cos Math.pow Math.sqrt

Werner Sharp

Reporter

Updated

•

14 years ago

Whiteboard: PACMAN

Werner Sharp

Reporter

Comment 1

•

14 years ago

Attached patch add support for a few math routines on the list (obsolete) — Details — Splinter Review

Patch which changes inlineBuiltinFunction to be table driven. Add helper functions for isNaN, Math.min, Math.max. Add table entries for charCodeAt, Math.sin. Looking for feedback on the basic approach related to discussion in bug 598322.

Attachment #485380 - Flags: feedback?(wmaddox)

Attachment #485380 - Flags: feedback?(rreitmai)

Edwin Smith

Updated

•

14 years ago

Blocks: 606956

William Maddox

Assignee

Comment 2

•

14 years ago

(In reply to comment #1) > Created attachment 485380 [details] [diff] [review] > Looking for feedback on the basic approach related to discussion in bug 598322. We need not search the entirety of specializedFunctions[], as we have already done a switch on the method id and dispatched to a case label specific to that id, and can simply invoke a specialization method with the subset of the table that is applicable to that id. It is then necessary only to iterate over the possible argument matches, making the lookup time insensitive to the number of functions for which specializations are defined. It is probably possible to simplify the argument matching and specialization code a bit. Taking advantage of the the initial switch on the method id, we could invoke a separate specialization method for each arity, reducing the internal branchiness of the specialization method(s). I have an inkling that further streamlining is possible -- I'll take a closer look and get back to you with with a more careful analysis.

Werner Sharp

Reporter

Comment 3

•

14 years ago

I don't think as this work progresses that we want to have a large switch statement with 30+ cases to handle each specialization, even if they are table driven. (Or 30 individual tables, one for each native ID.) It would be better to generate a HashTable to find a match. We definitely can't linearly search for a match on each native which is why I started the top-level case to begin with. HashKey could be (id, argCount, argType0, argType1) but the multiple matches for argType could be tricky. An arg could match an int/uint/Number function signature and we want the most optimized one to be hit first. Maybe a bit on MethodInfo can be set for a native method that has a specialization entry. Some investigation is still required.

Werner Sharp

Reporter

Updated

•

14 years ago

Assignee: nobody → wsharp

Werner Sharp

Reporter

Comment 4

•

14 years ago

Attached patch optimize math calls, string.length, cleanup inlineBuiltinFunctions (obsolete) — Details — Splinter Review

Follow up to last patch, remove the specializeOneArgFunction function and switch to a table driven approach. A HashMap of <method id, argCount> keys points to starting entries in our static table. We iterate over all matching entries in the table looking for matching argument types. This adds direct calls support for nearly all the Math routines, String.length plus the existing charAt/charCodeAt/isNaN. Math.abs, min, max are emitted as pure LIR ops. Remove Math._min/_max and Verifier::findMathFunction logic which is obsolete now. Faster Windows MathUtils::round which used to just call MathUtils.floor(value+0.5) Microbenchmarks: wsharp@wsharp-PC ~ $ $AVM test.abc int = Math.abs(int) elapsed time is 803 Number = Math.abs(Number) elapsed time is 912 Math.max(int,int) elapsed time is 831 Math.max(Number,Number) elapsed time is 830 Math.sqrt(Number) elapsed time is 1056 Math.round(Number) elapsed time is 1830 Math.cos(Number) elapsed time is 4533 Math.sin(Number) elapsed time is 5024 Math.floor(Number) elapsed time is 1791 Math.ceil(Number) elapsed time is 1823 wsharp@wsharp-PC ~ $ $AVM2 test.abc int = Math.abs(int) elapsed time is 2720 Number = Math.abs(Number) elapsed time is 2229 Math.max(int,int) elapsed time is 2671 Math.max(Number,Number) elapsed time is 2951 Math.sqrt(Number) elapsed time is 2558 Math.round(Number) elapsed time is 3244 Math.cos(Number) elapsed time is 6683 Math.sin(Number) elapsed time is 6791 Math.floor(Number) elapsed time is 3936 Math.ceil(Number) elapsed time is 3840

Attachment #485380 - Attachment is obsolete: true

Attachment #487662 - Flags: superreview?(edwsmith)

Attachment #487662 - Flags: review?(wmaddox)

Attachment #485380 - Flags: feedback?(wmaddox)

Attachment #485380 - Flags: feedback?(rreitmai)

Werner Sharp

Reporter

Comment 5

•

14 years ago

Attached patch slight update to last patch (obsolete) — Details — Splinter Review

Running acceptance after last minute change showed two problems. One typo and also that we should not treat negative zero as an integer.

Attachment #487680 - Flags: superreview?(edwsmith)

Attachment #487680 - Flags: review?(wmaddox)

Dan Smith

Updated

•

14 years ago

Flags: flashplayer-qrb+

Priority: -- → P2

Target Milestone: --- → flash10.x - Serrano

Werner Sharp

Reporter

Updated

•

14 years ago

Attachment #487662 - Flags: superreview?(edwsmith)

Attachment #487662 - Flags: review?(wmaddox)

William Maddox

Assignee

Comment 6

•

14 years ago

Comment on attachment 487680 [details] [diff] [review] slight update to last patch R+, with comments added/fixed per notes below: The second part of the psuedocode comment in emitMathMin() doesn't make sense, and doesn't agree with the expression it is supposed to be equivalent to. It does not appear that the 'y' argument can ever be the result. It may be cheaper to explicitly OR the result of the comparision and the null check and use only one insChoose() on targets that do not implement cmov. Then again, you may lose on targets that can't generate a boolean result without a branch. Just something to consider. Similarly, for emitMathMax(). Need header comment on specializedFunctions[] array describing the interpretation of the fields of each entry.

Attachment #487680 - Flags: review?(wmaddox) → review+

Werner Sharp

Reporter

Comment 7

•

14 years ago

Attached patch dont rely on buggy cmovd — Details — Splinter Review

LIR_cmovd (and/or multiple uses of it in a row) is terribly buggy on various platforms. Switch our abs/min/max code to use if/else cases to get all sandboxes happy (including -Dnosse).

Attachment #487662 - Attachment is obsolete: true

Attachment #487680 - Attachment is obsolete: true

Attachment #489208 - Flags: superreview?(edwsmith)

Attachment #489208 - Flags: review?(wmaddox)

Attachment #487680 - Flags: superreview?(edwsmith)

Edwin Smith

Comment 8

•

14 years ago

In case I missed an earlier discussion -- what's the problem with cmovd? is it buggy implementations in Nanojit backends? fine if so, but we should write bugs when possible.

Werner Sharp

Reporter

Comment 9

•

14 years ago

PowerPC - not implemented -DnoSSE - can't find a float register error (two cmovd's active at once?) MIPS - unknown problems (failed sandbox acceptance tests) SH4 - unknown problems (failed sandbox acceptance tests)

Edwin Smith

Comment 10

•

14 years ago

Cool. I copied these notes over to bug 584270.

William Maddox

Assignee

Comment 11

•

14 years ago

Comment on attachment 489208 [details] [diff] [review] dont rely on buggy cmovd 1) In emitMathMin() and emitMathMax(), you no longer use insChoose() for the non-integer case. Based on some remarks from Moh a while back, I would not be the least bit surprised if this code is faster on Intel CPUs. Intel's performance recommendations prefer conditional control over CMOV for cases that predict well. The situation may likely be different on other CPUs, e.g., ARM. If the change was motivated strictly to work around buggy cmov implementations, and has not been experimentally validated for better performance across platforms, please leave a comment so we might revisit the issue when the backends are fixed. 2) Math.abs() is inlined as follows: if (value >= 0) return i2d(value); else return i2d(-value); This will fail for MIN_INT, as the expression -value will then overflow. You should convert value as-is, and negate the resulting float. 3) I am concerned that testing did not catch this, perhaps because, in the absence of this optimization, Math.abs(MIN_INT) is not an interesting boundary case. New white-box tests should be added to cover such cases. While on the subject of tests, note also the usage of JIT_EVENT() in the addition fastpath and elswhere. This can be used to validate tests for branch coverage in the generated code.

Attachment #489208 - Flags: review?(wmaddox) → review-

Edwin Smith

Updated

•

14 years ago

Attachment #489208 - Flags: superreview?(edwsmith)

Werner Sharp

Reporter

Comment 12

•

14 years ago

Attached patch fix math.abs(0x80000000) — Details — Splinter Review

Remove integer path from Math.abs. Adding checks for 0x80000000 make it just as slow as the number path. Add test case to catch this case in ecma3 Math.abs test. Related to cmovd, x86+x64 both use branches for cmovd (see Assembler::asm_cmov). I'm not sure if there is a conditional move instruction for SSE that we could use but currently nanojit is just a branch. The new code is a couple percent slower than the older code in micobenchmarks. (win32 x86). Previously I did try to simplify the code to only use one insChoose instruction but that did not work correctly with ORing the intermediate boolean checks. Also re-enable Math.min/max inlining for Math_private__min and Math_private__max. Interpreter execution requires we keep our _min and _max code for negative zero bug so JIT code will see those as well as regular min/max functions.

Attachment #491529 - Flags: superreview?(edwsmith)

Attachment #491529 - Flags: review?(wmaddox)

Edwin Smith

Comment 13

•

14 years ago

(In reply to comment #12) > Related to cmovd, x86+x64 both use branches for cmovd (see > Assembler::asm_cmov). I'm not sure if there is a conditional move instruction > for SSE that we could use but currently nanojit is just a branch. The new code > is a couple percent slower than the older code in micobenchmarks. (win32 x86). > Previously I did try to simplify the code to only use one insChoose instruction > but that did not work correctly with ORing the intermediate boolean checks. Aside from bugs in the backend, frontends should never hesitate to use a LIR_cmov in situations when it can. A simple pure 3-ary operator is always better than a diamond or something. (see bug 586472).

William Maddox

Assignee

Comment 14

•

14 years ago

Werner: RE review request - I'm in the middle of a release "firefight", and will not be able to get to this for a few more days.

Werner Sharp

Reporter

Updated

•

14 years ago

Assignee: wsharp → nobody

Andre Kruetzfeldt

Updated

•

14 years ago

Assignee: nobody → wmaddox

Edwin Smith

Comment 15

•

14 years ago

(In reply to comment #9) > PowerPC - not implemented > -DnoSSE - can't find a float register error (two cmovd's active at once?) > MIPS - unknown problems (failed sandbox acceptance tests) > SH4 - unknown problems (failed sandbox acceptance tests) I think the -Dnosse bug is addressed by bug 580515.

Edwin Smith

Comment 16

•

14 years ago

Attached patch (v3) tighten up inlined Math.abs code — Details — Splinter Review

Here's the modified code (emitMathAbs() is the only change). Not requesting review for now, in case there is feedback.

Edwin Smith

Comment 17

•

14 years ago

Comment on attachment 491529 [details] [diff] [review] fix math.abs(0x80000000) This all looks basically okay. I uploaded a modified patch that is slightly smaller for Math.abs, but haven't tested performance. (take it or leave it). I didn't review the other cases in detail for correctness.

Attachment #491529 - Flags: superreview?(edwsmith) → superreview+

William Maddox

Assignee

Comment 18

•

14 years ago

Comment on attachment 491529 [details] [diff] [review] fix math.abs(0x80000000) Removing myself as reviewer, since I'm now the owner of the bug.

Attachment #491529 - Flags: review?(wmaddox)

Trevor Baker

Updated

•

14 years ago

Flags: flashplayer-bug-

Andre Kruetzfeldt

Updated

•

14 years ago

Blocks: 645018

Andre Kruetzfeldt

Updated

•

13 years ago

Attachment #494492 - Flags: review?(wmaddox)

Trevor Baker

Updated

•

13 years ago

Flags: flashplayer-injection-

William Maddox

Assignee

Comment 19

•

13 years ago

Attached patch (v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max — Details — Splinter Review

By using a member function pointer, the emit functions can be written as ordinary non-static CodegenLIR methods, with implicit access to 'this'. The CodegenLIR* argument, and the repeated references to it, looked ugly. The simple inline code emitted for min and max in the v3 patch for the Number/Number case did not handle -0 correctly, intentionally replicating a bug in the builtin class. That bug (bug 551587) has now been fixed, however, and the corrected code is likely more than we want to inline. I have thus removed the inlining specialization for that case. It would probably be worthwhile to consider a direct call to an optimized out-of-line handler, however.

William Maddox

Assignee

Updated

•

13 years ago

Attachment #494492 - Flags: review?(wmaddox)

William Maddox

Assignee

Comment 20

•

13 years ago

Comment on attachment 527712 [details] [diff] [review] (v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max Werner and Edwin's v3 patch looked good to me at the time, so what needs to be reviewed here for v4 are primarily the changes noted above.

Attachment #527712 - Flags: review?(rreitmai)

Rick Reitmaier

Comment 21

•

13 years ago

Comment on attachment 527712 [details] [diff] [review] (v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max Review of attachment 527712 [details] [diff] [review]: r+ nicely done! ::: core/CodegenLIR.cpp @@ +3865,5 @@ + } + else { + localSet(op1-1, binaryIns(LIR_eqi, binaryIns(LIR_eqd, f, f), InsConst(0)), result); + } + } I realize that this code existed previously, but it might be a good time to add comments on the eqi/eqd usage here, as its probably not obvious to new-comers what is going on. @@ +3958,5 @@ + } + else if (bt == BUILTIN_int) { + LIns *arg = localGet(argOffset); + if (arg->isImmI() && arg->immI() >= 0) + btMask |= 1 << BUILTIN_uint; could we not set btMask BUILTIN_int if >= 0 fails? @@ +4071,5 @@ + funcKey = newFuncKey; + builtinFunctionOptimizerHashMap->put(funcKey, i); + } + } + } Maybe beyond the scope of this fix , but I wonder if we should start thinking about partitioning the 'static' portion of CodegenLIR's data. E.g. this map could be built once and shared for all CodegenLIR's.

Attachment #527712 - Flags: review?(rreitmai) → review+

Tamarin Bot

Comment 22

•

13 years ago

changeset: 6220:e5e1279cf7a4 user: William Maddox <wmaddox@adobe.com> summary: Bug 606561 - Specialization framework, including optimizing various Math calls http://hg.mozilla.org/tamarin-redux/rev/e5e1279cf7a4

William Maddox

Assignee

Comment 23

•

13 years ago

(In reply to comment #21) > Comment on attachment 527712 [details] [diff] [review] > r+ nicely done! This was mostly Werner's work, with a contribution by Edwin as well. > I realize that this code existed previously, but it might be a good time to add > comments on the eqi/eqd usage here, as its probably not obvious to new-comers > what is going on. Done. > could we not set btMask BUILTIN_int if >= 0 fails? This case is handled by the earlier line: int32_t btMask = 1 << bt; > Maybe beyond the scope of this fix , but I wonder if we should start thinking > about partitioning the 'static' portion of CodegenLIR's data. > > E.g. this map could be built once and shared for all CodegenLIR's. Good idea! I don't want to hold up getting this in for the integration, but there really is no need to duplicate the initialization for every method compiled. Will follow up.

William Maddox

Assignee

Comment 24

•

13 years ago

(In reply to comment #21) > Comment on attachment 527712 [details] [diff] [review] > E.g. this map could be built once and shared for all CodegenLIR's. At first blush, it looked like this might be as easy as simply making builtinFunctionOptimizerHashMap static. The real problem is that HashMap requires a nanojit::Allocator, and the only one conveniently at hand is an arena allocator whose storage is reclaimed along with the CodegenLIR object.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Brent Baker

Comment 25

•

13 years ago

Attached patch Math.abs(0) ensure is +0 (obsolete) — Details — Splinter Review

Reopening this bug since this injected an issue with Math.abs(0)

Attachment #567701 - Flags: review?(gchaney)

Brent Baker

Updated

•

13 years ago

Status: RESOLVED → REOPENED

Flags: flashplayer-qrb+ → flashplayer-qrb?

Resolution: FIXED → ---

Whiteboard: PACMAN → PACMAN WE:3001900

Brent Baker

Comment 26

•

13 years ago

Attached patch v2. Math.abs(0) ensure is +0 (obsolete) — Details — Splinter Review

tweak the failconfig.txt so that it is not an expectedfail on debugger runs and skip during differential testing

Attachment #567701 - Attachment is obsolete: true

Attachment #567709 - Flags: review?(gchaney)

Attachment #567701 - Flags: review?(gchaney)

William Maddox

Assignee

Comment 27

•

13 years ago

Attached patch Produce correctly-signed result for Math.abs(-0), add test case — Details — Splinter Review

Do not negate a zero-valued argument unless it is -0, not +0.

Attachment #567889 - Flags: review?(rreitmai)

Brent Baker

Comment 28

•

13 years ago

Comment on attachment 567709 [details] [diff] [review] v2. Math.abs(0) ensure is +0 patch obsoleted by attachment# 567889 [details] [diff] [review]

Attachment #567709 - Attachment is obsolete: true

Attachment #567709 - Flags: review?(gchaney)

Dan Smith

Updated

•

13 years ago

Flags: flashplayer-qrb? → flashplayer-qrb+

Target Milestone: Q3 11 - Serrano → Q4 11 - Anza

Rick Reitmaier

Updated

•

13 years ago

Attachment #567889 - Flags: review?(rreitmai) → review+

William Maddox

Assignee

Comment 29

•

13 years ago

Attached patch Another try at fixing abs(). Specialize to direct MathUtils call only. — Details — Splinter Review

Somehow, I convinced myself that the last patch worked. It showed up quite broken in a sandbox build, however, and the reason is fundamental: The standard floating point comparisons do not distinguish -0 and 0, despite the intuition that -0 < 0, even if -0 == 0. Correct inline code would need to examine the bitwise representation, or use a hardware instruction such as FABS on x86. This patch removes the inlining, and simply specializes Math::abs() to a direct call into MathUtils, as we do for trig, etc.

Attachment #568335 - Flags: review?(rreitmai)

Edwin Smith

Comment 30

•

13 years ago

Do we have a handle on what release branches already shipped with the bug? (backports? ug)

William Maddox

Assignee

Comment 31

•

13 years ago

The original bug was injected into FRMain in April. It looks like the bug is in Serrano, but not in Wasabi.

Rick Reitmaier

Updated

•

13 years ago

Attachment #568335 - Flags: review?(rreitmai) → review+

William Maddox

Assignee

Comment 32

•

13 years ago

Pushed to TR: http://hg.mozilla.org/tamarin-redux/rev/28ea922b25b9

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.