Closed Bug 606561 Opened 14 years ago Closed 13 years ago

optimize various Math calls

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q4 11 - Anza

People

(Reporter: wsharp, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN WE:3001900)

Attachments

(6 files, 5 obsolete files)

From the brightspot data from bug 588922, we should optimize various Math routines to either emit custom JIT code or to directly bind to the routine (no thunk). The potential list: Math.round Math.floor Math._max (2 arg) Math.abs Math._min (2 arg) Math.sin Math.random Math.ceil Math.cos Math.pow Math.sqrt
Whiteboard: PACMAN
Patch which changes inlineBuiltinFunction to be table driven. Add helper functions for isNaN, Math.min, Math.max. Add table entries for charCodeAt, Math.sin. Looking for feedback on the basic approach related to discussion in bug 598322.
Attachment #485380 - Flags: feedback?(wmaddox)
Attachment #485380 - Flags: feedback?(rreitmai)
Blocks: 606956
(In reply to comment #1) > Created attachment 485380 [details] [diff] [review] > Looking for feedback on the basic approach related to discussion in bug 598322. We need not search the entirety of specializedFunctions[], as we have already done a switch on the method id and dispatched to a case label specific to that id, and can simply invoke a specialization method with the subset of the table that is applicable to that id. It is then necessary only to iterate over the possible argument matches, making the lookup time insensitive to the number of functions for which specializations are defined. It is probably possible to simplify the argument matching and specialization code a bit. Taking advantage of the the initial switch on the method id, we could invoke a separate specialization method for each arity, reducing the internal branchiness of the specialization method(s). I have an inkling that further streamlining is possible -- I'll take a closer look and get back to you with with a more careful analysis.
I don't think as this work progresses that we want to have a large switch statement with 30+ cases to handle each specialization, even if they are table driven. (Or 30 individual tables, one for each native ID.) It would be better to generate a HashTable to find a match. We definitely can't linearly search for a match on each native which is why I started the top-level case to begin with. HashKey could be (id, argCount, argType0, argType1) but the multiple matches for argType could be tricky. An arg could match an int/uint/Number function signature and we want the most optimized one to be hit first. Maybe a bit on MethodInfo can be set for a native method that has a specialization entry. Some investigation is still required.
Assignee: nobody → wsharp
Follow up to last patch, remove the specializeOneArgFunction function and switch to a table driven approach. A HashMap of <method id, argCount> keys points to starting entries in our static table. We iterate over all matching entries in the table looking for matching argument types. This adds direct calls support for nearly all the Math routines, String.length plus the existing charAt/charCodeAt/isNaN. Math.abs, min, max are emitted as pure LIR ops. Remove Math._min/_max and Verifier::findMathFunction logic which is obsolete now. Faster Windows MathUtils::round which used to just call MathUtils.floor(value+0.5) Microbenchmarks: wsharp@wsharp-PC ~ $ $AVM test.abc int = Math.abs(int) elapsed time is 803 Number = Math.abs(Number) elapsed time is 912 Math.max(int,int) elapsed time is 831 Math.max(Number,Number) elapsed time is 830 Math.sqrt(Number) elapsed time is 1056 Math.round(Number) elapsed time is 1830 Math.cos(Number) elapsed time is 4533 Math.sin(Number) elapsed time is 5024 Math.floor(Number) elapsed time is 1791 Math.ceil(Number) elapsed time is 1823 wsharp@wsharp-PC ~ $ $AVM2 test.abc int = Math.abs(int) elapsed time is 2720 Number = Math.abs(Number) elapsed time is 2229 Math.max(int,int) elapsed time is 2671 Math.max(Number,Number) elapsed time is 2951 Math.sqrt(Number) elapsed time is 2558 Math.round(Number) elapsed time is 3244 Math.cos(Number) elapsed time is 6683 Math.sin(Number) elapsed time is 6791 Math.floor(Number) elapsed time is 3936 Math.ceil(Number) elapsed time is 3840
Attachment #485380 - Attachment is obsolete: true
Attachment #487662 - Flags: superreview?(edwsmith)
Attachment #487662 - Flags: review?(wmaddox)
Attachment #485380 - Flags: feedback?(wmaddox)
Attachment #485380 - Flags: feedback?(rreitmai)
Attached patch slight update to last patch (obsolete) — Splinter Review
Running acceptance after last minute change showed two problems. One typo and also that we should not treat negative zero as an integer.
Attachment #487680 - Flags: superreview?(edwsmith)
Attachment #487680 - Flags: review?(wmaddox)
Flags: flashplayer-qrb+
Priority: -- → P2
Target Milestone: --- → flash10.x - Serrano
Attachment #487662 - Flags: superreview?(edwsmith)
Attachment #487662 - Flags: review?(wmaddox)
Comment on attachment 487680 [details] [diff] [review] slight update to last patch R+, with comments added/fixed per notes below: The second part of the psuedocode comment in emitMathMin() doesn't make sense, and doesn't agree with the expression it is supposed to be equivalent to. It does not appear that the 'y' argument can ever be the result. It may be cheaper to explicitly OR the result of the comparision and the null check and use only one insChoose() on targets that do not implement cmov. Then again, you may lose on targets that can't generate a boolean result without a branch. Just something to consider. Similarly, for emitMathMax(). Need header comment on specializedFunctions[] array describing the interpretation of the fields of each entry.
Attachment #487680 - Flags: review?(wmaddox) → review+
LIR_cmovd (and/or multiple uses of it in a row) is terribly buggy on various platforms. Switch our abs/min/max code to use if/else cases to get all sandboxes happy (including -Dnosse).
Attachment #487662 - Attachment is obsolete: true
Attachment #487680 - Attachment is obsolete: true
Attachment #489208 - Flags: superreview?(edwsmith)
Attachment #489208 - Flags: review?(wmaddox)
Attachment #487680 - Flags: superreview?(edwsmith)
In case I missed an earlier discussion -- what's the problem with cmovd? is it buggy implementations in Nanojit backends? fine if so, but we should write bugs when possible.
PowerPC - not implemented -DnoSSE - can't find a float register error (two cmovd's active at once?) MIPS - unknown problems (failed sandbox acceptance tests) SH4 - unknown problems (failed sandbox acceptance tests)
Cool. I copied these notes over to bug 584270.
Comment on attachment 489208 [details] [diff] [review] dont rely on buggy cmovd 1) In emitMathMin() and emitMathMax(), you no longer use insChoose() for the non-integer case. Based on some remarks from Moh a while back, I would not be the least bit surprised if this code is faster on Intel CPUs. Intel's performance recommendations prefer conditional control over CMOV for cases that predict well. The situation may likely be different on other CPUs, e.g., ARM. If the change was motivated strictly to work around buggy cmov implementations, and has not been experimentally validated for better performance across platforms, please leave a comment so we might revisit the issue when the backends are fixed. 2) Math.abs() is inlined as follows: if (value >= 0) return i2d(value); else return i2d(-value); This will fail for MIN_INT, as the expression -value will then overflow. You should convert value as-is, and negate the resulting float. 3) I am concerned that testing did not catch this, perhaps because, in the absence of this optimization, Math.abs(MIN_INT) is not an interesting boundary case. New white-box tests should be added to cover such cases. While on the subject of tests, note also the usage of JIT_EVENT() in the addition fastpath and elswhere. This can be used to validate tests for branch coverage in the generated code.
Attachment #489208 - Flags: review?(wmaddox) → review-
Attachment #489208 - Flags: superreview?(edwsmith)
Remove integer path from Math.abs. Adding checks for 0x80000000 make it just as slow as the number path. Add test case to catch this case in ecma3 Math.abs test. Related to cmovd, x86+x64 both use branches for cmovd (see Assembler::asm_cmov). I'm not sure if there is a conditional move instruction for SSE that we could use but currently nanojit is just a branch. The new code is a couple percent slower than the older code in micobenchmarks. (win32 x86). Previously I did try to simplify the code to only use one insChoose instruction but that did not work correctly with ORing the intermediate boolean checks. Also re-enable Math.min/max inlining for Math_private__min and Math_private__max. Interpreter execution requires we keep our _min and _max code for negative zero bug so JIT code will see those as well as regular min/max functions.
Attachment #491529 - Flags: superreview?(edwsmith)
Attachment #491529 - Flags: review?(wmaddox)
(In reply to comment #12) > Related to cmovd, x86+x64 both use branches for cmovd (see > Assembler::asm_cmov). I'm not sure if there is a conditional move instruction > for SSE that we could use but currently nanojit is just a branch. The new code > is a couple percent slower than the older code in micobenchmarks. (win32 x86). > Previously I did try to simplify the code to only use one insChoose instruction > but that did not work correctly with ORing the intermediate boolean checks. Aside from bugs in the backend, frontends should never hesitate to use a LIR_cmov in situations when it can. A simple pure 3-ary operator is always better than a diamond or something. (see bug 586472).
Werner: RE review request - I'm in the middle of a release "firefight", and will not be able to get to this for a few more days.
Assignee: wsharp → nobody
Assignee: nobody → wmaddox
(In reply to comment #9) > PowerPC - not implemented > -DnoSSE - can't find a float register error (two cmovd's active at once?) > MIPS - unknown problems (failed sandbox acceptance tests) > SH4 - unknown problems (failed sandbox acceptance tests) I think the -Dnosse bug is addressed by bug 580515.
Here's the modified code (emitMathAbs() is the only change). Not requesting review for now, in case there is feedback.
Comment on attachment 491529 [details] [diff] [review] fix math.abs(0x80000000) This all looks basically okay. I uploaded a modified patch that is slightly smaller for Math.abs, but haven't tested performance. (take it or leave it). I didn't review the other cases in detail for correctness.
Attachment #491529 - Flags: superreview?(edwsmith) → superreview+
Comment on attachment 491529 [details] [diff] [review] fix math.abs(0x80000000) Removing myself as reviewer, since I'm now the owner of the bug.
Attachment #491529 - Flags: review?(wmaddox)
Flags: flashplayer-bug-
Blocks: 645018
Attachment #494492 - Flags: review?(wmaddox)
Flags: flashplayer-injection-
By using a member function pointer, the emit functions can be written as ordinary non-static CodegenLIR methods, with implicit access to 'this'. The CodegenLIR* argument, and the repeated references to it, looked ugly. The simple inline code emitted for min and max in the v3 patch for the Number/Number case did not handle -0 correctly, intentionally replicating a bug in the builtin class. That bug (bug 551587) has now been fixed, however, and the corrected code is likely more than we want to inline. I have thus removed the inlining specialization for that case. It would probably be worthwhile to consider a direct call to an optimized out-of-line handler, however.
Attachment #494492 - Flags: review?(wmaddox)
Comment on attachment 527712 [details] [diff] [review] (v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max Werner and Edwin's v3 patch looked good to me at the time, so what needs to be reviewed here for v4 are primarily the changes noted above.
Attachment #527712 - Flags: review?(rreitmai)
Comment on attachment 527712 [details] [diff] [review] (v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max Review of attachment 527712 [details] [diff] [review]: r+ nicely done! ::: core/CodegenLIR.cpp @@ +3865,5 @@ + } + else { + localSet(op1-1, binaryIns(LIR_eqi, binaryIns(LIR_eqd, f, f), InsConst(0)), result); + } + } I realize that this code existed previously, but it might be a good time to add comments on the eqi/eqd usage here, as its probably not obvious to new-comers what is going on. @@ +3958,5 @@ + } + else if (bt == BUILTIN_int) { + LIns *arg = localGet(argOffset); + if (arg->isImmI() && arg->immI() >= 0) + btMask |= 1 << BUILTIN_uint; could we not set btMask BUILTIN_int if >= 0 fails? @@ +4071,5 @@ + funcKey = newFuncKey; + builtinFunctionOptimizerHashMap->put(funcKey, i); + } + } + } Maybe beyond the scope of this fix , but I wonder if we should start thinking about partitioning the 'static' portion of CodegenLIR's data. E.g. this map could be built once and shared for all CodegenLIR's.
Attachment #527712 - Flags: review?(rreitmai) → review+
changeset: 6220:e5e1279cf7a4 user: William Maddox <wmaddox@adobe.com> summary: Bug 606561 - Specialization framework, including optimizing various Math calls http://hg.mozilla.org/tamarin-redux/rev/e5e1279cf7a4
(In reply to comment #21) > Comment on attachment 527712 [details] [diff] [review] > r+ nicely done! This was mostly Werner's work, with a contribution by Edwin as well. > I realize that this code existed previously, but it might be a good time to add > comments on the eqi/eqd usage here, as its probably not obvious to new-comers > what is going on. Done. > could we not set btMask BUILTIN_int if >= 0 fails? This case is handled by the earlier line: int32_t btMask = 1 << bt; > Maybe beyond the scope of this fix , but I wonder if we should start thinking > about partitioning the 'static' portion of CodegenLIR's data. > > E.g. this map could be built once and shared for all CodegenLIR's. Good idea! I don't want to hold up getting this in for the integration, but there really is no need to duplicate the initialization for every method compiled. Will follow up.
(In reply to comment #21) > Comment on attachment 527712 [details] [diff] [review] > E.g. this map could be built once and shared for all CodegenLIR's. At first blush, it looked like this might be as easy as simply making builtinFunctionOptimizerHashMap static. The real problem is that HashMap requires a nanojit::Allocator, and the only one conveniently at hand is an arena allocator whose storage is reclaimed along with the CodegenLIR object.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attached patch Math.abs(0) ensure is +0 (obsolete) — Splinter Review
Reopening this bug since this injected an issue with Math.abs(0)
Attachment #567701 - Flags: review?(gchaney)
Status: RESOLVED → REOPENED
Flags: flashplayer-qrb+ → flashplayer-qrb?
Resolution: FIXED → ---
Whiteboard: PACMAN → PACMAN WE:3001900
Attached patch v2. Math.abs(0) ensure is +0 (obsolete) — Splinter Review
tweak the failconfig.txt so that it is not an expectedfail on debugger runs and skip during differential testing
Attachment #567701 - Attachment is obsolete: true
Attachment #567709 - Flags: review?(gchaney)
Attachment #567701 - Flags: review?(gchaney)
Do not negate a zero-valued argument unless it is -0, not +0.
Attachment #567889 - Flags: review?(rreitmai)
Comment on attachment 567709 [details] [diff] [review] v2. Math.abs(0) ensure is +0 patch obsoleted by attachment# 567889 [details] [diff] [review]
Attachment #567709 - Attachment is obsolete: true
Attachment #567709 - Flags: review?(gchaney)
Flags: flashplayer-qrb? → flashplayer-qrb+
Target Milestone: Q3 11 - Serrano → Q4 11 - Anza
Attachment #567889 - Flags: review?(rreitmai) → review+
Somehow, I convinced myself that the last patch worked. It showed up quite broken in a sandbox build, however, and the reason is fundamental: The standard floating point comparisons do not distinguish -0 and 0, despite the intuition that -0 < 0, even if -0 == 0. Correct inline code would need to examine the bitwise representation, or use a hardware instruction such as FABS on x86. This patch removes the inlining, and simply specializes Math::abs() to a direct call into MathUtils, as we do for trig, etc.
Attachment #568335 - Flags: review?(rreitmai)
Do we have a handle on what release branches already shipped with the bug? (backports? ug)
The original bug was injected into FRMain in April. It looks like the bug is in Serrano, but not in Wasabi.
Attachment #568335 - Flags: review?(rreitmai) → review+
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: