Closed Bug 552542 Opened 15 years ago Closed 6 years ago

Specialize and Inline partly-typed numeric operators

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX
Q2 12 - Cyril

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN, Tracking)

Attachments

(8 files)

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8 Build Identifier: I have been experimenting with improving the performance of numeric code by more aggressive inlining and specialization of JIT-ed code. Benchmark results show significant speedups, particularly on the numerically-intensive jsbench benchmarks. The results are a bit uneven, with some optimizations that might be expected to yield consistent benefits cause significant regressions on some benchmarks. A subset of the optimizations yield solid gains with acceptable consistency. The optimizations have been grouped under control of a set of configuration variables in core/avmbuild.h of the form ARITH_FASTPATH_xxx and ARITH_FASTPATH_TARGET_xxx. The former work with standard bytecode files as produced by ASC. The latter investigate gains that would be possible if the arithmetic bytecodes folded certain result type coercions into the operator rather than producing a canonical result that is then subsequently converted. These experiments use a version of ASC modified to insert additional semantically-redundant coerce opcodes following arithmetic operations that serve effectively as postifix modifier on the opcode. See Verifier::targetType(). The configuration varibles are as follows: ARITH_FASTPATH_ADD For addition of two atoms, speculatively inline the intptr/intptr case. Where the type of one argument is known statically, partially inline the numeric cases, and invoke type-specialized out-of-line handlers that do not repeat the work done inline. ARITH_FASTPATH_CMP Perform similar optimizations for relational comparitions, equality, and strict equality. ARITH_FASTPATH_TOATOM Inline simple cases of conversion of a numeric type to an atom. ARITH_FASTPATH_FROMATOM Inline simple cases of conversion from an atom to a numeric type. ARITH_FASTPATH_FROMNUMBER Inline simple cases of conversion from Number (double) to another type. ARITH_FASTPATH_CHECKNULL Inline fastpath for null check. ARITH_FASTPATH_INCDECLOCAL (should be removed) ARITH_FASTPATH_TARGET_ADD Specialize addition based on a target type preference, anticipating a subsequent coercion. ARITH_FASTPATH_TARGET_INCDEC Specialize increment and decrement based on a target type preference, anticipating a subsequent coercion. ARITH_FASTPATH_TARGET_BINOPS Specialize subtract and multiply based on a target type preference, anticipating a subsequent coercion. ARITH_FASTPATH_TARGET_BINOPS_INTPTR Speculate in favor of the intptr/intptr case when both arguments are atoms, as well as the specializations of the previous case. ARITH_FASTPATH_TARGET_NEGNOT Specialize logical not and arithmetic negation based on a target type preference, anticipating a subsequent coercion. The code is a bit messy, as it grew organically as I discovered additional optimization opportunities, took feedback from benchmarking into account, and provided extensive conditionalization to support experiments. I am therefore not submitting this patch as a candidate to land in its present form, but invite comments on the approach and the results. The code has been tested on 32-bit MacOS X, and, to a lesser extent on 32-bit Windows. Support for 64-bit platforms is incomplete and untested, and, in some cases, will require the LIR instruction set to be further fleshed out. The patch applies cleanly to revision 3882. Recent changes to the verifier and the removal of the LIR_ov instruction will require updates to the patch. Based on the benchmark results, it appears that the ARITH_FASTPATH_ADD, ARITH_FASTPATH_CMP, ARITH_FASTPATH_CHECKNULL, and ARITH_FASTPATH_FROMATOM cases are good candidates for inclusion in the near term, as they yield significant benefits, do not require bytecode or compiler changes, and show fairly consistent improvement or neutral behavior. I am hesitant to consider optimizations that trade of performance gains on some applications against significant regressions on others without a better benchmarking methodology, as there is little reason to believe that our standard performance suite accurately reflects the instruction mix and other behavioral characteristics of the real-world Actionscript codebase. Reproducible: Always
The patch implements an extension to the vprof profiling mechanism to count events in code generated by the JIT. The arithmetic optimizations are instrumented with calls of the form JIT_EVENT(jit_xxx), which emits profiling code into the JIT-generated methods. This attachment shows the results of such profiling on the performance suite. Note that vprof is essentially a value profiling mechanism, and is used in a degenerate way to count events. The relevant number, the count, is the last number on each line.
Assignee: nobody → wmaddox
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: flashplayer-qrb+
Priority: -- → P2
Target Milestone: --- → flash10.2
Whiteboard: PACMAN
Work in progress toward a landable patch for arithmetic inlining. This patch brings forward the following optimizations from the earlier patch: ARITH_FASTPATH_ADD ARITH_FASTPATH_CMP ARITH_FASTPATH_CHECKNULL These performed well in benchmarking, and would be desirable to land in the near term. Recent changes to CodegenLIR.cpp and NanoJIT required extensive changes, though mostly of a clerical/mechanical nature. More substantively, the removal of the LIR_ov instruction required the implementation of combined operate-and-branch-on-overflow instructions. These have been implemented for platforms where the corresponding operate-and-exit-on-overflow instructions currently exist. Additionally, the JIT code generation for inlined arithmetic now supports 64-bit platforms. Built and tested on 32-bit and 64-bit x86 (MacOS X). Built but not tested on MinMo/ARM. Status on other platforms currently unknown.
Attachment #432670 - Attachment is patch: true
Attachment #432672 - Attachment is patch: true
cool, LIR_add/sub/muljov is exactly what i was imagining.
Attachment #439420 - Flags: feedback?(edwsmith)
I also added qiaddjov to satisfy an immediate need in JIT-ing OP_add. An intptr/intptr fastpath for all operations will need 64-bit variants of all three ops. We are running tight on opcode space now, however. What other future claims on opcode space have been floated or are in play?
Once upon a time, the opcode field was 7 bits, but now its 8 bits, and its not hard to shuffle opcodes around without breaking things. Note: see bug 504506; we're in the middle of a refactoring arc that renamed all the opcodes, and introduced a consistent suffux notation (based on intel's defact-standard suffixes). LIR_qiadd is just an alias for LIR_addq now. The new opcodes you added should be called addjovl, etc.
Comment on attachment 439420 [details] [diff] [review] Patch for arithmetic optimzations (applies to rev 4491) from CodegenLIR-inlines.h: * there is no LIR_qisub (really, subq) opcode because we haven't needed one before. You could add one, and then create a LIR_subp alias to subl/subq.
Depends on: 562458
Depends on: 561963
Summary: Experiments with inlining and specialization of arithmetic in the JIT → Specialize and Inline partly-typed numeric operators in the JIT
Assignee: wmaddox → nobody
Component: Virtual Machine → JIT Compiler (NanoJIT)
OS: Mac OS X → All
QA Contact: vm → nanojit
Hardware: x86 → All
Summary: Specialize and Inline partly-typed numeric operators in the JIT → Specialize and Inline partly-typed numeric operators
Comment on attachment 439420 [details] [diff] [review] Patch for arithmetic optimzations (applies to rev 4491) We're on the right track. We should plan on fixing the side effect that new branchover labels cause VarTracker and CSEFilter to completely reset, before this is fully enabled for real We can mitigate the risk from these new labels several ways (not mutually exlusive): 1. keep the #ifdefs in place so we can land patches w/out enabling inlining 2. test a much wider set of benchmarks and apps with the patches applied, and with just labels inserted, to measure the impact. (reduce fear of the unknown) 3. enable calls to specialized helpers, without introducing labels, even before we enable inlining of the fast paths in those helpers. (i.e. separate specialization from inlining).
Attachment #439420 - Flags: feedback?(edwsmith) → feedback+
Assignee: nobody → wmaddox
Depends on: 562744
Depends on: 562653
Benchmark results for patches proposed in the dependent bugs. These also include the VarTracker extension patch in bug 564580.
No longer depends on: 562458
Depends on: 477779
Whiteboard: PACMAN → PACMAN, Tracking
Cumulative work-in-progress patch integrating inlining of addition, comparisons, and atom2double, with varTracker patch to extend tracking across forward branches, and disabling of CSE within synthetic control-flow diamonds.
Attachment #447404 - Attachment is patch: true
Attachment #447404 - Attachment is patch: false
Flags: flashplayer-bug-
Depends on: Andre
No longer depends on: Andre
Flags: flashplayer-injection-
Priority: P2 → P3
Target Milestone: Q3 11 - Serrano → Q1 12 - Brannan
Depends on: Andre
Target Milestone: Q1 12 - Brannan → Q2 12 - Cyril
Removing Andre blocker, as Dan assigned it to Cyril.
No longer depends on: Andre
Tamarin is a dead project now. Mass WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: