Closed Bug 552542 Opened 15 years ago Closed 6 years ago

Specialize and Inline partly-typed numeric operators

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Q2 12 - Cyril

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN, Tracking)

Attachments

(8 files)

Patch for arithmetic optimzations (applies to rev 3882) 15 years ago William Maddox 179.60 KB, patch		Details \| Diff \| Splinter Review
Patch to ASC to generate coercions for target-type preferencing 15 years ago William Maddox 2.94 KB, patch		Details \| Diff \| Splinter Review
Benchmark results (MacOS X) 15 years ago William Maddox 427.98 KB, text/plain		Details
Event profiling results for performance benchmark suite 15 years ago William Maddox 144.35 KB, text/plain		Details
Patch for arithmetic optimzations (applies to rev 4491) 15 years ago William Maddox 100.93 KB, patch	edwsmith : feedback+	Details \| Diff \| Splinter Review
Benchmark results for inlining patches 15 years ago William Maddox 72.68 KB, text/plain		Details
Cumulative work-in-progress patch for benchmarking 14 years ago William Maddox 86.54 KB, patch		Details \| Diff \| Splinter Review
Performance suite run on MacOS X 10.6, i386 14 years ago William Maddox 25.52 KB, text/plain		Details

William Maddox

Assignee

Description

•

15 years ago

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8 Build Identifier: I have been experimenting with improving the performance of numeric code by more aggressive inlining and specialization of JIT-ed code. Benchmark results show significant speedups, particularly on the numerically-intensive jsbench benchmarks. The results are a bit uneven, with some optimizations that might be expected to yield consistent benefits cause significant regressions on some benchmarks. A subset of the optimizations yield solid gains with acceptable consistency. The optimizations have been grouped under control of a set of configuration variables in core/avmbuild.h of the form ARITH_FASTPATH_xxx and ARITH_FASTPATH_TARGET_xxx. The former work with standard bytecode files as produced by ASC. The latter investigate gains that would be possible if the arithmetic bytecodes folded certain result type coercions into the operator rather than producing a canonical result that is then subsequently converted. These experiments use a version of ASC modified to insert additional semantically-redundant coerce opcodes following arithmetic operations that serve effectively as postifix modifier on the opcode. See Verifier::targetType(). The configuration varibles are as follows: ARITH_FASTPATH_ADD For addition of two atoms, speculatively inline the intptr/intptr case. Where the type of one argument is known statically, partially inline the numeric cases, and invoke type-specialized out-of-line handlers that do not repeat the work done inline. ARITH_FASTPATH_CMP Perform similar optimizations for relational comparitions, equality, and strict equality. ARITH_FASTPATH_TOATOM Inline simple cases of conversion of a numeric type to an atom. ARITH_FASTPATH_FROMATOM Inline simple cases of conversion from an atom to a numeric type. ARITH_FASTPATH_FROMNUMBER Inline simple cases of conversion from Number (double) to another type. ARITH_FASTPATH_CHECKNULL Inline fastpath for null check. ARITH_FASTPATH_INCDECLOCAL (should be removed) ARITH_FASTPATH_TARGET_ADD Specialize addition based on a target type preference, anticipating a subsequent coercion. ARITH_FASTPATH_TARGET_INCDEC Specialize increment and decrement based on a target type preference, anticipating a subsequent coercion. ARITH_FASTPATH_TARGET_BINOPS Specialize subtract and multiply based on a target type preference, anticipating a subsequent coercion. ARITH_FASTPATH_TARGET_BINOPS_INTPTR Speculate in favor of the intptr/intptr case when both arguments are atoms, as well as the specializations of the previous case. ARITH_FASTPATH_TARGET_NEGNOT Specialize logical not and arithmetic negation based on a target type preference, anticipating a subsequent coercion. The code is a bit messy, as it grew organically as I discovered additional optimization opportunities, took feedback from benchmarking into account, and provided extensive conditionalization to support experiments. I am therefore not submitting this patch as a candidate to land in its present form, but invite comments on the approach and the results. The code has been tested on 32-bit MacOS X, and, to a lesser extent on 32-bit Windows. Support for 64-bit platforms is incomplete and untested, and, in some cases, will require the LIR instruction set to be further fleshed out. The patch applies cleanly to revision 3882. Recent changes to the verifier and the removal of the LIR_ov instruction will require updates to the patch. Based on the benchmark results, it appears that the ARITH_FASTPATH_ADD, ARITH_FASTPATH_CMP, ARITH_FASTPATH_CHECKNULL, and ARITH_FASTPATH_FROMATOM cases are good candidates for inclusion in the near term, as they yield significant benefits, do not require bytecode or compiler changes, and show fairly consistent improvement or neutral behavior. I am hesitant to consider optimizations that trade of performance gains on some applications against significant regressions on others without a better benchmarking methodology, as there is little reason to believe that our standard performance suite accurately reflects the instruction mix and other behavioral characteristics of the real-world Actionscript codebase. Reproducible: Always

William Maddox

Assignee

Comment 1

•

15 years ago

Attached patch Patch for arithmetic optimzations (applies to rev 3882) — Details — Splinter Review

William Maddox

Assignee

Comment 2

•

15 years ago

Attached patch Patch to ASC to generate coercions for target-type preferencing — Details — Splinter Review

William Maddox

Assignee

Comment 3

•

15 years ago

Attached file Benchmark results (MacOS X) — Details

William Maddox

Assignee

Comment 4

•

15 years ago

Attached file Event profiling results for performance benchmark suite — Details

The patch implements an extension to the vprof profiling mechanism to count events in code generated by the JIT. The arithmetic optimizations are instrumented with calls of the form JIT_EVENT(jit_xxx), which emits profiling code into the JIT-generated methods. This attachment shows the results of such profiling on the performance suite. Note that vprof is essentially a value profiling mechanism, and is used in a degenerate way to count events. The relevant number, the count, is the last number on each line.