Closed Bug 511873 Opened 15 years ago Closed 6 years ago

Reduce call overhead of jit'd functions

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Future

People

(Reporter: edwsmith, Unassigned)

References

Details

(Whiteboard: PACMAN, Tracking)

Attachments

(6 files, 1 obsolete file)

test to measure pure 0-arg jit call overhead 15 years ago Edwin Smith 2.30 KB, text/plain		Details
measures 0-arg jit call overhead (update 1) 15 years ago Edwin Smith 2.50 KB, text/plain		Details
measures 3-arg jit call overhead 15 years ago Edwin Smith 2.50 KB, patch		Details \| Diff \| Splinter Review
ABC, LIR, and x64 code for boidshack.as magnitude2() 15 years ago Edwin Smith 3.76 KB, text/plain		Details
Example of overhead of a plug-in architecture and attempt to simulate advantages of tail call optimization 15 years ago Alex Harui 21.71 KB, application/zip		Details
Patch which analyzes in verifier pass to mark functions as simple and avoid methodFrame/stackOverflow 15 years ago Werner Sharp 67.98 KB, patch		Details \| Diff \| Splinter Review
hacked together perl script for analyzing debug output from patch 15 years ago Werner Sharp 6.48 KB, text/plain		Details

Edwin Smith

Reporter

Description

•

15 years ago

Attached file test to measure pure 0-arg jit call overhead (obsolete) — Details

A call to a trivial function (every function) involves: * get MethodEnv* of callee * store args to stack * pass env, argc, argp (even when env/argc/argv may not be used) * call function indirectly (even when target might be known) * stack overflow check (add ebp+const, load stkbottom, compare, branch) * interrupt check (load interrupt flag, compare, branch) * create MethodFrame (load mf*, store, store 0 dxns, store env, store next) ... body * destroy MethodFrame (load cmf->next, store in core->currentMethodFrame) we can chip away at this overhead. Dependent bugs will address different cases.

Edwin Smith

Reporter

Comment 1

•

15 years ago

Attached file measures 0-arg jit call overhead (update 1) — Details

Attachment #395831 - Attachment is obsolete: true

Edwin Smith

Reporter

Updated

•

15 years ago

Attachment #395831 - Attachment is patch: false

Edwin Smith

Reporter

Comment 2

•

15 years ago

Attached patch measures 3-arg jit call overhead — Details — Splinter Review

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 511875

Edwin Smith

Reporter

Comment 3

•

15 years ago

total-call-counts for calloverhead.as after removing the stack overflow check entirely (higher is better): before after % change t60 4701438000 5677923000 +21% beagleboard 492521000 572454000 +16%

Edwin Smith

Reporter

Comment 4

•

15 years ago

total call counts with and without interrupt check with-interrupts no-interrupts %change beagleboard 83165000 98560000 +18%

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 512282

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 512286

Steven Johnson

Comment 5

•

15 years ago

(In reply to comment #0) > * pass env, argc, argp (even when env/argc/argv may not be used) Here's a thought: if relatively few callees use the 'env' parameter, we could now rely on the fact we have a reliable MethodFrame and obtain 'env' via core->currentMethodFrame when necessary.

Edwin Smith

Reporter

Comment 6

•

15 years ago

i like where you're going with this, but we'd have to pass env so the callee can set up currentMethodFrame. come to think of it, we don't even link currentMethodFrame for native methods. bug?

Steven Johnson

Comment 7

•

15 years ago

(In reply to comment #6) > i like where you're going with this, but we'd have to pass env so the callee > can set up currentMethodFrame. doh. yeah. I'm seduced by the idea of getting to argc/argv so we can go FASTCALL everywhere :-) > come to think of it, we don't even link > currentMethodFrame for native methods. bug? hmm -- that's a good question. if it's a bug, it hasn't caused any obvious misbehavior yet.

Steven Johnson

Comment 8

•

15 years ago

> * call function indirectly (even when target might be known) So which targets can we know reliably enough to make constant? final native methods seem safe enough...

Edwin Smith

Reporter

Comment 9

•

15 years ago

We probably can do (env, args) in the common cases. Only methods with optional args or rest args need to be passed argc. The rest have an argc that is known from MethodInfo.

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 512685

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 512972

Edwin Smith

Reporter

Comment 10

•

15 years ago

(In reply to comment #7) > hmm -- that's a good question. if it's a bug, it hasn't caused any obvious > misbehavior yet. Imagine you call a native method from AS3 and the native method calls back into more AS3, and then triggers an exception. the stack trace will be missing the native method. Not because of the missing MethodFrame, but because of the missing CallStackNode.

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 513007

Lars T Hansen

Updated

•

15 years ago

Summary: reduce call overhead of jit'd functions (tracking) → reduce call overhead of jit'd functions

Whiteboard: Tracking

Target Milestone: --- → Future

Edwin Smith

Reporter

Updated

•

15 years ago

OS: Mac OS X → All

Hardware: x86 → All

Lars T Hansen

Updated

•

15 years ago

Whiteboard: Tracking → PACMAN, Tracking

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 560120

Edwin Smith

Reporter

Updated

•

15 years ago

Blocks: 557926

Edwin Smith

Reporter

Comment 11

•

15 years ago

Attached file ABC, LIR, and x64 code for boidshack.as magnitude2() — Details

Here is a quick study of the jit code prolog for this as3 function: class Vector { var x:Number, y:Number function magnitude2() { return x*x + y*y } } observe the prologue is much longer and has more loads and stores than the actual method body. Also some quality problems in the jit'd code * setting up core->methodFrame is expensive. restoring it is done with another load from core, instead of the saved pointer from the prolog * &core[currentMethodFrame] is const-folded in one case, but not another, leading to extra instructions and register pressure. (better if it's not const folded at all, so we only need a register for &core) * register assignments around the call to stackOverflow (cold path) are unfortunate and cause spilling on the hot pat

Edwin Smith

Reporter

Updated

•

15 years ago

Attachment #442905 - Attachment mime type: application/octet-stream → text/plain

Steven Johnson

Comment 12

•

15 years ago

(In reply to comment #11) > * setting up core->methodFrame is expensive. I spent a day thinking about alternate ways to do the MethodFrame stuff -- unfortunately it's hard to model the Flash security model otherwise. Lars informs me that the deoptimization work that Bernd has in the pipeline will make this much easier, but that's a ways down the road. > restoring it is done with another load from core, instead of the saved pointer from the prolog That sounds wrong.

Edwin Smith

Reporter

Comment 13

•

15 years ago

work splinters into two paths: 1. don't use MF when we don't need it (this is a leaf function, so we're just churning core->currentMethodFrame pointer for no good reason) 2. make it cheaper to maintain the call stack. some random ideas: - in large-ish methods, preserving a pointer to the previous MF would just spill, which would be more expensive than the explicit loads & stores we have now. in small-ish mehtods, it wouldn't spill, so keeping the load instruction live is better. only the Assembler knows which way is better, and we can't express this stuff in LIR right now. - on x86 (e.g.) if we can walk the machine call stack and get the MethodEnv/dnxns pointers along each stop, maybe we dont need explicit MF structs for every call frame. - our current calling conventions require (MethodEnv*, argc, argv). a different convention could pass along MF*, which makes it cheaper to establish/teardown MF structures. (much like using EBP to maintain a walkable callstack)

Edwin Smith

Reporter

Comment 14

•

15 years ago

More about leaf functions: The VM already keeps a stack space reserve (R). A call that detects stack overflow (crossing into the reserve) calls AvmCore::stackOverflow(), which uses the reserve to construct and throw an actual stack overflow error object. If the true stack is exhaused while constructing a StackOverflow object, it's a bug in the VM (reserve too small). All the code involved in that construction is vm code. Any user leaf function is therefore guaranteed to have at R bytes of stack available. Any VM leaf function involved in constructing a stack overflow error isn't guaranteed to have R bytes, but *is* guaranteed to have *enough* bytes, assuming R is large enough to avoid true stack overflow. Therefore, leaf function's don't need an explicit stack overflow check. Leaf functions also don't need to bother with creating or destroying AvmCore::currentMethodFrame, since they won't call any internal VM code that inspects currentMethodFrame. A leaf function *can* call into VM helper code (bending the idea of a leaf function), as long as that VM helper code doesn't violate the "leaf function" semantics: - no inspection of MethodFrame, no calls to unbounded user code, plus anything else we want to constrain it with. of course a true leaf function doesn't make *any* calls (vm helpers or user code) and doesn't need any such caveats. todo: what % of functions are leaf functions? statically? dynamically? what about invocations that could call out to something, but don't? (prologue/epilogue sinking?)

Lars T Hansen

Comment 15

•

15 years ago

In untyped code, or probably even losely typed code, virtually no methods (statically) will be leaf functions due to invocation of valueOf / toString semantics for '+' and so on. (Just a note, not meant to skew the discussion in any particular direction.)

Edwin Smith

Reporter

Comment 16

•

15 years ago

I'll add that if generic VM helpers like doubleToAtom are able to throw AS3 exceptions (maybe OutOfMemory?) then even very simple almost-leaf functions, with no implicit AS3 calls, are also not leaf functions.

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 414870

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 566769

Edwin Smith

Reporter

Updated

•

15 years ago

Component: Virtual Machine → JIT Compiler (NanoJIT)

Edwin Smith

Reporter

Updated

•

15 years ago

Depends on: 529832

Edwin Smith

Reporter

Updated

•

15 years ago

Summary: reduce call overhead of jit'd functions → Reduce call overhead of jit'd functions

Edwin Smith

Reporter

Comment 17

•

15 years ago

Data point: a full performace test run on x86-32 2.0ghz core-2-duo with stack overflow checking disabled showed no changes outside "noise" (all changes < 1 std dev).

Werner Sharp

Comment 18

•

15 years ago

I looked at this one a bit this week with no great progress. 1. It would be great to skip this overhead for leaf functions as has been suggested. Unfortunately, we don't really know we are calling out to a helper function until we are JITing the main function after the prologue has been written. This is tricky to get right before that time. A simple coerce_to_number opcode might do nothing but until we JIT we don't exactly know. Lars simple Object constructor benchmark could show some improvement if we could skip all this overhead though. 2. Could we make handleStackOverflow a custom ASM function that does its own register preservation? So instead of the prologue having to worry about any register saving/restoring, the stack overflow routine itself does all the work. All it needs is 'env' in the most convenient register for the prologue code. 3. On 32-bit, I see the 'ap' param getting spilled even though this is already safely saved on the stack. On first try I thought I could init this after the rest of the prologue code. This works okay on 32-bit x86 because its on the stack and does not get overwritten. It does not work on 64-bit because it comes in on a register. But maybe the 32-bit JIT could understand that param types on the stack don't really need spilling somehow?

Edwin Smith

Reporter

Comment 19

•

15 years ago

(In reply to comment #18) > I looked at this one a bit this week with no great progress. > > 1. It would be great to skip this overhead for leaf functions as has been > suggested. Unfortunately, we don't really know we are calling out to a helper > function until we are JITing the main function after the prologue has been > written. This is tricky to get right before that time. A simple > coerce_to_number opcode might do nothing but until we JIT we don't exactly > know. Lars simple Object constructor benchmark could show some improvement if > we could skip all this overhead though. two ways to do this: 1. do a conservative analysis in verifier phase 1; it would "fail" if it saw any operation disqualifying the method as a leaf method. 2. generate parts of the prologue later; we do a bit of this now (we have two LIR buffers) we could do more, with some care. > 2. Could we make handleStackOverflow a custom ASM function that does its own > register preservation? So instead of the prologue having to worry about any > register saving/restoring, the stack overflow routine itself does all the work. > All it needs is 'env' in the most convenient register for the prologue code. We could also use the jit to generate register-preserving thunks to existing helpers (saving us lots of wrapper-writing). however, in a previous experiment, I got no measurable benefit from completely eliminating the stack overflow check. so any gains here may not show up until other things are faster. (x86-32 result; other cpu's may vary) > But maybe the 32-bit JIT could understand that param > types on the stack don't really need spilling somehow? The x86-32 backend does understand that parameters can be rematerialized without spilling, perhaps a bug has crept in. All other backends assign it to a register parameter.

Alex Harui

Comment 20

•

15 years ago

Attached file Example of overhead of a plug-in architecture and attempt to simulate advantages of tail call optimization — Details

Werner Sharp

Updated

•

15 years ago

Assignee: nobody → wsharp

Werner Sharp

Comment 21

•

15 years ago

Attached patch Patch which analyzes in verifier pass to mark functions as simple and avoid methodFrame/stackOverflow — Details — Splinter Review

This patch adds a NullWriter subclass to the verification stage that determines whether a function is simple - no throwing exceptions, no calling subfunctions or unknown AS3 code, allowing various coercions, opcode checking, etc. Also included is some disabled debugging code to generate statistics for each file run. Acceptance and performance suite statistics will be posted next.

Attachment #463207 - Flags: review?(lhansen)

Werner Sharp

Comment 22

•

15 years ago

Attached file hacked together perl script for analyzing debug output from patch — Details

Werner Sharp

Comment 23

•

15 years ago

On my 2.66 GHz Xeon 5150 running Windows 7, the above patch speeds up simple function calls by approximately 20-30%. Calling Vector.magnitude2 100 million times: 740 vs 945 msec Calling a simple getter returning an int 100 million times: 490 vs 700 msec This is the perl output for the acceptance test suite: Total avmcore instances: 2418 Total functions verified: 52238 Total simple functions verified: 7838 (15.0044029250737%) Total complex functions run: 25729865 Total simple functions run: 614090 (2.33104710359549%) Normalized simple functions run (percent per file): 3.22175867818345 opcode: 0x03 (throw) count: 4 opcode: 0x04 (getsuper) count: 5 opcode: 0x05 (setsuper) count: 2 opcode: 0x1e (nextname) count: 6 opcode: 0x41 (call) count: 19 opcode: 0x42 (construct) count: 4 opcode: 0x45 (callsuper) count: 32 opcode: 0x46 (callproperty) count: 1163 opcode: 0x49 (constructsuper) count: 453 opcode: 0x4c (callproplex) count: 13 opcode: 0x57 (newactivation) count: 3086 opcode: 0x58 (newclass) count: 90 opcode: 0x59 (getdescendants) count: 1 opcode: 0x5b (findpropglobalstrict) count: 57 opcode: 0x5c (findpropglobal) count: 187 opcode: 0x5d (findpropstrict) count: 15 opcode: 0x5e (findproperty) count: 2 opcode: 0x5f (finddef) count: 21056 opcode: 0x61 (setproperty) count: 2558 opcode: 0x66 (getproperty) count: 2988 opcode: 0x68 (initproperty) count: 21 opcode: 0xa0 (add) count: 162 opcode: 0xab (equals) count: 305 opcode: 0xac (strictequals) count: 1 opcode: 0xb1 (instanceof) count: 2 opcode: 0xb4 (in) count: 1 opcode: 0x100 (complex coercion) count: 9622 opcode: 0x102 (writeMethodCall) count: 2545 This is the perl output for the performance test suite: Total avmcore instances: 326 Total functions verified: 4186 Total simple functions verified: 994 (23.7458193979933%) Total complex functions run: 614368424 Total simple functions run: 398783973 (39.3607096208647%) Normalized simple functions run (percent per file): 11.8256777869279 opcode: 0x03 (throw) count: 2 opcode: 0x1e (nextname) count: 4 opcode: 0x41 (call) count: 6 opcode: 0x42 (construct) count: 56 opcode: 0x45 (callsuper) count: 10 opcode: 0x46 (callproperty) count: 301 opcode: 0x49 (constructsuper) count: 44 opcode: 0x57 (newactivation) count: 207 opcode: 0x58 (newclass) count: 1 opcode: 0x5c (findpropglobal) count: 18 opcode: 0x5f (finddef) count: 1109 opcode: 0x61 (setproperty) count: 78 opcode: 0x66 (getproperty) count: 471 opcode: 0xa0 (add) count: 32 opcode: 0xab (equals) count: 38 opcode: 0xb4 (in) count: 2 opcode: 0x100 (complex coercion) count: 414 opcode: 0x102 (writeMethodCall) count: 399

Steven Johnson

Comment 24

•

15 years ago

Comment on attachment 463207 [details] [diff] [review] Patch which analyzes in verifier pass to mark functions as simple and avoid methodFrame/stackOverflow Note that we can't skip the MethodFrame initialization for any methods that can call back into native code: native code might want to examine the current CodeContext, and skipping this step would make it invalid. (Exception: we could skip it for callbacks that are known to be safe, eg, MathUtils, but in the general case it's risky.)

Werner Sharp

Comment 25

•

15 years ago

Yes, any call out is forbidden at the moment except for coercions that I have carefully analyzed and know will be simple. For example, string->bool, string->number, number->string, etc. are okay but object->string are not (calls toString) or object->number (calls defaultValue). A blessed list could be expanded to include things in Math, etc. but I have not down this. The main failing opcodes are OP_finddef/OP_callproperty pairs (finding builtin trace followed by calling the routine for example) and we could theoretically track these if they showed some benefit.

Steven Johnson

Comment 26

•

15 years ago

(In reply to comment #25) > Yes, any call out is forbidden at the moment except for coercions that I have > carefully analyzed and know will be simple. Cool. If we do add a "blessed list" of some sort, let's be sure to capture the MethodFrame/CodeContext warning in a comment near it.

Werner Sharp

Updated

•

15 years ago

Depends on: 584834

Werner Sharp

Comment 27

•

15 years ago

Moving simple function analyzer patch to bug 584834.

Werner Sharp

Updated

•

15 years ago

Attachment #463207 - Flags: review?(lhansen)

Lars T Hansen

Comment 28

•

15 years ago

(In reply to comment #16) > I'll add that if generic VM helpers like doubleToAtom are able to throw AS3 > exceptions (maybe OutOfMemory?) then even very simple almost-leaf functions, > with no implicit AS3 calls, are also not leaf functions. We currently (10.1) do not throw OutOfMemory or ObjectTooLarge exceptions - we had something like that but the semantics were gloriously ill-defined. I hope to keep it this way.

Edwin Smith

Reporter

Updated

•

14 years ago

Depends on: 570521

Edwin Smith

Reporter

Updated

•

14 years ago

Depends on: 563146

Werner Sharp

Updated

•

14 years ago

Assignee: wsharp → nobody

Edwin Smith

Reporter

Updated

•

14 years ago

Depends on: 588750

Edwin Smith

Reporter

Updated

•

14 years ago

Depends on: 599120

Andre Kruetzfeldt

Updated

•

14 years ago

Flags: flashplayer-qrb+

Edwin Smith

Reporter

Updated

•

14 years ago

No longer depends on: 529832

Edwin Smith

Reporter

Updated

•

14 years ago

Attachment #395832 - Attachment mime type: application/octet-stream → text/plain

Edwin Smith

Reporter

Updated

•

13 years ago

Depends on: 686698

Edwin Smith

Reporter

Updated

•

13 years ago

Depends on: 512686

Edwin Smith

Reporter

Updated

•

13 years ago

Depends on: 724631

Edwin Smith

Reporter

Updated

•

13 years ago

Depends on: 735530

Sylvestre Ledru [:Sylvestre]

Comment 29

•

6 years ago

Tamarin is a dead project now. Mass WONTFIX.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Sylvestre Ledru [:Sylvestre]

Comment 30

•

6 years ago

Tamarin isn't maintained anymore. WONTFIX remaining bugs.

You need to log in before you can comment on or make changes to this bug.