Closed Bug 511873 Opened 15 years ago Closed 6 years ago

Reduce call overhead of jit'd functions

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: edwsmith, Unassigned)

References

Details

(Whiteboard: PACMAN, Tracking)

Attachments

(6 files, 1 obsolete file)

A call to a trivial function (every function) involves: * get MethodEnv* of callee * store args to stack * pass env, argc, argp (even when env/argc/argv may not be used) * call function indirectly (even when target might be known) * stack overflow check (add ebp+const, load stkbottom, compare, branch) * interrupt check (load interrupt flag, compare, branch) * create MethodFrame (load mf*, store, store 0 dxns, store env, store next) ... body * destroy MethodFrame (load cmf->next, store in core->currentMethodFrame) we can chip away at this overhead. Dependent bugs will address different cases.
Attachment #395831 - Attachment is obsolete: true
Attachment #395831 - Attachment is patch: false
Depends on: 511875
total-call-counts for calloverhead.as after removing the stack overflow check entirely (higher is better): before after % change t60 4701438000 5677923000 +21% beagleboard 492521000 572454000 +16%
total call counts with and without interrupt check with-interrupts no-interrupts %change beagleboard 83165000 98560000 +18%
Depends on: 512282
Depends on: 512286
(In reply to comment #0) > * pass env, argc, argp (even when env/argc/argv may not be used) Here's a thought: if relatively few callees use the 'env' parameter, we could now rely on the fact we have a reliable MethodFrame and obtain 'env' via core->currentMethodFrame when necessary.
i like where you're going with this, but we'd have to pass env so the callee can set up currentMethodFrame. come to think of it, we don't even link currentMethodFrame for native methods. bug?
(In reply to comment #6) > i like where you're going with this, but we'd have to pass env so the callee > can set up currentMethodFrame. doh. yeah. I'm seduced by the idea of getting to argc/argv so we can go FASTCALL everywhere :-) > come to think of it, we don't even link > currentMethodFrame for native methods. bug? hmm -- that's a good question. if it's a bug, it hasn't caused any obvious misbehavior yet.
> * call function indirectly (even when target might be known) So which targets can we know reliably enough to make constant? final native methods seem safe enough...
We probably can do (env, args) in the common cases. Only methods with optional args or rest args need to be passed argc. The rest have an argc that is known from MethodInfo.
Depends on: 512685
Depends on: 512972
(In reply to comment #7) > hmm -- that's a good question. if it's a bug, it hasn't caused any obvious > misbehavior yet. Imagine you call a native method from AS3 and the native method calls back into more AS3, and then triggers an exception. the stack trace will be missing the native method. Not because of the missing MethodFrame, but because of the missing CallStackNode.
Depends on: 513007
Summary: reduce call overhead of jit'd functions (tracking) → reduce call overhead of jit'd functions
Whiteboard: Tracking
Target Milestone: --- → Future
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: Tracking → PACMAN, Tracking
Depends on: 560120
Blocks: 557926
Here is a quick study of the jit code prolog for this as3 function: class Vector { var x:Number, y:Number function magnitude2() { return x*x + y*y } } observe the prologue is much longer and has more loads and stores than the actual method body. Also some quality problems in the jit'd code * setting up core->methodFrame is expensive. restoring it is done with another load from core, instead of the saved pointer from the prolog * &core[currentMethodFrame] is const-folded in one case, but not another, leading to extra instructions and register pressure. (better if it's not const folded at all, so we only need a register for &core) * register assignments around the call to stackOverflow (cold path) are unfortunate and cause spilling on the hot pat
Attachment #442905 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #11) > * setting up core->methodFrame is expensive. I spent a day thinking about alternate ways to do the MethodFrame stuff -- unfortunately it's hard to model the Flash security model otherwise. Lars informs me that the deoptimization work that Bernd has in the pipeline will make this much easier, but that's a ways down the road. > restoring it is done with another load from core, instead of the saved pointer from the prolog That sounds wrong.
work splinters into two paths: 1. don't use MF when we don't need it (this is a leaf function, so we're just churning core->currentMethodFrame pointer for no good reason) 2. make it cheaper to maintain the call stack. some random ideas: - in large-ish methods, preserving a pointer to the previous MF would just spill, which would be more expensive than the explicit loads & stores we have now. in small-ish mehtods, it wouldn't spill, so keeping the load instruction live is better. only the Assembler knows which way is better, and we can't express this stuff in LIR right now. - on x86 (e.g.) if we can walk the machine call stack and get the MethodEnv/dnxns pointers along each stop, maybe we dont need explicit MF structs for every call frame. - our current calling conventions require (MethodEnv*, argc, argv). a different convention could pass along MF*, which makes it cheaper to establish/teardown MF structures. (much like using EBP to maintain a walkable callstack)
More about leaf functions: The VM already keeps a stack space reserve (R). A call that detects stack overflow (crossing into the reserve) calls AvmCore::stackOverflow(), which uses the reserve to construct and throw an actual stack overflow error object. If the true stack is exhaused while constructing a StackOverflow object, it's a bug in the VM (reserve too small). All the code involved in that construction is vm code. Any user leaf function is therefore guaranteed to have at R bytes of stack available. Any VM leaf function involved in constructing a stack overflow error isn't guaranteed to have R bytes, but *is* guaranteed to have *enough* bytes, assuming R is large enough to avoid true stack overflow. Therefore, leaf function's don't need an explicit stack overflow check. Leaf functions also don't need to bother with creating or destroying AvmCore::currentMethodFrame, since they won't call any internal VM code that inspects currentMethodFrame. A leaf function *can* call into VM helper code (bending the idea of a leaf function), as long as that VM helper code doesn't violate the "leaf function" semantics: - no inspection of MethodFrame, no calls to unbounded user code, plus anything else we want to constrain it with. of course a true leaf function doesn't make *any* calls (vm helpers or user code) and doesn't need any such caveats. todo: what % of functions are leaf functions? statically? dynamically? what about invocations that could call out to something, but don't? (prologue/epilogue sinking?)
In untyped code, or probably even losely typed code, virtually no methods (statically) will be leaf functions due to invocation of valueOf / toString semantics for '+' and so on. (Just a note, not meant to skew the discussion in any particular direction.)
I'll add that if generic VM helpers like doubleToAtom are able to throw AS3 exceptions (maybe OutOfMemory?) then even very simple almost-leaf functions, with no implicit AS3 calls, are also not leaf functions.
Depends on: 414870
Depends on: 566769
Component: Virtual Machine → JIT Compiler (NanoJIT)
Depends on: 529832
Summary: reduce call overhead of jit'd functions → Reduce call overhead of jit'd functions
Data point: a full performace test run on x86-32 2.0ghz core-2-duo with stack overflow checking disabled showed no changes outside "noise" (all changes < 1 std dev).
I looked at this one a bit this week with no great progress. 1. It would be great to skip this overhead for leaf functions as has been suggested. Unfortunately, we don't really know we are calling out to a helper function until we are JITing the main function after the prologue has been written. This is tricky to get right before that time. A simple coerce_to_number opcode might do nothing but until we JIT we don't exactly know. Lars simple Object constructor benchmark could show some improvement if we could skip all this overhead though. 2. Could we make handleStackOverflow a custom ASM function that does its own register preservation? So instead of the prologue having to worry about any register saving/restoring, the stack overflow routine itself does all the work. All it needs is 'env' in the most convenient register for the prologue code. 3. On 32-bit, I see the 'ap' param getting spilled even though this is already safely saved on the stack. On first try I thought I could init this after the rest of the prologue code. This works okay on 32-bit x86 because its on the stack and does not get overwritten. It does not work on 64-bit because it comes in on a register. But maybe the 32-bit JIT could understand that param types on the stack don't really need spilling somehow?
(In reply to comment #18) > I looked at this one a bit this week with no great progress. > > 1. It would be great to skip this overhead for leaf functions as has been > suggested. Unfortunately, we don't really know we are calling out to a helper > function until we are JITing the main function after the prologue has been > written. This is tricky to get right before that time. A simple > coerce_to_number opcode might do nothing but until we JIT we don't exactly > know. Lars simple Object constructor benchmark could show some improvement if > we could skip all this overhead though. two ways to do this: 1. do a conservative analysis in verifier phase 1; it would "fail" if it saw any operation disqualifying the method as a leaf method. 2. generate parts of the prologue later; we do a bit of this now (we have two LIR buffers) we could do more, with some care. > 2. Could we make handleStackOverflow a custom ASM function that does its own > register preservation? So instead of the prologue having to worry about any > register saving/restoring, the stack overflow routine itself does all the work. > All it needs is 'env' in the most convenient register for the prologue code. We could also use the jit to generate register-preserving thunks to existing helpers (saving us lots of wrapper-writing). however, in a previous experiment, I got no measurable benefit from completely eliminating the stack overflow check. so any gains here may not show up until other things are faster. (x86-32 result; other cpu's may vary) > But maybe the 32-bit JIT could understand that param > types on the stack don't really need spilling somehow? The x86-32 backend does understand that parameters can be rematerialized without spilling, perhaps a bug has crept in. All other backends assign it to a register parameter.
Assignee: nobody → wsharp
This patch adds a NullWriter subclass to the verification stage that determines whether a function is simple - no throwing exceptions, no calling subfunctions or unknown AS3 code, allowing various coercions, opcode checking, etc. Also included is some disabled debugging code to generate statistics for each file run. Acceptance and performance suite statistics will be posted next.
Attachment #463207 - Flags: review?(lhansen)
On my 2.66 GHz Xeon 5150 running Windows 7, the above patch speeds up simple function calls by approximately 20-30%. Calling Vector.magnitude2 100 million times: 740 vs 945 msec Calling a simple getter returning an int 100 million times: 490 vs 700 msec This is the perl output for the acceptance test suite: Total avmcore instances: 2418 Total functions verified: 52238 Total simple functions verified: 7838 (15.0044029250737%) Total complex functions run: 25729865 Total simple functions run: 614090 (2.33104710359549%) Normalized simple functions run (percent per file): 3.22175867818345 opcode: 0x03 (throw) count: 4 opcode: 0x04 (getsuper) count: 5 opcode: 0x05 (setsuper) count: 2 opcode: 0x1e (nextname) count: 6 opcode: 0x41 (call) count: 19 opcode: 0x42 (construct) count: 4 opcode: 0x45 (callsuper) count: 32 opcode: 0x46 (callproperty) count: 1163 opcode: 0x49 (constructsuper) count: 453 opcode: 0x4c (callproplex) count: 13 opcode: 0x57 (newactivation) count: 3086 opcode: 0x58 (newclass) count: 90 opcode: 0x59 (getdescendants) count: 1 opcode: 0x5b (findpropglobalstrict) count: 57 opcode: 0x5c (findpropglobal) count: 187 opcode: 0x5d (findpropstrict) count: 15 opcode: 0x5e (findproperty) count: 2 opcode: 0x5f (finddef) count: 21056 opcode: 0x61 (setproperty) count: 2558 opcode: 0x66 (getproperty) count: 2988 opcode: 0x68 (initproperty) count: 21 opcode: 0xa0 (add) count: 162 opcode: 0xab (equals) count: 305 opcode: 0xac (strictequals) count: 1 opcode: 0xb1 (instanceof) count: 2 opcode: 0xb4 (in) count: 1 opcode: 0x100 (complex coercion) count: 9622 opcode: 0x102 (writeMethodCall) count: 2545 This is the perl output for the performance test suite: Total avmcore instances: 326 Total functions verified: 4186 Total simple functions verified: 994 (23.7458193979933%) Total complex functions run: 614368424 Total simple functions run: 398783973 (39.3607096208647%) Normalized simple functions run (percent per file): 11.8256777869279 opcode: 0x03 (throw) count: 2 opcode: 0x1e (nextname) count: 4 opcode: 0x41 (call) count: 6 opcode: 0x42 (construct) count: 56 opcode: 0x45 (callsuper) count: 10 opcode: 0x46 (callproperty) count: 301 opcode: 0x49 (constructsuper) count: 44 opcode: 0x57 (newactivation) count: 207 opcode: 0x58 (newclass) count: 1 opcode: 0x5c (findpropglobal) count: 18 opcode: 0x5f (finddef) count: 1109 opcode: 0x61 (setproperty) count: 78 opcode: 0x66 (getproperty) count: 471 opcode: 0xa0 (add) count: 32 opcode: 0xab (equals) count: 38 opcode: 0xb4 (in) count: 2 opcode: 0x100 (complex coercion) count: 414 opcode: 0x102 (writeMethodCall) count: 399
Comment on attachment 463207 [details] [diff] [review] Patch which analyzes in verifier pass to mark functions as simple and avoid methodFrame/stackOverflow Note that we can't skip the MethodFrame initialization for any methods that can call back into native code: native code might want to examine the current CodeContext, and skipping this step would make it invalid. (Exception: we could skip it for callbacks that are known to be safe, eg, MathUtils, but in the general case it's risky.)
Yes, any call out is forbidden at the moment except for coercions that I have carefully analyzed and know will be simple. For example, string->bool, string->number, number->string, etc. are okay but object->string are not (calls toString) or object->number (calls defaultValue). A blessed list could be expanded to include things in Math, etc. but I have not down this. The main failing opcodes are OP_finddef/OP_callproperty pairs (finding builtin trace followed by calling the routine for example) and we could theoretically track these if they showed some benefit.
(In reply to comment #25) > Yes, any call out is forbidden at the moment except for coercions that I have > carefully analyzed and know will be simple. Cool. If we do add a "blessed list" of some sort, let's be sure to capture the MethodFrame/CodeContext warning in a comment near it.
Depends on: 584834
Moving simple function analyzer patch to bug 584834.
Attachment #463207 - Flags: review?(lhansen)
(In reply to comment #16) > I'll add that if generic VM helpers like doubleToAtom are able to throw AS3 > exceptions (maybe OutOfMemory?) then even very simple almost-leaf functions, > with no implicit AS3 calls, are also not leaf functions. We currently (10.1) do not throw OutOfMemory or ObjectTooLarge exceptions - we had something like that but the semantics were gloriously ill-defined. I hope to keep it this way.
Depends on: 570521
Depends on: 563146
Assignee: wsharp → nobody
Depends on: 588750
Depends on: 599120
Flags: flashplayer-qrb+
No longer depends on: 529832
Attachment #395832 - Attachment mime type: application/octet-stream → text/plain
Depends on: 686698
Depends on: 512686
Depends on: 724631
Depends on: 735530
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: