Closed
Bug 511873
Opened 15 years ago
Closed 6 years ago
Reduce call overhead of jit'd functions
Categories
(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect)
Tamarin Graveyard
Baseline JIT (CodegenLIR)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: edwsmith, Unassigned)
References
Details
(Whiteboard: PACMAN, Tracking)
Attachments
(6 files, 1 obsolete file)
2.50 KB,
text/plain
|
Details | |
2.50 KB,
patch
|
Details | Diff | Splinter Review | |
3.76 KB,
text/plain
|
Details | |
21.71 KB,
application/zip
|
Details | |
67.98 KB,
patch
|
Details | Diff | Splinter Review | |
6.48 KB,
text/plain
|
Details |
A call to a trivial function (every function) involves:
* get MethodEnv* of callee
* store args to stack
* pass env, argc, argp (even when env/argc/argv may not be used)
* call function indirectly (even when target might be known)
* stack overflow check (add ebp+const, load stkbottom, compare, branch)
* interrupt check (load interrupt flag, compare, branch)
* create MethodFrame (load mf*, store, store 0 dxns, store env, store next)
... body
* destroy MethodFrame (load cmf->next, store in core->currentMethodFrame)
we can chip away at this overhead. Dependent bugs will address different cases.
Reporter | ||
Comment 1•15 years ago
|
||
Attachment #395831 -
Attachment is obsolete: true
Reporter | ||
Updated•15 years ago
|
Attachment #395831 -
Attachment is patch: false
Reporter | ||
Comment 2•15 years ago
|
||
Reporter | ||
Comment 3•15 years ago
|
||
total-call-counts for calloverhead.as after removing the stack overflow check entirely (higher is better):
before after % change
t60 4701438000 5677923000 +21%
beagleboard 492521000 572454000 +16%
Reporter | ||
Comment 4•15 years ago
|
||
total call counts with and without interrupt check
with-interrupts no-interrupts %change
beagleboard 83165000 98560000 +18%
Comment 5•15 years ago
|
||
(In reply to comment #0)
> * pass env, argc, argp (even when env/argc/argv may not be used)
Here's a thought: if relatively few callees use the 'env' parameter, we could now rely on the fact we have a reliable MethodFrame and obtain 'env' via core->currentMethodFrame when necessary.
Reporter | ||
Comment 6•15 years ago
|
||
i like where you're going with this, but we'd have to pass env so the callee can set up currentMethodFrame. come to think of it, we don't even link currentMethodFrame for native methods. bug?
Comment 7•15 years ago
|
||
(In reply to comment #6)
> i like where you're going with this, but we'd have to pass env so the callee
> can set up currentMethodFrame.
doh. yeah. I'm seduced by the idea of getting to argc/argv so we can go FASTCALL everywhere :-)
> come to think of it, we don't even link
> currentMethodFrame for native methods. bug?
hmm -- that's a good question. if it's a bug, it hasn't caused any obvious misbehavior yet.
Comment 8•15 years ago
|
||
> * call function indirectly (even when target might be known)
So which targets can we know reliably enough to make constant? final native methods seem safe enough...
Reporter | ||
Comment 9•15 years ago
|
||
We probably can do (env, args) in the common cases. Only methods with optional args or rest args need to be passed argc. The rest have an argc that is known from MethodInfo.
Reporter | ||
Comment 10•15 years ago
|
||
(In reply to comment #7)
> hmm -- that's a good question. if it's a bug, it hasn't caused any obvious
> misbehavior yet.
Imagine you call a native method from AS3 and the native method calls back into more AS3, and then triggers an exception. the stack trace will be missing the native method. Not because of the missing MethodFrame, but because of the missing CallStackNode.
Updated•15 years ago
|
Summary: reduce call overhead of jit'd functions (tracking) → reduce call overhead of jit'd functions
Whiteboard: Tracking
Target Milestone: --- → Future
Reporter | ||
Updated•15 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Updated•15 years ago
|
Whiteboard: Tracking → PACMAN, Tracking
Reporter | ||
Comment 11•15 years ago
|
||
Here is a quick study of the jit code prolog for this as3 function:
class Vector {
var x:Number, y:Number
function magnitude2() {
return x*x + y*y
}
}
observe the prologue is much longer and has more loads and stores than the actual method body. Also some quality problems in the jit'd code
* setting up core->methodFrame is expensive. restoring it is done with another load from core, instead of the saved pointer from the prolog
* &core[currentMethodFrame] is const-folded in one case, but not another, leading to extra instructions and register pressure. (better if it's not const folded at all, so we only need a register for &core)
* register assignments around the call to stackOverflow (cold path) are unfortunate and cause spilling on the hot pat
Reporter | ||
Updated•15 years ago
|
Attachment #442905 -
Attachment mime type: application/octet-stream → text/plain
Comment 12•15 years ago
|
||
(In reply to comment #11)
> * setting up core->methodFrame is expensive.
I spent a day thinking about alternate ways to do the MethodFrame stuff -- unfortunately it's hard to model the Flash security model otherwise. Lars informs me that the deoptimization work that Bernd has in the pipeline will make this much easier, but that's a ways down the road.
> restoring it is done with another load from core, instead of the saved pointer from the prolog
That sounds wrong.
Reporter | ||
Comment 13•15 years ago
|
||
work splinters into two paths:
1. don't use MF when we don't need it (this is a leaf function, so we're just churning core->currentMethodFrame pointer for no good reason)
2. make it cheaper to maintain the call stack. some random ideas:
- in large-ish methods, preserving a pointer to the previous MF would just spill, which would be more expensive than the explicit loads & stores we have now. in small-ish mehtods, it wouldn't spill, so keeping the load instruction live is better. only the Assembler knows which way is better, and we can't express this stuff in LIR right now.
- on x86 (e.g.) if we can walk the machine call stack and get the MethodEnv/dnxns pointers along each stop, maybe we dont need explicit MF structs for every call frame.
- our current calling conventions require (MethodEnv*, argc, argv). a different convention could pass along MF*, which makes it cheaper to establish/teardown MF structures. (much like using EBP to maintain a walkable callstack)
Reporter | ||
Comment 14•15 years ago
|
||
More about leaf functions:
The VM already keeps a stack space reserve (R). A call that detects stack overflow (crossing into the reserve) calls AvmCore::stackOverflow(), which uses the reserve to construct and throw an actual stack overflow error object. If the true stack is exhaused while constructing a StackOverflow object, it's a bug in the VM (reserve too small). All the code involved in that construction is vm code.
Any user leaf function is therefore guaranteed to have at R bytes of stack available. Any VM leaf function involved in constructing a stack overflow error isn't guaranteed to have R bytes, but *is* guaranteed to have *enough* bytes, assuming R is large enough to avoid true stack overflow.
Therefore, leaf function's don't need an explicit stack overflow check.
Leaf functions also don't need to bother with creating or destroying AvmCore::currentMethodFrame, since they won't call any internal VM code that inspects currentMethodFrame.
A leaf function *can* call into VM helper code (bending the idea of a leaf function), as long as that VM helper code doesn't violate the "leaf function" semantics: - no inspection of MethodFrame, no calls to unbounded user code, plus anything else we want to constrain it with.
of course a true leaf function doesn't make *any* calls (vm helpers or user code) and doesn't need any such caveats.
todo:
what % of functions are leaf functions?
statically?
dynamically?
what about invocations that could call out to something, but don't? (prologue/epilogue sinking?)
Comment 15•15 years ago
|
||
In untyped code, or probably even losely typed code, virtually no methods (statically) will be leaf functions due to invocation of valueOf / toString semantics for '+' and so on. (Just a note, not meant to skew the discussion in any particular direction.)
Reporter | ||
Comment 16•15 years ago
|
||
I'll add that if generic VM helpers like doubleToAtom are able to throw AS3 exceptions (maybe OutOfMemory?) then even very simple almost-leaf functions, with no implicit AS3 calls, are also not leaf functions.
Reporter | ||
Updated•15 years ago
|
Component: Virtual Machine → JIT Compiler (NanoJIT)
Reporter | ||
Updated•15 years ago
|
Summary: reduce call overhead of jit'd functions → Reduce call overhead of jit'd functions
Reporter | ||
Comment 17•15 years ago
|
||
Data point: a full performace test run on x86-32 2.0ghz core-2-duo with stack overflow checking disabled showed no changes outside "noise" (all changes < 1 std dev).
Comment 18•15 years ago
|
||
I looked at this one a bit this week with no great progress.
1. It would be great to skip this overhead for leaf functions as has been suggested. Unfortunately, we don't really know we are calling out to a helper function until we are JITing the main function after the prologue has been written. This is tricky to get right before that time. A simple coerce_to_number opcode might do nothing but until we JIT we don't exactly know. Lars simple Object constructor benchmark could show some improvement if we could skip all this overhead though.
2. Could we make handleStackOverflow a custom ASM function that does its own register preservation? So instead of the prologue having to worry about any register saving/restoring, the stack overflow routine itself does all the work. All it needs is 'env' in the most convenient register for the prologue code.
3. On 32-bit, I see the 'ap' param getting spilled even though this is already safely saved on the stack. On first try I thought I could init this after the rest of the prologue code. This works okay on 32-bit x86 because its on the stack and does not get overwritten. It does not work on 64-bit because it comes in on a register. But maybe the 32-bit JIT could understand that param types on the stack don't really need spilling somehow?
Reporter | ||
Comment 19•15 years ago
|
||
(In reply to comment #18)
> I looked at this one a bit this week with no great progress.
>
> 1. It would be great to skip this overhead for leaf functions as has been
> suggested. Unfortunately, we don't really know we are calling out to a helper
> function until we are JITing the main function after the prologue has been
> written. This is tricky to get right before that time. A simple
> coerce_to_number opcode might do nothing but until we JIT we don't exactly
> know. Lars simple Object constructor benchmark could show some improvement if
> we could skip all this overhead though.
two ways to do this:
1. do a conservative analysis in verifier phase 1; it would "fail" if it saw any operation disqualifying the method as a leaf method.
2. generate parts of the prologue later; we do a bit of this now (we have two LIR buffers) we could do more, with some care.
> 2. Could we make handleStackOverflow a custom ASM function that does its own
> register preservation? So instead of the prologue having to worry about any
> register saving/restoring, the stack overflow routine itself does all the work.
> All it needs is 'env' in the most convenient register for the prologue code.
We could also use the jit to generate register-preserving thunks to existing helpers (saving us lots of wrapper-writing). however, in a previous experiment, I got no measurable benefit from completely eliminating the stack overflow check. so any gains here may not show up until other things are faster. (x86-32 result; other cpu's may vary)
> But maybe the 32-bit JIT could understand that param
> types on the stack don't really need spilling somehow?
The x86-32 backend does understand that parameters can be rematerialized without spilling, perhaps a bug has crept in. All other backends assign it to a register parameter.
Comment 20•15 years ago
|
||
Updated•15 years ago
|
Assignee: nobody → wsharp
Comment 21•15 years ago
|
||
This patch adds a NullWriter subclass to the verification stage that determines whether a function is simple - no throwing exceptions, no calling subfunctions or unknown AS3 code, allowing various coercions, opcode checking, etc. Also included is some disabled debugging code to generate statistics for each file run. Acceptance and performance suite statistics will be posted next.
Attachment #463207 -
Flags: review?(lhansen)
Comment 22•15 years ago
|
||
Comment 23•15 years ago
|
||
On my 2.66 GHz Xeon 5150 running Windows 7, the above patch speeds up simple function calls by approximately 20-30%.
Calling Vector.magnitude2 100 million times: 740 vs 945 msec
Calling a simple getter returning an int 100 million times: 490 vs 700 msec
This is the perl output for the acceptance test suite:
Total avmcore instances: 2418
Total functions verified: 52238
Total simple functions verified: 7838 (15.0044029250737%)
Total complex functions run: 25729865
Total simple functions run: 614090 (2.33104710359549%)
Normalized simple functions run (percent per file): 3.22175867818345
opcode: 0x03 (throw) count: 4
opcode: 0x04 (getsuper) count: 5
opcode: 0x05 (setsuper) count: 2
opcode: 0x1e (nextname) count: 6
opcode: 0x41 (call) count: 19
opcode: 0x42 (construct) count: 4
opcode: 0x45 (callsuper) count: 32
opcode: 0x46 (callproperty) count: 1163
opcode: 0x49 (constructsuper) count: 453
opcode: 0x4c (callproplex) count: 13
opcode: 0x57 (newactivation) count: 3086
opcode: 0x58 (newclass) count: 90
opcode: 0x59 (getdescendants) count: 1
opcode: 0x5b (findpropglobalstrict) count: 57
opcode: 0x5c (findpropglobal) count: 187
opcode: 0x5d (findpropstrict) count: 15
opcode: 0x5e (findproperty) count: 2
opcode: 0x5f (finddef) count: 21056
opcode: 0x61 (setproperty) count: 2558
opcode: 0x66 (getproperty) count: 2988
opcode: 0x68 (initproperty) count: 21
opcode: 0xa0 (add) count: 162
opcode: 0xab (equals) count: 305
opcode: 0xac (strictequals) count: 1
opcode: 0xb1 (instanceof) count: 2
opcode: 0xb4 (in) count: 1
opcode: 0x100 (complex coercion) count: 9622
opcode: 0x102 (writeMethodCall) count: 2545
This is the perl output for the performance test suite:
Total avmcore instances: 326
Total functions verified: 4186
Total simple functions verified: 994 (23.7458193979933%)
Total complex functions run: 614368424
Total simple functions run: 398783973 (39.3607096208647%)
Normalized simple functions run (percent per file): 11.8256777869279
opcode: 0x03 (throw) count: 2
opcode: 0x1e (nextname) count: 4
opcode: 0x41 (call) count: 6
opcode: 0x42 (construct) count: 56
opcode: 0x45 (callsuper) count: 10
opcode: 0x46 (callproperty) count: 301
opcode: 0x49 (constructsuper) count: 44
opcode: 0x57 (newactivation) count: 207
opcode: 0x58 (newclass) count: 1
opcode: 0x5c (findpropglobal) count: 18
opcode: 0x5f (finddef) count: 1109
opcode: 0x61 (setproperty) count: 78
opcode: 0x66 (getproperty) count: 471
opcode: 0xa0 (add) count: 32
opcode: 0xab (equals) count: 38
opcode: 0xb4 (in) count: 2
opcode: 0x100 (complex coercion) count: 414
opcode: 0x102 (writeMethodCall) count: 399
Comment 24•15 years ago
|
||
Comment on attachment 463207 [details] [diff] [review]
Patch which analyzes in verifier pass to mark functions as simple and avoid methodFrame/stackOverflow
Note that we can't skip the MethodFrame initialization for any methods that can call back into native code: native code might want to examine the current CodeContext, and skipping this step would make it invalid. (Exception: we could skip it for callbacks that are known to be safe, eg, MathUtils, but in the general case it's risky.)
Comment 25•15 years ago
|
||
Yes, any call out is forbidden at the moment except for coercions that I have carefully analyzed and know will be simple. For example, string->bool, string->number, number->string, etc. are okay but object->string are not (calls toString) or object->number (calls defaultValue).
A blessed list could be expanded to include things in Math, etc. but I have not down this. The main failing opcodes are OP_finddef/OP_callproperty pairs (finding builtin trace followed by calling the routine for example) and we could theoretically track these if they showed some benefit.
Comment 26•15 years ago
|
||
(In reply to comment #25)
> Yes, any call out is forbidden at the moment except for coercions that I have
> carefully analyzed and know will be simple.
Cool. If we do add a "blessed list" of some sort, let's be sure to capture the MethodFrame/CodeContext warning in a comment near it.
Comment 27•15 years ago
|
||
Moving simple function analyzer patch to bug 584834.
Updated•15 years ago
|
Attachment #463207 -
Flags: review?(lhansen)
Comment 28•15 years ago
|
||
(In reply to comment #16)
> I'll add that if generic VM helpers like doubleToAtom are able to throw AS3
> exceptions (maybe OutOfMemory?) then even very simple almost-leaf functions,
> with no implicit AS3 calls, are also not leaf functions.
We currently (10.1) do not throw OutOfMemory or ObjectTooLarge exceptions - we had something like that but the semantics were gloriously ill-defined. I hope to keep it this way.
Updated•14 years ago
|
Assignee: wsharp → nobody
Updated•14 years ago
|
Flags: flashplayer-qrb+
Reporter | ||
Updated•14 years ago
|
Attachment #395832 -
Attachment mime type: application/octet-stream → text/plain
Comment 29•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 30•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•