IonGenericCall stub is misattributed to caller in JIT profiles
Categories
(Core :: JavaScript Engine: JIT, enhancement, P1)
Tracking
()
People
(Reporter: mstange, Assigned: jrmuizel)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
10.80 KB,
patch
|
Details | Diff | Splinter Review |
Profile: https://share.firefox.dev/46VMGDl
In this profile, callHooks
calls callHook
, which calls various functions via the new IonGenericCall
stub from bug 1829411.
However, in profiles, the time spent in the IonGenericCall
stub looks like it was part of the execution of callHooks
, when it should have been part of callHook
instead.
This causes misleading comparison reports.
Comment 1•2 years ago
|
||
It's not clear to me that this is possible to solve.
The obvious answer, pushing a frame pointer in the call stub, is impossible. We enter the stub with a bunch of arguments already on the stack. The stub is responsible for doing a small amount of additional work to finish laying out the stack the way the callee expects it. There's no room on the stack for us to shove a new frame pointer in the middle of our careful work.
If there's a fix for this, it may have to be on the profiler end. Maybe you can assign IonGenericCall self-time to its own bucket instead of attributing it to the caller?
Reporter | ||
Comment 2•2 years ago
|
||
Would the following work?
- For the common case of a scripted call, copy the code for doing such a call into the Ion generated code, so that the stub doesn't have to be hit at all.
- For all the other cases,
call
into the stub. - In the stub, push a framepointer and copy the arguments to a different spot on the stack.
- Do the rest of the adjustment / calling.
Jeff and I have been discussing 1 and hypothesized that the branch predictor might like it more.
Comment 3•2 years ago
|
||
I spent 30 seconds starting to write a prototype for #1, and quickly ran into problems. The scripted call path can branch off to three separate places in the rest of the trampoline, so we need a bunch of extra trampoline pointers, multiplied by two for the call and construct stubs, and those jumps aren't short relative jumps so we need to generate less efficient code for the branches, and then since the branches aren't the same we have to duplicate a bunch of logic instead of implementing it in one place, and then... Long story short, I really don't think it's a good idea. I am dubious that the branch predictor would care in any realistic workload; I think the branch history table should already do a pretty good job disambiguating the various paths through this code.
Doing 3+4 in every case is maybe possible, although a) it will be less efficient than the real code path, and b) it is pretty invasive code-wise, because all the code that currently tail-calls the callee would have to be rewritten to sometimes do a real call so that we can clean up our new frame on the way out.
So I revise my assessment from "maybe impossible" to "likely possible, but quite unpleasant".
Note that, unlike baseline ICs, this should only affect self time for the trampoline. The trampoline won't leave any traces of itself on the stack after doing the call (except in the case of native calls, where we set up a real frame pointer already).
Reporter | ||
Comment 4•2 years ago
|
||
Thank you for giving it a try!
(In reply to Iain Ireland [:iain] from comment #3)
The scripted call path can branch off to three separate places in the rest of the trampoline, so we need a bunch of extra trampoline pointers, multiplied by two for the call and construct stubs,
How was this working before? Could this be solved by duplicating more of the trampoline code?
Comment 5•2 years ago
|
||
Prior to the trampoline, we had something like:
if the callee is not a function, jump to out-of-line VM call
if the callee function can only be called/constructed, and we're constructing/calling, jump to out-of-line VM call
if the callee function does not have a JIT entry, jump to OOL VM call
call jit entry
<ool VM call rejoins here>
With the trampoline, the Ion code just calls the trampoline, and in the trampoline we have (somewhat simplified):
if the callee is not a function, jump to <bound function>
if the callee function can only be called/constructed, and we're constructing/calling, jump to <vmcall>
if the callee function does not have a JIT entry, jump to <native function>
call jit entry
<bound function>
do the bound function thing
<native function>
do the native function thing
<vmcall>
call into the VM
The important thing to notice is that there are several checks that we have to do before taking the jit entry path, and now that we support more cases, all the checks jump to different places. If we want to duplicate the call jit entry
part in the Ion code, then those jumps all need to end up in different parts of the trampoline. Because the trampoline is not being compiled at the same time as the Ion code, we don't know the relative offset between the branch and the destination, so we have to generate less efficient code for the jumps.
We could duplicate more of the trampoline code, but there's no clear place to draw a line other than "generate the entire trampoline out of line" (the status quo) or "generate all of this code inline" (which bloats the code a lot). It also increases the cost of adding new paths to the trampoline in the future.
Note that on average V8 is more likely to implement something by having it call a shared stub, while (with a few exceptions like this one) we tend to generate more code inline. That choice hasn't hurt V8 particularly badly, and I don't think it's likely to hurt us here.
Assignee | ||
Comment 6•2 years ago
|
||
We discovered that V8 has the same unwinding problem with it's generic call stub (Builtins_Call_ReceiverIsNotNullOrUndefined) which reduces the value of fixing this in SM until we fix V8
Comment 7•2 years ago
|
||
Would it work to create a separate bucket for this and the V8 generic call stub, and assign self time for both call stubs to that bucket instead of to the caller?
Assignee | ||
Comment 8•1 years ago
|
||
I wrote a patch for V8 that fixes this. I'll try SM next.
Updated•1 years ago
|
Updated•1 years ago
|
Reporter | ||
Comment 9•1 year ago
|
||
Another option would be to make a new copy of the stub for every Ion-compiled function when profiling, for example when the "make interpreter stubs" flag is set.
Assignee | ||
Comment 10•9 months ago
|
||
Description
•