We should investigate the most commonly used self-hosted functions and make sure that we compile them to Baseline JIT as early as possible. The most commonly used functions could be compiled at first-run, instead of waiting for the standard baseline warm-up procedure.
We should measure how much time we spend interpreting self-hosted code, because I'm not sure this is worth it compared to the bigger perf issues that show up in profiles. It could easily backfire when we call a self-hosted function once, for instance.
I'm not suggesting that we compile all of the self-hosted functions. I'd like a focused approach to this, based in information gleaned from our target websites. Basically, we take our top-5. Look at them individually. There should be a subset of self-hosted functions that are used heavily by some or all of these sites. For these self-hosted functions only, we compile on first run (or if possible, schedule for background compilation immediately after the engine starts up). So we would only pre-compiling (or first-use compiling) self-hosted functions that we _know_ are going to be hit heavily the sites we are trying to target.
Brian, would you be able to take a look at this? Seems like it would be a relatively straightforward patch - a bit of instrumentation, measurement, and a small change to our baseline compilation tuning.
(In reply to Kannan Vijayan [:djvj] from comment #2) > For these self-hosted functions only, we compile on first run > (or if possible, schedule for background compilation immediately after the > engine starts up). Each compartment has its own copy of these functions + JIT code though. > So we would only pre-compiling (or first-use compiling) self-hosted > functions that we _know_ are going to be hit heavily the sites we are trying > to target. IIUC it means when foo.com calls array.map() once, it would also be compiled immediately, even though right now it would just run in the interpreter? To me it still feels like something that won't make a measurable difference, so I'd love to see measurements suggesting the opposite.
(In reply to Jan de Mooij [:jandem] from comment #4) > Each compartment has its own copy of these functions + JIT code though. Ah, so maybe it should be scheduled for background compilation when a new main-thread compartment is created. That said, let's forget about that for now, and just do the compile-on-first-run bit. > IIUC it means when foo.com calls array.map() once, it would also be compiled > immediately, even though right now it would just run in the interpreter? The top sites we're measuring with are typical of the new class of js-heavy sites. Typically built on react, ember, jquery, or other frameworks that will definitely hit _some_ subset of these heavily. Those subsets will be reliable. And the potential miniscule hit to other sites that don't behave that way is less of a user experience issue than the benefit to top-tier sites that most of our users hit (that we're optimizing against). Sstangl also commented that V8 does this already (pre-compile self-hosted functions). One would assume they looked into it and found it a net benefit. > To me it still feels like something that won't make a measurable difference, > so I'd love to see measurements suggesting the opposite. It may not make a measurable difference individually. Even so, if we can reason that it will improve things even by a fraction of a percent which is lost in the noise, adding this together to all the other incremental, miniscule perf improvements should yield a sum benefit. The thing is, this is a small enough change, and we're under enough pressure, that the amount of pre-evaluation time we have available is limited. If we see call-counts high for a particular self-hosted function F() across a number of our top sites, we can easily use our intuition to realize that just baseline-compiling F() on first-use will be a net benefit on those sites - period. We might not know exactly how _much_ of a benefit on those sites, but we can be sure it'll be _something_.
I agree with Jan, I don't think this will make a measurable difference and it is as likely as not to backfire and hurt both our throughput (baseline compilation isn't free) and memory usage. The warmup heuristics exist for a reason. One way to get around this problem, though, would be to be able to share baseline jitcode between different baseline scripts. Then we could compile the baseline code once for the entire runtime and reuse it for each compartment. Because the incremental time and memory usage of creating a baselinescript and attaching an existing JitCode would be negligible, we could immediately "compile" the self hosted script on first use. The same infrastructure could potentially be used to do the same thing for content scripts. If we had a way to determine if a content script is the same wrt compilation as a previous script then we could immediately reuse both the script bytecode and baseline jitcode when we parse that second script. (I tried to do the former --- reuse bytecode without going through full parsing --- a while ago but got hung up on the sameness-determination issue above; it'd be worth revisiting and might make a difference within a single tab if the same code keeps appearing in different iframes, but it's getting a little far afield for this bug.) I don't know any obvious hangups preventing this from working (the baseline jitcode would need to change when compiling in this mode, clearly) and can look into implementing it.
Sharing the baseline jitcode is a great idea. Doing it just for self-hosted code should also be straightforward (since the "same-code" issue is trivial to resolve there). If you're interested in taking a stab at that, I'd be down. Relatedly, I just r+ed a patch from Jan in bug 1358047 to share Baseline CacheIR stubcode at the zone level instead of per-compartment. I can make a separate bug for that. On topic: I have to strongly dispute the backfire potential for first-use-compile on a measured subset of self-hosted functions. The warmup heuristics were developed on a different set of code (all JSScripts as used across the engine in response to external code). The heuristics for the subset that is self-hosted code (and more relevantly, the subset of self-hosted code that we explicitly measure to be heavily used on top websites), is going to be different. The memory usage issues are less of a concern given Jason's work on moving to CPZ instead of CPG.
Let's reprioritize if there's data -- seems like not a lot of bang for the buck here, even if implemented.
Whiteboard: [qf:p1] → [qf]
You need to log in before you can comment on or make changes to this bug.