Open
Bug 847160
Opened 12 years ago
Updated 3 months ago
IonMonkey: Better code for self-hosted functions and aggressive inlining
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
NEW
People
(Reporter: evilpie, Unassigned)
References
(Blocks 3 open bugs)
Details
Attachments
(2 files)
Julian Viereck just showed me this http://jsperf.com/iter-array testcase on twitter. The forEach case is unbelievable slow compared to everything else. We have multiple problems here:
- We don't inline ArrayForEach, which might be worthwhile
- The generated code for ForEach is very bad, we don't hoist bounds checks in the inner loop and use a CallGeneric for the callback
- The call to the callback in ArrayForEach should be completely inlined.
I attached a shell testcase that shows this problem.
Comment 1•12 years ago
|
||
Also, from the results of that test there is a big performance regression on the different simple for loops, counting up/down:
browser (runs) -- down / up / up with length cache
FF18 (3) -- 839,777 / 839,777 / 839,777
FF19 (5) -- 821,707 / 821,707 / 821,707
FF20 (1) -- 147,997 / 50,489 / 50,489
FF22 (4) -- 207,446 / 119,751 / 199,751
I am not sure what is causing the slowdown from 19 to 20 on those tests.
Comment 2•12 years ago
|
||
That's interesting. I wonder if this regressed since landing the self-hosting or if the slight differences to my performance tests make all the difference here.
Back when I last tested this in December, there wasn't any speed difference at all between the forEach and the numeric for loop versions of my tests.
Comment 3•12 years ago
|
||
Yeah, so this regressed. Strongly.
The attached testcase takes ~10x as long as it does when commenting out the the forEach version and activating the numeric loop version.
With methodjit, (i.e., --no-ion), it runs slightly faster, but the numeric loop version is only ~2x faster.
Maybe the regression is somehow related to the changes for ParallelArrays?
Flags: needinfo?(nmatsakis)
Comment 4•12 years ago
|
||
When did this regress? Should we be tracking this for some release?
Keywords: regression
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Comment 5•12 years ago
|
||
I am not sure what changes would have caused the regression, but of course I wouldn't rule it out.
Flags: needinfo?(nmatsakis)
Comment 6•12 years ago
|
||
Mmh, so this didn't regress after all: the very first Nightly containing the self-hosted array forEach implementation[1] is just as fast as today's.
Having said that, my testing before the landing clearly showed speed-parity with numeric for loops at some point. Unfortunately, I only did ad hoc testing without really documenting the results or making them reproducible. Luckily, evilpie already identified what needs to be improved to speed this up.
[1]: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-23-03-08-27-mozilla-central/
Comment 7•12 years ago
|
||
Um. Both today's build and the builds right after landing show self-hosted forEach comparable to manual numeric loops on the testcase in bug 602132, say.
So what's special about the testcase(s) here?
Comment 8•12 years ago
|
||
Furthermore, if I take the attachments in this bug and use a manually self-hosted forEach, I don't see them get much faster. That is, the self-hosted forEach does just as well as a hand-rolled forEach _function_ but not nearly as well as numerical iteration with everything inlined.
It's entirely possible that we used to only compare to hand-rolled forEach.....
The main hit is in fact that we don't inline the call to the callback in ArrayForEach.
Comment 9•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #7)
> So what's special about the testcase(s) here?
The answer to that is surprisingly simple: these testcases run inside a test() function whereas the ones in bug 602132 run in the top-level script. If you change the latter to run inside a function, times roughly quadruple.
Evilpie, do you know of any heuristics in IM that cause these differences?
Comment 10•12 years ago
|
||
Oh, just did another quick variation I hadn't thought of before: Moving just the forEach callback out of the test() function to the top-level script restores the speed. So it looks like inner functions aren't inlined into ArrayForEach and somehow throw off some optimizations like hoisted bounds checks.
Comment 11•12 years ago
|
||
Did some investigation for why forEach is so slow (in particular, why we don't inline) beyond the obvious "we don't have enough TI resolution to inline". That is, using callsite cloning or the landing of bug 804676 do *not* solve these problems. This comment also applies for bug 851120.
There are 2 things we want to be inlineable for self-hosted higher-order operations like forEach to be fast: the call to forEach itself (for nested iteration, like the one in bug 851120), and the call to the kernel.
The call to forEach is not inlineable because:
- For semantic reasons (like keeping functions' .length property spec-compliant), forEach uses |arguments[i]|, which is NYI in Ion.
- jsanalyze.cpp marks functions that contains loops as not inlineable. bhackett and h4writer have said on IRC that this should be enabled sometime in the future?
- Heavyweight functions are not inlineable.
The call to the kernel function is not inlineable because:
- TypeObjects are not inlineable. See bug 847694.
- Closures aren't inlineable. Related to heavyweight functions not being inlineable.
Comment 12•12 years ago
|
||
Inlining closures probably go hand in hand with some kind of closure conversion, accounting for deopting on |f.caller| and all that.
FWIW, I don't think there's any technical reason why we couldn't inline heavyweight functions.
Comment 14•11 years ago
|
||
So, I looked at this again, and inlining doesn't seem to make much of a difference, for some reason. Adding `SetScriptHints(ArrayForEach, {cloneAtCallsite: true, inline: true});` to builtins/Array.js causes both forEach and the kernel to be inlined, but doesn't change the runtime of the first testcase at all.
Interestingly, an opt shell is only a few percent faster than a debug one for the benchmark, and I have no idea why.
Will look at it with the tracelogger.
Comment 15•11 years ago
|
||
Is this still a major regression from for-loops? I worked on optimizing these a while back.
Inlining ArrayForEach directly is not that big of a deal, for most hot loops, we'll spend most of our time inside the ForEach loop, so a higher overhead for entering and leaving the loop is not such a big issue. OTOH, inlining the kernel would have a bigger impact. I do think we are able to inline the kernel these days, given that we're cloning ArrayForEach on site, and TI will be able to pick up the kernel as a singleton type.
If you're seeing that opt is only a few percent faster than debug, it just means that we're spending most of our time in tight, jitted code.
Comment 16•11 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #15)
> Is this still a major regression from for-loops? I worked on optimizing
> these a while back.
It is, sadly. Changing the first attached testcase to use a numeric loop makes it ~3x faster. More importantly, in the jsperf benchmark from comment 0, the forEach version is 97%(!) slower than the other versions.
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 17•9 years ago
|
||
On the jsperf, Nightly / Chrome 43:
forEach: 249,016 / 9,064
for, counting up: 1,070,746 / 401,057
for, counting up, caching length: 1,062,803 / 426,321
for, counting down: 1,065,931 / 145,648
forEach is 78% slower on Firefox and 98% on Chrome.
Comment 18•9 years ago
|
||
While this isn't fully solved, it's gotten much better:
On the jsperf test, forEach is half as fast (~800kOps/s vs 1600kOps/s) in Nightly, but 1% as fast (~7kOps/s vs. 670kOps/s) on Chrome. Pretty sure the loop version is just completely optimized out for us, making the benchmark largely meaningless.
The shell test cases are both as fast with forEach as they are with for loops: 600ms and 62ms for the first and second test, respectively. Other engines don't fare that well, with d8's 62 seconds(!) for the first test being the worst.
Is there much left to do here, I wonder?
Comment 19•9 years ago
|
||
> On the jsperf test, forEach is half as fast (~800kOps/s vs 1600kOps/s) in Nightly,
Fwiw, it's ~465kOps/s vs ~3400kOps/s for me on a Mac nightly....
Comment 20•9 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #19)
> Fwiw, it's ~465kOps/s vs ~3400kOps/s for me on a Mac nightly....
Huh, interesting. My results hold up in a fresh profile on a 2012 rMBP with current Nightly.
Comment 21•9 years ago
|
||
Oh, in a clean profile with a single tab I see numbers more like yours.
In a clean profile with 10 copies of gmail opened in background tabs I see numbers a lot more like mine.
Updated•2 years ago
|
Severity: normal → S3
Comment 22•3 months ago
|
||
We are about 6.5x slower here.
Nightly: https://share.firefox.dev/3yAfBAD (5.5s)
Chrome: https://share.firefox.dev/3Av8D0j (841ms)
cc :iain, bthrall
Comment 23•3 months ago
|
||
Looks like the problem here is that we don't inline forEach because we had previously done OSR inside it.
You need to log in
before you can comment on or make changes to this bug.
Description
•