Closed
Bug 961434
Opened 11 years ago
Closed 9 years ago
Consider self-hosting Array.prototype.push
Categories
(Core :: JavaScript Engine: JIT, enhancement)
Tracking
()
RESOLVED
FIXED
People
(Reporter: isk, Assigned: isk)
Details
Attachments
(4 files, 4 obsolete files)
Shell test case.
I'm attaching a shell test case based on http://www.peacekeeper.therichins.net/test14.html.
This shell test case focus on Array.push().
d8: 135
SM: 8076
SM is slower than d8.
v8's Array.push() is self-hosting(https://code.google.com/p/v8/source/browse/trunk/src/array.js#464)
Comment 1•11 years ago
|
||
This benchmark doesn't really test Array#push performance: try commenting out all the pushes. For me, that makes the test execute in a couple of ms in both v8 and jsc, whereas our time goes down from a horrific 1750ms to a terrible 1300ms.
So, there are interesting things going on here:
- apparently we do expensive things with these arrays each time the function is called, whereas other engines do nothing with them
- now, if I move the creation of all the arrays out of the test function so it happens only once, our time goes down to 410ms. That would be somewhat closer to other engines, if they didn't improve a lot, too: v8 goes from 134ms to 79ms, jsc from 365ms to 117ms. It seems like other engines detect that the arrays aren't required to be re-initialized during each call, whereas we don't. Actually using them seems to cause more work to happen, but far from the amount we do.
So, it still does look like our push implementation is a lot slower than other engines': the 410ms vs 79ms or 117ms. The bigger fish to fry here is the array creation (or cloning, or ... something) on each call, though.
Comment 2•11 years ago
|
||
For the original benchmark, main-thread time is about 25% jitcode, about 35% NewInitArray and about 32% is array_push, 6% is DoNewArray, 2% is CheckOverRecursed.
Overall, the main thread is about 65% of the samples. The other 35% are the free() calls from Arena::finalize on the GCHelperThread.
For the NewInitArray time, there's a bunch of malloc under growElements (11% of the mainthread time), updateMallocCounter (7% of the mainthread time???), and 11% NewObjectCache::newObjectFromHit, mostly under ArenaLists::refillFreeList. Bunch PR_Lock/PR_Unlock under there. :(
Under array_push, of the vast majority of the time is spent calling JSObject::growElements, which does a realloc.
I wonder whether our main and GC threads are running into malloc lock contention here...
Comment 3•11 years ago
|
||
Just so I can test again and compare in the future, using the original testcase:
Nightly 29.0a1 (2014-01-18) - 72700 op/s
Chrome 32 - 267500 op/s
IE 11 - 174800 op/s
| Assignee | ||
Comment 4•11 years ago
|
||
Sorry to upload incomplete benchmark.
I fix it.
Attachment #8362162 -
Attachment is obsolete: true
| Assignee | ||
Comment 5•11 years ago
|
||
This patch is for self-hosted Array#push.
The result is following.
before patch: 458ms
after patch: 2002ms
This patch is very slow.
Comment 6•11 years ago
|
||
Self-hosting Array.prototype.push won't help much because Ion inlines push already, and that's a faster path than the self-hosted version.
This is Peacekeeper's arrayWeighted test, bug 918746 has all the details. In short, this mainly needs GGC so that we can allocate elements in the nursery and avoid all this malloc slowness.
Comment 7•11 years ago
|
||
Also, with GGC, the push calls inlined in Ion could reallocate the elements inline if the capacity check fails, instead of falling back to a VM call. We could do this now by calling malloc from JIT code, but that's probably not worth it with GGC on the horizon.
We could also optimize MNewArray followed by X MArrayPush instructions to allocate space for X elements eagerly. That's a pretty narrow optimization though, but shouldn't be too hard..
| Assignee | ||
Comment 8•11 years ago
|
||
This is IonGraph after this patch.
Comment 9•11 years ago
|
||
Comment on attachment 8362466 [details]
func04-pass00-BuildSSA-mir.gv.pdf
This attachement contains only the code for ArrayPush, but not the code of the function which is calling it.
What you should look for is if ArrayPush is inlined in the function which is calling it. You can try other IONFLAGS options to check why it might not be inlined.
Also, is the caller hot enough to be Ion compiled?
| Assignee | ||
Comment 10•11 years ago
|
||
Thank you for your comment
I'm sorry to upload incomplete IonGraph.
I command following code:
IONFLAGS=logs,scripts,osi,bailouts ./js/src/js --ion-eager --ion-parallel-compile=off
I think IONFLAGS is no problem.
Previous IonGraph is first call of Array#push.
So it is not hot code.
But previous IonGraph is
Attachment #8362466 -
Attachment is obsolete: true
| Assignee | ||
Updated•11 years ago
|
Attachment #8362471 -
Attachment filename: func07-pass00-BuildSSA-mir.gv.pdf → func07-pass00-BuildSSA-mir.pdf
| Assignee | ||
Updated•11 years ago
|
Attachment #8362471 -
Attachment description: func07-pass00-BuildSSA-mir.gv.pdf → func07-pass00-BuildSSA-mir.pdf
Comment 11•11 years ago
|
||
You should try to use a for loop in the script which is using push, such as it becomes hot enough to both compile and inline.
Although, you are not running the benchmark, right? At least, this does not look like the run() function used in the benchmark.
| Assignee | ||
Comment 12•11 years ago
|
||
I misunderstand. I command following code:
var arr = ['foo']
arr.push('bar')
So iongraph didn't look like the run() function.
I try to use benchmark.
Resumepoint have parent block number in this iongraph.
So self-hosting function is inlined.
Yesterday, jandem taught me why Ion inlines push is faster than self-hosting version.
This is because that Ion inline cache is such like monomorphic inline cache(MIC). On the other hand, baseline inline cache is such like polymorphic inline cache(PIC).
PIC have to handle all cases. So it have cost to search the cache.
MIC handle only case. So it have not cost to search the cache and can more optimize code.
I have a question.
I think PIC and MIC can be collaborate such like firstly check MIC and if not hit check MIC.
Is it possible in this engine and Is doing it worth the cost?
Attachment #8362471 -
Attachment is obsolete: true
| Assignee | ||
Comment 13•11 years ago
|
||
jandem taught me why ion-inline is faster than self-hosting.
| Assignee | ||
Comment 14•11 years ago
|
||
I update WIP-patch.
I had misunderstood.
IonMonkey already have cooperated with BaselineJIT. So If we can't compile at IonMonkey, we try to compile at BaselineJIT. In other words, If we can't hit at MIC, we try to whether hit at PIC.
So If Ion inlines push is faster than self-hosted version, making self-hosted push should be faster than now.
But result is difference.
Attachment #8362360 -
Attachment is obsolete: true
| Assignee | ||
Updated•11 years ago
|
Assignee: nobody → iseki.m.aa
Comment 15•9 years ago
|
||
We have generational GC now.
Running this benchmark in the browser on my MacBook Pro in Firefox 49 (with hundreds of tabs open, sorry) and Chrome 53, I get (typical numbers):
v8: 65ms
SM: 98ms
SM is still 50% slower on this microbenchmark, but as the discussion above has indicated this is not because we're not self-hosting push.
Since bug 918746, which tracks the original benchmark, has already been resolve, I will close this bug as well.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•