Closed Bug 961434 Opened 11 years ago Closed 9 years ago

Consider self-hosting Array.prototype.push

Categories

(Core :: JavaScript Engine: JIT, enhancement)

x86_64
macOS
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: isk, Assigned: isk)

Details

Attachments

(4 files, 4 obsolete files)

Attached file array_push_benchmark.js (obsolete) —
Shell test case. I'm attaching a shell test case based on http://www.peacekeeper.therichins.net/test14.html. This shell test case focus on Array.push(). d8: 135 SM: 8076 SM is slower than d8. v8's Array.push() is self-hosting(https://code.google.com/p/v8/source/browse/trunk/src/array.js#464)
This benchmark doesn't really test Array#push performance: try commenting out all the pushes. For me, that makes the test execute in a couple of ms in both v8 and jsc, whereas our time goes down from a horrific 1750ms to a terrible 1300ms. So, there are interesting things going on here: - apparently we do expensive things with these arrays each time the function is called, whereas other engines do nothing with them - now, if I move the creation of all the arrays out of the test function so it happens only once, our time goes down to 410ms. That would be somewhat closer to other engines, if they didn't improve a lot, too: v8 goes from 134ms to 79ms, jsc from 365ms to 117ms. It seems like other engines detect that the arrays aren't required to be re-initialized during each call, whereas we don't. Actually using them seems to cause more work to happen, but far from the amount we do. So, it still does look like our push implementation is a lot slower than other engines': the 410ms vs 79ms or 117ms. The bigger fish to fry here is the array creation (or cloning, or ... something) on each call, though.
For the original benchmark, main-thread time is about 25% jitcode, about 35% NewInitArray and about 32% is array_push, 6% is DoNewArray, 2% is CheckOverRecursed. Overall, the main thread is about 65% of the samples. The other 35% are the free() calls from Arena::finalize on the GCHelperThread. For the NewInitArray time, there's a bunch of malloc under growElements (11% of the mainthread time), updateMallocCounter (7% of the mainthread time???), and 11% NewObjectCache::newObjectFromHit, mostly under ArenaLists::refillFreeList. Bunch PR_Lock/PR_Unlock under there. :( Under array_push, of the vast majority of the time is spent calling JSObject::growElements, which does a realloc. I wonder whether our main and GC threads are running into malloc lock contention here...
Just so I can test again and compare in the future, using the original testcase: Nightly 29.0a1 (2014-01-18) - 72700 op/s Chrome 32 - 267500 op/s IE 11 - 174800 op/s
Attached file array_push_bench.js
Sorry to upload incomplete benchmark. I fix it.
Attachment #8362162 - Attachment is obsolete: true
Attached patch Self-hosted Array#push.WIP (obsolete) — Splinter Review
This patch is for self-hosted Array#push. The result is following. before patch: 458ms after patch: 2002ms This patch is very slow.
Self-hosting Array.prototype.push won't help much because Ion inlines push already, and that's a faster path than the self-hosted version. This is Peacekeeper's arrayWeighted test, bug 918746 has all the details. In short, this mainly needs GGC so that we can allocate elements in the nursery and avoid all this malloc slowness.
Also, with GGC, the push calls inlined in Ion could reallocate the elements inline if the capacity check fails, instead of falling back to a VM call. We could do this now by calling malloc from JIT code, but that's probably not worth it with GGC on the horizon. We could also optimize MNewArray followed by X MArrayPush instructions to allocate space for X elements eagerly. That's a pretty narrow optimization though, but shouldn't be too hard..
Attached file func04-pass00-BuildSSA-mir.gv.pdf (obsolete) —
This is IonGraph after this patch.
Comment on attachment 8362466 [details] func04-pass00-BuildSSA-mir.gv.pdf This attachement contains only the code for ArrayPush, but not the code of the function which is calling it. What you should look for is if ArrayPush is inlined in the function which is calling it. You can try other IONFLAGS options to check why it might not be inlined. Also, is the caller hot enough to be Ion compiled?
Attached file func07-pass00-BuildSSA-mir.pdf (obsolete) —
Thank you for your comment I'm sorry to upload incomplete IonGraph. I command following code: IONFLAGS=logs,scripts,osi,bailouts ./js/src/js --ion-eager --ion-parallel-compile=off I think IONFLAGS is no problem. Previous IonGraph is first call of Array#push. So it is not hot code. But previous IonGraph is
Attachment #8362466 - Attachment is obsolete: true
Attachment #8362471 - Attachment filename: func07-pass00-BuildSSA-mir.gv.pdf → func07-pass00-BuildSSA-mir.pdf
Attachment #8362471 - Attachment description: func07-pass00-BuildSSA-mir.gv.pdf → func07-pass00-BuildSSA-mir.pdf
You should try to use a for loop in the script which is using push, such as it becomes hot enough to both compile and inline. Although, you are not running the benchmark, right? At least, this does not look like the run() function used in the benchmark.
I misunderstand. I command following code: var arr = ['foo'] arr.push('bar') So iongraph didn't look like the run() function. I try to use benchmark. Resumepoint have parent block number in this iongraph. So self-hosting function is inlined. Yesterday, jandem taught me why Ion inlines push is faster than self-hosting version. This is because that Ion inline cache is such like monomorphic inline cache(MIC). On the other hand, baseline inline cache is such like polymorphic inline cache(PIC). PIC have to handle all cases. So it have cost to search the cache. MIC handle only case. So it have not cost to search the cache and can more optimize code. I have a question. I think PIC and MIC can be collaborate such like firstly check MIC and if not hit check MIC. Is it possible in this engine and Is doing it worth the cost?
Attachment #8362471 - Attachment is obsolete: true
Attached file IRC-jandem-isk.txt
jandem taught me why ion-inline is faster than self-hosting.
I update WIP-patch. I had misunderstood. IonMonkey already have cooperated with BaselineJIT. So If we can't compile at IonMonkey, we try to compile at BaselineJIT. In other words, If we can't hit at MIC, we try to whether hit at PIC. So If Ion inlines push is faster than self-hosted version, making self-hosted push should be faster than now. But result is difference.
Attachment #8362360 - Attachment is obsolete: true
Assignee: nobody → iseki.m.aa
We have generational GC now. Running this benchmark in the browser on my MacBook Pro in Firefox 49 (with hundreds of tabs open, sorry) and Chrome 53, I get (typical numbers): v8: 65ms SM: 98ms SM is still 50% slower on this microbenchmark, but as the discussion above has indicated this is not because we're not self-hosting push. Since bug 918746, which tracks the original benchmark, has already been resolve, I will close this bug as well.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: