Last Comment Bug 723728 - 15% slowdown on Emscripten-generated code since bug 718128
: 15% slowdown on Emscripten-generated code since bug 718128
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
-- normal (vote)
: mozilla13
Assigned To: David Mandelin [:dmandelin]
: Jason Orendorff [:jorendorff]
Depends on:
Blocks: 718128
  Show dependency treegraph
Reported: 2012-02-02 14:58 PST by Alon Zakai (:azakai)
Modified: 2012-02-04 02:41 PST (History)
5 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

primes benchmark (17.78 KB, application/javascript)
2012-02-02 14:58 PST, Alon Zakai (:azakai)
no flags Details
Patch (1.08 KB, patch)
2012-02-03 13:44 PST, David Mandelin [:dmandelin]
jwalden+bmo: review+
Details | Diff | Splinter Review

Description User image Alon Zakai (:azakai) 2012-02-02 14:58:41 PST
Created attachment 593981 [details]
primes benchmark

The changeset where bug 718128 landed regressed the Emscripten benchmarks (almost all of them). Attached is an example benchmark. The revision before that bug landed in m-c takes 0.460 seconds (-m -n), while the revision where it lands takes 0.525, which is 15% slower.

Bug 718128 if I understand correctly implements ArrayBuffer.slice. Note that the attached benchmark doesn't use that function (if it did, it wouldn't run at all on the previous revision).
Comment 1 User image Jeff Walden [:Waldo] (remove +bmo to email) 2012-02-02 16:51:01 PST
Odd.  There's not much change outside the new method there.  An extra parameter gets passed to a couple internal methods, but that shouldn't explain that much time difference.  A couple extra fields get set in newly-created ArrayBuffers, but that's a few extra words' writing, adjacent to a value that was previously written, so that seems unlikely too.  That leaves the calloc->malloc+memset-to-contents-or-0 change.  If I had to put money on something, I'd guess it was that, but really this needs profiling.
Comment 2 User image Boris Zbarsky [:bz] (still a bit busy) 2012-02-02 18:37:27 PST
Just did a profile.  The old code ends up 99.4% in jitcode.

The new code ends up 91.3 in jitcode, 6% in the kernel under vm_fault, and 2% under __bzero from allocateArrayBufferSlots.

So yes, it's memset and the ensuing VM faults, looks like.  at least for me and on Mac.
Comment 3 User image David Mandelin [:dmandelin] 2012-02-03 12:34:09 PST
Weird, I had assumed that malloc+memset was equivalent to calloc, and it was more convenient to factor that way, so I did it. Maybe the OS can provide pre-zeroed pages or something. I'll try that out.
Comment 4 User image Boris Zbarsky [:bz] (still a bit busy) 2012-02-03 12:40:54 PST first answer is an interesting read in this context.

So basically, calloc followed by not touching the memory is in fact way faster than malloc+memset.
Comment 5 User image David Mandelin [:dmandelin] 2012-02-03 13:44:55 PST
Created attachment 594294 [details] [diff] [review]

Alon, could you test with this patch? The scores on the primes benchmark are pretty noisy on this machine (Windows 7 laptop at home) so it's hard for me to tell if it's helping or not.
Comment 6 User image Alon Zakai (:azakai) 2012-02-03 13:55:05 PST
Tested, works perfectly! Same speed as before the slowdown.
Comment 7 User image David Mandelin [:dmandelin] 2012-02-03 15:05:00 PST
Comment 8 User image Marco Bonardo [::mak] 2012-02-04 02:41:22 PST

Note You need to log in before you can comment on or make changes to this bug.