The default bug view has changed. See this FAQ.

15% slowdown on Emscripten-generated code since bug 718128

RESOLVED FIXED in mozilla13

Status

()

Core
JavaScript Engine
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: azakai, Assigned: dmandelin)

Tracking

unspecified
mozilla13
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
Created attachment 593981 [details]
primes benchmark

The changeset where bug 718128 landed regressed the Emscripten benchmarks (almost all of them). Attached is an example benchmark. The revision before that bug landed in m-c takes 0.460 seconds (-m -n), while the revision where it lands takes 0.525, which is 15% slower.

Bug 718128 if I understand correctly implements ArrayBuffer.slice. Note that the attached benchmark doesn't use that function (if it did, it wouldn't run at all on the previous revision).
(Reporter)

Updated

5 years ago
Blocks: 718128
Odd.  There's not much change outside the new method there.  An extra parameter gets passed to a couple internal methods, but that shouldn't explain that much time difference.  A couple extra fields get set in newly-created ArrayBuffers, but that's a few extra words' writing, adjacent to a value that was previously written, so that seems unlikely too.  That leaves the calloc->malloc+memset-to-contents-or-0 change.  If I had to put money on something, I'd guess it was that, but really this needs profiling.
Just did a profile.  The old code ends up 99.4% in jitcode.

The new code ends up 91.3 in jitcode, 6% in the kernel under vm_fault, and 2% under __bzero from allocateArrayBufferSlots.

So yes, it's memset and the ensuing VM faults, looks like.  at least for me and on Mac.
(Assignee)

Comment 3

5 years ago
Weird, I had assumed that malloc+memset was equivalent to calloc, and it was more convenient to factor that way, so I did it. Maybe the OS can provide pre-zeroed pages or something. I'll try that out.
Assignee: general → dmandelin
http://stackoverflow.com/questions/2688466/why-mallocmemset-slower-than-calloc first answer is an interesting read in this context.

So basically, calloc followed by not touching the memory is in fact way faster than malloc+memset.
(Assignee)

Comment 5

5 years ago
Created attachment 594294 [details] [diff] [review]
Patch

Alon, could you test with this patch? The scores on the primes benchmark are pretty noisy on this machine (Windows 7 laptop at home) so it's hard for me to tell if it's helping or not.
(Reporter)

Comment 6

5 years ago
Tested, works perfectly! Same speed as before the slowdown.
(Assignee)

Updated

5 years ago
Attachment #594294 - Flags: review?(jwalden+bmo)
Attachment #594294 - Flags: review?(jwalden+bmo) → review+
(Assignee)

Comment 7

5 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/881f035164ac
Target Milestone: --- → mozilla12
https://hg.mozilla.org/mozilla-central/rev/881f035164ac
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla12 → mozilla13
You need to log in before you can comment on or make changes to this bug.