Created attachment 593981 [details]
The changeset where bug 718128 landed regressed the Emscripten benchmarks (almost all of them). Attached is an example benchmark. The revision before that bug landed in m-c takes 0.460 seconds (-m -n), while the revision where it lands takes 0.525, which is 15% slower.
Bug 718128 if I understand correctly implements ArrayBuffer.slice. Note that the attached benchmark doesn't use that function (if it did, it wouldn't run at all on the previous revision).
Odd. There's not much change outside the new method there. An extra parameter gets passed to a couple internal methods, but that shouldn't explain that much time difference. A couple extra fields get set in newly-created ArrayBuffers, but that's a few extra words' writing, adjacent to a value that was previously written, so that seems unlikely too. That leaves the calloc->malloc+memset-to-contents-or-0 change. If I had to put money on something, I'd guess it was that, but really this needs profiling.
Just did a profile. The old code ends up 99.4% in jitcode.
The new code ends up 91.3 in jitcode, 6% in the kernel under vm_fault, and 2% under __bzero from allocateArrayBufferSlots.
So yes, it's memset and the ensuing VM faults, looks like. at least for me and on Mac.
Weird, I had assumed that malloc+memset was equivalent to calloc, and it was more convenient to factor that way, so I did it. Maybe the OS can provide pre-zeroed pages or something. I'll try that out.
http://stackoverflow.com/questions/2688466/why-mallocmemset-slower-than-calloc first answer is an interesting read in this context.
So basically, calloc followed by not touching the memory is in fact way faster than malloc+memset.
Created attachment 594294 [details] [diff] [review]
Alon, could you test with this patch? The scores on the primes benchmark are pretty noisy on this machine (Windows 7 laptop at home) so it's hard for me to tell if it's helping or not.
Tested, works perfectly! Same speed as before the slowdown.