Closed Bug 699571 Opened 14 years ago Closed 2 years ago

Emscripten: optimize translated memset/memcpy

Categories

(Core :: JavaScript Engine, defect)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: terrence, Unassigned)

Details

Emscripten's translation of memset and memcpy are trivial loops that we should be able to relatively easily recognize and optimize, for a huge performance boost on all Emscripten translated code.
As Terrence pointed out to me, it's also possible to do memcpy using typed array's .set() method (setting a view on the same underlying buffer), basically memcpy(dst, src, len) becomes MEM.set(MEM.subarray(src, src+len), dst) This seems slower than a normal loop, though, I guess it hasn't been optimized? Perhaps it would be easier to optimize than a loop, however (make it use memcpy() internally when possible, something like that)?
I did some micro-benchmark profiling. For large numbers of tiny memcpy's, the majority of the overhead is in creating the subarray objects. I think what is going on here is that in order to go into C++ to do the memcpy, we incur a relatively large amount of overhead. First, C++ calls from the methodjit aren't cheap (we do two here) and second, we have to hit the allocator for our temporary subarray object, which is also not terribly fast. The test I ran confirms this: /* Common preamble */ var SIZE = 128 * 1024 * 1024; var COPY_SIZE = ___; var buf = new ArrayBuffer(SIZE); var view = new Uint8Array(buf); var view2 = new Uint8Array(buf); /* Test 1 */ for (var i = 1; i < SIZE / COPY_SIZE; i++) { var q = view.subarray(i*COPY_SIZE, i+COPY_SIZE); view2.set(q, i * COPY_SIZE - COPY_SIZE); } /* Test 2 */ for (var i = 1; i < SIZE / COPY_SIZE; i++) { for (var j = 0; j < COPY_SIZE; j++) { view2[i * COPY_SIZE - COPY_SIZE + j] = view[i * COPY_SIZE + j]; } } /* *** */ For COPY_SIZE = 4: TypeArray.set: 0m5.262s simple loop: 0m1.741s For COPY_SIZE = 1024 * 1024: TypeArray.set: 0m0.013s simple loop: 0m1.721s In the long run, the right solution here is to teach the jits about typearrays. In the meantime, Alon, would it be possible to try adding a branch such that we only TypeArray.set when len is large and do the simple loop for small len? It will take a bit of profiling to find the right tradeoff for the average C program, but this should give immediate wins until we can get a better solution into SM.
Thanks, I tested that, and turns out in the Emscripten benchmark suite it is never beneficial to use TypedArray.set over a simple loop. The biggest memcopies there are less than 100 bytes, I guess that isn't enough to make it worthwhile. This is now an option though, it can be tweaked to whatever works better later on.
Assignee: general → nobody
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.