Closed
Bug 699571
Opened 14 years ago
Closed 2 years ago
Emscripten: optimize translated memset/memcpy
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: terrence, Unassigned)
Details
Emscripten's translation of memset and memcpy are trivial loops that we should be able to relatively easily recognize and optimize, for a huge performance boost on all Emscripten translated code.
Comment 1•14 years ago
|
||
As Terrence pointed out to me, it's also possible to do memcpy using typed array's .set() method (setting a view on the same underlying buffer), basically
memcpy(dst, src, len)
becomes
MEM.set(MEM.subarray(src, src+len), dst)
This seems slower than a normal loop, though, I guess it hasn't been optimized? Perhaps it would be easier to optimize than a loop, however (make it use memcpy() internally when possible, something like that)?
| Reporter | ||
Comment 2•14 years ago
|
||
I did some micro-benchmark profiling. For large numbers of tiny memcpy's, the majority of the overhead is in creating the subarray objects. I think what is going on here is that in order to go into C++ to do the memcpy, we incur a relatively large amount of overhead. First, C++ calls from the methodjit aren't cheap (we do two here) and second, we have to hit the allocator for our temporary subarray object, which is also not terribly fast.
The test I ran confirms this:
/* Common preamble */
var SIZE = 128 * 1024 * 1024;
var COPY_SIZE = ___;
var buf = new ArrayBuffer(SIZE);
var view = new Uint8Array(buf);
var view2 = new Uint8Array(buf);
/* Test 1 */
for (var i = 1; i < SIZE / COPY_SIZE; i++) {
var q = view.subarray(i*COPY_SIZE, i+COPY_SIZE);
view2.set(q, i * COPY_SIZE - COPY_SIZE);
}
/* Test 2 */
for (var i = 1; i < SIZE / COPY_SIZE; i++) {
for (var j = 0; j < COPY_SIZE; j++) {
view2[i * COPY_SIZE - COPY_SIZE + j] = view[i * COPY_SIZE + j];
}
}
/* *** */
For COPY_SIZE = 4:
TypeArray.set: 0m5.262s
simple loop: 0m1.741s
For COPY_SIZE = 1024 * 1024:
TypeArray.set: 0m0.013s
simple loop: 0m1.721s
In the long run, the right solution here is to teach the jits about typearrays. In the meantime, Alon, would it be possible to try adding a branch such that we only TypeArray.set when len is large and do the simple loop for small len? It will take a bit of profiling to find the right tradeoff for the average C program, but this should give immediate wins until we can get a better solution into SM.
Comment 3•14 years ago
|
||
Thanks, I tested that, and turns out in the Emscripten benchmark suite it is never beneficial to use TypedArray.set over a simple loop. The biggest memcopies there are less than 100 bytes, I guess that isn't enough to make it worthwhile. This is now an option though, it can be tweaked to whatever works better later on.
| Assignee | ||
Updated•11 years ago
|
Assignee: general → nobody
Updated•3 years ago
|
Severity: normal → S3
Updated•2 years ago
|
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•