Faster interrupt checks via mprotect() + load
Categories
(Core :: JavaScript: WebAssembly, enhancement, P5)
Tracking
()
People
(Reporter: lth, Unassigned)
Details
Attachments
(1 file)
Currently our interrupt check at the loop header is this:
ld tmp, *(tls + offs)
cmp tmp, 0
je continue
trap
and on x86/x64 Ion even folds the load and compare and avoids using a register here. This is not too awful, the branch should be well-predicted in any hot loop and it's mostly a matter of fetching and processing instructions, + minor concerns about code size.
It is possible to do better in principle by using an mprotect trick. In this scenario, the interrupt "flag" is an entire page that is normally readable but not writable. The check performs a load from that page and discards the result:
ld tmp, *loc
where loc is a run-time constant. To signal an interrupt, we mprotect the page, making it unreadable and making the load trap.
Not without some issues:
(1) Sadly we need to use a register for this but that's fixable. On ARM64 we can target the zero register I think. On x86 and x64 we could instead guarantee that the page contains only zero values and then do this (rax is the destination):
add *loc, $rax
(2) Encoding a constant address in the code is fine only so long as the code is not saved to disk to be reloaded in another process later, or shared among runtimes that don't all interrupt at the same time. A constant address is also sometimes expensive to encode in the instruction stream.
In practice, we may prefer to use TLS-relative offsets to sidestep all these problems. Suppose for the sake of argument that the tls could start on a page boundary. Then the page before the tls could be the trap page:
ld tmp, *($tls - offs)
where offs might depend on what's most convenient in the instruction set; any location within the page might do.
Comment 1•4 years ago
|
||
Comment 2•4 years ago
•
|
||
Measured the above WIP with following code:
timeout(3, function() {
const buf = new BigUint64Array(i.exports.memory.buffer, 0, 8);
print("timeout! " + buf[0]);
quit(1);
});
const b = wasmTextToBinary(`(module
(memory (export "memory") 1)
(func (export "run")
i32.const 0
i64.const 0
i64.store
loop
i32.const 0
i32.const 0
i64.load
i64.const 1
i64.add
i64.store
br 0
end
))`);
const i = new WebAssembly.Instance(new WebAssembly.Module(b));
//wasmDis(i.exports.run);
i.exports.run();
Without patch the numbers are (on x64): 1969886744 1926480976 1992487502 1964049245 1945432197
With patch: 1995686307 1981681606 1972536904 1977942344 1930216667 (about 0.6% speed up?)
Reporter | ||
Comment 3•4 years ago
|
||
Taking this since it's blocked on me to run some tests. I think we should test loops that terminate (so we don't pay for the interrupt) but also loops that interrupt much more often (to gauge the cost of the interrupt).
Reporter | ||
Comment 4•3 years ago
|
||
OK, so on this program with the patch I'm seeing a 0.4% speedup on my dual Xeon (with taskset
to pin the job to one of the CPUs) and a small slowdown (in the same range, I didn't bother to run the numbers) on the M1. Both of these with Ion.
I also take back what I said in comment 3 about "paying for the interrupt", clearly that is not an issue here.
In short, this is not worth doing at this time. The patch is interesting, though, and ties into some things I'm doing with bounds checking on memory64, so I'll P5 it for now.
Reporter | ||
Updated•3 years ago
|
Description
•