Use branchSub32 to optimize some generated loops
Categories
(Core :: JavaScript Engine: JIT, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox71 | --- | fixed |
People
(Reporter: jandem, Assigned: jandem)
Details
Attachments
(1 file)
I was telling Benjamin about branchSub32 and that made me wonder about some loops in Baseline Interpreter and IC code where we count down to zero, like this:
L0:
test reg, reg
jz L1
...
sub 1, reg
jmp L0
L1:
With branchSub32 we could do this a bit more efficiently:
test reg, reg
jz L1
L0:
...
sub 1, reg
jnz L0
L1:
It's what C++ compilers typically do. It executes fewer instructions but it also allows CPUs to do macro-op fusion: Intel CPUs at least can fuse the sub/jnz instructions.
The perf difference is pretty measurable, on my MBP with --no-baseline I get the following for the contrived micro-benchmark below (this tests the blinterp emitInitializeLocals loop):
before: 85 ms
after: 68 ms
var s = "";
for (var i = 0; i < 100; i++) {
s += "var x" + i + ";";
}
function f() {
var g = Function(s);
var t = new Date;
for (var i = 0; i < 1000000; i++) {
g();
}
print(new Date - t);
}
f();
Assignee | ||
Comment 1•6 years ago
|
||
This is a bit more efficient. Bug 1582772 comment 0 has more data.
Assignee | ||
Comment 3•6 years ago
|
||
Hm pushCallArguments can be optimized and simplified a bit more. NI myself to fix next week.
Comment 4•6 years ago
|
||
bugherder |
Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #3)
Hm pushCallArguments can be optimized and simplified a bit more. NI myself to fix next week.
Description
•