Closed Bug 1582772 Opened 6 years ago Closed 6 years ago

Use branchSub32 to optimize some generated loops

Categories

(Core :: JavaScript Engine: JIT, task)

task
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox71 --- fixed

People

(Reporter: jandem, Assigned: jandem)

Details

Attachments

(1 file)

I was telling Benjamin about branchSub32 and that made me wonder about some loops in Baseline Interpreter and IC code where we count down to zero, like this:

L0:
  test reg, reg
  jz L1
  ...
  sub 1, reg
  jmp L0
L1:

With branchSub32 we could do this a bit more efficiently:

  test reg, reg
  jz L1
L0:
  ...
  sub 1, reg
  jnz L0
L1:

It's what C++ compilers typically do. It executes fewer instructions but it also allows CPUs to do macro-op fusion: Intel CPUs at least can fuse the sub/jnz instructions.

The perf difference is pretty measurable, on my MBP with --no-baseline I get the following for the contrived micro-benchmark below (this tests the blinterp emitInitializeLocals loop):

before: 85 ms
after:  68 ms
var s = "";
for (var i = 0; i < 100; i++) {
    s += "var x" + i + ";";
}
function f() {
    var g = Function(s);
    var t = new Date;
    for (var i = 0; i < 1000000; i++) {
        g();
    }
    print(new Date - t);
}
f();
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4d81f578aa35 Use branchSub32 to micro-optimize some loops in generated code. r=iain

Hm pushCallArguments can be optimized and simplified a bit more. NI myself to fix next week.

Flags: needinfo?(jdemooij)
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

(In reply to Jan de Mooij [:jandem] from comment #3)

Hm pushCallArguments can be optimized and simplified a bit more. NI myself to fix next week.

Bug 1583104.

Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: