Closed Bug 1944761 Opened 7 months ago Closed 4 months ago

Optimize comparisons with constants better

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

RESOLVED FIXED
139 Branch
Tracking Status
firefox139 --- fixed

People

(Reporter: jandem, Assigned: debadree333)

References

(Blocks 4 open bugs)

Details

(Keywords: perf-alert, Whiteboard: [sp3][js-perf-next])

Attachments

(1 file)

Sp3 Angular-Complex-DOM has this function:

function At(e) {
  return Array.isArray(e) && !0 === e[1]
}

Ion is using an IC for the strict-equality op even though comparing against true is basically a single compare instruction. The JS shell test case below has the same issue.

We could fix this in the Ion backend but ideally we'd also emit more optimized bytecode for these trivial cases. Either "Value is constant" or "Eq but LHS is always <type>". See also TypeOfEq/TypeOfEqOperand.

function f() {
  var arr = [true, 1];
  var res = 0;
  for (var i = 0; i < 100_000; i++) {
    res = arr[i & 1] === !0;
  }
  return res;
}
f();

Oh and the IC is fairly hot: I see one instance with 59024 hits in Ion and there might be other compilations of the same function.

It's also a basic operation that we can't afford getting wrong.

When I added polymorphic type snapshots for TypeOf/ToBool (bug 1712030), I vaguely planned to extend it to compare ICs, but never got around to it. I think it would be relatively easy to track which combinations of input types we'd seen at a Compare IC, although turning that information into optimal MIR might be a bit of work.

It does seem like we could generate much better CacheIR to start with if we knew that one of the arguments to a CompareIC was constant, and it naively doesn't seem too hard for the frontend to track that.

Severity: -- → N/A
Priority: -- → P3

I am trying to take a small attempt at this if i run the script:

function At(e) {
  return Array.isArray(e) && !0 === e[1]
}

for (let i = 0; i < 1e6; i++) {
  At([1, true]);
}

and running that with export CACHEIR_LOGS=1 ./mach run

and running the cache ir logs result through http://tomschuster.name/cacheir-tools/ i see the following:

IC attached
Compare test.js:5 (pc: 10a009460)	1
ToBool test.js:5 (pc: 10a009461)	1
GetName test.js:6 (pc: 10a00946b)	1
NewArray test.js:6 (pc: 10a009471)	1
Call test.js:6 (pc: 10a009482)	1
UnaryArith test.js:5 (pc: 10a00948a)	1
UnaryArith test.js:5 (pc: 10a00948b)	1
GetName test.js:2 (pc: 10a080a21)	1
GetProp test.js:2 (pc: 10a080a27)	1
Call test.js:2 (pc: 10a080a30)	1
ToBool test.js:2 (pc: 10a080a33)	1
GetElem test.js:2 (pc: 10a080a43)	1
Compare test.js:2 (pc: 10a080a44)	1


IC Kinds
Compare.Int32	2
ToBool.Bool	2
GetName.GlobalNameValue	2
NewArray.Object	1
Call.CallScripted	1
UnaryArith.Int32ToNumeric	1
UnaryArith.Int32Inc	1
GetProp.NativeSlot	1
ArrayIsArray	1
GetProp.DenseElement	1

So iiuc there is no Compare cacheir for the boolean case looking at https://searchfox.org/mozilla-central/source/js/src/jit/CacheIRGenerator.h#830? would adding one help here?

Thank you!

Flags: needinfo?(jdemooij)

Somewhat misleadingly, the boolean case is handled here. We treat boolean comparison as a special case of int32 comparison.

Flags: needinfo?(jdemooij)
Assignee: nobody → debadree333
Status: NEW → ASSIGNED

Hello!

I have attached a patch of what I am trying to do (nearly exact copy of the approach taken in TypeofEq πŸ˜…)

  • Check if one of the ParseNodes if they contain a literal
  • Encode that info into the operand
  • (WIP) Store the constant side in the CacheIRGenerator and just guard one side and check if equal to the constant
  • (WIP) need to do this for constants ParseNodes too if they have constant initialisers

"Eq but LHS is always <type>"

I do not understand this point we can only know the types if they are literals right in the parser? any other approach to figuring out types?

Thank you!!!

(This is a clunky week, but I'll try to make sure you get feedback by next week)

(This is a clunky week, but I'll try to make sure you get feedback by next week)

Thank you so much

Just an update!
I have updated the patch and it does eliminate the guards if theres a constant in the expression but it doesnt look like there was that much improvement example script i used:

function At(e) {
  return e === 1;
}

console.log(At(1)); // true
console.log(At(2)); // false

const start = performance.now();
let at1;
let at2;
for (let i = 0; i < 1e8; i++) {
  at1 = At(1);
  at2 = At(2);
}
console.log(at1, at2);
const end = performance.now();
console.log(end - start);

heres what i see in the benchmark results (first one is one with the patch):

debadreechatterjee@Debadrees-MacBook-Pro mozilla-unified % hyperfine --warmup 3 "./mach run test.js"
Benchmark 1: ./mach run test.js
  Time (mean Β± Οƒ):      1.616 s Β±  0.382 s    [User: 1.203 s, System: 0.579 s]
  Range (min … max):    1.140 s …  2.378 s    10 runs
 
debadreechatterjee@Debadrees-MacBook-Pro mozilla-unified % hyperfine --warmup 3 "./mach run test.js"
Benchmark 1: ./mach run test.js
  Time (mean Β± Οƒ):      1.609 s Β±  0.321 s    [User: 1.178 s, System: 0.529 s]
  Range (min … max):    1.223 s …  2.343 s    10 runs

Right now looking into codegen to see if something can be better!

Thank you!

I was thinking of another approach (i dont know if this makes much sense) i notice the file https://searchfox.org/mozilla-central/source/js/src/jit/BytecodeAnalysis.cpp it seems to loop over bytecodes and attach some associated inferred metadata presently used for (async fns/generators analysis i believe) could we maybe extend this and store type infos and use that to build up type infos for the bytecodes wherever possible? the base case would be opcodes like Int8 which wouldn't have to look further but for any other opcode we look back nuses number of opcodes and see what types they are? and maybe pass this analysis on to the baseline compiler etc etc dunno if this makes sense just an observation!

Thank you!

(In reply to Debadree Chatterjee from comment #9)

I was thinking of another approach (i dont know if this makes much sense) i notice the file https://searchfox.org/mozilla-central/source/js/src/jit/BytecodeAnalysis.cpp it seems to loop over bytecodes and attach some associated inferred metadata presently used for (async fns/generators analysis i believe) could we maybe extend this and store type infos and use that to build up type infos for the bytecodes wherever possible? the base case would be opcodes like Int8 which wouldn't have to look further but for any other opcode we look back nuses number of opcodes and see what types they are? and maybe pass this analysis on to the baseline compiler etc etc dunno if this makes sense just an observation!

Thank you!

Left some comments on your patch. I think that's a better direction than trying to do bytecode analysis. Hopefully my patch comments make sense, if not feel free to ask.

Thank you so much for the review! i am working so far have understood, i am working on addressing them!

I had one question regarding the implementation right now we are dealing with int32 constants in our minimal implementation now for both baseline compiler and baseline interpreter we have code something like this (without implementing ICs):

  switch (data.type()) {
    case JS::ValueType::Int32: {
      int32_t val = data.toInt32();

      masm.branchTestInt32(Assembler::NotEqual, value,
                           op == JSOp::StrictEq ? &fail : &pass);
      masm.branch32(JSOpToCondition(op, true), value.payloadOrValueReg(),
                    Imm32(val), &pass);
      masm.jump(&fail);
      break;
    }

similar in baseline interpreter

now theres the unfortunate case of -0 === 0 and -0 is a double hence this code would be wrong hence my question on what can be a good approach here? back to implementing ICs? or maybe doing branchTestNumber and branchDouble? or just specially checking for 0 and emitting code for that?

Thank you!

Thinking aloud as I type:

So the concern here being that you have a constant Int32 0 and you're trying to code generate the correct comparison that will produce the right result for Int32(0) === Double(-0).

So I guess the way I would intuitively do this is basically (in pseudocode, asuming RHS is the non-constant value, and assuming Int32(0) lhs)

tag = extractTag(rhs) 
if (tag == int32) goto int; 
if (tag == double) goto double
goto no_match; 
return;
int: 

branch_if_equal(rhs, tagged_immediate_zero, match)
goto no_match

double: 
branch_if_double_eq(rhs, tagged_imm_zero, match)
goto no_match

no_match: 
res = false
goto end
match: 
res = true
end: 

Skimming it sort of seems like DoubleCoundition is already correctly dealing with 0/-0.

I suppose another option would be to simply always box int32 into doubles and do double compare? Maybe?

Attachment #9466925 - Attachment description: (WIP) Bug 1944761 - Introduce special opcode for comparison with constants. r=#spidermonkey-reviewers → Bug 1944761 - Introduce special opcode for comparison with constants. r=#spidermonkey-reviewers
Blocks: 1958722
Pushed by mgaudet@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/22bb7791d65d Introduce special opcode for comparison with constants. r=mgaudet,iain,spidermonkey-reviewers
Backout by chorotan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2859337b779d Backed out changeset 22bb7791d65d for causing SM bustages at BytecodeUtil.cpp

Working on fixing!

Pushed by mgaudet@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b25bee683580 Introduce special opcode for comparison with constants. r=mgaudet,iain,spidermonkey-reviewers
Flags: needinfo?(debadree333)
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 139 Branch

Improved Octane-earleyBoyer by 13% (!)

For some cases like this https://hg-edge.mozilla.org/mozilla-central/rev/b25bee683580#l8.100, I wonder if it would be more efficient to compare the 64bit value directly instead of unboxing first.

Regressions: 1961587

(In reply to Pulsebot from comment #20)

Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25bee683580
Introduce special opcode for comparison with constants.
r=mgaudet,iain,spidermonkey-reviewers

Perfherder has detected a browsertime performance change from push b25bee6835808e315edd8b927088012d113ff248.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
5% speedometer3 TodoMVC-Angular-Complex-DOM/CompletingAllItems/Sync linux1804-64-nightlyasrelease-qr fission webrender 18.91 -> 17.97 Before/After
5% speedometer3 TodoMVC-Angular-Complex-DOM/CompletingAllItems/Sync linux1804-64-shippable-qr fission webrender 18.38 -> 17.48 Before/After
4% speedometer3 TodoMVC-Angular-Complex-DOM/CompletingAllItems/total linux1804-64-nightlyasrelease-qr fission webrender 24.73 -> 23.74 Before/After
3% speedometer3 TodoMVC-Angular-Complex-DOM/DeletingAllItems/Sync linux1804-64-nightlyasrelease-qr fission webrender 20.09 -> 19.51 Before/After
3% speedometer3 TodoMVC-Angular-Complex-DOM/DeletingAllItems/total linux1804-64-nightlyasrelease-qr fission webrender 23.23 -> 22.62 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44764

The following documentation link provides more information about this command.

Keywords: perf-alert

(In reply to Acasandrei Beatrice (needinfo me) from comment #24)

Perfherder has detected a browsertime performance change from push b25bee6835808e315edd8b927088012d113ff248.

It's satisfying that the test where I noticed this problem in comment 0 (TodoMVC-Angular-Complex-DOM) is the one that measurably improved by this. I didn't expect the improvement would be this large.

Great work, Debadree!

Thank you so to you all too for the guidance and reviews! looking forward to more such! wohoo :-)

(In reply to Pulsebot from comment #20)

Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25bee683580
Introduce special opcode for comparison with constants.
r=mgaudet,iain,spidermonkey-reviewers

Perfherder has detected a talos performance change from push b25bee6835808e315edd8b927088012d113ff248.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
7% pdfpaint PDFJS-9279-reduced.pdf linux1804-64-shippable-qr e10s fission stylo webrender-sw 1,950.89 -> 1,806.28
7% pdfpaint issue8795_reduced.pdf linux1804-64-shippable-qr e10s fission stylo webrender-sw 3,543.33 -> 3,310.53
6% pdfpaint issue8795_reduced.pdf linux1804-64-shippable-qr e10s fission stylo webrender 3,619.62 -> 3,393.85
6% pdfpaint PDFJS-9279-reduced.pdf linux1804-64-shippable-qr e10s fission stylo webrender 1,983.86 -> 1,860.49

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44791

The following documentation link provides more information about this command.

QA Whiteboard: [qa-triage-done-c140/b139]
Regressions: 1969395
Blocks: 1969947
Blocks: 1969950
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: