Optimize comparisons with constants better
Categories
(Core :: JavaScript Engine, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox139 | --- | fixed |
People
(Reporter: jandem, Assigned: debadree333)
References
(Blocks 4 open bugs)
Details
(Keywords: perf-alert, Whiteboard: [sp3][js-perf-next])
Attachments
(1 file)
Sp3 Angular-Complex-DOM has this function:
function At(e) {
return Array.isArray(e) && !0 === e[1]
}
Ion is using an IC for the strict-equality op even though comparing against true
is basically a single compare instruction. The JS shell test case below has the same issue.
We could fix this in the Ion backend but ideally we'd also emit more optimized bytecode for these trivial cases. Either "Value is constant" or "Eq but LHS is always <type>". See also TypeOfEq
/TypeOfEqOperand
.
function f() {
var arr = [true, 1];
var res = 0;
for (var i = 0; i < 100_000; i++) {
res = arr[i & 1] === !0;
}
return res;
}
f();
Updated•7 months ago
|
Reporter | ||
Comment 1•7 months ago
|
||
Oh and the IC is fairly hot: I see one instance with 59024 hits
in Ion and there might be other compilations of the same function.
It's also a basic operation that we can't afford getting wrong.
Comment 2•7 months ago
|
||
When I added polymorphic type snapshots for TypeOf/ToBool (bug 1712030), I vaguely planned to extend it to compare ICs, but never got around to it. I think it would be relatively easy to track which combinations of input types we'd seen at a Compare IC, although turning that information into optimal MIR might be a bit of work.
It does seem like we could generate much better CacheIR to start with if we knew that one of the arguments to a CompareIC was constant, and it naively doesn't seem too hard for the frontend to track that.
Updated•7 months ago
|
Assignee | ||
Comment 3•6 months ago
|
||
I am trying to take a small attempt at this if i run the script:
function At(e) {
return Array.isArray(e) && !0 === e[1]
}
for (let i = 0; i < 1e6; i++) {
At([1, true]);
}
and running that with export CACHEIR_LOGS=1 ./mach run
and running the cache ir logs result through http://tomschuster.name/cacheir-tools/ i see the following:
IC attached
Compare test.js:5 (pc: 10a009460) 1
ToBool test.js:5 (pc: 10a009461) 1
GetName test.js:6 (pc: 10a00946b) 1
NewArray test.js:6 (pc: 10a009471) 1
Call test.js:6 (pc: 10a009482) 1
UnaryArith test.js:5 (pc: 10a00948a) 1
UnaryArith test.js:5 (pc: 10a00948b) 1
GetName test.js:2 (pc: 10a080a21) 1
GetProp test.js:2 (pc: 10a080a27) 1
Call test.js:2 (pc: 10a080a30) 1
ToBool test.js:2 (pc: 10a080a33) 1
GetElem test.js:2 (pc: 10a080a43) 1
Compare test.js:2 (pc: 10a080a44) 1
IC Kinds
Compare.Int32 2
ToBool.Bool 2
GetName.GlobalNameValue 2
NewArray.Object 1
Call.CallScripted 1
UnaryArith.Int32ToNumeric 1
UnaryArith.Int32Inc 1
GetProp.NativeSlot 1
ArrayIsArray 1
GetProp.DenseElement 1
So iiuc there is no Compare cacheir for the boolean case looking at https://searchfox.org/mozilla-central/source/js/src/jit/CacheIRGenerator.h#830? would adding one help here?
Thank you!
Comment 4•6 months ago
|
||
Somewhat misleadingly, the boolean case is handled here. We treat boolean comparison as a special case of int32 comparison.
Assignee | ||
Comment 5•6 months ago
|
||
Updated•6 months ago
|
Assignee | ||
Comment 6•6 months ago
|
||
Hello!
I have attached a patch of what I am trying to do (nearly exact copy of the approach taken in TypeofEq π )
- Check if one of the ParseNodes if they contain a literal
- Encode that info into the operand
- (WIP) Store the constant side in the CacheIRGenerator and just guard one side and check if equal to the constant
- (WIP) need to do this for constants ParseNodes too if they have constant initialisers
"Eq but LHS is always <type>"
I do not understand this point we can only know the types if they are literals right in the parser? any other approach to figuring out types?
Thank you!!!
Comment 7•6 months ago
|
||
(This is a clunky week, but I'll try to make sure you get feedback by next week)
Assignee | ||
Comment 8•6 months ago
•
|
||
(This is a clunky week, but I'll try to make sure you get feedback by next week)
Thank you so much
Just an update!
I have updated the patch and it does eliminate the guards if theres a constant in the expression but it doesnt look like there was that much improvement example script i used:
function At(e) {
return e === 1;
}
console.log(At(1)); // true
console.log(At(2)); // false
const start = performance.now();
let at1;
let at2;
for (let i = 0; i < 1e8; i++) {
at1 = At(1);
at2 = At(2);
}
console.log(at1, at2);
const end = performance.now();
console.log(end - start);
heres what i see in the benchmark results (first one is one with the patch):
debadreechatterjee@Debadrees-MacBook-Pro mozilla-unified % hyperfine --warmup 3 "./mach run test.js"
Benchmark 1: ./mach run test.js
Time (mean Β± Ο): 1.616 s Β± 0.382 s [User: 1.203 s, System: 0.579 s]
Range (min β¦ max): 1.140 s β¦ 2.378 s 10 runs
debadreechatterjee@Debadrees-MacBook-Pro mozilla-unified % hyperfine --warmup 3 "./mach run test.js"
Benchmark 1: ./mach run test.js
Time (mean Β± Ο): 1.609 s Β± 0.321 s [User: 1.178 s, System: 0.529 s]
Range (min β¦ max): 1.223 s β¦ 2.343 s 10 runs
Right now looking into codegen to see if something can be better!
Thank you!
Assignee | ||
Comment 9•6 months ago
|
||
I was thinking of another approach (i dont know if this makes much sense) i notice the file https://searchfox.org/mozilla-central/source/js/src/jit/BytecodeAnalysis.cpp it seems to loop over bytecodes and attach some associated inferred metadata presently used for (async fns/generators analysis i believe) could we maybe extend this and store type infos and use that to build up type infos for the bytecodes wherever possible? the base case would be opcodes like Int8 which wouldn't have to look further but for any other opcode we look back nuses number of opcodes and see what types they are? and maybe pass this analysis on to the baseline compiler etc etc dunno if this makes sense just an observation!
Thank you!
Comment 10•6 months ago
|
||
(In reply to Debadree Chatterjee from comment #9)
I was thinking of another approach (i dont know if this makes much sense) i notice the file https://searchfox.org/mozilla-central/source/js/src/jit/BytecodeAnalysis.cpp it seems to loop over bytecodes and attach some associated inferred metadata presently used for (async fns/generators analysis i believe) could we maybe extend this and store type infos and use that to build up type infos for the bytecodes wherever possible? the base case would be opcodes like Int8 which wouldn't have to look further but for any other opcode we look back nuses number of opcodes and see what types they are? and maybe pass this analysis on to the baseline compiler etc etc dunno if this makes sense just an observation!
Thank you!
Left some comments on your patch. I think that's a better direction than trying to do bytecode analysis. Hopefully my patch comments make sense, if not feel free to ask.
Assignee | ||
Comment 11•6 months ago
|
||
Thank you so much for the review! i am working so far have understood, i am working on addressing them!
Assignee | ||
Comment 12•6 months ago
|
||
I had one question regarding the implementation right now we are dealing with int32 constants in our minimal implementation now for both baseline compiler and baseline interpreter we have code something like this (without implementing ICs):
switch (data.type()) {
case JS::ValueType::Int32: {
int32_t val = data.toInt32();
masm.branchTestInt32(Assembler::NotEqual, value,
op == JSOp::StrictEq ? &fail : &pass);
masm.branch32(JSOpToCondition(op, true), value.payloadOrValueReg(),
Imm32(val), &pass);
masm.jump(&fail);
break;
}
similar in baseline interpreter
now theres the unfortunate case of -0 === 0 and -0 is a double hence this code would be wrong hence my question on what can be a good approach here? back to implementing ICs? or maybe doing branchTestNumber and branchDouble? or just specially checking for 0 and emitting code for that?
Thank you!
Comment 13•6 months ago
|
||
Thinking aloud as I type:
So the concern here being that you have a constant Int32 0 and you're trying to code generate the correct comparison that will produce the right result for Int32(0) === Double(-0).
So I guess the way I would intuitively do this is basically (in pseudocode, asuming RHS is the non-constant value, and assuming Int32(0) lhs)
tag = extractTag(rhs)
if (tag == int32) goto int;
if (tag == double) goto double
goto no_match;
return;
int:
branch_if_equal(rhs, tagged_immediate_zero, match)
goto no_match
double:
branch_if_double_eq(rhs, tagged_imm_zero, match)
goto no_match
no_match:
res = false
goto end
match:
res = true
end:
Skimming it sort of seems like DoubleCoundition is already correctly dealing with 0/-0.
Comment 14•6 months ago
|
||
I suppose another option would be to simply always box int32 into doubles and do double compare? Maybe?
Updated•6 months ago
|
Comment 15•4 months ago
|
||
Comment 16•4 months ago
|
||
Comment 17•4 months ago
•
|
||
Backed out for causing multiple failures
Backout link
Failure log SM
Failure log runtests.py
Failure log SM p
Failure log win crash
Failure log Xpcshell
Failure log Gtest
Comment 18•4 months ago
|
||
Worth also looking at the windows failures: https://treeherder.mozilla.org/logviewer?job_id=503598574&repo=autoland&lineNumber=1707
Assignee | ||
Comment 19•4 months ago
|
||
Working on fixing!
Comment 20•4 months ago
|
||
Assignee | ||
Updated•4 months ago
|
Comment 21•4 months ago
|
||
bugherder |
Comment 23•4 months ago
|
||
For some cases like this https://hg-edge.mozilla.org/mozilla-central/rev/b25bee683580#l8.100, I wonder if it would be more efficient to compare the 64bit value directly instead of unboxing first.
Comment 24•4 months ago
|
||
(In reply to Pulsebot from comment #20)
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25bee683580
Introduce special opcode for comparison with constants.
r=mgaudet,iain,spidermonkey-reviewers
Perfherder has detected a browsertime performance change from push b25bee6835808e315edd8b927088012d113ff248.
If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
5% | speedometer3 TodoMVC-Angular-Complex-DOM/CompletingAllItems/Sync | linux1804-64-nightlyasrelease-qr | fission webrender | 18.91 -> 17.97 | Before/After |
5% | speedometer3 TodoMVC-Angular-Complex-DOM/CompletingAllItems/Sync | linux1804-64-shippable-qr | fission webrender | 18.38 -> 17.48 | Before/After |
4% | speedometer3 TodoMVC-Angular-Complex-DOM/CompletingAllItems/total | linux1804-64-nightlyasrelease-qr | fission webrender | 24.73 -> 23.74 | Before/After |
3% | speedometer3 TodoMVC-Angular-Complex-DOM/DeletingAllItems/Sync | linux1804-64-nightlyasrelease-qr | fission webrender | 20.09 -> 19.51 | Before/After |
3% | speedometer3 TodoMVC-Angular-Complex-DOM/DeletingAllItems/total | linux1804-64-nightlyasrelease-qr | fission webrender | 23.23 -> 22.62 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 44764
The following documentation link provides more information about this command.
Reporter | ||
Comment 25•4 months ago
|
||
(In reply to Acasandrei Beatrice (needinfo me) from comment #24)
Perfherder has detected a browsertime performance change from push b25bee6835808e315edd8b927088012d113ff248.
It's satisfying that the test where I noticed this problem in comment 0 (TodoMVC-Angular-Complex-DOM) is the one that measurably improved by this. I didn't expect the improvement would be this large.
Great work, Debadree!
Assignee | ||
Comment 26•4 months ago
|
||
Thank you so to you all too for the guidance and reviews! looking forward to more such! wohoo :-)
Comment 27•4 months ago
|
||
(In reply to Pulsebot from comment #20)
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b25bee683580
Introduce special opcode for comparison with constants.
r=mgaudet,iain,spidermonkey-reviewers
Perfherder has detected a talos performance change from push b25bee6835808e315edd8b927088012d113ff248.
If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
7% | pdfpaint PDFJS-9279-reduced.pdf | linux1804-64-shippable-qr | e10s fission stylo webrender-sw | 1,950.89 -> 1,806.28 |
7% | pdfpaint issue8795_reduced.pdf | linux1804-64-shippable-qr | e10s fission stylo webrender-sw | 3,543.33 -> 3,310.53 |
6% | pdfpaint issue8795_reduced.pdf | linux1804-64-shippable-qr | e10s fission stylo webrender | 3,619.62 -> 3,393.85 |
6% | pdfpaint PDFJS-9279-reduced.pdf | linux1804-64-shippable-qr | e10s fission stylo webrender | 1,983.86 -> 1,860.49 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 44791
The following documentation link provides more information about this command.
Updated•4 months ago
|
Description
•