Don't use two 32-bit moves in MacroAssemblerX64::storeValue
Categories
(Core :: JavaScript Engine: JIT, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox139 | --- | fixed |
People
(Reporter: iain, Assigned: iain)
References
(Blocks 2 open bugs)
Details
(Keywords: perf-alert)
Attachments
(1 file)
There is a microoptimization in MacroAssemblerX64::storeValue to store Int32 and boolean values using two 32-bit stores (one storing the payload and one storing the tag) instead of combining the two halves in a 64-bit register and storing that. This saves a scratch register and a couple of instructions.
However, it turns out that in some case, it's quite bad for performance. Specifically, if we store an Int32 value into an argument slot using two 32-bit stores, call a function, and use a 64-bit load to read the argument value, the store forwarding logic in many x86 chips gets very sad. Calling the identity function (function id(x) { return x; }
) in a loop is 2x slower when passing an int32 argument compared to an argument that would be stored as 64 bits. This has been tested on my own Raptor Lake machine, along with multiple AMD Threadripper machines and nbp's 7-year old Intel Thinkpad, so it seems to be a reasonably widespread problem.
Removing this microoptimization is a 2% improvement on Jetstream 2 on Windows 11. (I don't see the same improvement on Linux or Windows 10, which are apparently running on Xeon, so not all x86 chips are affected.)
I think this is also the explanation for the mysteriously slow move I was investigating in perf-dashboard starting with this comment.
Assignee | ||
Comment 1•3 months ago
|
||
I dispatched a fresh set of Speedometer runs, and the weird regression I thought I might be seeing on Speedometer went away, so I don't see any reason to limit this to rbp/rsp.
Comment 3•3 months ago
|
||
bugherder |
Comment 5•3 months ago
•
|
||
(In reply to Narcis Beleuzu [:NarcisB] from comment #3)
Perfherder has detected a browsertime performance change from push bbbc53f6330ff032a11d82683972751a2c1d1b2c. As author of one of the patches included in that push, we need your help to address this regression.
Please acknowledge, and begin investigating this alert within 3 business days, or the patch(es) may be backed out in accordance with our regression policy. Our guide to handling regression bugs has information about how you can proceed with this investigation.
If you have any questions or need any help with the investigation, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
20% | addkBLN time_duration | windows11-64-24h2-shippable | fission webrender | 2,999.02 -> 2,387.32 | Before/After |
8% | ares6 | windows11-64-24h2-shippable | fission webrender | 14.54 -> 13.32 | Before/After |
3% | jetstream3 | windows11-64-24h2-shippable | fission webrender | 168.94 -> 173.85 | Before/After |
2% | jetstream2 | windows11-64-24h2-shippable | fission webrender | 233.16 -> 238.16 | Before/After |
2% | sunspider | windows11-64-24h2-shippable | fission webrender | 64.04 -> 62.75 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 44584
The following documentation link provides more information about this command.
Updated•3 months ago
|
Comment 6•3 months ago
|
||
(In reply to Narcis Beleuzu [:NarcisB] from comment #3)
Perfherder has detected a talos performance change from push bbbc53f6330ff032a11d82683972751a2c1d1b2c.
If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
13% | pdfpaint issue2129.pdf | windows11-64-24h2-shippable | e10s fission stylo webrender-sw | 341.06 -> 295.10 |
13% | pdfpaint issue2129.pdf | windows11-64-24h2-shippable | e10s fission stylo webrender | 341.60 -> 297.51 |
12% | pdfpaint issue1810.pdf | windows11-64-24h2-shippable | e10s fission stylo webrender | 334.14 -> 292.59 |
12% | pdfpaint issue1810.pdf | windows11-64-24h2-shippable | e10s fission stylo webrender-sw | 333.48 -> 293.40 |
11% | pdfpaint issue13794.pdf | windows11-64-24h2-shippable | e10s fission stylo webrender | 474.15 -> 423.71 |
... | ... | ... | ... | ... |
6% | pdfpaint issue10989.pdf | windows11-64-24h2-shippable | e10s fission stylo webrender-sw | 3,547.12 -> 3,320.64 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.
You can run all of these tests on try with ./mach try perf --alert 44585
The following documentation link provides more information about this command.
Comment 7•3 months ago
•
|
||
OT (rather than a new issue): I follow mountains of perf stuff, but I focus on fingerprinting :) Loving the thousand paper cuts and occasional bug big wins. Anyway, just wondering what has happened to https://arewefastyet.com/win11/benchmarks/overview?numDays=90 or if you are aware of it - seems like half the tests are no longer show since Feb 28th and the other half since Mar 18th
Comment hidden (abusive-reviewed) |
Updated•2 months ago
|
Description
•