Closed Bug 1956655 Opened 3 months ago Closed 3 months ago

Don't use two 32-bit moves in MacroAssemblerX64::storeValue

Categories

(Core :: JavaScript Engine: JIT, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
139 Branch
Tracking Status
firefox139 --- fixed

People

(Reporter: iain, Assigned: iain)

References

(Blocks 2 open bugs)

Details

(Keywords: perf-alert)

Attachments

(1 file)

There is a microoptimization in MacroAssemblerX64::storeValue to store Int32 and boolean values using two 32-bit stores (one storing the payload and one storing the tag) instead of combining the two halves in a 64-bit register and storing that. This saves a scratch register and a couple of instructions.

However, it turns out that in some case, it's quite bad for performance. Specifically, if we store an Int32 value into an argument slot using two 32-bit stores, call a function, and use a 64-bit load to read the argument value, the store forwarding logic in many x86 chips gets very sad. Calling the identity function (function id(x) { return x; }) in a loop is 2x slower when passing an int32 argument compared to an argument that would be stored as 64 bits. This has been tested on my own Raptor Lake machine, along with multiple AMD Threadripper machines and nbp's 7-year old Intel Thinkpad, so it seems to be a reasonably widespread problem.

Removing this microoptimization is a 2% improvement on Jetstream 2 on Windows 11. (I don't see the same improvement on Linux or Windows 10, which are apparently running on Xeon, so not all x86 chips are affected.)

I think this is also the explanation for the mysteriously slow move I was investigating in perf-dashboard starting with this comment.

I dispatched a fresh set of Speedometer runs, and the weird regression I thought I might be seeing on Speedometer went away, so I don't see any reason to limit this to rbp/rsp.

Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bbbc53f6330f Don't use 32-bit moves in MacroAssemblerX64::storeValue r=jandem
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 139 Branch

Looks to have improved Jetstream2-stanford-crypto-aes on Linux by 3%.

(In reply to Narcis Beleuzu [:NarcisB] from comment #3)

https://hg.mozilla.org/mozilla-central/rev/bbbc53f6330f

Perfherder has detected a browsertime performance change from push bbbc53f6330ff032a11d82683972751a2c1d1b2c. As author of one of the patches included in that push, we need your help to address this regression.

Please acknowledge, and begin investigating this alert within 3 business days, or the patch(es) may be backed out in accordance with our regression policy. Our guide to handling regression bugs has information about how you can proceed with this investigation.

If you have any questions or need any help with the investigation, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
20% addkBLN time_duration windows11-64-24h2-shippable fission webrender 2,999.02 -> 2,387.32 Before/After
8% ares6 windows11-64-24h2-shippable fission webrender 14.54 -> 13.32 Before/After
3% jetstream3 windows11-64-24h2-shippable fission webrender 168.94 -> 173.85 Before/After
2% jetstream2 windows11-64-24h2-shippable fission webrender 233.16 -> 238.16 Before/After
2% sunspider windows11-64-24h2-shippable fission webrender 64.04 -> 62.75 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44584

The following documentation link provides more information about this command.

Keywords: perf-alert

(In reply to Narcis Beleuzu [:NarcisB] from comment #3)

https://hg.mozilla.org/mozilla-central/rev/bbbc53f6330f

Perfherder has detected a talos performance change from push bbbc53f6330ff032a11d82683972751a2c1d1b2c.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
13% pdfpaint issue2129.pdf windows11-64-24h2-shippable e10s fission stylo webrender-sw 341.06 -> 295.10
13% pdfpaint issue2129.pdf windows11-64-24h2-shippable e10s fission stylo webrender 341.60 -> 297.51
12% pdfpaint issue1810.pdf windows11-64-24h2-shippable e10s fission stylo webrender 334.14 -> 292.59
12% pdfpaint issue1810.pdf windows11-64-24h2-shippable e10s fission stylo webrender-sw 333.48 -> 293.40
11% pdfpaint issue13794.pdf windows11-64-24h2-shippable e10s fission stylo webrender 474.15 -> 423.71
... ... ... ... ...
6% pdfpaint issue10989.pdf windows11-64-24h2-shippable e10s fission stylo webrender-sw 3,547.12 -> 3,320.64

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44585

The following documentation link provides more information about this command.

OT (rather than a new issue): I follow mountains of perf stuff, but I focus on fingerprinting :) Loving the thousand paper cuts and occasional bug big wins. Anyway, just wondering what has happened to https://arewefastyet.com/win11/benchmarks/overview?numDays=90 or if you are aware of it - seems like half the tests are no longer show since Feb 28th and the other half since Mar 18th

Also improved kraken-standford-crypto-aes by 7%

Blocks: 1729507
QA Whiteboard: [qa-triage-done-c140/b139]
Regressions: 1963757
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: