Closed Bug 1956655 Opened 3 months ago Closed 3 months ago

Don't use two 32-bit moves in MacroAssemblerX64::storeValue

Tracking

()

Status:

RESOLVED FIXED

Milestone:

139 Branch

Tracking Flags:

Tracking

Status

firefox139

---

fixed

People

(Reporter: iain, Assigned: iain)

References

(Blocks 2 open bugs)

Details

(Keywords: perf-alert)

Attachments

(1 file)

Bug 1956655: Don't use 32-bit moves in MacroAssemblerX64::storeValue r=jandem 3 months ago Iain Ireland [:iain] 48 bytes, text/x-phabricator-request		Details \| Review

Iain Ireland [:iain]

Assignee

Description

•

3 months ago

There is a microoptimization in MacroAssemblerX64::storeValue to store Int32 and boolean values using two 32-bit stores (one storing the payload and one storing the tag) instead of combining the two halves in a 64-bit register and storing that. This saves a scratch register and a couple of instructions.

However, it turns out that in some case, it's quite bad for performance. Specifically, if we store an Int32 value into an argument slot using two 32-bit stores, call a function, and use a 64-bit load to read the argument value, the store forwarding logic in many x86 chips gets very sad. Calling the identity function (function id(x) { return x; }) in a loop is 2x slower when passing an int32 argument compared to an argument that would be stored as 64 bits. This has been tested on my own Raptor Lake machine, along with multiple AMD Threadripper machines and nbp's 7-year old Intel Thinkpad, so it seems to be a reasonably widespread problem.

Removing this microoptimization is a 2% improvement on Jetstream 2 on Windows 11. (I don't see the same improvement on Linux or Windows 10, which are apparently running on Xeon, so not all x86 chips are affected.)

I think this is also the explanation for the mysteriously slow move I was investigating in perf-dashboard starting with this comment.

Iain Ireland [:iain]

Assignee

Comment 1

•

3 months ago

Attached file Bug 1956655: Don't use 32-bit moves in MacroAssemblerX64::storeValue r=jandem — Details

I dispatched a fresh set of Speedometer runs, and the weird regression I thought I might be seeing on Speedometer went away, so I don't see any reason to limit this to rbp/rsp.

Pulsebot

Comment 2

•

3 months ago

Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bbbc53f6330f Don't use 32-bit moves in MacroAssemblerX64::storeValue r=jandem

Narcis Beleuzu [:NarcisB]

Comment 3

•

3 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bbbc53f6330f

Status: NEW → RESOLVED

Closed: 3 months ago

status-firefox139: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 139 Branch

Mayank Bansal

Comment 4

•

3 months ago

Looks to have improved Jetstream2-stanford-crypto-aes on Linux by 3%.

Florin Bilt

Comment 5

•

3 months ago

•

Edited

(In reply to Narcis Beleuzu [:NarcisB] from comment #3)

https://hg.mozilla.org/mozilla-central/rev/bbbc53f6330f

Perfherder has detected a browsertime performance change from push bbbc53f6330ff032a11d82683972751a2c1d1b2c. As author of one of the patches included in that push, we need your help to address this regression.

Please acknowledge, and begin investigating this alert within 3 business days, or the patch(es) may be backed out in accordance with our regression policy. Our guide to handling regression bugs has information about how you can proceed with this investigation.

If you have any questions or need any help with the investigation, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)	Performance Profiles
20%	addkBLN time_duration	windows11-64-24h2-shippable	fission webrender	2,999.02 -> 2,387.32	Before/After
8%	ares6	windows11-64-24h2-shippable	fission webrender	14.54 -> 13.32	Before/After
3%	jetstream3	windows11-64-24h2-shippable	fission webrender	168.94 -> 173.85	Before/After
2%	jetstream2	windows11-64-24h2-shippable	fission webrender	233.16 -> 238.16	Before/After
2%	sunspider	windows11-64-24h2-shippable	fission webrender	64.04 -> 62.75	Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44584

The following documentation link provides more information about this command.

Florin Bilt

Updated

•

3 months ago

Keywords: perf-alert

Florin Bilt

Comment 6

•

3 months ago

(In reply to Narcis Beleuzu [:NarcisB] from comment #3)

https://hg.mozilla.org/mozilla-central/rev/bbbc53f6330f

Perfherder has detected a talos performance change from push bbbc53f6330ff032a11d82683972751a2c1d1b2c.

If you have any questions, please reach out to a performance sheriff. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
13%	pdfpaint issue2129.pdf	windows11-64-24h2-shippable	e10s fission stylo webrender-sw	341.06 -> 295.10
13%	pdfpaint issue2129.pdf	windows11-64-24h2-shippable	e10s fission stylo webrender	341.60 -> 297.51
12%	pdfpaint issue1810.pdf	windows11-64-24h2-shippable	e10s fission stylo webrender	334.14 -> 292.59
12%	pdfpaint issue1810.pdf	windows11-64-24h2-shippable	e10s fission stylo webrender-sw	333.48 -> 293.40
11%	pdfpaint issue13794.pdf	windows11-64-24h2-shippable	e10s fission stylo webrender	474.15 -> 423.71
...	...	...	...	...
6%	pdfpaint issue10989.pdf	windows11-64-24h2-shippable	e10s fission stylo webrender-sw	3,547.12 -> 3,320.64

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a performance sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44585

The following documentation link provides more information about this command.

Thorin [:thorin]

Comment 7

•

3 months ago

•

Edited

OT (rather than a new issue): I follow mountains of perf stuff, but I focus on fingerprinting :) Loving the thousand paper cuts and occasional ~~bug~~ big wins. Anyway, just wondering what has happened to https://arewefastyet.com/win11/benchmarks/overview?numDays=90 or if you are aware of it - seems like half the tests are no longer show since Feb 28th and the other half since Mar 18th

Comment hidden (abusive-reviewed)

Mayank Bansal

Comment 9

•

2 months ago

Also improved kraken-standford-crypto-aes by 7%

Mayank Bansal

Updated

•

2 months ago

Blocks: 1729507

Mihai Boldan, Desktop QA [:mboldan]

Updated

•

2 months ago

QA Whiteboard: [qa-triage-done-c140/b139]

Mayank Bansal

Updated

•

2 months ago

Regressions: 1963757

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Don't use two 32-bit moves in MacroAssemblerX64::storeValue

Categories

(Core :: JavaScript Engine: JIT, enhancement, P1)

Tracking

()

People

(Reporter: iain, Assigned: iain)

References

(Blocks 2 open bugs)

Details

(Keywords: perf-alert)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Improvements:

Updated

Comment 6

Improvements:

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type