Closed Bug 1815069 Opened 1 year ago Closed 1 year ago

Make it possible to tweak mozjemalloc's max dirty page sizes dynamically and increase them on foreground content processes

Categories

(Core :: Performance, defect, P2)

defect

Tracking

()

RESOLVED FIXED
112 Branch
Performance Impact high
Tracking Status
firefox112 --- fixed

People

(Reporter: smaug, Assigned: smaug)

References

(Blocks 4 open bugs)

Details

(Keywords: perf-alert, Whiteboard: [sp3:p1])

Attachments

(4 files)

Some background for this is in https://bugzilla.mozilla.org/show_bug.cgi?id=1805644#c16

AWSY doesn't show too much difference, possibly because of the 3rd patch on the patch queue I'm about to upload
https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=6424e727b6adc253c38b7d7d3c861c9360de74da&newProject=try&newRevision=5f6ae029066462d2c4e4c1ceb8d10aa77c41ebae&page=1&framework=4

BrowserTime really likes this
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=b3dfdc2535422cf85769eef6bb762befaeb9ffb6&newProject=try&newRevision=336b8407412e68151f1afa447bea363288f4b861&page=1&framework=13&showOnlyImportant=1
especially Speedometer* part of those tests
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=336b8407412e68151f1afa447bea363288f4b861&originalSignature=4586009&newSignature=4586009&framework=13&originalRevision=b3dfdc2535422cf85769eef6bb762befaeb9ffb6&page=1&showOnlyImportant=1

The tweaks to the cache sizes in the 2nd patch are based on experimenting with different numbers. The non-default arenas we have really need to be larger than they currently are when running busy code, like speedometer.
It is probably worth to experiment with the limits some more, but it is a bit time consuming to get all the performance and awsy numbers.

Not sure what would be the right component for this, but since this is all about performance, Core: Performance it is :)
I could have also split this to several bugs, but the patches are small and go all together.

I need to still run this on tryserver to ensure I didn't break any checks on debug builds. I wouldn't be surprised if I missed something.

jemalloc_free_dirty_pages is surprisingly fast. One may see it in profiles after running speedometer, but
even then it is basically just one sample or so.

Depends on D168901

Attachment #9315992 - Attachment description: WIP: Bug 1815069 - Make it possible to tweak mozjemalloc's max dirty page sizes dynamically, r=glandium → Bug 1815069 - Make it possible to tweak mozjemalloc's max dirty page sizes dynamically, r=glandium
Attachment #9315993 - Attachment description: WIP: Bug 1815069 - Increase page caches on foreground content processes, r=pbone → Bug 1815069 - Increase page caches on foreground content processes, r=pbone
Attachment #9315994 - Attachment description: WIP: Bug 1815069 - purge page caches once the CC/GC cycle ends, r=mccr8 → Bug 1815069 - purge page caches once the CC/GC cycle ends, r=mccr8
Attachment #9316133 - Attachment description: Bug 1815069, add a pref, r=mccr8 → Bug 1815069, add dom.memory.foreground_content_processes_have_larger_page_cache pref to control page cache behavior in content processes, r=mccr8
Severity: -- → S3
Priority: -- → P2

I'm thinking yet another way to purge. It might not need to use idle tasks, but low priority tasks and it would perhaps purge only to the default levels, not purge all. That might make this less risky from memory usage point of view. Investigating... but feel free to review the patches anyhow. That extra purging would be a followup patch or bug and would be needed only if memory usage turned out to be too high.

I've triggered speedometer 2 and some other browsertime benchmarks on those pushes also. Some of my own work had some page load regressions and I'd like to know if yours is affected too.

Blocks: 1809610
Blocks: 1814817

See the first comment. At least on Windows this gives massive boost on page load.

Blocks: 1817640
Whiteboard: sp3:p1
Whiteboard: sp3:p1 → [sp3:p1]
Performance Impact: --- → high
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5216330caa3c
Make it possible to tweak mozjemalloc's max dirty page sizes dynamically, r=glandium,pbone
https://hg.mozilla.org/integration/autoland/rev/4b1f970d7dfa
Increase page caches on foreground content processes, r=pbone
https://hg.mozilla.org/integration/autoland/rev/cd6590c2d0c1
purge page caches once the CC/GC cycle ends, r=mccr8,pbone
https://hg.mozilla.org/integration/autoland/rev/2cc01888068f
add dom.memory.foreground_content_processes_have_larger_page_cache pref to control page cache behavior in content processes, r=mccr8

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:smaug, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(smaug)
Blocks: 1820136
Flags: needinfo?(smaug)

(In reply to Pulsebot from comment #10)

Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5216330caa3c
Make it possible to tweak mozjemalloc's max dirty page sizes dynamically,
r=glandium,pbone
https://hg.mozilla.org/integration/autoland/rev/4b1f970d7dfa
Increase page caches on foreground content processes, r=pbone
https://hg.mozilla.org/integration/autoland/rev/cd6590c2d0c1
purge page caches once the CC/GC cycle ends, r=mccr8,pbone
https://hg.mozilla.org/integration/autoland/rev/2cc01888068f
add dom.memory.foreground_content_processes_have_larger_page_cache pref to
control page cache behavior in content processes, r=mccr8

== Change summary for alert #37493 (as of Fri, 03 Mar 2023 18:23:14 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
4% speedometer3 linux1804-64-shippable-qr fission webrender 101.63 -> 105.52 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37493

(In reply to Pulsebot from comment #10)

Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5216330caa3c
Make it possible to tweak mozjemalloc's max dirty page sizes dynamically,
r=glandium,pbone
https://hg.mozilla.org/integration/autoland/rev/4b1f970d7dfa
Increase page caches on foreground content processes, r=pbone
https://hg.mozilla.org/integration/autoland/rev/cd6590c2d0c1
purge page caches once the CC/GC cycle ends, r=mccr8,pbone
https://hg.mozilla.org/integration/autoland/rev/2cc01888068f
add dom.memory.foreground_content_processes_have_larger_page_cache pref to
control page cache behavior in content processes, r=mccr8

== Change summary for alert #37572 (as of Wed, 08 Mar 2023 02:12:09 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
37% perf_reftest_singletons link-style-cache-1.html windows10-64-shippable-qr e10s fission stylo webrender 452.43 -> 284.55
35% perf_reftest_singletons link-style-cache-1.html linux1804-64-shippable-qr e10s fission stylo webrender 500.32 -> 324.13
33% perf_reftest_singletons inline-style-cache-1.html linux1804-64-shippable-qr e10s fission stylo webrender 1,425.74 -> 955.07
31% perf_reftest_singletons inline-style-cache-1.html windows10-64-shippable-qr e10s fission stylo webrender 1,425.34 -> 977.16
8% pdfpaint windows10-64-shippable-qr e10s fission stylo webrender-sw 563.19 -> 515.41
... ... ... ... ...
2% tp5o windows10-64-shippable-qr e10s fission stylo webrender-sw 227.03 -> 222.02

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37572

(In reply to Iulian Moraru from comment #12)

https://hg.mozilla.org/mozilla-central/rev/5216330caa3c
https://hg.mozilla.org/mozilla-central/rev/4b1f970d7dfa
https://hg.mozilla.org/mozilla-central/rev/cd6590c2d0c1
https://hg.mozilla.org/mozilla-central/rev/2cc01888068f

== Change summary for alert #37518 (as of Sat, 04 Mar 2023 14:04:49 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
6% outlook fcp windows10-64-shippable-qr fission warm webrender 98.85 -> 93.33 Before/After
5% outlook SpeedIndex windows10-64-shippable-qr cold fission webrender 1,322.12 -> 1,262.00 Before/After
4% outlook fcp (geomean) windows10-64-shippable-qr fission warm webrender 98.53 -> 94.37 Before/After
4% outlook fcp (mean) windows10-64-shippable-qr fission warm webrender 98.71 -> 94.55 Before/After
4% wikia ContentfulSpeedIndex windows10-64-shippable-qr fission warm webrender 868.95 -> 834.42
... ... ... ... ... ...
3% nytimes LastVisualChange windows10-64-shippable-qr cold fission webrender 2,048.42 -> 1,989.00 Before/After

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37518

Good news coming from DevTools performance tests as well.

Some raw JS computations have been made faster:

  • 20% improvement on server.protocoljs, means that overall DevTools Client <=> Server communication has been made significantly faster! I imagine that the test allocates tons of objects when doing this particular stress test.
  • 6 to 12% improvement on webconsole.open. The console opens faster against our various test pages.
  • 2% improvement on source-map. This is pure JS computation, but generating source-map could be overall faster.

== Change summary for alert #37501 (as of Sat, 04 Mar 2023 01:59:15 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
15% damp console.log-in-loop-content-process-node linux1804-64-shippable-qr e10s fission stylo webrender-sw 47.81 -> 54.85
4% damp console.log-in-loop-content-process-date windows10-64-shippable-qr e10s fission stylo webrender-sw 66.34 -> 68.98

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
14% damp server.protocoljs.DAMP windows10-64-shippable-qr e10s fission stylo webrender-sw 1,579.23 -> 1,358.87
14% damp server.protocoljs.DAMP windows10-64-shippable-qr e10s fission stylo webrender 1,581.24 -> 1,364.75
12% damp server.protocoljs.DAMP linux1804-64-shippable-qr e10s fission stylo webrender 1,180.21 -> 1,034.21
12% damp custom.webconsole.open.DAMP windows10-64-shippable-qr e10s fission stylo webrender-sw 654.64 -> 577.45
12% damp custom.webconsole.open.DAMP windows10-64-shippable-qr e10s fission stylo webrender 653.73 -> 577.32
... ... ... ... ...
2% damp console.log-in-loop-content-process-window windows10-64-shippable-qr e10s fission stylo webrender-sw 213.56 -> 208.48

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37501

== Change summary for alert #37456 (as of Thu, 02 Mar 2023 18:41:12 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
5% reddit fcp linux1804-64-shippable-qr fission warm webrender 171.62 -> 162.88 Before/After
5% cnn SpeedIndex android-hw-a51-11-0-aarch64-shippable-qr cold webrender 3,397.38 -> 3,228.50
5% cnn FirstVisualChange android-hw-a51-11-0-aarch64-shippable-qr warm webrender 2,390.08 -> 2,278.08
5% cnn SpeedIndex android-hw-a51-11-0-aarch64-shippable-qr warm webrender 2,493.58 -> 2,377.17
5% cnn PerceptualSpeedIndex android-hw-a51-11-0-aarch64-shippable-qr warm webrender 2,466.00 -> 2,353.33
... ... ... ... ... ...
4% cnn FirstVisualChange android-hw-a51-11-0-aarch64-shippable-qr cold webrender 2,972.18 -> 2,847.67

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37456

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: