Make it possible to tweak mozjemalloc's max dirty page sizes dynamically and increase them on foreground content processes
Categories
(Core :: Performance, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox112 | --- | fixed |
People
(Reporter: smaug, Assigned: smaug)
References
(Blocks 4 open bugs)
Details
(Keywords: perf-alert, Whiteboard: [sp3:p1])
Attachments
(4 files)
Some background for this is in https://bugzilla.mozilla.org/show_bug.cgi?id=1805644#c16
AWSY doesn't show too much difference, possibly because of the 3rd patch on the patch queue I'm about to upload
https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&originalRevision=6424e727b6adc253c38b7d7d3c861c9360de74da&newProject=try&newRevision=5f6ae029066462d2c4e4c1ceb8d10aa77c41ebae&page=1&framework=4
BrowserTime really likes this
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=b3dfdc2535422cf85769eef6bb762befaeb9ffb6&newProject=try&newRevision=336b8407412e68151f1afa447bea363288f4b861&page=1&framework=13&showOnlyImportant=1
especially Speedometer* part of those tests
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=336b8407412e68151f1afa447bea363288f4b861&originalSignature=4586009&newSignature=4586009&framework=13&originalRevision=b3dfdc2535422cf85769eef6bb762befaeb9ffb6&page=1&showOnlyImportant=1
The tweaks to the cache sizes in the 2nd patch are based on experimenting with different numbers. The non-default arenas we have really need to be larger than they currently are when running busy code, like speedometer.
It is probably worth to experiment with the limits some more, but it is a bit time consuming to get all the performance and awsy numbers.
Not sure what would be the right component for this, but since this is all about performance, Core: Performance it is :)
I could have also split this to several bugs, but the patches are small and go all together.
I need to still run this on tryserver to ensure I didn't break any checks on debug builds. I wouldn't be surprised if I missed something.
Assignee | ||
Comment 1•1 year ago
|
||
Assignee | ||
Comment 2•1 year ago
|
||
Depends on D168900
Assignee | ||
Comment 3•1 year ago
|
||
jemalloc_free_dirty_pages is surprisingly fast. One may see it in profiles after running speedometer, but
even then it is basically just one sample or so.
Depends on D168901
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 4•1 year ago
|
||
Depends on D168902
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 5•1 year ago
•
|
||
I'm thinking yet another way to purge. It might not need to use idle tasks, but low priority tasks and it would perhaps purge only to the default levels, not purge all. That might make this less risky from memory usage point of view. Investigating... but feel free to review the patches anyhow. That extra purging would be a followup patch or bug and would be needed only if memory usage turned out to be too high.
Assignee | ||
Comment 6•1 year ago
•
|
||
I rebased the patches and run some more tests (triggered many times, but the confidence is still low) and memory usage looks quite reasonable
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=968eb82a48a71de0a415a75d1a70f080af18074b&newProject=try&newRevision=a13b303e3131db00545d312caa00c025fc3d0a07&page=1&framework=4
Assignee | ||
Comment 7•1 year ago
|
||
Since the API changed and I tweaked some modifiers
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=a6bcb63372e5a145b7bc09d3a05ec569aad16b1e&newProject=try&newRevision=94ccd7acb12e00c1d0ca0f975b82475b18565abe
AWSY looks reasonble and so does Speedometer3
Comment 8•1 year ago
|
||
I've triggered speedometer 2 and some other browsertime benchmarks on those pushes also. Some of my own work had some page load regressions and I'd like to know if yours is affected too.
Assignee | ||
Comment 9•1 year ago
|
||
See the first comment. At least on Windows this gives massive boost on page load.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 10•1 year ago
|
||
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5216330caa3c Make it possible to tweak mozjemalloc's max dirty page sizes dynamically, r=glandium,pbone https://hg.mozilla.org/integration/autoland/rev/4b1f970d7dfa Increase page caches on foreground content processes, r=pbone https://hg.mozilla.org/integration/autoland/rev/cd6590c2d0c1 purge page caches once the CC/GC cycle ends, r=mccr8,pbone https://hg.mozilla.org/integration/autoland/rev/2cc01888068f add dom.memory.foreground_content_processes_have_larger_page_cache pref to control page cache behavior in content processes, r=mccr8
Comment 11•1 year ago
|
||
The severity field for this bug is set to S3. However, the Performance Impact
field flags this bug as having a high impact on the performance.
:smaug, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact
flag to ?
?
For more information, please visit auto_nag documentation.
Comment 12•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5216330caa3c
https://hg.mozilla.org/mozilla-central/rev/4b1f970d7dfa
https://hg.mozilla.org/mozilla-central/rev/cd6590c2d0c1
https://hg.mozilla.org/mozilla-central/rev/2cc01888068f
Assignee | ||
Updated•1 year ago
|
Comment 13•1 year ago
|
||
(In reply to Pulsebot from comment #10)
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5216330caa3c
Make it possible to tweak mozjemalloc's max dirty page sizes dynamically,
r=glandium,pbone
https://hg.mozilla.org/integration/autoland/rev/4b1f970d7dfa
Increase page caches on foreground content processes, r=pbone
https://hg.mozilla.org/integration/autoland/rev/cd6590c2d0c1
purge page caches once the CC/GC cycle ends, r=mccr8,pbone
https://hg.mozilla.org/integration/autoland/rev/2cc01888068f
add dom.memory.foreground_content_processes_have_larger_page_cache pref to
control page cache behavior in content processes, r=mccr8
== Change summary for alert #37493 (as of Fri, 03 Mar 2023 18:23:14 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
4% | speedometer3 | linux1804-64-shippable-qr | fission webrender | 101.63 -> 105.52 | Before/After |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37493
Updated•1 year ago
|
Comment 14•1 year ago
|
||
(In reply to Pulsebot from comment #10)
Pushed by opettay@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5216330caa3c
Make it possible to tweak mozjemalloc's max dirty page sizes dynamically,
r=glandium,pbone
https://hg.mozilla.org/integration/autoland/rev/4b1f970d7dfa
Increase page caches on foreground content processes, r=pbone
https://hg.mozilla.org/integration/autoland/rev/cd6590c2d0c1
purge page caches once the CC/GC cycle ends, r=mccr8,pbone
https://hg.mozilla.org/integration/autoland/rev/2cc01888068f
add dom.memory.foreground_content_processes_have_larger_page_cache pref to
control page cache behavior in content processes, r=mccr8
== Change summary for alert #37572 (as of Wed, 08 Mar 2023 02:12:09 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
37% | perf_reftest_singletons link-style-cache-1.html | windows10-64-shippable-qr | e10s fission stylo webrender | 452.43 -> 284.55 |
35% | perf_reftest_singletons link-style-cache-1.html | linux1804-64-shippable-qr | e10s fission stylo webrender | 500.32 -> 324.13 |
33% | perf_reftest_singletons inline-style-cache-1.html | linux1804-64-shippable-qr | e10s fission stylo webrender | 1,425.74 -> 955.07 |
31% | perf_reftest_singletons inline-style-cache-1.html | windows10-64-shippable-qr | e10s fission stylo webrender | 1,425.34 -> 977.16 |
8% | pdfpaint | windows10-64-shippable-qr | e10s fission stylo webrender-sw | 563.19 -> 515.41 |
... | ... | ... | ... | ... |
2% | tp5o | windows10-64-shippable-qr | e10s fission stylo webrender-sw | 227.03 -> 222.02 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37572
Comment 15•1 year ago
|
||
(In reply to Iulian Moraru from comment #12)
https://hg.mozilla.org/mozilla-central/rev/5216330caa3c
https://hg.mozilla.org/mozilla-central/rev/4b1f970d7dfa
https://hg.mozilla.org/mozilla-central/rev/cd6590c2d0c1
https://hg.mozilla.org/mozilla-central/rev/2cc01888068f
== Change summary for alert #37518 (as of Sat, 04 Mar 2023 14:04:49 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
6% | outlook fcp | windows10-64-shippable-qr | fission warm webrender | 98.85 -> 93.33 | Before/After |
5% | outlook SpeedIndex | windows10-64-shippable-qr | cold fission webrender | 1,322.12 -> 1,262.00 | Before/After |
4% | outlook fcp (geomean) | windows10-64-shippable-qr | fission warm webrender | 98.53 -> 94.37 | Before/After |
4% | outlook fcp (mean) | windows10-64-shippable-qr | fission warm webrender | 98.71 -> 94.55 | Before/After |
4% | wikia ContentfulSpeedIndex | windows10-64-shippable-qr | fission warm webrender | 868.95 -> 834.42 | |
... | ... | ... | ... | ... | ... |
3% | nytimes LastVisualChange | windows10-64-shippable-qr | cold fission webrender | 2,048.42 -> 1,989.00 | Before/After |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37518
Comment 16•1 year ago
|
||
Good news coming from DevTools performance tests as well.
Some raw JS computations have been made faster:
- 20% improvement on server.protocoljs, means that overall DevTools Client <=> Server communication has been made significantly faster! I imagine that the test allocates tons of objects when doing this particular stress test.
- 6 to 12% improvement on webconsole.open. The console opens faster against our various test pages.
- 2% improvement on source-map. This is pure JS computation, but generating source-map could be overall faster.
== Change summary for alert #37501 (as of Sat, 04 Mar 2023 01:59:15 GMT) ==
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
15% | damp console.log-in-loop-content-process-node | linux1804-64-shippable-qr | e10s fission stylo webrender-sw | 47.81 -> 54.85 |
4% | damp console.log-in-loop-content-process-date | windows10-64-shippable-qr | e10s fission stylo webrender-sw | 66.34 -> 68.98 |
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
14% | damp server.protocoljs.DAMP | windows10-64-shippable-qr | e10s fission stylo webrender-sw | 1,579.23 -> 1,358.87 |
14% | damp server.protocoljs.DAMP | windows10-64-shippable-qr | e10s fission stylo webrender | 1,581.24 -> 1,364.75 |
12% | damp server.protocoljs.DAMP | linux1804-64-shippable-qr | e10s fission stylo webrender | 1,180.21 -> 1,034.21 |
12% | damp custom.webconsole.open.DAMP | windows10-64-shippable-qr | e10s fission stylo webrender-sw | 654.64 -> 577.45 |
12% | damp custom.webconsole.open.DAMP | windows10-64-shippable-qr | e10s fission stylo webrender | 653.73 -> 577.32 |
... | ... | ... | ... | ... |
2% | damp console.log-in-loop-content-process-window | windows10-64-shippable-qr | e10s fission stylo webrender-sw | 213.56 -> 208.48 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37501
Comment 17•1 year ago
|
||
== Change summary for alert #37456 (as of Thu, 02 Mar 2023 18:41:12 GMT) ==
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
5% | reddit fcp | linux1804-64-shippable-qr | fission warm webrender | 171.62 -> 162.88 | Before/After |
5% | cnn SpeedIndex | android-hw-a51-11-0-aarch64-shippable-qr | cold webrender | 3,397.38 -> 3,228.50 | |
5% | cnn FirstVisualChange | android-hw-a51-11-0-aarch64-shippable-qr | warm webrender | 2,390.08 -> 2,278.08 | |
5% | cnn SpeedIndex | android-hw-a51-11-0-aarch64-shippable-qr | warm webrender | 2,493.58 -> 2,377.17 | |
5% | cnn PerceptualSpeedIndex | android-hw-a51-11-0-aarch64-shippable-qr | warm webrender | 2,466.00 -> 2,353.33 | |
... | ... | ... | ... | ... | ... |
4% | cnn FirstVisualChange | android-hw-a51-11-0-aarch64-shippable-qr | cold webrender | 2,972.18 -> 2,847.67 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37456
Description
•