12.98 - 3.82% motionmark-htmlsuite-1-3 / motionmark-1-3 + 1 more (Windows) regression on Wed September 18 2024
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | unaffected |
firefox130 | --- | unaffected |
firefox131 | --- | unaffected |
firefox132 | --- | wontfix |
firefox133 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: nical)
References
(Regressed 1 open bug, Regression)
Details
(Keywords: perf, perf-alert, regression)
Attachments
(2 files, 1 obsolete file)
Perfherder has detected a browsertime performance regression from push 5c0408e2fee20cddfdec5fe5dd81d461b2787d10. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
13% | motionmark-htmlsuite-1-3 | windows11-64-shippable-qr | fission webrender | 953.92 -> 830.09 | |
5% | motionmark-1-3 | windows11-64-shippable-qr | fission webrender | 1,253.38 -> 1,189.79 | Before/After |
4% | motionmark-1-3 | windows11-64-shippable-qr | fission webrender | 1,244.86 -> 1,197.31 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 2212
For more information on performance sheriffing please see our FAQ.
Updated•4 months ago
|
Comment 1•4 months ago
|
||
Set release status flags based on info from the regressing bug 1912019
Assignee | ||
Comment 2•4 months ago
|
||
Profiles and some local testing show that parts during the test motionmark end up allocating large per-frame bump allocators. Most of frames have a handful of bump allocator chunks, and some frames have allocation spikes going up to the 80-90 chunks (chunks are 128KB). When these chunks are dropped on the renderer thread, two expensive things happen:
- 2/3 of the time is spent in very expensive memset operations to poison the memory. This constitutes the bulk of the regression but is actually a nightly-only thing. In release builds we only poison a small fixed amount of each large allocation so this cost should be negligible for non-nightly builds.
- The remaining 1/3 of the time goes into decommitting memory because the allocation spike pushed us above the dirty page threshold. That affects all users.
In addition, these things happen in jemalloc while it holds its arena lock, so there is some extra lock contention on other threads allocating from the same arena (I'm not sure, whether all threads use a single arena for huge allocations or if its just the unlucky threads using the same arena as the renderer thread).
To mitigate these we could:
- Recycle the bump allocator chunks instead of freeing them each frame.
- Ensure that the memory is dropped outside of the critical path.
- Raise jemalloc's dirty page threshold some more (as of bug 1915146) it is already temporarily raised while webrender is active.
Or a combination of all three.
I looked into whether avoiding wasted bytes in the frame allocator (caused by generous alignment and vector reallocation) but as far as I can tell, while running the motionmark, wasted space represents a small (~3%) proportion of the allocated memory so it does not appear to be worth the work.
I'll try a combination of the solutions above. In the mean time, the regression is not actually as severe as reported for non-nightly users due to most of it going into the nightly-only large memsets when dropping large allocs.
Assignee | ||
Updated•4 months ago
|
Assignee | ||
Comment 3•4 months ago
|
||
Adding a pool to recycle the memory chunks seems to fix the regression: https://perf.compare/compare-results?baseRev=18774299b4c2a2977e0485a9eb27e36d5f32346e&newRev=66fef37627001d0a56941fc70f4266c4ea99e55f&baseRepo=try&newRepo=try&framework=13
Assignee | ||
Comment 4•4 months ago
|
||
Instead of dropping the frame's memory chunk when a frame is replaced, place the chunks into a global pool. This mostly helps when running stress tests like motionmark's html suite that cause intense allocation spikes (around 90 chunks per frame).
We should revisit whether this is needed if the cost of deallocating large regions of memory in mozjemalloc improves in the future.
Updated•4 months ago
|
Assignee | ||
Comment 5•4 months ago
|
||
Assignee | ||
Comment 6•4 months ago
|
||
Comment 8•4 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6862a5bda1cf
https://hg.mozilla.org/mozilla-central/rev/ce5e1c419ca0
Comment 9•4 months ago
|
||
numbers :
Pre-regression (~970)
Regression (870)
Post-fix: (990)
So numbers are slightly better than the pre-regression
Comment 10•4 months ago
|
||
The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox132
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 11•4 months ago
|
||
Let's let it bake for a week before considering an uplift.
Comment 12•4 months ago
|
||
(In reply to Pulsebot from comment #7)
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6862a5bda1cf
Add a pool for the memory chunks used by per-frame allocators.
r=gfx-reviewers,lsalzman
https://hg.mozilla.org/integration/autoland/rev/ce5e1c419ca0
Purge WebRender's chunk pool when idle. r=gfx-reviewers,lsalzman
Perfherder has detected a browsertime performance change from push ce5e1c419ca0ff17b1c630156c836607f7391ffa.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
14% | motionmark-htmlsuite-1-3 | windows11-64-shippable-qr | fission webrender | 878.13 -> 1,000.03 |
13% | motionmark-htmlsuite-1-3 | linux1804-64-shippable-qr | fission webrender | 524.27 -> 591.29 |
9% | motionmark-htmlsuite-1-3 | linux1804-64-shippable-qr | fission webrender | 528.23 -> 574.37 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 2371
For more information on performance sheriffing please see our FAQ.
Comment 13•4 months ago
|
||
Is the plan to move D223926 to a new bug at this point? Also, just a reminder that we're going into our final week of Beta for the cycle before building the Fx132 RC, so if we intend to uplift, the request should come sooner rather than later.
Updated•4 months ago
|
Assignee | ||
Comment 14•4 months ago
|
||
I closed D223926 since it has issues.
Regarding the other two patches, in my opinion they aren't worth uplifting since they fix a regression for a rather niche situation (stress tests). Unless we care about the continuity of motionmark scores, users won't notice if the fix comes one train later.
Updated•4 months ago
|
Description
•