Closed Bug 1920678 Opened 4 months ago Closed 4 months ago

12.98 - 3.82% motionmark-htmlsuite-1-3 / motionmark-1-3 + 1 more (Windows) regression on Wed September 18 2024

Tracking

()

Status:

RESOLVED FIXED

Milestone:

133 Branch

Tracking Flags:

Tracking

Status

firefox-esr115

---

unaffected

firefox-esr128

---

unaffected

firefox130

---

unaffected

firefox131

---

unaffected

firefox132

---

wontfix

firefox133

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: nical)

References

(Regressed 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(2 files, 1 obsolete file)

Bug 1920678 - Add a pool for the memory chunks used by per-frame allocators. r=#gfx-reviewers 4 months ago Nicolas Silva [:nical] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1920678 - Purge WebRender's chunk pool when idle. r=#gfx-reviewers 4 months ago Nicolas Silva [:nical] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1920678 - Drop the WR document after rendering it. r=gw 4 months ago Nicolas Silva [:nical] 48 bytes, text/x-phabricator-request		Details \| Review

Treeherder Bug Filer

Reporter

Description

•

4 months ago

treeherder

Perfherder has detected a browsertime performance regression from push 5c0408e2fee20cddfdec5fe5dd81d461b2787d10. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Test	Platform	Options	Absolute values (old vs new)	Performance Profiles
13%	motionmark-htmlsuite-1-3	windows11-64-shippable-qr	fission webrender	953.92 -> 830.09
5%	motionmark-1-3	windows11-64-shippable-qr	fission webrender	1,253.38 -> 1,189.79	Before/After
4%	motionmark-1-3	windows11-64-shippable-qr	fission webrender	1,244.86 -> 1,197.31

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 2212

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(nical.bugzilla)

Florin Bilt

Updated

•

4 months ago

status-firefox132: --- → affected

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

4 months ago

Set release status flags based on info from the regressing bug 1912019

status-firefox130: --- → unaffected

status-firefox131: --- → unaffected

status-firefox-esr115: --- → unaffected

status-firefox-esr128: --- → unaffected

Nicolas Silva [:nical]

Assignee

Comment 2

•

4 months ago

Profiles and some local testing show that parts during the test motionmark end up allocating large per-frame bump allocators. Most of frames have a handful of bump allocator chunks, and some frames have allocation spikes going up to the 80-90 chunks (chunks are 128KB). When these chunks are dropped on the renderer thread, two expensive things happen:

2/3 of the time is spent in very expensive memset operations to poison the memory. This constitutes the bulk of the regression but is actually a nightly-only thing. In release builds we only poison a small fixed amount of each large allocation so this cost should be negligible for non-nightly builds.
The remaining 1/3 of the time goes into decommitting memory because the allocation spike pushed us above the dirty page threshold. That affects all users.

In addition, these things happen in jemalloc while it holds its arena lock, so there is some extra lock contention on other threads allocating from the same arena (I'm not sure, whether all threads use a single arena for huge allocations or if its just the unlucky threads using the same arena as the renderer thread).

To mitigate these we could:

Recycle the bump allocator chunks instead of freeing them each frame.
Ensure that the memory is dropped outside of the critical path.
Raise jemalloc's dirty page threshold some more (as of bug 1915146) it is already temporarily raised while webrender is active.

Or a combination of all three.

I looked into whether avoiding wasted bytes in the frame allocator (caused by generous alignment and vector reallocation) but as far as I can tell, while running the motionmark, wasted space represents a small (~3%) proportion of the allocated memory so it does not appear to be worth the work.

I'll try a combination of the solutions above. In the mean time, the regression is not actually as severe as reported for non-nightly users due to most of it going into the nightly-only large memsets when dropping large allocs.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Updated

•

4 months ago

Severity: -- → S3

Priority: -- → P2

Nicolas Silva [:nical]

Assignee

Comment 3

•

4 months ago

Adding a pool to recycle the memory chunks seems to fix the regression: https://perf.compare/compare-results?baseRev=18774299b4c2a2977e0485a9eb27e36d5f32346e&newRev=66fef37627001d0a56941fc70f4266c4ea99e55f&baseRepo=try&newRepo=try&framework=13

Nicolas Silva [:nical]

Assignee

Comment 4

•

4 months ago

Attached file Bug 1920678 - Add a pool for the memory chunks used by per-frame allocators. r=#gfx-reviewers — Details

Instead of dropping the frame's memory chunk when a frame is replaced, place the chunks into a global pool. This mostly helps when running stress tests like motionmark's html suite that cause intense allocation spikes (around 90 chunks per frame).

We should revisit whether this is needed if the cost of deallocating large regions of memory in mozjemalloc improves in the future.

Phabricator Automation

Updated

•

4 months ago

Assignee: nobody → nical.bugzilla

Status: NEW → ASSIGNED

Nicolas Silva [:nical]

Assignee

Comment 5

•

4 months ago

Attached file Bug 1920678 - Purge WebRender's chunk pool when idle. r=#gfx-reviewers — Details

Nicolas Silva [:nical]

Assignee

Comment 6

•

4 months ago

Attached file Bug 1920678 - Drop the WR document after rendering it. r=gw (obsolete) — Details

Pulsebot

Comment 7

•

4 months ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6862a5bda1cf Add a pool for the memory chunks used by per-frame allocators. r=gfx-reviewers,lsalzman https://hg.mozilla.org/integration/autoland/rev/ce5e1c419ca0 Purge WebRender's chunk pool when idle. r=gfx-reviewers,lsalzman

Serban Stanca [:SerbanS]

Comment 8

•

4 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6862a5bda1cf
https://hg.mozilla.org/mozilla-central/rev/ce5e1c419ca0

Status: ASSIGNED → RESOLVED

Closed: 4 months ago

status-firefox133: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 133 Branch

Mayank Bansal

Comment 9

•

4 months ago

numbers :
Pre-regression (~970)
Regression (870)
Post-fix: (990)

So numbers are slightly better than the pre-regression

BugBot [:suhaib / :marco/ :calixte]

Comment 10

•

4 months ago

The patch landed in nightly and beta is affected.
:nical, is this bug important enough to require an uplift?

If yes, please nominate the patch for beta approval.
If no, please set status-firefox132 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 11

•

4 months ago

Let's let it bake for a week before considering an uplift.

Alex Finder

Comment 12

•

4 months ago

(In reply to Pulsebot from comment #7)

Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6862a5bda1cf
Add a pool for the memory chunks used by per-frame allocators.
r=gfx-reviewers,lsalzman
https://hg.mozilla.org/integration/autoland/rev/ce5e1c419ca0
Purge WebRender's chunk pool when idle. r=gfx-reviewers,lsalzman

Perfherder has detected a browsertime performance change from push ce5e1c419ca0ff17b1c630156c836607f7391ffa.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
14%	motionmark-htmlsuite-1-3	windows11-64-shippable-qr	fission webrender	878.13 -> 1,000.03
13%	motionmark-htmlsuite-1-3	linux1804-64-shippable-qr	fission webrender	524.27 -> 591.29
9%	motionmark-htmlsuite-1-3	linux1804-64-shippable-qr	fission webrender	528.23 -> 574.37

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 2371

For more information on performance sheriffing please see our FAQ.

Ryan VanderMeulen [:RyanVM]

Comment 13

•

4 months ago

Is the plan to move D223926 to a new bug at this point? Also, just a reminder that we're going into our final week of Beta for the cycle before building the Fx132 RC, so if we intend to uplift, the request should come sooner rather than later.

Phabricator Automation

Updated

•

4 months ago

Attachment #9427696 - Attachment is obsolete: true

Nicolas Silva [:nical]

Assignee

Comment 14

•

4 months ago

I closed D223926 since it has issues.

Regarding the other two patches, in my opinion they aren't worth uplifting since they fix a regression for a rather niche situation (stress tests). Unless we care about the continuity of motionmark scores, users won't notice if the fix comes one train later.

Flags: needinfo?(nical.bugzilla)

Ryan VanderMeulen [:RyanVM]

Updated

•

4 months ago

status-firefox132: affected → wontfix

Jamie Nicol [:jnicol]

Updated

•

3 months ago

Regressions: 1926676

You need to log in before you can comment on or make changes to this bug.