Closed Bug 1951925 Opened 1 year ago Closed 1 year ago

7.27 - 3.55% Base Content Heap Unclassified / Base Content Resident Unique Memory + 1 more (Windows) regression on Wed March 5 2025

Tracking

()

Status:

RESOLVED FIXED

Milestone:

138 Branch

Tracking Flags:

Tracking

Status

firefox-esr128

---

unaffected

firefox136

---

unaffected

firefox137

---

unaffected

firefox138

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: afranchuk)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Bug 1951925 - Fix base content memory usage regressions r=nika 1 year ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review

Treeherder Bug Filer

Reporter

Description

•

1 year ago

treeherder

Perfherder has detected a awsy performance regression from push 7a43eb20fb7c112e190821dec3c58fb8727e7b25. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Test	Platform	Options	Absolute values (old vs new)
7%	Base Content Heap Unclassified	windows11-64-24h2-shippable	fission	1,934,130.67 -> 2,074,806.00
5%	Base Content Explicit	windows11-64-24h2-shippable	fission	8,173,794.00 -> 8,576,967.33
4%	Base Content Resident Unique Memory	windows11-64-24h2-shippable	fission	11,453,781.33 -> 11,860,821.33

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44199

The following documentation link provides more information about this command.

For more information on performance sheriffing please see our FAQ.

If you have any questions, please do not hesitate to reach out to fbilt@mozilla.com.

Flags: needinfo?(afranchuk)

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

1 year ago

Set release status flags based on info from the regressing bug 1942129

status-firefox136: --- → unaffected

status-firefox137: --- → unaffected

status-firefox138: --- → affected

status-firefox-esr128: --- → unaffected

Alex Franchuk [:afranchuk]

Assignee

Updated

•

1 year ago

Duplicate of this bug: 1951923

Alex Franchuk [:afranchuk]

Assignee

Comment 3

•

1 year ago

heap-unclassified and layout/style-sheet-cache/unshared seem to account for the increased sizes. At least the latter may be due to some subtlety in the changed layout/style/GlobalStyleSheetCache, though superficially the code is identical to the old code. I am investigating the cause. Given that the absolute deltas aren't prohibitively large, I'd say hold off on backing out the patch while we get down to the bottom of this regression.

Flags: needinfo?(afranchuk)

Alex Franchuk [:afranchuk]

Assignee

Comment 4

•

1 year ago

Attached file Bug 1951925 - Fix base content memory usage regressions r=nika — Details

This disarms a footgun in the FreezableMapping::Freeze API, which was
causing read-only mappings in the global stylesheet cache to fail. These
failures caused content processes to re-load the style sheets,
increasing memory usage.

Phabricator Automation

Updated

•

1 year ago

Assignee: nobody → afranchuk

Status: NEW → ASSIGNED

Alex Franchuk [:afranchuk]

Assignee

Comment 5

•

1 year ago

Both heap-unclassified and layout/style-sheet-cache/{un,}shared seem to have returned to prior values with the given patch.

Pulsebot

Comment 6

•

1 year ago

Pushed by afranchuk@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dca810679260 Fix base content memory usage regressions r=nika

agoloman

Comment 7

•

1 year ago

Backed out for causing build bustages @TestSharedMemory.cpp.

Flags: needinfo?(afranchuk)

Alex Franchuk [:afranchuk]

Assignee

Comment 8

•

1 year ago

Ugh, that was fixed but I didn't push it.

Flags: needinfo?(afranchuk)

Pulsebot

Comment 9

•

1 year ago

Pushed by afranchuk@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/017c9f9793ec Fix base content memory usage regressions r=nika

Cristina Horotan [:chorotan]

Comment 10

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/017c9f9793ec

Status: ASSIGNED → RESOLVED

Closed: 1 year ago

status-firefox138: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 138 Branch

Mayank Bansal

Comment 11

•

1 year ago

Looks like atleast some of the numbers are back to pre-regression

Alex Franchuk [:afranchuk]

Assignee

Comment 12

•

1 year ago

•

Edited

The bug here was subtle:

The FreezableMapping::Freeze method returned a tuple of a MutableMapping and a ReadOnlyHandle.
In GlobalStyleSheetCache.cpp, we were doing a destructured binding, binding only the ReadOnlyHandle. However, c++ bindings have different semantics than value assignment, and the returned tuple (including the MutableMapping) was only destructed at the end of the enclosing scope, even though the mapping wasn't bound to an identifier.
In the same scope, we tried to map the ReadOnlyHandle to the same fixed address to which the MutableMapping was mapped. Since the MutableMapping was not destructed yet, this mapping failed.

This isn't an immediate error, but it prevents the mapping from being shared with child processes, so the child processes were all loading the stylesheets themselves rather than using the shared-memory mapping holding the loaded stylesheets from the parent process. The child processes loading the stylesheets is what caused the memory regression (and performance regressions of bug 1952917.

Ray Kraesig [:rkraesig]

Comment 13

•

1 year ago

(In reply to Alex Franchuk [:afranchuk] from comment #12)

In GlobalStyleSheetCache.cpp, we were doing a destructured binding, binding only the ReadOnlyHandle. However, c++ bindings have different semantics than value assignment, and the returned tuple (including the MutableMapping) was only destructed at the end of the enclosing scope, even though the mapping wasn't bound to an identifier.

This is subtly incorrect: the mapping was bound to an identifier. An example:

  auto [_, readOnlyHandle] = std::move(mMem).Freeze();

As of C++17, _ has no special semantics; it's a perfectly valid identifier. (This contrasts with, probably most relevantly, Rust.)

(In C++26 that may change somewhat; there is a proposal to allow use of _ as a special dummy identifier. Unfortunately, it still wouldn't change the semantics of the above code: it allows redeclaring _ without hassle, but any values bound to it are still live until end-of-scope — they're just occluded.)

Alex Franchuk [:afranchuk]

Assignee

Comment 14

•

1 year ago

(In reply to Ray Kraesig [:rkraesig] from comment #13)

(In reply to Alex Franchuk [:afranchuk] from comment #12)

In GlobalStyleSheetCache.cpp, we were doing a destructured binding, binding only the ReadOnlyHandle. However, c++ bindings have different semantics than value assignment, and the returned tuple (including the MutableMapping) was only destructed at the end of the enclosing scope, even though the mapping wasn't bound to an identifier.

This is subtly incorrect: the mapping was bound to an identifier. An example:
  auto [_, readOnlyHandle] = std::move(mMem).Freeze();
As of C++17, _ has no special semantics; it's a perfectly valid identifier. (This contrasts with, probably most relevantly, Rust.)

(In C++26 that may change somewhat; there is a proposal to allow use of _ as a special dummy identifier. Unfortunately, it still wouldn't change the semantics of the above code: it allows redeclaring _ without hassle, but any values bound to it are still live until end-of-scope — they're just occluded.)

Ah of course :) Thanks for clarifying. But yeah, I figure if that proposal is accepted it'll always be a footgun :/

Acasandrei Beatrice (needinfo me)

Comment 15

•

1 year ago

(In reply to Pulsebot from comment #9)

Pushed by afranchuk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/017c9f9793ec
Fix base content memory usage regressions r=nika

Perfherder has detected a awsy performance change from push 017c9f9793ecaed3435ad4fd2674fb1d6a9e6358.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
7%	Base Content Heap Unclassified	windows11-64-24h2-shippable	fission	2,076,175.33 -> 1,935,502.67
5%	Base Content Explicit	windows11-64-24h2-shippable	fission	8,582,291.33 -> 8,181,934.00
3%	Base Content Resident Unique Memory	windows11-64-24h2-shippable	fission	11,829,930.67 -> 11,485,354.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44317

For more information on performance sheriffing please see our FAQ.

Florin Bilt

Comment 16

•

1 year ago

(In reply to Cristina Horotan [:chorotan] from comment #10)

https://hg.mozilla.org/mozilla-central/rev/017c9f9793ec

Perfherder has detected a talos performance change from push 017c9f9793ecaed3435ad4fd2674fb1d6a9e6358.

Improvements:

Ratio	Test	Platform	Options	Absolute values (old vs new)
4%	cpstartup content-process-startup	windows11-64-24h2-shippable	e10s fission stylo webrender	44.12 -> 42.42

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44342

For more information on performance sheriffing please see our FAQ.

Treeherder Bug Filer

Reporter

Updated

•

1 year ago

Regressions: 1954740

You need to log in before you can comment on or make changes to this bug.