Closed Bug 1865799 Opened 2 years ago Closed 2 years ago

282.8 - 6.95% Base Content Heap Unclassified / Resident Memory + 5 more (OSX) regression on Tue November 14 2023

Categories

(Core :: Layout: Text and Fonts, defect)

defect

Tracking

()

RESOLVED FIXED
122 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox120 --- unaffected
firefox121 + fixed
firefox122 --- fixed

People

(Reporter: bacasandrei, Assigned: jfkthame)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a awsy performance regression from push 19d87a3a7a39edf262e54cb7dc683a7e71fcad51. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
283% Base Content Heap Unclassified macosx1015-64-shippable-qr fission 2,631,070.33 -> 10,071,639.67
77% Base Content Explicit macosx1015-64-shippable-qr fission 10,541,072.00 -> 18,642,789.33
56% Base Content Resident Unique Memory macosx1015-64-shippable-qr fission 16,139,605.33 -> 25,223,338.67
55% Base Content Resident Unique Memory macosx1015-64-shippable-qr fission 16,404,394.67 -> 25,505,621.33
53% Heap Unclassified macosx1015-64-shippable-qr fission tp6 155,221,560.52 -> 237,937,303.52
11% Explicit Memory macosx1015-64-shippable-qr fission tp6 646,421,533.99 -> 715,948,550.14
7% Resident Memory macosx1015-64-shippable-qr fission tp6 1,024,006,116.81 -> 1,095,123,381.45

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(jfkthame)

I suspect this is an unavoidable result of registering additional fonts in the content process; doing so no doubt causes CoreGraphics/CoreText to load some metadata and map the font files into the address space. /System/Library/Fonts/Supplemental contains ~290 font files, some of which are multi-megabyte (e.g. CJK) fonts, so the RAM footprint seems unsurprising.

We need to register the fonts from this directory in order to have coverage for a wide range of scripts/languages that are only supported via the Noto & other fonts in that directory; and as we have seen in bug 1803406, registering them only in the parent process is not reliable and can result in broken rendering for users.

Flags: needinfo?(jfkthame)

At a bare minimum, it looks like more memory reporters would be useful here so this doesn't all end up bucketed into heap unclassified. That said, it's a pretty big regression :\

That would be good, but I'm not sure if we can do much by way of reporting here... :-\ This isn't stuff we're allocating ourselves, AFAIK, it'll be happening deep inside the macOS system libraries.

To confirm this, I looked at the about:memory report for a preallocated content process, immediately after launching Nightly. On my (arm64) mac running macOS 14.1, I see:

16.72 MB (100.0%) -- explicit
├───7.69 MB (46.03%) -- heap-overhead
│   ├──5.25 MB (31.38%) -- bin-unused
│   │  ├──1.93 MB (11.55%) ++ (40 tiny)
│   │  ├──0.99 MB (05.90%) ── bin-112
│   │  ├──0.57 MB (03.41%) ── bin-8192
│   │  ├──0.42 MB (02.54%) ── bin-96
│   │  ├──0.40 MB (02.38%) ── bin-128
│   │  ├──0.31 MB (01.86%) ── bin-80
│   │  ├──0.23 MB (01.36%) ── bin-4096
│   │  ├──0.21 MB (01.28%) ── bin-144
│   │  └──0.19 MB (01.12%) ── bin-3328
│   ├──1.36 MB (08.13%) ── page-cache
│   ├──0.73 MB (04.36%) ── bookkeeping
│   └──0.36 MB (02.15%) -- phc
│      ├──0.33 MB (01.96%) ── metadata
│      └──0.03 MB (00.19%) ── fragmentation
├───6.06 MB (36.23%) ── heap-unclassified
├───1.70 MB (10.15%) -- js-non-window
│   ├──0.85 MB (05.09%) -- runtime
│   │  ├──0.53 MB (03.17%) ++ (13 tiny)
│   │  └──0.32 MB (01.93%) -- gc
│   │     ├──0.25 MB (01.50%) ── nursery-committed
│   │     └──0.07 MB (00.43%) ++ (3 tiny)
│   ├──0.82 MB (04.89%) -- zones
│   │  ├──0.43 MB (02.56%) ++ zone(0x106a5e000)
│   │  ├──0.22 MB (01.30%) -- zone(0x106a5c600)
│   │  │  ├──0.21 MB (01.23%) -- strings/string(<non-notable strings>)
│   │  │  │  ├──0.20 MB (01.20%) ── gc-heap/latin1
│   │  │  │  └──0.01 MB (00.04%) ── malloc-heap/latin1
│   │  │  └──0.01 MB (00.07%) ++ sundries
│   │  └──0.17 MB (01.03%) ++ zone(0x106a5d300)
│   └──0.03 MB (00.17%) ++ (2 tiny)
├───0.53 MB (03.16%) ++ (16 tiny)
├───0.29 MB (01.73%) -- gfx
│   ├──0.29 MB (01.73%) ── font-list
│   └──0.00 MB (00.00%) ++ (4 tiny)
├───0.23 MB (01.39%) -- threads/overhead
│   ├──0.22 MB (01.31%) ── kernel
│   └──0.01 MB (00.08%) ++ (2 tiny)
└───0.22 MB (01.32%) -- atoms
    ├──0.22 MB (01.31%) ── table
    └──0.00 MB (00.01%) ── dynamic-objects-and-chars

If I effectively undo the fix from bug 1803406 by disabling the CTFontManagerRegisterFontURLs call here in content processes (but still allowing it to happen in the parent), the about:memory report for a preallocated content process now shows:

9.97 MB (100.0%) -- explicit
├──4.17 MB (41.80%) -- heap-overhead
│  ├──2.83 MB (28.41%) -- bin-unused
│  │  ├──1.35 MB (13.53%) ++ (43 tiny)
│  │  ├──0.77 MB (07.76%) ── bin-8192
│  │  ├──0.23 MB (02.35%) ── bin-4096
│  │  ├──0.19 MB (01.91%) ── bin-3328
│  │  ├──0.17 MB (01.70%) ── bin-2048
│  │  └──0.12 MB (01.16%) ── bin-2816
│  ├──0.58 MB (05.86%) ── bookkeeping
│  ├──0.41 MB (04.08%) ── page-cache
│  └──0.34 MB (03.45%) -- phc
│     ├──0.33 MB (03.29%) ── metadata
│     └──0.02 MB (00.16%) ── fragmentation
├──2.83 MB (28.40%) ── heap-unclassified
├──1.70 MB (17.03%) -- js-non-window
│  ├──0.85 MB (08.54%) -- runtime
│  │  ├──0.32 MB (03.23%) -- gc
│  │  │  ├──0.25 MB (02.51%) ── nursery-committed
│  │  │  └──0.07 MB (00.72%) ++ (3 tiny)
│  │  ├──0.20 MB (02.05%) ++ (11 tiny)
│  │  ├──0.16 MB (01.65%) ── atoms-table
│  │  └──0.16 MB (01.61%) ── temporary
│  ├──0.82 MB (08.21%) -- zones
│  │  ├──0.43 MB (04.30%) -- zone(0x10035e000)
│  │  │  ├──0.32 MB (03.20%) ++ (9 tiny)
│  │  │  └──0.11 MB (01.10%) ++ realm([System Principal], shared JSM global)
│  │  ├──0.22 MB (02.18%) -- zone(0x10035c600)
│  │  │  ├──0.21 MB (02.07%) -- strings/string(<non-notable strings>)
│  │  │  │  ├──0.20 MB (02.00%) ── gc-heap/latin1
│  │  │  │  └──0.01 MB (00.06%) ── malloc-heap/latin1
│  │  │  └──0.01 MB (00.11%) ++ sundries
│  │  └──0.17 MB (01.73%) -- zone(0x10035d300)
│  │     ├──0.13 MB (01.25%) ++ code
│  │     └──0.05 MB (00.48%) ++ (2 tiny)
│  └──0.03 MB (00.28%) ++ (2 tiny)
├──0.29 MB (02.90%) -- gfx
│  ├──0.29 MB (02.90%) ── font-list
│  └──0.00 MB (00.00%) ++ (4 tiny)
├──0.23 MB (02.33%) -- threads/overhead
│  ├──0.22 MB (02.19%) ── kernel
│  └──0.01 MB (00.13%) ++ (2 tiny)
├──0.22 MB (02.21%) -- atoms
│  ├──0.22 MB (02.19%) ── table
│  └──0.00 MB (00.02%) ── dynamic-objects-and-chars
├──0.16 MB (01.57%) ++ preferences
├──0.13 MB (01.35%) -- script-preloader/heap
│  ├──0.13 MB (01.34%) ── saved-scripts
│  └──0.00 MB (00.01%) ++ (2 tiny)
├──0.13 MB (01.33%) ++ (13 tiny)
└──0.11 MB (01.09%) ++ telemetry

So that one macOS API call, which is the heart of the fix, has caused heap-unclassified to jump from 2.83 MB to 6.06 MB, and heap-overhead from 4.17 MB to 7.69 MB. But AFAIK we have no visibility into what's happening internally when we call CTFontManagerRegisterFontURLs.

(Aside: it's interesting to me that the figures I see locally are substantially lower than what AWSY reports. I wonder if that's an x86_64 vs aarch64 memory-management difference? Or more generally a macOS version difference?)

The bug is marked as tracked for firefox121 (beta). However, the bug still isn't assigned.

:fgriffith, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(fgriffith)

I think we can safely assign this to jfkthame to sort out what, if anything, should be done here.

Assignee: nobody → jfkthame
Flags: needinfo?(fgriffith)
Severity: -- → S3

All of the numbers have recovered since bug 1866105 landed.

Status: NEW → RESOLVED
Closed: 2 years ago
Depends on: 1866105
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch
You need to log in before you can comment on or make changes to this bug.