Closed Bug 1625590 Opened 3 years ago Closed 2 years ago

Huge amount of heap-unclassified memory for mozilla::gfx::NativeFontResourceMac::Create() when using drawSnapshot

Categories

(Core :: Graphics: Text, defect, P3)

Desktop
macOS
defect

Tracking

()

VERIFIED FIXED
85 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox82 --- wontfix
firefox83 --- wontfix
firefox84 --- wontfix
firefox85 --- verified

People

(Reporter: whimboo, Assigned: bradwerth)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: memory-footprint, regression)

Attachments

(7 files)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:76.0) Gecko/20100101 Firefox/76.0 ID:20200326130624

Via bug 1611467 I reported a massive heap-unclassified usage for fonts when using Google docs including Emojis. Lee fixed everything what he was able to on the Font side. But even after the fix I still see a huge amount of heap-unclassified memory even with no tab for Google docs open for a while, or even forcing a minimize memory in about:memory.

Also clearing the font cache by flipping the gfx.downloadable_fonts.enabled preference as Lee suggested doesn't help.

So this might indicate that something in CrossProcessPaint keeps that font data alive. I'm running a DMD build and here is one unreported entry of the processed DMD of the parent process:

Unreported {
  93 blocks in heap block record 1 of 24,152
  248,586,240 bytes (248,398,568 requested / 187,672 slop)
  Individual block sizes: 207,929,344; 2,170,880 x 7; 774,144 x 6; 753,664 x 5; 749,568 x 2; 741,376 x 2; 630,784 x 2; 606,208; 438,272 x 3; 421,888 x 2; 417,792; 405,504 x 3; 401,408 x 3; 397,312; 393,216 x 3; 389,120 x 3; 233,472 x 3; 217,088 x 3; 212,992 x 4; 196,608; 167,936 x 3; 139,264 x 3; 106,496; 102,400; 40,960 x 10; 32,768 x 6; 24,576 x 13
  23.13% of the heap (23.13% cumulative)
  36.34% of unreported (36.34% cumulative)
  Allocated at {
    #01: __CFDataInit (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x12326a)
    #02: mozilla::gfx::NativeFontResourceMac::Create(unsigned char*, unsigned int) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd9621e)
    #03: mozilla::gfx::RecordedFontData::PlayEvent(mozilla::gfx::Translator*) const (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd30b73)
    #04: std::__1::__function::__func<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0, std::__1::allocator<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0>, bool (mozilla::gfx::RecordedEvent*)>::operator( (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd36e6c)
    #05: bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader>(mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader&, mozilla::gfx::RecordedEvent::EventType (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd275f9)
    #06: mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd23c55)
    #07: mozilla::gfx::CrossProcessPaint::ResolveInternal(mozilla::dom::IdType<mozilla::dom::BrowserParent>, nsRefPtrHashtable<nsUint64HashKey, mozilla::gfx::SourceSurface>*) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x10ca7f2)
    #08: mozilla::gfx::CrossProcessPaint::MaybeResolve() (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x10ca238)
    #09: mozilla::gfx::CrossProcessPaint::ReceiveFragment(mozilla::dom::WindowGlobalParent*, mozilla::gfx::PaintFragment&&) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x10c9e71)
    #10: mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ThenValue<mozilla::dom::WindowGlobalParent::DrawSnapshotInternal(mozilla::gfx::CrossProcessPaint*, mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownU (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x27d44b6)
    #11: mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ThenValueBase::ResolveOrRejectRunnable::Run() (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x9908c5)
    #12: nsThread::ProcessNextEvent(bool, bool*) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x104bd7)
    #13: NS_ProcessPendingEvents(nsIThread*, unsigned int) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x103082)
    #14: nsBaseAppShell::NativeEventCallback() (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x2a6f281)
    #15: nsAppShell::ProcessGeckoEvents(void*) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x2ace1dd)
    #16: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x84b21)
    #17: __CFRunLoopDoSource0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x84ac0)
    #18: __CFRunLoopDoSources0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x848d4)
    #19: __CFRunLoopRun (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83740)
    #20: CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x82bd3)
    #21: RunCurrentEventLoopInMode (/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f65d)
  }
}

Matt, do you have an idea what's wrong with that, and why the memory is not getting released?

Flags: needinfo?(matt.woodrow)

Can you send me the full DMD report file please?

I can't see anything obvious that would be leaking from that code.

Flags: needinfo?(matt.woodrow)

Sure, I sent it to you via Matrix. Thanks.

Flags: needinfo?(matt.woodrow)
Priority: -- → P3

Hey Matt. Have you already had the time to check my sent DMD logs?

Here some new numbers after a week of usage. It's 2.4GB of heap-unclassified again!

Unreported {
  626 blocks in heap block record 1 of 26,864
  2,310,172,672 bytes (2,308,983,516 requested / 1,189,156 slop)
  Individual block sizes: 207,929,344 x 10; 7,340,032 x 3; 2,170,880 x 20; 1,142,784 x 4; 811,008 x 3; 806,912 x 2; 774,144 x 53; 753,664 x 29; 749,568 x 4; 741,376 x 5; 630,784 x 6; 606,208 x 2; 557,056; 552,960; 536,576 x 2; 438,272 x 37; 421,888 x 35; 417,792 x 34; 405,504 x 11; 401,408; 393,216 x 13; 389,120 x 13; 323,584 x 3; 233,472 x 8; 217,088 x 6; 212,992 x 7; 208,896; 196,608; 180,224 x 2; 176,128; 172,032 x 2; 167,936 x 6; 139,264 x 13; 131,072 x 2; 110,592 x 4; 106,496 x 5; 102,400; 49,152 x 2; 40,960 x 107; 32,768 x 26; 24,576 x 136; 20,480 x 4
  77.76% of the heap (77.76% cumulative)
  89.71% of unreported (89.71% cumulative)
  Allocated at {
    #01: __CFDataInit (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x12326a)
    #02: mozilla::gfx::NativeFontResourceMac::Create(unsigned char*, unsigned int) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xdc005e)
    #03: mozilla::gfx::RecordedFontData::PlayEvent(mozilla::gfx::Translator*) const (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd56983)
    #04: std::__1::__function::__func<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0, std::__1::allocator<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0>, bool (mozilla::gfx::RecordedEvent*)>::operator( (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd5cc9c)
    #05: bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader>(mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader&, mozilla::gfx::RecordedEvent::EventType (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd4d4ab)
    #06: mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0xd49ac5)
    #07: mozilla::gfx::CrossProcessPaint::ResolveInternal(mozilla::dom::IdType<mozilla::dom::BrowserParent>, nsRefPtrHashtable<nsUint64HashKey, mozilla::gfx::SourceSurface>*) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x10ed7f2)
    #08: mozilla::gfx::CrossProcessPaint::MaybeResolve() (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x10ed19b)
    #09: mozilla::gfx::CrossProcessPaint::ReceiveFragment(mozilla::dom::WindowGlobalParent*, mozilla::gfx::PaintFragment&&) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x10ecdd1)
    #10: mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ThenValue<mozilla::dom::WindowGlobalParent::DrawSnapshotInternal(mozilla::gfx::CrossProcessPaint*, mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownU (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x27d0b16)
    #11: mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ThenValueBase::ResolveOrRejectRunnable::Run() (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x99dee5)
    #12: nsThread::ProcessNextEvent(bool, bool*) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x1052eb)
    #13: NS_ProcessPendingEvents(nsIThread*, unsigned int) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x103782)
    #14: nsBaseAppShell::NativeEventCallback() (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x2a6bfe1)
    #15: nsAppShell::ProcessGeckoEvents(void*) (/Users/henrik/code/gecko/Firefox Nightly.app/Contents/MacOS/XUL + 0x2acf26d)
    #16: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x84b21)
    #17: __CFRunLoopDoSource0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x84ac0)
    #18: __CFRunLoopDoSources0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x848d4)
    #19: __CFRunLoopRun (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83740)
    #20: CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x82bd3)
    #21: RunCurrentEventLoopInMode (/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f65d)
  }
}

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)

Here some new numbers after a week of usage. It's 2.4GB of heap-unclassified again!

2,310,172,672 bytes (2,308,983,516 requested / 1,189,156 slop)

4 days later this is now at:

4,754.40 MB (100.0%) -- explicit
├──3,845.64 MB (80.89%) ── heap-unclassified

Matt mentioned to me that this might be related to the screenshot code, which might be relevant to thumbnail generation. What I can say is that I'm using the tab panel (browser.ctrlTab.mostRecentlyUsed=true), which shows thumbnails for the pages when switching between tabs. Maybe that might be related here.

Since I'm using a new DMD nightly build of Firefox, which contains the fixes from bug 1628419, it seems that I'm no longer able to reproduce this problem. I will keep poking the next days.

The problem still persists:

1,685.49 MB (100.0%) -- explicit
├──1,030.86 MB (61.16%) ── heap-unclassified

I'm going to disable ctrlTab now to see if that is causing the problem.

I'm running this build for 5 days now without having Ctrl+Tab enabled, and here is the result:

1,005.84 MB (100.0%) -- explicit
[..]
├────172.16 MB (17.12%) ── heap-unclassified

I will now re-enable it just to verify that memory usage for heap-unclassified goes up again.

Dao, maybe you already have an idea?

Flags: needinfo?(dao+bmo)
Summary: Font data is not being released by CrossProcessPaint → Huge amount of heap-unclassified memory because screenshot related memory not being released by CrossProcessPaint

Since I enabled the Ctrl+Tab feature about 3h ago the heap unclassified usage has increased a lot:

1,219.09 MB (100.0%) -- explicit
├────383.33 MB (31.44%) ── heap-unclassified

Note that this feature has been enabled by default on bug 1473595 for Firefox 63.

Summary: Huge amount of heap-unclassified memory because screenshot related memory not being released by CrossProcessPaint → "Ctrl+Tab cycles through tabs in recently used order" leaks huge amount of heap-unclassified memory

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #10)

Since I enabled the Ctrl+Tab feature about 3h ago the heap unclassified usage has increased a lot:

1,219.09 MB (100.0%) -- explicit
├────383.33 MB (31.44%) ── heap-unclassified

How many open tabs did you have at that point?

Flags: needinfo?(dao+bmo) → needinfo?(hskupin)

I have about 30 open tabs in that browser instance. Since my last comment I mostly opened new ones (mainly Bugzilla), made comments and switching between tabs, and closed those new tabs.

Flags: needinfo?(hskupin)

I had a play with this locally, and I do see growing heap-unclassified memory (with browser.ctrlTab.recentlyUsedOrder=true, and ctrl-tabbing lots, plus tab changes).

I don't see us accumulating FontResourceMac or UnscaledFontMac though, so I don't think this is the screenshot code holding onto those objects longer than necessary.

Lee, is this possibly related to other bugs where we're holding onto Mac fonts for too long?

Flags: needinfo?(matt.woodrow) → needinfo?(lsalzman)

(In reply to Matt Woodrow (:mattwoodrow) from comment #14)

I had a play with this locally, and I do see growing heap-unclassified memory (with browser.ctrlTab.recentlyUsedOrder=true, and ctrl-tabbing lots, plus tab changes).

I don't see us accumulating FontResourceMac or UnscaledFontMac though, so I don't think this is the screenshot code holding onto those objects longer than necessary.

Lee, is this possibly related to other bugs where we're holding onto Mac fonts for too long?

The problem is CrossProcessPaint which keeps alive draw target recordings indefinitely, which have to hold onto the fonts because the recording is still alive.

Flags: needinfo?(lsalzman)

(In reply to Lee Salzman [:lsalzman] from comment #15)

The problem is CrossProcessPaint which keeps alive draw target recordings indefinitely, which have to hold onto the fonts because the recording is still alive.

Do you have STR for reproducing that?

In my testing all CrossProcessPaint objects are deleted immediately after we take a snapshot, which releases all contained FontResourceMac/UnscaledFontMac objects that were created during their lifetime.

Flags: needinfo?(lsalzman)

Henrik, do you have a quick STR that reliably reproduces the heap-unclassified bloat?

Flags: needinfo?(lsalzman) → needinfo?(hskupin)

My steps are similar to those from Matt in comment 14. I'm using my daily profile for that, so with sessionstore I already have about 30 tabs open. Then I just hold Ctrl+Shift so that the panel opens, keep it holding to let it cycle through the displayed tabs, and select one. Maybe repeat by opening and closing tabs, so that the list of tabs gets updated.

I just triggered another DMD build in case someone also wants to use one:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ebcd12d44561f35ce58b6d873101b1b1ba7f172

And that is the command I use to run it:

DMD=1 Firefox\ Nightly.app/Contents/MacOS/firefox --mode=dark-matter --stack=full

So what do I have to look out for? Any line of mozilla::gfx::NativeFontResourceMac::Create in the unreported section? I mean the amount of memory is low at the beginning and raises, so what would be a threshold?

Flags: needinfo?(hskupin)

You can check how many thumbnails are kept in memory running this in the browser console:

gBrowser.tabs.filter(t => t.__thumbnail).length

(In reply to Dão Gottwald [::dao] from comment #19)

gBrowser.tabs.filter(t => t.__thumbnail).length

Checked that and there are currently 14 thumbnails cached. When I checked the type of __thumbnail I can see that those are not only img but also of type canvas. Is that expected?

This is for:

Unreported {
  357 blocks in heap block record 1 of 24,886
  523,554,816 bytes (522,863,264 requested / 691,552 slop)
 ..
Allocated at {
    #01: __CFDataInit (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x12326a)
    #02: mozilla::gfx::NativeFontResourceMac::Create(unsigned char*, unsigned int) (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0xdcf01e)
    #03: mozilla::gfx::RecordedFontData::PlayEvent(mozilla::gfx::Translator*) const (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0xd65323)
    #04: std::__1::__function::__func<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0, std::__1::allocator<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0>, bool (mozilla::gfx::RecordedEvent*)>::operator( (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0xd6b63c)
    #05: bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader>(mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader&, mozilla::gfx::RecordedEvent::EventType (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0xd5be4b)
    #06: mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long) (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0xd58465)
    #07: mozilla::gfx::CrossProcessPaint::ResolveInternal(mozilla::dom::IdType<mozilla::dom::BrowserParent>, nsRefPtrHashtable<nsUint64HashKey, mozilla::gfx::SourceSurface>*) (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0x10f6862)
    #08: mozilla::gfx::CrossProcessPaint::MaybeResolve() (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0x10f620b)
    #09: mozilla::gfx::CrossProcessPaint::ReceiveFragment(mozilla::dom::WindowGlobalParent*, mozilla::gfx::PaintFragment&&) (/Users/henrik/code/gecko/Firefox Nightly DMD.app/Contents/MacOS/XUL + 0x10f5e41)
Flags: needinfo?(dao+bmo)

For what it is worth, I'm also seeing gigs of heap-unclassified in the main process on OSX.

I also have large heap-unclassified on OSX on the main process, but I don't use ctrl-tab.

Thanks. Maybe the part with Ctrl+Tab is a red herring. But my question for Dao still stands above. I don't think that we should store canvas elements as thumbnails...

Otherwise could one of you also use a DMD build of Firefox for a while? I just triggered a new build for MacOS with DMD enabled:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b0c3c0957236388c8ed8089856c30f51e249308

It would be good to see if you all can see the same unreported memory as I do.

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Severity: normal → S3
See Also: → 1635681

Bug 1635681 sounds interesting because I'm also keeping Firefox open for days and use the sleep mode a lot.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #20)

(In reply to Dão Gottwald [::dao] from comment #19)

gBrowser.tabs.filter(t => t.__thumbnail).length

Checked that and there are currently 14 thumbnails cached. When I checked the type of __thumbnail I can see that those are not only img but also of type canvas. Is that expected?

Yes, that's expected. If you want to give it a try, here's a build with the canvas caching disabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ff2f7d149a18e64baefe5f8bf7e1ce079a07ba9

Flags: needinfo?(dao+bmo)

I'm seeing something similar, with about 200MB of heap-unclassified under mozilla::gfx::NativeFontResourceMac::Create(). I don't have any Ctrl-Tab thing enabled. I do use Command-` to switch between windows, if that matters. This is with WebRender and Fission force-enabled.

I also have hundreds of MB of web render heap-unclassified, under WebRenderBridgeParent::UpdateResources and the like, but maybe that should be a separate issue.

Also, it would be nice if somebody wrote a memory reporter for this font data so people don't have to bend over backwards with a DMD build to understand what is happening. I've seen a number of reports of high main process memory on OSX, and I can't tell if they are related or not.

(In reply to Andrew McCreight [:mccr8] from comment #27)

I'm seeing something similar, with about 200MB of heap-unclassified under mozilla::gfx::NativeFontResourceMac::Create().

Filed bug 1639385 for getting this reported.

Depends on: 1639385

Ok, so given that no-one else uses Ctrl+Tab but still see this problem, we should update the bug's summary to not include this specific feature in Firefox.

Component: Graphics: Layers → Graphics: Text
Summary: "Ctrl+Tab cycles through tabs in recently used order" leaks huge amount of heap-unclassified memory → Huge amount of heap-unclassified memory for mozilla::gfx::NativeFontResourceMac::Create()

Is there a way to give an web extension access to the heap unclassified usage? Even through an experimental API. That way we could easily show the value in the toolbar for easier inspection. So something similar as what my https://github.com/mozilla/memchaser extension did in the past. This might help to further investigate possible steps because the value is always in the view, and it doesn't force you to go to about:memory and create a report.

No longer blocks: ctrl-tab-panel

With having the memory reporter for NativeFontResourceMac now, the verbose logging shows 0% for it, while there is still a good amount of heap unclassified:

1,019,141,922 B (100.0%) -- explicit
├────307,251,713 B (30.15%) ── heap-unclassified
├────257,523,664 B (25.27%) -- gfx
│    ├──257,204,224 B (25.24%) ── heap-textures
│    ├──────178,864 B (00.02%) ── font-list
│    ├───────87,408 B (00.01%) ── font-charmaps
│    ├───────34,416 B (00.00%) ── font-cache
│    ├───────18,752 B (00.00%) ── font-shaped-words
│    └────────────0 B (00.00%) ── native-font-resource-data

What could be done next here?

I don't know if it is the same issue, but I'm still seeing very high heap-unclassified on OSX with WebRender enabled. Yesterday I had about 3.4GB of it. I'll try to run with DMD at some point.

I had 3.6GB of heap-unclassified in the parent process on OSX just now. I've restarted with DMD enabled so we'll see if that turns up anything informative.

Lee, is there any further work that can be done here? This is a ton of memory that's being used, and your other patch doesn't seem to have fixed it. Let me know if I can provide further information. Thanks.

Flags: needinfo?(lsalzman)

(In reply to Andrew McCreight [:mccr8] from comment #34)

Lee, is there any further work that can be done here? This is a ton of memory that's being used, and your other patch doesn't seem to have fixed it. Let me know if I can provide further information. Thanks.

I've already spent a lot of time doing a ton of investigation and speculative fixes for this by now that I don't know there's anything left that can be done. If someone else wants to dig further and see if they can find something I missed, they are welcome to try.

Flags: needinfo?(lsalzman)

I'll see if I can come up with some actual steps to reproduce.

I tried disabling Fission to see if that makes it worse. I've now had my Firefox session open for about a day, and there's 1.5GB of heap-unclassified in the parent process.

I updated this morning to the latest Nightly. And since then (~12h) I put my MacBook a couple of times into hibernation mode (similar to bug 1635681?), and used Gmail, and Treeherder quite a fair amount of time (note that the latter is a memory / performance hog for me). Maybe one of these is related.

I wished that my memchaser extension would work with the WebExtension API and a button in the toolbar could visualize the memory usage. I always forget to regularly check about:memory.

Since my last comment no hibernation took place and I basically continued work in Gmail and Bugzilla. Unclassified memory went further up 200MB in that short time-span and now has reached: 954.01 MB (62.02%) ── heap-unclassified

FYI I was running Firefox in safe mode today with a single hibernation, and I can still see the same behavior.

As Jonathan pointed out on Matrix one of my recent observations shows sizes of 6 times the Apple Color Emoji font. Maybe we stash the file instead of the file descriptor into the shared memory?

Unreported {
104 blocks in heap block record 1 of 32,100
1,297,989,632 bytes (1,297,792,900 requested / 196,732 slop)
Individual block sizes: 207,929,344 x 6; 2,170,880 x 9; 819,200; 811,008 x 4; 

Here the font file on the local disk:

➜ ls -l /System/Library/Fonts/Apple\ Color\ Emoji.ttc
-rw-r--r--  1 root  wheel  207931344 Apr 17 23:10 /System/Library/Fonts/Apple Color Emoji.ttc
See Also: → 1659142
Attached file DMD report

I just experienced that issue again:

5,159.63 MB (100.0%) -- explicit
├──4,256.90 MB (82.50%) ── heap-unclassified

This is unbelievable high and also kinda freezes not only Firefox but any other application on my MBP too. I see the beach ball a lot and responses of the ui take forever.

I attached a DMD report of the unreported memory and as it can be seen we have the 208MB memory block 12 times in memory! If it's really the emoticon font file I might know where it actually comes from. Earlier today I tried to find some emojis on Slack. After a restart I will try again and clearly keep an eye out for this website.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #41)

This is unbelievable high and also kinda freezes not only Firefox but any other application on my MBP too. I see the beach ball a lot and responses of the ui take forever.

This sounds very much like what I see when I encounter bug 1652631. If you run a copy of Firefox from the command line during this period it would output the error message to the terminal if it was happening on your system.

No, I don't see this failure listed in my console and I always run Nightly through it to enable DMD mode.

See Also: → 1667535

Bug 1669286 should help diagnose this a bit.

Depends on: 1669286

Also, it looks like the OS will hold on to font stuff without us wanting it. The private API CTFontRemoveFromCaches fixes this.

Henrik the symbols in your DMD report look wrong. Also, can you check your about:memory in a Nightly at least as new as 20201007094223 and see if heap unclassified is still high? If it's high what's amount in gfx.webrender.weak-fonts?

Flags: needinfo?(hskupin)

I was running without webrender the last days. So had to restart Firefox. I will let you know early next week how it looks like for me.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #47)

Henrik the symbols in your DMD report look wrong.

What specifically do you mean with wrong? That's the output from running dmd.py, so maybe there's a problem with the script?

Also, can you check your about:memory in a Nightly at least as new as 20201007094223 and see if heap unclassified is still high? If it's high what's amount in gfx.webrender.weak-fonts?

I'm now using Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:83.0) Gecko/20100101 Firefox/83.0 ID:20201009041754 with webrender enabled for a while, and I still see high heap unclassified memory. Right now it's around 50% and not 80% - which is still not less:

1,579.78 MB (100.0%) -- explicit
├────777.15 MB (49.19%) ── heap-unclassified
├────255.91 MB (16.20%) -- gfx
│ ├──245.82 MB (15.56%) -- webrender
│ │ ├──230.35 MB (14.58%) -- resource-cache
│ │ │ ├──221.20 MB (14.00%) ── weak-fonts

Note that minimizing memory doesn't free up the memory as used by weak-fonts.

Here the details for the largest unreported memory block from the DMD build:

Unreported {
  112 blocks in heap block record 1 of 30,133
  454,107,136 bytes (453,890,748 requested / 216,388 slop)
  Individual block sizes: 207,929,344 x 2; 2,170,880 x 4; 819,200; 811,008 x 2; 806,912; 774,144 x 9; 753,664 x 4; 749,568; 741,376 x 2; 630,784 x 2; 557,056; 552,960; 536,576; 438,272 x 6; 421,888 x 2; 417,792 x 2; 405,504 x 4; 233,472 x 2; 217,088 x 3; 212,992 x 3; 196,608 x 2; 167,936 x 3; 139,264 x 3; 131,072; 110,592; 106,496 x 3; 102,400 x 2; 40,960 x 18; 32,768 x 6; 24,576 x 20
  34.49% of the heap (34.49% cumulative)
  55.41% of unreported (55.41% cumulative)
  Allocated at {
    #01: __CFDataInit (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x1221f9)
    #02: DumpCompleteHeap (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3250376)
    #03: DumpCompleteHeap (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x320f123)
    #04: DumpCompleteHeap (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x32125bf)
    #05: DumpCompleteHeap (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x320776c)
    #06: DumpCompleteHeap (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3205aed)
    #07: mozilla_dump_image (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x341b640)
    #08: mozilla_dump_image (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x341b12b)
    #09: mozilla_dump_image (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x341ac73)
    #10: mac_plugin_interposing_child_OnShowCursor (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x4759ddd)
    #11: NS_NewLocalFileWithCFURL (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x30c24d0)
    #12: NS_NewLocalFileWithFSRef (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x4019f7)
    #13: NS_NewLocalFileWithFSRef (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3fdb5c)
    #14: nsXPTCStubBase::Stub249() (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0xfda050)
    #15: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83d52)
    #16: __CFRunLoopDoSource0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83cf1)
    #17: __CFRunLoopDoSources0 (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x83b0b)
    #18: __CFRunLoopRun (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x8283a)
    #19: CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x81e3e)
    #20: RunCurrentEventLoopInMode (/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2fabd)
    #21: ReceiveNextEventCommon (/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox + 0x2f7d5)
  }
}

But I question this output...

Flags: needinfo?(hskupin) → needinfo?(jmuizelaar)

Yeah, it looks like you're missing symbols for XUL. Still it's interesting to see weak-fonts relatively high. I wonder if there's something we can do to encourage the OS to release them.

Flags: needinfo?(jmuizelaar)

Nightlies no longer come with symbols, see bug 1667170. (I'm surprised DMD can be used on regular Nightlies, I thought you still needed a custom local build for that. Cool!)

Blocks: wr-mac-block

So here's my understanding of what's happening:

  • When we create a CTFont from a CGFont, a descriptor that holds on to the CGFont gets put in an NSCache
  • We create a lot of CTFonts from CGFonts with WebRender (we have a separate CGFont for each thread that does glyph rasterization)
  • This cache is cleared when you call CTFontRemoveFromCaches(nullptr)
  • The cache will also be pruned on memory pressure. This prevents it totally exhausting memory on the system.

If we create the CGFont from the CTFont we don't have this problem. WebKit and Chromium both do things this way from what I understand.

Depends on: 1671034

Henrik, can you try today's nightly and report how much heap-unclassified and weak-fonts you have?

Flags: needinfo?(hskupin)
Regressions: 1671660
No longer regressions: 1671660

(In reply to Jeff Muizelaar [:jrmuizel] from comment #53)

Henrik, can you try today's nightly and report how much heap-unclassified and weak-fonts you have?

After running for a day I still have a lot of heap-unclassified memory, but at least no weak-fonts! \o/

972.03 MB (100.0%) -- explicit
├──390.65 MB (40.19%) ── heap-unclassified
[..]
├───32.55 MB (03.35%) -- gfx
│   ├──27.30 MB (02.81%) -- webrender
│   │  ├──18.32 MB (01.88%) -- resource-cache
│   │  │  ├──18.32 MB (01.88%) ── fonts
│   │  │  └───0.00 MB (00.00%) ++ (3 tiny)

When I find the time today I will create a new DMD report.

Flags: needinfo?(hskupin)

Here the current DMD report:

Unreported {
  97 blocks in heap block record 1 of 27,158
  246,390,784 bytes (246,203,824 requested / 186,960 slop)
  Individual block sizes: 207,929,344; 7,340,032; 2,170,880 x 4; 774,144 x 4; 753,664 x 3; 688,128; 630,784; 438,272 x 8; 421,888 x 7; 417,792 x 7; 405,504 x 7; 389,120; 372,736; 233,472; 217,088; 212,992; 167,936; 139,264 x 4; 40,960 x 16; 32,768 x 8; 28,672; 24,576 x 18
  34.84% of the heap (34.84% cumulative)
  59.10% of unreported (59.10% cumulative)
  Allocated at {
    #01: __CFDataInit (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x1221f9)
    #02: mozilla::gfx::NativeFontResourceMac::Create(unsigned char*, unsigned int) (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3262166)
    #03: mozilla::gfx::RecordedFontData::PlayEvent(mozilla::gfx::Translator*) const (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3220b73)
    #04: std::__1::__function::__func<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0, std::__1::allocator<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0>, bool (mozilla::gfx::RecordedEvent*)>::operator( (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3223f8f)
    #05: bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader>(mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader&, mozilla::gfx::RecordedEvent::EventType (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x3218f8c)

Ok, yeah, that's probably more of the same problem. It's the blob image version.

After 5 more days it still looks better than before:

1,164.78 MB (100.0%) -- explicit
├────461.66 MB (39.64%) ── heap-unclassified

(In reply to Jeff Muizelaar [:jrmuizel] from comment #56)

Ok, yeah, that's probably more of the same problem. It's the blob image version.

Mind explaining that a bit more? What is the blob image version?

Flags: needinfo?(jmuizelaar)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #57)

Mind explaining that a bit more? What is the blob image version?

Bug 1672088, which has the blob image version of this. Can you get a DMD report from today's Nighty and check if things have improved further?

Depends on: 1672088
Flags: needinfo?(jmuizelaar) → needinfo?(hskupin)

Sure. Give me some time. I will report on Friday.

Tested with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0 ID:20201021095128

935.64 MB (100.0%) -- explicit
├──367.44 MB (39.27%) ── heap-unclassified

The DMD report still shows the same stack. Here the top 5 frames of the report with updated addresses:

    #01: __CFDataInit (/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation + 0x1221f9)
    #02: mozilla::gfx::NativeFontResourceMac::Create(unsigned char*, unsigned int) (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x2bf3369)
    #03: mozilla::gfx::RecordedFontData::PlayEvent(mozilla::gfx::Translator*) const (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x2bb22f3)
    #04: std::__1::__function::__func<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0, std::__1::allocator<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0>, bool (mozilla::gfx::RecordedEvent*)>::operator( (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x2bb570f)
    #05: bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader>(mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader&, mozilla::gfx::RecordedEvent::EventType (/Applications/FirefoxNightly.app/Contents/MacOS/XUL + 0x2baa70c)

Note that it was working fine for about a day but right now when I checked again the heap-unclassified jumped from around 20 to these 40%.

Flags: needinfo?(hskupin) → needinfo?(jmuizelaar)

I also see this stack as the top "unreported" stack in a DMD report, from a build that was only running for a few hours.

Depends on: 1672899

I did some investigation and it looks like we are leaking fonts in a similar way with WebRender off. That being said, the leaks only happen in the content processes instead of with WebRender where it happens in child and parent processes.

This morning while working a while without AC the heap-unclassified raised again to a huge value:

2,258.29 MB (100.0%) -- explicit
├──1,439.03 MB (63.72%) ── heap-unclassified

There is the mozilla::gfx::NativeFontResourceMac::Create block reported twice with each around 650MB. Not sure what actually triggered that.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #63)

This morning while working a while without AC the heap-unclassified raised again to a huge value:

2,258.29 MB (100.0%) -- explicit
├──1,439.03 MB (63.72%) ── heap-unclassified

There is the mozilla::gfx::NativeFontResourceMac::Create block reported twice with each around 650MB. Not sure what actually triggered that.

I'd be really interested to know what triggers this jump. Do you have any guess what it could be?

Flags: needinfo?(jmuizelaar) → needinfo?(hskupin)

I did a fair amount of test failure investigation via Treeherder today including filtering jobs, opening the log viewer, and copying/pasting content to Bugzilla. That's what I usually do only on Monday due to the incoming intermittent bug comments. Otherwise no idea.

Flags: needinfo?(hskupin)

And with a restarted instance of Firefox yesterday, I'm back at over 70% heap-unclassified again (webrender enabled):

1,378.27 MB (100.0%) -- explicit
├──1,035.95 MB (75.16%) ── heap-unclassified

Sadly I missed to start Firefox with DMD enabled, so that I cannot provide more details.

Maybe related here... I had https://addons.mozilla.org/en-US/firefox/addon/no-transition/ installed all the last months and disabled it yesterday because lots of sites are broken when CSS transitions are disabled. Maybe that opened up some other leaky code paths for me.

Depends on: 1674175

I can confirm that with the changes from bug 1672899 nothing has changed. Having a memory reporter would be indeed very helpful.

Jeff, do you have an idea meanwhile when this could have been regressed, especially for non-webrender? I wonder if some of the already landed patches might need an uplift to at least 83. Probably not given that the still to fix case seems to be the most important one.

Flags: needinfo?(jmuizelaar)

Do we have evidence that this regressed for non-WebRender?

Flags: needinfo?(jmuizelaar) → needinfo?(hskupin)

I'm fairly sure that I haven't seen around 80% of heap-unclassified usage in the past. But it would be hard to do such a regression check given that it's a fair amount of time someone would have to spent.

Flags: needinfo?(hskupin)

Based on when I saw high heap-unclassified I suspect that changes in macOS caused my problem. My recollection is that I started seeing the problem as soon as I started using a new macbook pro, which has a newer version of macOS on it then my older macbook pro (I'm a little slow/cautious to update macOS). I guess the other possibility is that changes in the hardware (gfx drivers?) could be another change between the good and bad configurations that I observed.

Henrik, are you running a new enough build that you have bug 1674175? If so can you share how much heap-unclassified you have, a long with native-font-resource-mac?

Flags: needinfo?(hskupin)

Yeah, I'm running it for a while and just this evening I now hit the following:

1,320.11 MB (100.0%) -- explicit
├────450.01 MB (34.09%) -- gfx
│    ├──420.90 MB (31.88%) ── native-font-resource-mac
│    ├───23.22 MB (01.76%) ++ webrender
│    └────5.88 MB (00.45%) ++ (5 tiny)
├────370.71 MB (28.08%) ── heap-unclassified

I'm fairly sure that this might be related to IRCCloud or Matrix. Both tabs I haven't used at all over the weekend and earlier today. Until then I didn't see a jump of both. But now after reading through the notifications in both tabs, the memory usage seems to have been jumped.

And as it can be seen a quite fair amount of memory is used by native font resources.

Flags: needinfo?(hskupin)

Do we actually have the capability to add the kind of native font that has been created? Similar to user-fonts, which can be expanded and show the font family, the URL, and the principal.

Flags: needinfo?(jmuizelaar)

We could possibly do that.

Flags: needinfo?(jmuizelaar)
Depends on: 1674868

(In reply to Jeff Muizelaar [:jrmuizel] from comment #76)

We could possibly do that.

Filed as bug 1674868. Until then I will keep an eye on the Matrix and IRCCloud tabs.

It looks CoreText has an LRU cache of 12 descriptors per thread.

For the numbers to get as big as they are it suggests that fonts are very large instead of us accumulating lots of fonts.

So this is almost certainly caused by the tab thumbnails used during ctrl+tab. If one of the tabs uses the apple emjoi font we can end up with it in a a recording.

Oh wow. So my assumption from around 7 months ago is actually correct, and indeed I was just able to reproduce the problem. I measured memory before and after opening the Ctrl+Tab panel, and native font resources climbed from 0 bytes up to 204 MB:

1,007.35 MB (100.0%) -- explicit
├────231.37 MB (22.97%) -- gfx
│    ├──204.50 MB (20.30%) ── native-font-resource-mac
Summary: Huge amount of heap-unclassified memory for mozilla::gfx::NativeFontResourceMac::Create() → Huge amount of heap-unclassified memory for mozilla::gfx::NativeFontResourceMac::Create() when using Ctrl+Tab

The tab that is actually causing this problem is the notification center from Github:
https://github.com/notifications?query=is%3Aunread

I assume the emojis as used under filter are causing this.

This is actually a regression that I can perfectly replicate now! By using ESR68 it works all fine, but 78ESR is broken. I will run mozregression to find out the causing change.

Keywords: regression

This is a year old regression:

7:46.35 INFO: Narrowed nightly regression window from [2019-11-04, 2019-11-07] (3 days) to [2019-11-06, 2019-11-07] (1 days) (~0 steps left)
7:46.35 INFO: Got as far as we can go bisecting nightlies...
7:46.35 INFO: Last good revision: 96b58f95ed7333672e6dba134d091015328d299b (2019-11-06)
7:46.35 INFO: First bad revision: 7748cc7e9b63d86a40eb4799c0830172da579c84 (2019-11-07)
7:46.35 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=96b58f95ed7333672e6dba134d091015328d299b&tochange=7748cc7e9b63d86a40eb4799c0830172da579c84

There is only bug 1576911 (Page thumbnailer should use Fission for messaging) for thumbnail related changes that could have caused that. Gijs, given that you mentored that bug, what do you think? Sadly mozregression cannot continue given that it cannot find any further builds from autoland (bug 1674970).

Flags: needinfo?(gijskruitbosch+bugs)

By manually testing d271c572a9bcd008ed14bf104b2eb81949952e4c it's indeed the Nightly containing the above change:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d271c572a9bcd008ed14bf104b2eb81949952e4c&tochange=7748cc7e9b63d86a40eb4799c0830172da579c84

Regressed by: 1576911

Also note that with builds having this regression the generation of the thumbnail takes a moment, and there is a delay after opening Ctrl-Tab and getting the thumbnail displayed. Before the change the thumbnail was presented right away.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #84)

There is only bug 1576911 (Page thumbnailer should use Fission for messaging) for thumbnail related changes that could have caused that.

That change actually switched from drawImage() to drawSnapshot(). So I assume some code in GFX is leaking here.

Matt, can you have a look again?

Blocks: 1559592
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(matt.woodrow)
Attached file marionette test

The same also happens when using Marionette. So it's not related to Ctrl+Tab but the drawSnapshot API.

I spoke with Matt on Matrix and as he mentioned we should wait for Jeff's investigation.

Flags: needinfo?(matt.woodrow) → needinfo?(jmuizelaar)
Summary: Huge amount of heap-unclassified memory for mozilla::gfx::NativeFontResourceMac::Create() when using Ctrl+Tab → Huge amount of heap-unclassified memory for mozilla::gfx::NativeFontResourceMac::Create() when using drawSnapshot

Updating to bug 1499845 which actually covered the implementation of the drawSnapshot API.

Regressed by: 1499845
No longer regressed by: 1576911
Has Regression Range: --- → yes

I was able to get the memory usage as high as 5,182.07MB which is more than the 12x200MB that I would expect as the maximum from the descriptor cache. This suggest that there's something else going on to cause the leak.

Flags: needinfo?(jmuizelaar) → needinfo?(hskupin)

Matt, the best way to mitigate this problem right now is to avoid including system fonts in the recording that we use for thumbnails. This is what we do with WebRender and blob images.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #91)

I was able to get the memory usage as high as 5,182.07MB which is more than the 12x200MB that I would expect as the maximum from the descriptor cache. This suggest that there's something else going on to cause the leak.

That being said, commenting out the body of DrawTargetSkia::DrawGlyphs is still sufficient to avoid the leak.

If I comment out DrawTargetSkia::DrawGlyphs but add a CFRelease(CTFontCopyFontDescriptor(ctFont)); to NativeFontResourceMac::Create I can get the leak to go as high as 12x200MB but not higher as would be expected from the descriptor cache. I suspect the other cache is the Skia strike cache. If I "Minimize memory usage" (which clears that cache) my native-font-resource-mac memory usage drops down to the more expected 1,986.84 MB.

All of this means, we're probably not actually leaking, we just have a couple of caches that keep copies of a 200MB font file alive.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #91)

I was able to get the memory usage as high as 5,182.07MB which is more than the 12x200MB that I would expect as the maximum from the descriptor cache. This suggest that there's something else going on to cause the leak.

Not sure what I'm about to do here. Let me know specifics in case you still need some feedback from me.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #94)

All of this means, we're probably not actually leaking, we just have a couple of caches that keep copies of a 200MB font file alive.

Is that only a problem on Mac, or do other platforms also have those caches? Also why do we cache the same font multiple times? Is it because of the different content processes? If yes, could it be maybe cached via shmem?

Flags: needinfo?(hskupin)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #95)

Is that only a problem on Mac, or do other platforms also have those caches? Also why do we cache the same font multiple times? Is it because of the different content processes? If yes, could it be maybe cached via shmem?

Mac is the only place where we we're likely to run into a commonly used 200MB font.

The mitigation to this high memory usage is to record the system fonts in the recording by name instead of by contents. This way we'll only have one Apple Emoji font.

OS: Unspecified → macOS
Hardware: Unspecified → Desktop

Jeff, would bug 1625476 be similar to the caching as you described above? At least it sounds like to me.

Some background:
There are two systems in place for handling fonts when making a recording.

  1. External fonts - This is what WebRender uses. WebRender creates a font key when a font is first used and this key is maintained between the content process and the gpu process. This key persists beyond the lifetime of the recording so that if we record the same content again we don't need to recreate the key.
  2. Internal fonts - This is the default and what printing and screenshots use. A copy of the font data is included in the recording. Nothing persists beyond the lifetime of the recording.

To minimize this particular issue it's probably ok to only worry about avoiding copying the font data for system fonts. This avoids the complexity of managing external fonts like WebRender does. This means that we'll still end up with duplicate copies of the data for data fonts, but they tend to be smaller so hopefully it won't be as much of an issue. Plus we have better memory tracking infrastructure to keep an eye on the problem now.

The font data serialization happens on demand when we try to draw glyphs with a particular font here:
https://searchfox.org/mozilla-central/source/gfx/2d/DrawTargetRecording.cpp#252

WebRender takes the WantsExternalFonts() branch, screenshots takes the else. Currently it tries to get the data for the font and record that, falling back to the name if it's unable to get the data. We should be able to swap the order here and try to use the name first falling back to grabbing the data otherwise.

For the curious, here's how WebRender handles managing the external keys:
We set up a callback to handle external keys here:
https://searchfox.org/mozilla-central/source/gfx/layers/wr/WebRenderCommandBuilder.cpp#636

That code calls into:
https://searchfox.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeChild.cpp#313
which calls GetFontDescriptor().

GetFontDescriptor() will return false for data fonts and otherwise return the font-name.

I'll try to move this forward.

Assignee: nobody → bwerth

I think attachment 9186397 [details] implements the strategy in comment 98. I'm not sure how to test it definitively, but it does seem prevent the Emoji font from remaining in-cache when I create a Ctrl-Tab snapshot of the Apple Emoji font page. I'm also not sure how to create a test for it.

I don't think there's a great way to test it so I wouldn't worry about it.

This seems to cause a failure on Windows for test browser_ext_tabs_saveAsPDF.js. I'll see if I can understand the logs. I'm not developing on a Windows machine:

Assertion failure: [GFX1 22]: Failed creating UnscaledFont of type 0 from font descriptor, at /builds/worker/workspace/obj-build/dist/include/mozilla/gfx/Logging.h:754
[task 2020-11-06T23:32:42.417Z] 23:32:42     INFO - GECKO(3864) | #01: mozilla::gfx::RecordedFontDescriptor::PlayEvent(mozilla::gfx::Translator*) const [gfx/2d/RecordedEventImpl.h:3459]
[task 2020-11-06T23:32:42.417Z] 23:32:42     INFO - GECKO(3864) | #02: static mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::EventStream>(mozilla::gfx::EventStream&, mozilla::gfx::RecordedEvent::EventType, std::function<bool (mozilla::gfx::RecordedEvent *)> const&) [gfx/2d/RecordedEventImpl.h:3912]
[task 2020-11-06T23:32:42.417Z] 23:32:42     INFO - GECKO(3864) | #03: mozilla::layout::PrintTranslator::TranslateRecording(mozilla::layout::PRFileDescStream&) [layout/printing/PrintTranslator.cpp:50]
[task 2020-11-06T23:32:42.418Z] 23:32:42     INFO - GECKO(3864) | #04: mozilla::layout::RemotePrintJobParent::PrintPage(mozilla::layout::PRFileDescStream&, nsRefPtrHashtable<nsUint64HashKey,mozilla::gfx::RecordedDependentSurface>*) [layout/printing/ipc/RemotePrintJobParent.cpp:167]
[task 2020-11-06T23:32:42.418Z] 23:32:42     INFO - GECKO(3864) | #05: mozilla::layout::RemotePrintJobParent::FinishProcessingPage(nsRefPtrHashtable<nsUint64HashKey,mozilla::gfx::RecordedDependentSurface>*) [layout/printing/ipc/RemotePrintJobParent.cpp:146]
[task 2020-11-06T23:32:42.419Z] 23:32:42     INFO - GECKO(3864) | #06: mozilla::layout::RemotePrintJobParent::RecvProcessPage(nsTArray<unsigned long long>&&) [layout/printing/ipc/RemotePrintJobParent.cpp:121]
[task 2020-11-06T23:32:42.419Z] 23:32:42     INFO - GECKO(3864) | #07: mozilla::layout::PRemotePrintJobParent::OnMessageReceived(IPC::Message const&) [s3:gecko-generated-sources-l1:46f001374024ac4ec33b95387c7bb26f8ae1f460b841cfcca8bb518d1e03ea9260db5e5539b1942f4e7b2117a2a5a1d6432c73e7711b401d67ad5080b527bc58/ipc/ipdl/PRemotePrintJobParent.cpp::295]

(In reply to Brad Werth [:bradwerth] from comment #104)

This seems to cause a failure on Windows for test browser_ext_tabs_saveAsPDF.js. I'll see if I can understand the logs. I'm not developing on a Windows machine:

I'd suggest adding some assertions to UnscaledFontDWrite::CreateFromFontDescriptor to get a better idea where it's failing and what the name of the font is.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #103)

I don't think there's a great way to test it so I wouldn't worry about it.

Isn't having a minimal HTML page with emojis enough and internally call the drawSnapshot command? After that we could request a memory dump and analyze the native-font-resource-mac sub entries (with bug 1674868 fixed).

(In reply to Jeff Muizelaar [:jrmuizel] from comment #105)

(In reply to Brad Werth [:bradwerth] from comment #104)

This seems to cause a failure on Windows for test browser_ext_tabs_saveAsPDF.js. I'll see if I can understand the logs. I'm not developing on a Windows machine:

I'd suggest adding some assertions to UnscaledFontDWrite::CreateFromFontDescriptor to get a better idea where it's failing and what the name of the font is.

Looking at the UnscaledFontDWrite code, I see that CreateFromFontDescriptor expects the aData parameter to be a null-terminated WCHAR string (note that IDWriteFactory::CreateFontFileReference accepts a WCHAR const * parameter but no corresponding length).

However, it's not clear to me that what we pass it is guaranteed to be null-terminated. In UnscaledFontDWrite::GetFontDescriptor, we get the filename via GetFontFileName, which reads the name into an array with space for a null terminator, checks that the terminator is in fact still null (so the name didn't overflow), and then pops off the terminator before returning.

What gets serialized, then, is the unterminated WCHAR string together with its size (in bytes). When deserializing, though, we just check that the size is non-zero, but we don't actually use it to determine the length of the string.

(We don't have this problem when the descriptor is passed to webrender, because the Rust code explicitly adds a null terminator to the path before passing it to IDWriteFactory::CreateFontFileReference.)

So I think a simple fix is to always serialize the path with a null terminator, so that it can be used directly when deserializing (without needing to copy it into a new temporary string in order to add a terminator). The extra null-termination that the Rust code adds will then be redundant, but harmless.

I pushed a try run with your patch plus this potential fix, and this seems to have resolved the browser_ext_tabs_saveAsPDF.js failure:
https://treeherder.mozilla.org/jobs?repo=try&revision=bb1b9f7c6a5c0bff63e33dec49420667a2b39c7b

This ensures that CreateFromFontDescriptor gets the null-terminated WCHAR
that it expects.

Depends on D96274

(In reply to Jonathan Kew (:jfkthame) from comment #107)

I pushed a try run with your patch plus this potential fix, and this seems to have resolved the browser_ext_tabs_saveAsPDF.js failure:
https://treeherder.mozilla.org/jobs?repo=try&revision=bb1b9f7c6a5c0bff63e33dec49420667a2b39c7b

Thank you, Jonathan for your insight into this. I pushed an updated two-part patch to try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82d238106e36c427392787e750522334900da402.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #106)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #103)

I don't think there's a great way to test it so I wouldn't worry about it.

Isn't having a minimal HTML page with emojis enough and internally call the drawSnapshot command? After that we could request a memory dump and analyze the native-font-resource-mac sub entries (with bug 1674868 fixed).

Do we have any example tests that look at the results of about:memory?

I suppose we could ensure that the font doesn't show up in native-font-resource-mac. That kind of tests does feel pretty targeted at the specific solution though. i.e. it doesn't feel that likely to catch regressions. I'll leave it up to Brad though.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #110)

I suppose we could ensure that the font doesn't show up in native-font-resource-mac. That kind of tests does feel pretty targeted at the specific solution though. i.e. it doesn't feel that likely to catch regressions. I'll leave it up to Brad though.

I'll attempt to build such a test.

toolkit/components/aboutmemory/tests/chrome.ini has some tests that appear to look at about:memory

Bred, https://phabricator.services.mozilla.com/D96478 needs review but no reviewer is set. I assume that's the only patch which blocks us from getting all landed? Or does it wait for a test?

Status: NEW → ASSIGNED
Flags: needinfo?(bwerth)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #113)

Bred, https://phabricator.services.mozilla.com/D96478 needs review but no reviewer is set. I assume that's the only patch which blocks us from getting all landed? Or does it wait for a test?

I'm trying to put a test together that does the drawSnapshot call + parse the about:memory output. I'll set review flags for everything when I have that ready -- hopefully soon.

Flags: needinfo?(bwerth)

I think this will need the GDI change that I talked about to make sure we always return false from GetDescriptor.

Attachment #9188187 - Attachment description: Bug 1625590 Part 3: Add a test of native font cache behavior on macOS. → Bug 1625590 Part 4: Add a test of native font cache behavior on macOS.
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cfe4fe8f1bee
Part 1: Make DrawTargetRecording::FillGlyphs prefer sending font description over font data. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/0f878ffc3020
Part 2: Make ScaledFontDWrite::GetFontFileName include the null terminator in the returned file name. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/e27e9c12a5e7
Part 3: Make UnscaledFontGDI::GetFontDescriptor unilaterally return false. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/42b9dbf35ce8
Part 4: Add a test of native font cache behavior on macOS. r=jrmuizel

That works pretty great with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0 ID:20201120094511. I no longer see native fonts allocated anymore after using Ctrl+Tab with the Github notification page captured.

Given that this patch has a massive impact on memory usage, I wonder if we should uplift to 84 beta.

Status: RESOLVED → VERIFIED
Flags: needinfo?(bwerth)

Yes, let's do that.

Comment on attachment 9188187 [details]
Bug 1625590 Part 4: Add a test of native font cache behavior on macOS.

Beta/Release Uplift Approval Request

  • User impact if declined: Snapshots taken of pages with native fonts will cause memory bloat for macOS users.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This has impacts for all users because it favors sending font descriptions over font data when transporting across processes. We have identified cases (covered by tests) where font descriptors were handled differently than the font data. We believe we have patched these cases, but we may not have adequate test coverage to cover all the ways fonts are transferred between processes.
  • String changes made/needed:
Flags: needinfo?(bwerth)
Attachment #9188187 - Flags: approval-mozilla-beta?
Attachment #9186397 - Flags: approval-mozilla-beta?
Attachment #9186775 - Flags: approval-mozilla-beta?
Attachment #9188907 - Flags: approval-mozilla-beta?

Comment on attachment 9186397 [details]
Bug 1625590 Part 1: Make DrawTargetRecording::FillGlyphs prefer sending font description over font data.

This is an old bug, it seems better to let it ride the trains.

Attachment #9186397 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9186775 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9188187 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #9188907 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

(In reply to Julien Cristau [:jcristau] from comment #124)

Comment on attachment 9186397 [details]
Bug 1625590 Part 1: Make DrawTargetRecording::FillGlyphs prefer sending font description over font data.

This is an old bug, it seems better to let it ride the trains.

Well, it's old but the situation is even worse with Webrender enabled, or am I wrong? Jeff, can you give details? If that is the case we have to take into account that we recently started to ship WebRender enabled by default on MacOS.

Flags: needinfo?(jmuizelaar)

Setting as firefox-84 affected again, to keep it on our radar.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #125)

(In reply to Julien Cristau [:jcristau] from comment #124)

Comment on attachment 9186397 [details]
Bug 1625590 Part 1: Make DrawTargetRecording::FillGlyphs prefer sending font description over font data.

This is an old bug, it seems better to let it ride the trains.

Well, it's old but the situation is even worse with Webrender enabled, or am I wrong? Jeff, can you give details? If that is the case we have to take into account that we recently started to ship WebRender enabled by default on MacOS.

No, WebRender already avoided the code path fixed by this patch. The problem should be the same with WebRender on vs off.

Flags: needinfo?(jmuizelaar)

Sorry for the bug spam, but I wanted to say thank you as this made memory usage so much better! No need to restart FF anymore 1-2 per day \o/

Regressions: 1690235
You need to log in before you can comment on or make changes to this bug.