Closed Bug 1611467 Opened 6 months ago Closed 4 months ago

Massive amount of heap-unclassified memory related to fonts when using Google Docs

Categories

(Core :: Graphics: Text, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox75 --- wontfix
firefox76 --- fixed

People

(Reporter: whimboo, Assigned: lsalzman)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files)

I recently have seen a memory increase of Firefox which finally ended up with 8GB of used memory whereby nearly half of it was listed as heap-unclassified. As such I have built a DMD enabled Firefox Nightly build for figuring out what it is.

As result here some unreported data from a Firefox session half a day long:

Unreported {
100 blocks in heap block record 1 of 10,477
381,874,176 bytes (381,683,404 requested / 190,772 slop)
Individual block sizes: 159,969,280 x 2; 7,380,992 x 3; 1,798,144 x 7; 1,388,544 x 3; 802,816 x 2; 774,144 x 7; 761,856 x 2; 753,664; 688,128 x 2; 630,784 x 3; 606,208 x 2; 438,272; 417,792 x 2; 413,696; 405,504; 393,216 x 5; 389,120 x 5; 212,992; 196,608 x 2; 139,264 x 5; 106,496 x 2; 102,400 x 2; 98,304; 86,016 x 2; 65,536 x 2; 49,152 x 4; 40,960 x 8; 32,768 x 10; 24,576 x 8; 20,480 x 4
33.29% of the heap (33.29% cumulative)
57.19% of unreported (57.19% cumulative)
Allocated at {
#01: replace_malloc(unsigned long) (in libmozglue.dylib) + 106
#02: malloc_zone_malloc (in libsystem_malloc.dylib) + 103
#03: malloc (in libsystem_malloc.dylib) + 24
#04: _CFBundleCreate (in CoreFoundation) + 1566
#05: mozilla::gfx::NativeFontResourceMac::Create(unsigned char*, unsigned int) (in XUL) + 33
#06: mozilla::gfx::RecordedFontData::PlayEvent(mozilla::gfx::Translator*) const (in XUL) + 71
#07: std::__1::__function::__func<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0, std::__1::allocator<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::$_0>, bool (mozilla::gfx::RecordedEvent*)>::operator()(mozilla::gfx::RecordedEvent*&&) (in XUL) + 44
#08: bool mozilla::gfx::RecordedEvent::DoWithEvent<mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader>(mozilla::gfx::InlineTranslator::TranslateRecording(char*, unsigned long)::MemReader&, mozilla::gfx::RecordedEvent::EventType, std::__1::function<bool (mozilla::gfx::RecordedEvent*)> const&) (in XUL) + 14261
}
}

Unreported {
1 block in heap block record 2 of 10,477
159,420,416 bytes (159,417,388 requested / 3,028 slop)
13.90% of the heap (47.19% cumulative)
23.87% of unreported (81.06% cumulative)
Allocated at {
#01: replace_malloc(unsigned long) (in libmozglue.dylib) + 106
#02: malloc_zone_malloc (in libsystem_malloc.dylib) + 103
#03: malloc (in libsystem_malloc.dylib) + 24
#04: _CFBundleCreate (in CoreFoundation) + 1566
#05: TFPFont::CopyTable(unsigned int) const (in libFontParser.dylib) + 572
#06: TBaseFont::CopyTable(unsigned int) const (in CoreText) + 148
#07: TBaseFont::CreateTraitsValuesPerFontInfo(MetadataFlag*) const (in CoreText) + 2465
#08: TBaseFont::CopyTraitsInternal() const (in CoreText) + 111
}
}

I did some work on Google doc in the last two hours, and especially here the MB as reported as unclassified went up a lot.

Component: Graphics → Graphics: Text
Whiteboard: [MemShrink:P2]

After observing this issue for a couple more days I can clearly say this is related to Google docs.

The priority flag is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(lsalzman)
Summary: Massive amount of heap-unclassified memory maybe related to fonts → Massive amount of heap-unclassified memory related to fonts when using Google Docs

Do you have the full DMD log so I can take a look?

Flags: needinfo?(lsalzman) → needinfo?(hskupin)

Sorry I was on PTO last week. So I created a new DMD build for MacOS:
https://treeherder.allizom.org/#/jobs?repo=try&revision=957810aa341ed9206288d4364b308234093350e2&selectedJob=290249888

I will make use of it for the next few days, and report back once I have the full DMD log.

Priority: -- → P3

Lee, do you need all the DMD files (for each process) or is the one from the parent enough?

Flags: needinfo?(hskupin) → needinfo?(lsalzman)

I would need the full log, as there's not quite enough information there to go on right now.

Flags: needinfo?(lsalzman)

So here the raw files for all processes (alive to download for 1 day):
https://send.firefox.com/download/4e56b675a62513cf/#5tz9GKIgZgnF1qfHOWkrcQ

The parent process is PID 56186.

Thanks

Flags: needinfo?(lsalzman)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #7)

So here the raw files for all processes (alive to download for 1 day):
https://send.firefox.com/download/4e56b675a62513cf/#5tz9GKIgZgnF1qfHOWkrcQ

The parent process is PID 56186.

Thanks

Sigh, the link expired before I saw this. Can you please put it up for longer than 1 day?

Flags: needinfo?(lsalzman)

Sorry, I uploaded again and sent you a direct email.

Hey Lee, have you had the time to check my DMD files? Thanks

Flags: needinfo?(lsalzman)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #10)

Hey Lee, have you had the time to check my DMD files? Thanks

Sorry, I haven't had time to get to this yet. I have been preempted by a few other important issues at the moment. I will try to get to this soon.

Flags: needinfo?(lsalzman)

Just had it again in a non-DMD build. Here the parent process takes 7.4GB with 75% of unclassified memory:

7,393.10 MB (100.0%) -- explicit
├──5,575.36 MB (75.41%) ── heap-unclassified

Jonathan, do you have any idea what might be going on here inside NativeFontResourceMac.cpp? The DMD logs are not really showing anything more than what he posted above, and we are properly releasing things as far as I can see...

Flags: needinfo?(jfkthame)

Henrik, is WebRender off or on when you see these leaks?

Flags: needinfo?(hskupin)

Both are off in my daily profile for Nightly.

Flags: needinfo?(hskupin)

I'm surprised that 'InlineTranslator::TranslateRecording' is being called. Can you get a full stack for a call to that?

Flags: needinfo?(hskupin)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)

I'm surprised that 'InlineTranslator::TranslateRecording' is being called. Can you get a full stack for a call to that?

I don't know what to do here. Can you please guide?

Flags: needinfo?(hskupin) → needinfo?(jmuizelaar)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #18)

I don't know what to do here. Can you please guide?

You can try calling dmd.py with --max-frames 24 and -a. That will increase the number of frames it outputs. By default, the number of frames is capped at 8. Then look for similarish stacks in the output.

Flags: needinfo?(jmuizelaar)

Hm, after I sent my DMD logs to Lee I deleted those locally. Lee, do you still have them and could you send those back to me? Otherwise it would take me a couple of days to get into a similar state, and requires me to build also new builds first via try. Thanks!

Flags: needinfo?(lsalzman)
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14b59d4adc95
disable Skia typeface cache to prevent it from leaking fonts. r=jrmuizel
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76

Thank you for that fix. I assume disabling the cache has no performance implications? Is it a bug in Skia which we would have to report?

To verify the fix I created a new DMD build based on 14b59d4adc95, which I will use over the next couple of days:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7c4768f18f9aa80b199cd22337fd4d21095a5ef7

Should we consider uplifting this Skia change to 75 Beta? Firefox memory usage, especially on Google Docs, is a common user complaint.

OTOH, 75 Beta rides to the Release channel on 2020-04-07, just two weeks from now. Relman might be nervous that the change wouldn't have much beta bake time.

Please note that the problem hasn't been fixed yet. I will upload an updated DMD output file shortly.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Given that the file is too huge to attach here, I've sent the processed DMD file for the parent process to Lee directly.

Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5f5128e54484
unify UnscaledFont::GetFontDescriptor and GetWRFontDescriptor implementations. r=jrmuizel
Status: REOPENED → ASSIGNED
Target Milestone: mozilla76 → ---
Status: ASSIGNED → RESOLVED
Closed: 4 months ago4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Flags: needinfo?(lsalzman)

As it looks like there are memory reporters for fonts already implemented. So why is this extensive memory usage still be listed under heap-unclassified? Do we need some kind of addition?

Given that I will continue the investigation for other heap-unclassified memory uses it would be great to get this out. Lee, should I file a new bug to get this added? Or does something prevent us from adding it?

Flags: needinfo?(jfkthame) → needinfo?(lsalzman)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #31)

As it looks like there are memory reporters for fonts already implemented. So why is this extensive memory usage still be listed under heap-unclassified? Do we need some kind of addition?

Given that I will continue the investigation for other heap-unclassified memory uses it would be great to get this out. Lee, should I file a new bug to get this added? Or does something prevent us from adding it?

DrawTargetRecording and such don't deal with the responsibility of reporting memory usage. Nor, apparently, does CrossProcessPaint, which is the real offender here. These are outside the normal gfxFont infrastructure, which does do memory reporting, but only applies to gfxFonts created in the content-process, nowhere else. Memory reporting would be better dealt with as a separate bug.

Flags: needinfo?(lsalzman)

Please note that with Lee's patch I still see ~260MB of unclassified heap memory staying alive. Which is way lesser than before. So as given in comment 32 I filed bug 1625590 for further investigations. Beside that there is also 600MB of unclassified heap from over the weekend, which doesn't contain any stack information and needs to be investigated separately.

QA Whiteboard: [qa-76b-p2]
You need to log in before you can comment on or make changes to this bug.