Closed Bug 469909 Opened 16 years ago Closed 13 years ago

Mac gfxQuartzFontCache consumes 10MB when international fonts are present

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: matt, Unassigned)

Details

Poking around in ObjectAlloc I noticed gfxQuartzFontCache holding a fair amount of memory. Measuring with Activity Monitor on several machines, I consistently see 10MB less RAM usage when international fonts are installed and character preloading is disabled. Steps to reproduce: 1. Open Font Book, and ensure that you have international fonts installed (STSong, Hiragino etc.) 2. Open Activity Monitor 3. Open Firefox to with a single about:blank tab 4. Note memory usage, watch memory usage climb 8 seconds after launch 5. Comment out cmap preloading at http://mxr.mozilla.org/mozilla/source/gfx/thebes/src/gfxQuartzFontCache.mm#1319 then rebuild 6. Launch Firefox with the same setup 7. Note the same initial memory usage, but no significant increase after 8 seconds. This is OSX 10.5.4, with Firefox 3.0.4. Since international fonts are installed by default this means that the base RAM footprint after startup is 10MB higher than necessary (assuming font loading is cheap, and very few fonts are actually used in a session). This is a clone of http://bugzilla.songbirdnest.com/show_bug.cgi?id=14448
Matt, I'm not sure what you're noting as a bug here. Are you thinking we (1) shouldn't preload cmaps at all (2) should do it based on a pref (3) should be storing cmaps much more efficiently? In general the reason for preloading cmaps is so that when system-fallback occurs, it doesn't cause a big pause in rendering. System fallback actually occurs relatively easily, even when only loading Latin-1 pages, because of things like accents and symbols. If I know more about what you consider a bug I can suggest ways that we can work around those problems.
If we really are using 10MB for the character coverage bitmaps, it probably is worth doing another round of optimization on them. And it's possible that apps like Songbird should run in a mode where CMAPs aren't preloaded (although I suspect they'll discover they don't like unpredictable hangs when system-fallback is hit). But we definitely have to preload CMAPs in general.
Interesting. For the record, I know next to nothing about this stuff. I came across the code yesterday. I mentioned it to bent, who talked to vlad, who asked me to file a bug. I'm not suggesting anything in particular, other than that someone may want to look into it. How would I go about trying to hit the system-fallback case? I tried loading some Japanese websites and didn't notice any problems, but then I have a fast-ish system, and OSX likes to keep fonts in shared memory. Also, how large of a pause is expected, worst case? The preload code tries do 10 fonts every 150ms. For Songbird, where browsing the web is secondary to playback and collection browsing, I think a slight hiccup the first time you hit a new font may be acceptable.
Japanese pages are likely to declare themselves as Japanese, and so we'll default to appropriate Japanese fonts. You could try something like http://www.alanwood.net/unicode/unicode_samples.html, which will almost certainly have some characters that aren't supported in any font you have loaded, and so the fallback will have to search through all available fonts.
Jonathan: Thanks. The unicode test page loads smoothly on my machine, but I'll track down a mac mini and try that. Also, I looked into it a bit more, and realized that loading the cmaps really only adds 2-3MB to the heap, but the loading process uses/fragments quite a bit of memory, making the perceived RAM usage 10mb higher. Output from the 'heap' command with preloading disabled: Overall size: 24766KB; 193471 nodes malloced for 20367KB (82% of capacity); largest unused: [0xf35600-810KB] Overall size: 24882KB; 193920 nodes malloced for 20529KB (82% of capacity); largest unused: [0xf3fa00-769KB] Output with preloading enabled: Overall size: 34142KB; 218415 nodes malloced for 22830KB (66% of capacity); largest unused: [0x32855e00-7848KB] Overall size: 35850KB; 212185 nodes malloced for 24169KB (67% of capacity); largest unused: [0x3082ac00-8020KB] Maybe reusing a buffer rather than creating a new one for every font would help? See http://mxr.mozilla.org/mozilla/source/gfx/thebes/src/gfxQuartzFontCache.mm#189
That looks like a great idea! Any volunteers?
As long as the stack-based buffer allocated by nsAutoTArray is big enough, the function won't cause any heap fragmentation. So to see what difference this might make, you could start by increasing the auto-buffer size from 16384 to (say) 64K, so that it's less likely that it will spill over into the heap, even for large fonts. If that helps, then it suggests that reusing a single buffer for large fonts (assuming we don't really want to use that much stack in production) would be helpful too.
I ran this test on my system --- basically just Leopard with few or no extra fonts installed, but it does have STSong and the Hiragino fonts --- with a trunk Firefox debug build. Here's what I get: No preloading: Zone DefaultMallocZone_0x34000: Overall size: 18802KB; 128453 nodes malloced for 14168KB (75% of capacity); largest unused: [0x1520600-2942KB] With preloading: Zone DefaultMallocZone_0x34000: Overall size: 18822KB; 140946 nodes malloced for 15324KB (81% of capacity); largest unused: [0x15c6c00-2276KB] I know preloading happened because I'm dumping the cmap sizes for each font. So why are my numbers so much smaller than Matt's --- both the total memory usage and the extra usage for preloading? Jonathan, can you try to reproduce this? I set my pref for "When Firefox starts" to "show a blank page" --- the Google start page seems to trigger system font fallback which makes it impossible to avoid loading all the fonts. My test profile has a few extensions installed but they could hardly reduce memory usage.
BTW the largest CMAP size is 370726 bytes for some of the Hiragino fonts. I'd be uncomfortable declaring that much space on the stack. If we had to do something, preallocating a heap block bigger than that and sharing it until the preloader is done would be my preference.
I just tried a couple of runs with preloading disabled: Zone DefaultMallocZone_0x34000: Overall size: 19330KB; 117209 nodes malloced for 12711KB (65% of capacity); largest unused: [0xcd2400-3254KB] Zone DefaultMallocZone_0x34000: Overall size: 17823KB; 116278 nodes malloced for 13086KB (73% of capacity); largest unused: [0xc93000-3507KB] And a couple with it enabled: Zone DefaultMallocZone_0x34000: Overall size: 17327KB; 121038 nodes malloced for 13071KB (75% of capacity); largest unused: [0xd03c00-3056KB] Zone DefaultMallocZone_0x34000: Overall size: 17347KB; 124682 nodes malloced for 13337KB (76% of capacity); largest unused: [0xd16000-2983KB] It seems there's a certain amount of "random" variation, presumably depending on the exact moment when the heap gets sampled, as well as various factors in the environment. These runs were as close to "identical" as I can easily do by hand (launching to a blank page) on a fairly clean Leopard system. (The machine does have the large Asian fonts installed.) Interestingly, the largest overall heap size comes from one of the runs WITHOUT preloading, though the actual space in use is slightly less. But there's no real sign here that cmap preloading results in major heap fragmentation or excessive memory use. Matt, I wonder if there's some other factor in play for you - maybe an extension that is also doing memory allocation while preloading is going on, and this is leading to different heap behavior?
I retested everything with a fresh build of Minefield, and while I see the same behaviour (10MB spike after 8 seconds), I see no difference in heap size/fragmentation. Loading all the fonts at once with a single buffer also makes no difference. I'm not sure what was up with my initial heap measurements. It turns out what I'm seeing is cache files memory mapped by the ATSFontGetTable call that hang around for about 10 minutes. This is unfortunate, since I doubt reviewers will wait 10 minutes before comparing memory usage, but since the memory is eventually freed I assume this is INVALID/WONTFIX. Thanks for your time though. ---------- Snippet from 'vmmap -resident' with preloading disabled: ==== Summary for process 32418 ReadOnly portion of Libraries: Total=118.6M resident=84.4M(71%) swapped_out_or_unallocated=34.1M(29%) Writable regions: Total=610.4M written=18.1M(3%) resident=33.1M(5%) swapped_out=0K(0%) unallocated=577.2M(95%) REGION TYPE [ VIRTUAL/RESIDENT] =========== [ =======/========] ATS (font support) [ 33.4M/ 1920K] CG backing stores [ 6104K/ 6104K] CG shared images [ 3208K/ 976K] Carbon [ 2536K/ 2536K] CoreGraphics [ 160K/ 100K] IOKit [ 512.0M/ 0K] MALLOC [ 23.4M/ 19.8M] STACK GUARD [ 56.1M/ 0K] Stack [ 12.5M/ 220K] VM_ALLOCATE ? [ 2428K/ 1388K] __DATA [ 7040K/ 6184K] __IMAGE [ 1240K/ 536K] __IMPORT [ 720K/ 720K] __LINKEDIT [ 12.0M/ 11.9M] __OBJC [ 1308K/ 2184K] __OBJC/__DATA [ 36K/ 36K] __PAGEZERO [ 4K/ 0K] __TEXT [ 106.6M/ 72.5M] __UNICODE [ 532K/ 532K] mapped file [ 60.6M/ 19.8M] shared memory [ 16.0M/ 32K] shared pmap [ 3196K/ 2876K] Snippet with preloading enabled: ==== Summary for process 15158 ReadOnly portion of Libraries: Total=118.6M resident=83.3M(70%) swapped_out_or_unallocated=35.3M(30%) Writable regions: Total=611.4M written=23.2M(4%) resident=32.9M(5%) swapped_out=0K(0%) unallocated=578.5M(95%) REGION TYPE [ VIRTUAL/RESIDENT] =========== [ =======/========] ATS (font support) [ 33.4M/ 1800K] CG backing stores [ 6104K/ 6104K] CG shared images [ 3208K/ 908K] Carbon [ 2536K/ 2536K] CoreGraphics [ 160K/ 100K] IOKit [ 512.0M/ 0K] MALLOC [ 24.6M/ 21.5M] STACK GUARD [ 56.1M/ 0K] Stack [ 12.5M/ 220K] VM_ALLOCATE ? [ 2660K/ 1620K] __DATA [ 6980K/ 4948K] __IMAGE [ 1240K/ 536K] __IMPORT [ 720K/ 720K] __LINKEDIT [ 12.0M/ 11.9M] __OBJC [ 1260K/ 1260K] __OBJC/__DATA [ 24K/ 24K] __PAGEZERO [ 4K/ 0K] __TEXT [ 106.6M/ 71.4M] __UNICODE [ 532K/ 532K] mapped file [ 69.3M/ 28.3M] shared memory [ 16.0M/ 32K] shared pmap [ 3316K/ 2988K] Note the additional 10MB of mapped files. The full vmmap output shows 60 additional mapped files like "/private/var/folders/hX/hXIGOrbKEQ0RXryG-HW4tU+++TI/-Caches-/com.apple.ATS/annex_aux" when firefox preloads fonts. Some of these are over 1MB. BUT, about 10 minutes later, all these files are tossed out, presumably by some ATS caching mechanism, and the vmmap output looks like ==== Summary for process 49765 ReadOnly portion of Libraries: Total=118.6M resident=84.6M(71%) swapped_out_or_unallocated=34.0M(29%) Writable regions: Total=609.6M written=23.0M(4%) resident=32.9M(5%) swapped_out=0K(0%) unallocated=576.7M(95%) REGION TYPE [ VIRTUAL/RESIDENT] =========== [ =======/========] ATS (font support) [ 33.4M/ 1888K] CG backing stores [ 6104K/ 6104K] CG shared images [ 3208K/ 1044K] Carbon [ 2536K/ 2536K] CoreGraphics [ 160K/ 112K] IOKit [ 512.0M/ 0K] MALLOC [ 24.5M/ 21.5M] STACK GUARD [ 56.0M/ 0K] Stack [ 11.0M/ 160K] VM_ALLOCATE ? [ 2408K/ 1368K] __DATA [ 6980K/ 4960K] __IMAGE [ 1240K/ 536K] __IMPORT [ 720K/ 720K] __LINKEDIT [ 12.0M/ 11.9M] __OBJC [ 1260K/ 1260K] __OBJC/__DATA [ 24K/ 24K] __PAGEZERO [ 4K/ 0K] __TEXT [ 106.6M/ 72.6M] __UNICODE [ 532K/ 532K] mapped file [ 25.5M/ 12.6M] shared memory [ 16.0M/ 36K] shared pmap [ 3316K/ 3052K]
It seems unlikely we can do anything about that, unless we can somehow tell ATS that we're done using the fonts and it should release the cache memory mappings ... which seems unlikely.
Right, I don't think we can realistically do anything about this for now -- but I also don't think it really matters. FYI, I'm currently starting to restructure gfxQuartzFontCache in order to abstract a platform-independent font family/style management component (as part of the solution for bug 469656 and bug 454514; also in preparation for using HarfBuzz some day). I doubt, though, that it'll make any significant difference to this issue; this is internal ATS stuff, so it will stay this way as long as we look at the cmaps at all. I suppose we could consider caching the character-coverage data between runs, if we could reliably detect when a font might have changed; that might give us a slight performance win, as well as reducing this (temporary) memory footprint. I'm far from convinced that it'd be worthwhile, though.
From the comments it sounds like this is WONTFIX. Please re-open if I'm wrong.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.