Closed Bug 648417 Opened 13 years ago Closed 7 months ago

Memshrink: Investigate sharing font-related caches among Gecko processes

Categories

(Core :: Graphics, enhancement)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: cjones, Unassigned)

References

(Depends on 3 open bugs, Blocks 2 open bugs)

Details

(Whiteboard: [MemShrink:P2][tech-p2][layout:p1])

It's not entirely trivial to share the font cache.  Before doing so, let's find out how big it gets during normal browsing, to see if sharing is worth the engineering cost.  I'm not sure how to measure it or what constitutes "normal" browsing wrt fonts.

Joe: who's the right person to work on this?
As noted in the e10s meeting, this is going to be much more interesting/important for the multiple-content-process case than for the single chrome/content split, because the chrome process *ought* to have only the UI fonts loaded and cached. Although we should probably verify that as well.
The text rendering system in gfx has a set of caches:

  - "font cache" == cache of gfxFont objects (one per style/size combination)
  - "font table cache" == cache of tables pulled in per face
  - other system-wide font list info (e.g. cmaps, names)
  - "word cache" == cache of already computed text runs per word (temporal)

The first three would be candidates for sharing across content processes, probably not the word cache.  This would also be a good time to come up with better metrics for how efficient/needed these caches are.
Summary: Investigate sharing font cache among Gecko processes → Investigate sharing font-related caches among Gecko processes
I suspect that John would like to work on this, if and when we decide it's what we want to do. That can change at any time though :-)
Assignee: nobody → jdaggett
Should probably get them into about:memory to see what the amounts look like in the field.
Whiteboard: [MemShrink]
Not clear yet what the cost/benefit ratio is for this work, but sharing font resources efficiently is definitely a major technical issue.
Whiteboard: [MemShrink] → [MemShrink][tech-p2]
(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> Not clear yet what the cost/benefit ratio is for this work, but sharing font
> resources efficiently is definitely a major technical issue.

And I think it needs to be worked out whether sharing these across processes is the right approach.  Other options are simply caching less and/or reducing the lifespan of what's cached.  Those are *definitely* easier to consider that trying to do cross-process caches.  There may also be platform-specific options where the underlying platform provides less support for efficient font resource loading (e.g. Android/FreeType).
Whiteboard: [MemShrink][tech-p2] → [MemShrink:P2][tech-p2]
Nicholas, I'm not sure this is really a P2 anything.  Labeling as such I think is a distraction from other ways of decreasing font-related memory usage.
(In reply to John Daggett (:jtd) from comment #7)
> Nicholas, I'm not sure this is really a P2 anything.  Labeling as such I
> think is a distraction from other ways of decreasing font-related memory
> usage.

We triage all MemShrink-tagged bugs as P1, P2 or P3.  See 
https://wiki.mozilla.org/Performance/MemShrink#Bug_Tracking
for details.
> The text rendering system in gfx has a set of caches:
> 
>   - "font cache" == cache of gfxFont objects (one per style/size combination)
>   - "font table cache" == cache of tables pulled in per face
>   - other system-wide font list info (e.g. cmaps, names)
>   - "word cache" == cache of already computed text runs per word (temporal)
> 
> The first three would be candidates for sharing across content processes,
> probably not the word cache.  This would also be a good time to come up with
> better metrics for how efficient/needed these caches are.

The font/text things reported in about:memory that I know about are:

- "gfx/font-shaped-words"
- "gfx/font-cache" (presumably this is the "font cache" mentioned above)
- "window(...)/layout/text-runs" (which is per-window)

Do we need memory reporters for the latter three items in the list above?
John, did that last 9 months of development provide any illumination as to whether this is worth investigating still?
Flags: needinfo?(jdaggett)
gfxFont cache is an nsExpirationTracker, which via ExpirationTrackerObserver listens for memory-pressure events and purges the cache. Thats not ideal but it should make the font cache irrelevant under memory pressure. Sharing it might improve performance of concurrently rendering processes. In case of FFOS where memory is tightest only one process renders at any given time, so there at most this is a app switching latency problem (process needs some time to recover from having its font cache purged).
The FrameTextRunCache is an nsExpirationTracker as well so same applies there.
(In reply to Andreas Gal :gal from comment #11)
> gfxFont cache is an nsExpirationTracker, which via ExpirationTrackerObserver
> listens for memory-pressure events and purges the cache. Thats not ideal but
> it should make the font cache irrelevant under memory pressure. Sharing it
> might improve performance of concurrently rendering processes. In case of
> FFOS where memory is tightest only one process renders at any given time, so
> there at most this is a app switching latency problem (process needs some
> time to recover from having its font cache purged).

Hmmm. Rather than introducing relatively complex machinery for sharing cached text and font related data at various granularity, I think it's simpler just to look at what data is retained and focus first on how to reduce the need to retain that data.  That starts way up in layout text frame code and runs all the way down to harfbuzz shaping code.

Specifically, I don't think derived data objects like gfxFont's (which own size-specific metrics, extents and a word cache) or gfxTextRun's can easily be shared across processes in an efficient manner.  The code is just too complex to easily refactor in a way that could be easily shared across processes.

I do think there might be a benefit to having some way of sharing lower-level system font data (e.g. font tables, enumerated lists of font data like localized family names and postscript ==> font mappings, and the cmap sets needed for font fallback).

To varying degrees, desktop OSes provide services that already effectively share font data across processes.  For example, DirectWrite and CoreText cache font family lists and provide fallback methods that eliminate the need to load cmaps for determining fallback fonts.  FFOS and Android lack this but the number of fonts supplied on the system is much lower.  I imagine Android will eventually evolve to have some form of system font cache, that's the evolution most systems take.  For FFOS I think it would be just simpler to figure out a way of distilling font metadata (e.g. fallback lists, names) as part of the build process in such a way that the amount of per-process data is very small when loaded.
Flags: needinfo?(jdaggett)
(In reply to Dietrich Ayala (:dietrich) from comment #10)
> John, did that last 9 months of development provide any illumination as to
> whether this is worth investigating still?

Basically, no. While there might be room for improvement in areas such as lower-level font table data caching on Android and FFOS, overall cross-process sharing seems like a big gun applied to a tiny problem.
Assignee: jd.bugzilla → nobody
Depends on: 1439412
Depends on: 1469980
Depends on: 1470015
This came up again in the context of fission, see bug 1471309 comment 13.

In that bug we discussed two caches - the cache of loaded fonts from the system, and the word cache. Jonathan, are there others of interest? I'm not sure how well comment 2 has aged over seven years.

The first cache (loaded fonts from the system) seems like something we're going to need to share. We can end up doing some pretty expensive work the first time common fonts are loaded. This work generally gets amortized across process lifetime, but that will matter less in a fission world. I'd also imagine the data structures aren't small. Jonathan, can you measure?

Another question is whether any of the APIs we use to load fonts on Windows are going away in the content process. Jonathan, can you give a rough outline of the API surface to jimm, who can then answer this question?

The next topic is the shaped word caches. These are high-traffic, and so from a performance and complexity standpoint it would likely be simpler to avoid sharing them. The question is how much it will cost us memory-wise to avoid sharing shaped words across domains, and whether that overhead is acceptable. Jonathan, can you comment/measure?
Flags: needinfo?(jfkthame)
We also have bug bug 1258781 on file for reducing the skia glyph cache; I'm not sure if that's the same as the shaped word cache being mentioned.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #16)
> We also have bug bug 1258781 on file for reducing the skia glyph cache; I'm
> not sure if that's the same as the shaped word cache being mentioned.

This is about mWordCache [1], which caches gfxShapedText, which is used during layout. So I think it's different.

[1] https://searchfox.org/mozilla-central/rev/28daa2806c89684b3dfa4f0b551db1d099dda7c2/gfx/thebes/gfxFont.h#2258
(In reply to Bobby Holley (:bholley) from comment #15)
> The next topic is the shaped word caches. These are high-traffic, and so
> from a performance and complexity standpoint it would likely be simpler to
> avoid sharing them. The question is how much it will cost us memory-wise to
> avoid sharing shaped words across domains, and whether that overhead is
> acceptable. Jonathan, can you comment/measure?

Depending on how big they get and how many of the entries wind up shared between origins, we might be able to get away with an approach similar to bug 1471025, where we have a base cache shared between processes, and a per-process dynamic cache on top of that. We'd have to do some work to periodically send heavily-used cache entries to the parent processes during idle slices, and then send updated snapshots to all children. But if we can save more than 100K/child that way, it's probably worth it.

One other benefit of the snapshot approach is that we have much more opportunity to optimize the layout of the snapshot than we do of a pure dynamic hashtable, which means we can probably fit many more entries in the same amount of memory, and we can make sure lookups always have an optimal probe length (assuming a static hashtable rather than a binary tree, as our other snapshots currently use).

And as a bonus, if we're doing multi-threaded layout, the static caches can be probed without any locking, as long as each thread keeps its own reference to the cache.
David, do you know who can work on this? It has a pretty big impact across the board for per content process overhead.
Flags: needinfo?(dbolter)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #19)
> David, do you know who can work on this? It has a pretty big impact across
> the board for per content process overhead.

I think jfkthame is probably the right person.
(Will chat with him about it once he's back from PTO, removing NI from dbolter for now).
Flags: needinfo?(dbolter)
There are a couple of things that may be worth exploring here, with potentially varying levels of complexity, performance trade-offs, etc.

(1) The system font list (managed by platform-specific implementations of gfxPlatformFontList). This holds a list of all the installed font families, created at startup (and rarely updated during a session, though the possibility has to be handled).

On some platforms, we already build this list "from scratch" (via platform APIs) only in the chrome process, and pass it via IPC to content processes; this was done because using platform APIs to iterate over all available fonts can be a bit expensive.

Note, however, that the objects in the font list are not entirely static; they're subject to lazy initialization (again for performance reasons; it's too expensive to populate everything during startup). So we start with a simple list of font families. On demand, we populate the families with their list of available faces; and on demand, we examine individual faces to determine their character coverage. These details are retrieved on first use, and then cached in the font list.

So a shared font list will need to allow for this progressive enhancement as various font families and faces get used and therefore fully initialized.

In my current session, I'm seeing memory usage in the 150-200K range for the font list (this will depend on the user's installed fonts -- I have quite a lot), so there's a substantial memory win to be had if we can take this out of the individual content processes.

(2) When we actually measure/render text with a given font, we cache the shaped glyph records that represent each word, as this gives a substantial perf win in text layout (most of the time, given that common words occur repeatedly). The memory used for this caching varies greatly depending on the kind of content a process is handling; it may be negligible, or just a few kilobytes, or it may be hundreds of KB. (I currently have about 1.3MB in a content process that has loaded the NYTimes home page, for example.)

Because this data is specific to text in a given font face and size, it's unclear how often we'd get a big win by sharing it between processes. This would only happen if multiple content processes are using the exact same styles. In a world of short-lived content processes and repeated visits to certain sites, though, it might become pretty significant.
Flags: needinfo?(jfkthame)
A potentially tricky part of sharing the system font list is that it currently holds platform font references (e.g. Core Text font references, DirectWrite font faces, FreeType faces, etc) that aren't likely to be usable across processes. So I guess we'll end up needing some kind of cache for these on the content-process side, even if we can keep the bulk of the metadata in a shared list.
> On demand, we populate the families with their list of available faces; and on demand, we examine individual
> faces to determine their character coverage.

Does this happen in the content process, or get remoted to the chrome process and then shipped back down to the content process?
Currently, in the content process.
Ok. Sounds to me like sharing the system font stuff across processes is something we probably need to do.

Sharing the shaped words seems less likely to be an obvious memory win. That said, I think we should measure. I propose the following experiment:
(1) Instrument the shaped word cache such that entries also contain an array of RefPtr<BasePrincipal>.
(2) New cache entries have a 1-element array with the document principal. Cache hits scan the list for matches (via BasePrincipal::FastEquals) and insert the document principal if no matches are found.
(3) Set dom.ipc.processCount to 1. Set the word cache size to unbounded (if a cap exists, not sure).
(4) Load the sites in [1], one per tab.
(5) Iterate over the cache. For each entry, add size_of(Word) * (array.length - 1) to the total count. dump the total count. Also dump the total size of the words in the table ignoring the principal array.

WDYT Jonathan?


[1] https://docs.google.com/document/d/1I5MlrMgNTMjicHgauWa0zltS9tMBxcgN-hCPgNBxZA8/edit#
Flags: needinfo?(jfkthame)
Just to update here, I've started to think about how we can create a system font list that resides in shared memory (meaning we can't do it with our existing hashtables/arrays/strings/etc that use pointers all over the place!), as it seems clear there's a substantial memory win to be had there, particularly in the Fission world. (I'd expect there should be a minor startup-time win for content processes, too.)

As noted in comment 26, it's less clear whether sharing shaped-word caches would be a significant win in typical cases (I'm also more uncertain whether it could be made sufficiently performant). Some measurement along the lines suggested above would help give us more insight here. My suspicion is that many of the high-profile sites of interest will be using site-specific webfonts, which would limit the scope for cache-sharing.
Flags: needinfo?(jfkthame)
Whiteboard: [MemShrink:P2][tech-p2] → [MemShrink:P2][tech-p2][layout:p1]
Fission Milestone: --- → M2
Type: defect → enhancement
Summary: Investigate sharing font-related caches among Gecko processes → Memshrink: Investigate sharing font-related caches among Gecko processes

Jonathan, are you still looking into this?

Flags: needinfo?(jfkthame)

The shared-fontlist work is part of this, so yes, that's in progress; bug 1533462 intends to start preffing it on once the dependencies there are resolved.

There's no current work on sharing the fonts' shaped-word caches between processes, which would be the other potential memory win (although it's unclear how big of a win in practice, and I'm pretty doubtful it could be made sufficiently performant).

Flags: needinfo?(jfkthame)

There's no obvious and easily attainable memshrink wins identified here. I don't think fission team needs to track what's left.

Jonathan, it'd help to know what could be the memory win with this work. Would you possibly have time to investigate this? Or maybe you already have findings to share? :)

Flags: needinfo?(jfkthame)
Severity: normal → S3

I think it's time to close this. The font-list metadata (lists of font families & faces with their properties, and character coverage) is now shared across processes (bug 1514869 and related work).

The memory win will depend greatly on the number/size of installed fonts, the number of processes we're running, and on the font usage by content. On my MBPro, I'm seeing an initial 200K reduction in the size of each preallocated content process. The win will be larger -- potentially as much as multiple MB/process -- as content is rendered, if large fonts (e.g. for CJK usage) are involved.

Status: NEW → RESOLVED
Closed: 7 months ago
Flags: needinfo?(jfkthame)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.