Closed Bug 834079 Opened 13 years ago Closed 12 years ago

Investigate effects of font loading on startup time

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: cjones, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

We don't have a shared cross-process cmap or glyph cache, so each process will load, parse, and rasterize glyphs afresh. That can take some time. For post-v1, a shared cache would be fantastic, but that's a *lot* of work. To test the effect of font rendering on startup time, I did a simple experiment - measured startup times for the in-tree template app - removed all the text and made it just paint a pink background (attached patch) - measure startup times again With stopwatch timing, the results I see are Text: 0.97, 0.91, 0.97, 1.00, 0.87 Color: 0.88, 0.85, 0.81, 0.84, 0.81 So we may be spending anywhere from 50-200ms in font-related code. That's more stuff we can potentially move into the prelaunch phase. It would be nice to get a differential profile to see where the time is really going.
With vivien's patch from bug 819000 I don't see any diff with this patch on Template app. without: 510ms with: 512ms Is it worth testing without vivien's patch anymore?
What are you measuring?
appwillload till mozbrowserloadend
Chris, could you classify this bug a bit better? Graphics|Text maybe? Or just Graphics? Does it really only apply to Linux? There are actually a whole slew of caches per-process, I think you need to isolate better what perf win you want to get.
(In reply to Zbigniew Braniecki [:gandalf] from comment #3) > appwillload till mozbrowserloadend I don't know what mozbrowserloadend is, but if it doesn't measure painting time it's not interesting for this experiment (since font stuff should only happen when we start drawing).
Actually I take that back ... font loading and metrics would show up as part of reflow, but glyph rasterization probably wouldn't show up.
(In reply to John Daggett (:jtd) from comment #4) > Chris, could you classify this bug a bit better? Graphics|Text maybe? Or > just Graphics? Does it really only apply to Linux? There are actually a > whole slew of caches per-process, I think you need to isolate better what > perf win you want to get. For a bit of context, one issue we're fighting on b2g right now is that out-of-process pages take longer to load than in-process pages. We have a major cheat that we do to equalize those load times: we "prelaunch" a gecko content process and pre-initialize some things in it. For example, we load the system stylesheets, initialize the security manager, and some other stuff. Then when we get ready to launch a new app, we just load it into the "prelaunch" process, instead of fork()ing/exec()ing/and initializing a new process on the critical startup path. As of the end of last week, we were seeing something like an 850ms overhead for pages loaded OOP into the prelaunch process, vs. those loaded in-process. With some hacks, we've got that shaved down by at least 500ms. But that still leaves 300ms of overhead that shouldn't be there. What this bug proposes to investigate is what overhead we might be incurring on startup from having to load fonts anew in the "prelaunch" process. There are no cross-process font caches for gecko or for gonk, so each b2g process will have to load its own font resources, each time. That's bad, because there are a small number of fonts we know we'll definitely need, for basically every gecko process. If we can load those fonts as much as possible while in the "prelaunch" phase, before apps are loaded into the process, we should theoretically be able to shorten the critical startup path. I don't have a good feel for how expensive that is, but I wouldn't be surprised if it were 100ms or so. Maybe you have a better guess?
There are several aspects that might be relevant: (1) building the list of available fonts (2) reading 'cmap' tables to determine character coverage (so that we can do fallback where needed) (3) reading glyph metrics (needed in order to do text layout) (4) rasterizing glyph outlines to bitmaps (needed for actual rendering) I've tried to look into (1) and (2) a bit, by instrumenting the relevant code and testing on a Unagi device. Some observations so far: Building the font list ====================== (gfxFT2FontList::InitFontList) In the chrome process, we do this by iterating over the font files in the system fonts directory, so it's dependent on "disk" i/o speed. After a cold start, this takes around 16-18ms. In the content process, things are more variable. Here, we don't iterate over the font files; instead, we ask the chrome process for the list. Most of the time, this takes less than 2ms. However, on the first content-process startup after a cold start, it often takes more like 50ms for InitFontList() to return. This is because it has to wait for the chrome process to construct and return the list. Instrumenting the GetFontList() function in the chrome process shows that it takes less than 0.2ms to actually construct the list that it's going to return, but it takes ~50ms for SendReadFontList() in the content process to return. What this suggests to me is that startup of the first content process is being blocked by other busy-ness of the chrome process; I guess it is busy with other startup-related stuff and so it takes a long time to respond to the request for the font list. Plausible? Anything we can do to improve it? Reading cmap tables =================== (FT2FontEntry::ReadCMAP) I was expecting this might be a significant issue, but it doesn't appear to be. We load the cmaps of a few fonts during startup of each content process; typically MozTT Light, Medium and/or Bold. Depending on the process, sometimes also Roboto. This is done separately by each process. However, reading each cmap typically takes around 0.1-0.2ms, so the cost of reading a handful of them is only around 1ms altogether. (If an app were to need Droid Sans Fallback during startup, that'd be a bit more expensive, but even its much-larger cmap only takes 1.3ms to load.) I notice that the first time the on-screen keyboard is displayed - e.g, when I tap in the browser's URL bar - we appear to read the cmaps of all the available fonts (except those that were read earlier during the initial launch of the app). I'm not sure why we do this, but in any case it seems to take not much more than 5ms total, so it's not too worrisome. (Though it does suggest that if we start adding significantly more fonts to the device, that first keyboard display will gradually become more sluggish. So we might want to examine why it's causing them all to be read, and figure out how to avoid this.)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #7) > For a bit of context, one issue we're fighting on b2g right now is that > out-of-process pages take longer to load than in-process pages. We have a > major cheat that we do to equalize those load times: we "prelaunch" a gecko > content process and pre-initialize some things in it. For example, we load > the system stylesheets, initialize the security manager, and some other > stuff. > > Then when we get ready to launch a new app, we just load it into the > "prelaunch" process, instead of fork()ing/exec()ing/and initializing a new > process on the critical startup path. For the record, While I definitely see the difference in OOP cost vs. in-process, I tested yesterday with dom.ipc.processPrelaunch.enabled set to false and it didn't affect startup speed. I didn't test too extensively, but I believe based on your description above that it should be slower without it.
Keywords: perf
From jkew and gandalf's comments, sounds like this is WONTFIX. Reopen if new information showing this is worth continuing to investigate.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: