Last Comment Bug 602792 - dwrite initfontlist causes large IO load on cold startup
: dwrite initfontlist causes large IO load on cold startup
Status: RESOLVED FIXED
[hardblocker][january 21]
:
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Windows 7
: P2 critical with 14 votes (vote)
: ---
Assigned To: John Daggett (:jtd)
:
: Milan Sreckovic [:milan]
Mentors:
: 618831 (view as bug list)
Depends on: 605048 636957 624970 628117 628152 629386 630201 636968
Blocks: 593614 617266 633615 843585
  Show dependency treegraph
 
Reported: 2010-10-07 23:16 PDT by John Daggett (:jtd)
Modified: 2013-02-25 01:43 PST (History)
54 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
debug patch, delay font cmap loading and neuter system font fallback (14.86 KB, patch)
2010-10-07 23:16 PDT, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
test app (6.18 KB, text/plain)
2010-10-12 01:59 PDT, John Daggett (:jtd)
no flags Details
timing results running test app after reboot (5.65 KB, text/plain)
2010-10-13 23:27 PDT, John Daggett (:jtd)
no flags Details
another debug patch, neuter font fallback and dump table read info (21.56 KB, patch)
2010-10-19 07:56 PDT, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
FontEnumeration.cpp, with table reading code (7.25 KB, text/plain)
2010-10-20 05:35 PDT, John Daggett (:jtd)
no flags Details
FontEnumeration.cpp, time font enumeration with DirectWrite (11.06 KB, text/plain)
2010-10-26 02:10 PDT, John Daggett (:jtd)
no flags Details
FontEnumeration.cpp, time font enumeration with DirectWrite or GDI font table loading (14.61 KB, text/plain)
2010-11-21 19:41 PST, John Daggett (:jtd)
no flags Details
part 1v3, rework code for doing localized font name lookup (5.70 KB, patch)
2010-11-22 18:59 PST, John Daggett (:jtd)
jfkthame: review-
Details | Diff | Splinter Review
part 2v3, reftests to localized name lookups (73.11 KB, patch)
2010-11-22 19:01 PST, John Daggett (:jtd)
jfkthame: review-
Details | Diff | Splinter Review
part 3v1, use GDI table loading in the dwrite case (14.04 KB, patch)
2010-11-22 19:10 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 1v4, rework code for doing localized font name lookup (5.95 KB, patch)
2010-11-23 22:01 PST, John Daggett (:jtd)
jfkthame: review+
Details | Diff | Splinter Review
part 3v1a, use GDI table loading in the dwrite case (14.04 KB, patch)
2010-11-23 22:02 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v1b, use GDI table loading in the dwrite case (14.98 KB, patch)
2010-11-24 22:54 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 2v4, reftests to localized name lookups (50.96 KB, patch)
2010-11-25 22:49 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v1c, use GDI table loading in the dwrite case (15.11 KB, patch)
2010-11-25 22:51 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 2v5, alternate version of patch to add the reftests (8.91 KB, patch)
2010-11-26 05:30 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
part 2v6, reftests to localized name lookups (10.99 KB, patch)
2010-11-28 19:03 PST, John Daggett (:jtd)
jfkthame: review+
Details | Diff | Splinter Review
part 3v2, use GDI table loading in the dwrite case (with debug logging) (18.35 KB, patch)
2010-11-29 00:59 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v3, use GDI table loading in the dwrite case (with debug logging) (17.78 KB, patch)
2010-11-30 00:30 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v4, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (with debug logging) (27.27 KB, patch)
2010-11-30 21:52 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
patch, load dwrite dll at startup to start FntCache service earlier (2.17 KB, patch)
2010-12-06 01:01 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
Resource monitor: FF cold start with FontCacheService running (167.04 KB, image/png)
2010-12-23 04:30 PST, Luke Usherwood
no flags Details
System info corresponding to previous attachment (75.23 KB, image/png)
2010-12-23 05:48 PST, Luke Usherwood
no flags Details
part 3v5, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (with debug logging) (30.26 KB, patch)
2011-01-04 23:52 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v6, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (with debug logging) (36.18 KB, patch)
2011-01-06 21:41 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v7, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (30.49 KB, patch)
2011-01-11 00:07 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
FontEnumeration.cpp, time font enumeration with DirectWrite or GDI font table loading (20.98 KB, text/plain)
2011-01-13 00:52 PST, John Daggett (:jtd)
no flags Details
part 3v7 (refresh) use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (32.26 KB, patch)
2011-01-17 04:14 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part4 clean up font logs (38.59 KB, patch)
2011-01-17 04:18 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part5, load dwrite early to initiate font cache service initialization (10.14 KB, patch)
2011-01-17 04:24 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
part 3v8 use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (31.80 KB, patch)
2011-01-21 00:33 PST, John Daggett (:jtd)
bas: review+
Details | Diff | Splinter Review
part4 clean up font logs (36.99 KB, patch)
2011-01-21 00:38 PST, John Daggett (:jtd)
bas: review+
Details | Diff | Splinter Review
part5v5 load dwrite early to initiate font cache service initialization (11.30 KB, patch)
2011-01-21 00:47 PST, John Daggett (:jtd)
bas: review+
Details | Diff | Splinter Review
part 6 avoid font face creation in harfbuzz shaper until needed (3.55 KB, patch)
2011-01-21 07:16 PST, John Daggett (:jtd)
jfkthame: review+
Details | Diff | Splinter Review

Description John Daggett (:jtd) 2010-10-07 23:16:31 PDT
Created attachment 481766 [details] [diff] [review]
debug patch, delay font cmap loading and neuter system font fallback

Steps to reproduce:

1. Install xperf, see notes here:

2. Build with attached patch (this delays cmap loading until 60secs after startup and stubs out system font fallback until these load)
3. Set default profile to startup with several tabs, including www.facebook.com
4. Reboot
5. Run cmd shell with Admin privs
6. Start xperf:

  xperf -on latency -stackwalk profile+hardfault

7. Run Minefield build
8. Wait 30 secs
9. Stop xperf:

  xperf -d coldstartup-dwrite.etl

10. View results:

  xperf coldstartup-dwrite.etl

Result: on my Dell i7 dev machine this took 13.9 secs to startup of which 6.7 secs was spent in gfxDWriteFontList::InitFontList.  99% of this time was spent in the GetSystemFontCollection method.  

The xperf trace indicates that this call is iterating over all fonts and reading in the *entire* font file contents for each (so all 8MB of Meiryo).

This is *really* problematic, the FindFontFamily method of the system font collection is DWrite's equivalent to CreateFontIndirect, the main API used look up a font.  So even if we never cached a whiff of font data we still have to take the hit at some point of calling GetSystemFontCollection.  

What's extremely curious is that I did the same cold startup trace on IE9.  Their code also ends up iterating over all fonts at startup but only a small fraction of each file is read in.  Maybe they're using the GDI interop interface? Or some private API (that would truly suck...)?
Comment 1 Jonathan Kew (:jfkthame) 2010-10-08 01:38:35 PDT
(In reply to comment #0)
> What's extremely curious is that I did the same cold startup trace on IE9. 
> Their code also ends up iterating over all fonts at startup but only a small
> fraction of each file is read in.  Maybe they're using the GDI interop
> interface? Or some private API (that would truly suck...)?

The behavior of "font-family: arial black" in IE9 (see bug 584769 comment 2 and 4) seems to suggest that they may well be using GDI-based font enumeration/lookup.
Comment 2 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-10-08 03:36:25 PDT
Maybe we can create a simple standalone test program that shows the performance issue (i.e. just fire up DirectWrite and call GetSystemFontCollection), and ask our friends at Microsoft what to do?
Comment 3 Bas Schouten (:bas.schouten) 2010-10-08 07:13:31 PDT
In theory they could also be using their own font collection code.
Comment 4 John Daggett (:jtd) 2010-10-12 01:59:04 PDT
Created attachment 482478 [details]
test app

This is based on the FontEnumeration sample code from the Windows 7.1 SDK.  The code simply creates a font collection object and iterates over available font families.  The majority of the time is spent creating the dwrite factory and the system font collection objects.
Comment 5 John Daggett (:jtd) 2010-10-12 02:15:34 PDT
Running the test app repeatedly to compare static linked version to
dynamically linked version.  Reboot between each run.  Times are
overall time and (dwrite factory/system font collection) times:

  static   982 ms (575/ 406)
  static   781 ms (625/ 155)
  static   784 ms (628/ 156)
  dynamic 6934 ms (673/6260)
  dynamic 5088 ms (704/4383)
  dynamic  820 ms (551/ 269)*
  dynamic 5358 ms (704/4653)
  static   789 ms (622/ 167)
  static   795 ms (639/ 155)
  static   794 ms (630/ 163)

*waited 15mins after startup to run test
  
Looking at the i/o trace in xperf, in both static and dynamic cases a
similar amount of i/o is done but for some reason the
dynamically-linked case takes 3x longer (wtf?!?).  Dwrite code
iterates over all fonts on the system, reading small chunks of data in
(e.g. Meiryo: 392K).  I still can't reproduce the diabolical behavior
of Firefox4/dwrite where dwrite code appears to be reading entire font
files (e.g. Meiryo: 8MB!!).
Comment 6 Jonathan Kew (:jfkthame) 2010-10-12 07:50:01 PDT
These figures suggest that we could get a substantial win if we linked directly with DWrite, instead of using LoadLibrary and GetProcAddress at runtime (though I don't know WHY it should be so expensive). However, that would prevent the browser running on systems earlier than Win7 or Vista with Platform Update.

(Note that IE 9 Platform Preview has those system requirements, and therefore escapes the DWrite-loading penalty that we're seeing here.)
Comment 7 John Daggett (:jtd) 2010-10-12 08:07:21 PDT
(In reply to comment #6)
> These figures suggest that we could get a substantial win if we linked directly
> with DWrite, instead of using LoadLibrary and GetProcAddress at runtime (though
> I don't know WHY it should be so expensive). However, that would prevent the
> browser running on systems earlier than Win7 or Vista with Platform Update.

Argh, I forgot to mention this, I did another test where I directly linked to the dwrite library and called GetSystemFontCollection within XREMain:

http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp#2744

The behavior was equivalent to the dynamically linked case in comment 5.  xperf showed all fonts getting read but only a small amount, not entire fonts.

I'm a little suspicious that this has something to do with some prefetch flag somewhere, I need to do more experimentation with flushing prefetch files before rebooting to see if that has any impact (i.e. is our behavior with no prefetch file better than with one?).
Comment 8 John Daggett (:jtd) 2010-10-13 23:27:29 PDT
Created attachment 483094 [details]
timing results running test app after reboot

Set up a script to run the test app and reboot multiple times.  First 8 times, calling GetSystemFontCollection takes less that 1000ms (for either the dynamically loaded or static linked case), then subsequent runs average 6000ms.  There's clearly some sort of heuristic being applied somewhere by DWrite code but it's not at all clear to me yet what the basis of that heuristic is.

I'm going to take these results and ask MS folks about this to see if they have any clever ideas.
Comment 9 John Daggett (:jtd) 2010-10-19 07:51:44 PDT
Ran some tests with debug trunk build to compare cold startup for:

  - dwrite enabled, harfbuzz
  - gdi, harfbuzz
  - gdi, platform shapers

Steps:

1. Build with debug logging patch
  - no font fallback beyond pref fonts
  - cmap loading never initiated
2. Reboot
3. Run with xperf -on DiagEasy+FileIO
4. Run browser with two tabs at startup:

  www.facebook.com (english)
  twitter.com/opentype
  
I/O Counts while firefox running:

dwrite/hb    158MB
gdi/hb        84MB
gdi/platform  87MB


Hardfaults while firefox running (all processes):

dwrite/hb
  ff      129.4MB (78.2MB font folder files)
  svchost  43.2MB (42MB font cache file)

gdi/hb
  ff       37.0MB (3.49MB font folder files)

Same test with Chrome 6.0.466.0 dev
  chrome1  25.3MB (0.4MB fonts)
  chrome2   1.5MB (1.0MB fonts)
  chrome3   1.0MB
  svchost   1.9MB

IE9 Beta 9.0.7930.16406
  ie9      82.9MB (57MB fonts)
  svchost  43.2MB (42MB font cache)
  svchost   6.2MB
  ie9-2     3.1MB
  svchost   2.7MB

The facebook and twitter pages use these fonts, the CJK and Indian fonts
are used on the bottom of the page:

  Arial Bold
  Arial Regular
  Gulim Regular
  MS Gothic Regular
  MS Mincho Regular
  MS PGothic Regular
  MS PMincho Regular
  MS UI Gothic Regular
  Mangal Regular
  Meiryo Bold
  Meiryo Italic
  Meiryo Regular
  SimSun Regular
  Tahoma Bold
  Tahoma Regular
  Times New Roman Regular
  Verdana Bold
  Verdana Regular
  
Total size of the font file with hardfault bytes for dwrite hb vs. gdi hb

  arial.ttf       774476  dwrite:   696,832  gdi:  287,232
  arialbd.ttf     751104  dwrite:   673,280  gdi:  312,832
  gulim.ttc     13524972  dwrite: 6,606,848  gdi:   65,536
  msgothic.ttc   9176636  dwrite: 5,931,008  gdi:  401,408
  msmincho.ttc  10057108  dwrite: 7,837,184  gdi:
  mangal.ttf      177388  dwrite:   177,664  gdi:  173,568
  meiryo.ttc     9533888  dwrite: 8,026,624  gdi:  249,856
  meiryob.ttc    9749256  dwrite: 8,852,480  gdi:  331,776
  simsun.ttc    15323200  dwrite: 9,110,016  gdi:  163,840
  tahoma.ttf      697972  dwrite:   538,624  gdi:  301,056
  tahomabd.ttf    647184  dwrite:   553,472  gdi:  201,216
  times.ttf       836380  dwrite:   762,880  gdi:  131,072
  verdana.ttf     189144  dwrite:   189,440  gdi:   65,536
  verdanab.ttf    156340  dwrite:   156,672  gdi:   65,536

Actual data pulled in:

  font: MeiryoRegular (hhea)     36 bytes (gfxDWriteFont::ComputeMetrics)
  font: MeiryoRegular (OS/2)     96 bytes (gfxDWriteFont::ComputeMetrics)
  font: MeiryoRegular (cmap) 296728 bytes (gfxDWriteFontEntry::ReadCMAP)
  font: MeiryoRegular (GDEF)     54 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  font: MeiryoRegular (GSUB)  51060 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  font: MeiryoRegular (GPOS)  29870 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  font: MeiryoRegular (head)     54 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  font: MeiryoRegular (cmap) 296728 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  font: MeiryoRegular (hhea)     36 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  font: MeiryoRegular (hmtx)  95768 bytes (hb_blob_t gfxDWriteFont::GetFontTable)
  
Conclusions:

The dwrite code is obviously doing some really dumb caching.  At cold
startup it reads in a significant fraction of *all* fonts.  For fonts
where harfbuzz reads relatively small tables, the dwrite code is pulling
in the *entire* font (or almost all of it).  The use of the CJK fonts
explains why this behavior is so deadly in the facebook case, it's
pulling in over 40MB of CJK fonts alone.

IE9 Beta clearly suffers from the same problem (i.e. 100MB of font IO at
startup).  My guess is that their code relies on dwrite shaping which
reduces some of the IO load.

It would be good to get MS to fix this, especially since it appears to
also affect IE9.  As a workaround, I'm going to see if I can write a
version of GetFontTable that pulls the data via GDI calls rather than
DirectWrite.
Comment 10 John Daggett (:jtd) 2010-10-19 07:56:53 PDT
Created attachment 484314 [details] [diff] [review]
another debug patch, neuter font fallback and dump table read info
Comment 11 (dormant account) 2010-10-19 08:57:06 PDT
So what does chrome do to avoid loading fonts?
Comment 12 John Daggett (:jtd) 2010-10-19 16:54:45 PDT
(In reply to comment #11)
> So what does chrome do to avoid loading fonts?

Compared to the dwrite case or the gdi case?  They're obviously not using dwrite API's yet, since they don't suffer any of the weird IO side effects.  Font fallback is hard-wired in Chrome.  Firefox uses pref settings that are tied to the default fonts for Firefox locales, falling back to system fonts when that fails.  Because of our pref font settings, in the GDI case we look at more fonts than they do.  The IO involved is 1.4MB Chrome vs. 3.4MB FF trunk, it's nothing compared to the dwrite case.

The options here are to get MS to fix dwrite caching behavior or to route font table reads to GDI API's or some combination of the two.
Comment 13 (dormant account) 2010-10-20 05:25:45 PDT
(In reply to comment #12)
> (In reply to comment #11)
> > So what does chrome do to avoid loading fonts?
> 
> Compared to the dwrite case or the gdi case?  They're obviously not using
> dwrite API's yet, since they don't suffer any of the weird IO side effects. 
> Font fallback is hard-wired in Chrome.  Firefox uses pref settings that are
> tied to the default fonts for Firefox locales, falling back to system fonts
> when that fails.  Because of our pref font settings, in the GDI case we look at
> more fonts than they do.  The IO involved is 1.4MB Chrome vs. 3.4MB FF trunk,
> it's nothing compared to the dwrite case.

In your table above there is 84mb in gdi/hb case. Unless we are doing a lot of places/urlclassifier IO that sounds like ~30mb fonts are being loaded.  Can you email me the xperf log for that?

What prefs do I have to set to reproduce only 3.4mb of font IO on facebook? I haven't been able to get Firefox to not enumerate tons of fonts with dwrite on or off.
Comment 14 John Daggett (:jtd) 2010-10-20 05:35:50 PDT
Created attachment 484662 [details]
FontEnumeration.cpp, with table reading code

Based on Windows 7.1 SDK sample code.  This version reads in the cmap table from the same fonts used when rendering Facebook/Twitter pages in previous tests.  On cold startup this code reads in roughly 70MB of font data.
Comment 15 John Daggett (:jtd) 2010-10-20 05:48:53 PDT
(In reply to comment #13)

> What prefs do I have to set to reproduce only 3.4mb of font IO on facebook? I
> haven't been able to get Firefox to not enumerate tons of fonts with dwrite on
> or off.

Use the "another debug patch" code, that will disable font fallback and the cmap loading code.

I'll send you the xperf files.
Comment 16 John Daggett (:jtd) 2010-10-20 05:52:12 PDT
(In reply to comment #13)

> In your table above there is 84mb in gdi/hb case. Unless we are doing a lot of
> places/urlclassifier IO that sounds like ~30mb fonts are being loaded.  Can you
> email me the xperf log for that?

This is running a debug build, so 49MB of the 84MB is reading in xul.dll.
Comment 17 Jonathan Kew (:jfkthame) 2010-10-20 05:52:52 PDT
It looks like we'd get a big cold-startup win if our installer configured the Windows FontCache service to auto-start (instead of starting up on demand when the first application uses it). That removes the aggressive reading of fonts out of our process and lets it happen before Firefox is launched at all.

There is sample code on MSDN that shows how to interact with the Service Control Manager to do this; see http://msdn.microsoft.com/en-us/library/bb540473(v=VS.85).aspx. This example includes an "enable" function that sets a service to "enabled" (manual startup on demand). by using the value SERVICE_AUTO_START instead of SERVICE_DEMAND_START in the call to ChangeServiceConfig, this can make the service be auto-started on system boot.

This would have to be done by the installer, not by Firefox, as administrative rights are needed to interact with the Service Control Manager.

A possible downside is that making this change might noticeably increase the general Windows boot time and/or RAM footprint, and it wouldn't be at all clear to users why simply installing Firefox (not necessarily even running it) should have that effect, or how to reverse it if desired.
Comment 18 (dormant account) 2010-10-20 06:51:50 PDT
(In reply to comment #15)
> 
> Use the "another debug patch" code, that will disable font fallback and the
> cmap loading code.
> 
> I'll send you the xperf files.

Thank you. The gdi/hf profile looks good.

Jonathan Kew, windows font cache trick sounds interesting. Main problem is that in the best case this only hides the problem and in the worst case the fonts will get read twice(ie if the cache ever gets flushed in low memory conditions). It's worth suggesting as a workaround for users experiencing this problem while it gets fixed.
Comment 19 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-10-20 13:22:01 PDT
(In reply to comment #18)
> Jonathan Kew, windows font cache trick sounds interesting. Main problem is that
> in the best case this only hides the problem

What do you mean by "hide"? If the user can no longer see a problem, then there is no problem :-)

> and in the worst case the fonts
> will get read twice(ie if the cache ever gets flushed in low memory
> conditions). It's worth suggesting as a workaround for users experiencing this
> problem while it gets fixed.

"Fixing" it (e.g. by persisting the font cache across reboots) is out of our control, unless we reimplement the font cache (in which case we'd be wasting machine resources by duplicating work if the user has run or will run any other D2D or WPF application).

The cache service builds a file, so it probably persists through low memory conditions.

Having the installer set the font cache service to autostart seems like it could be a good approach. We should measure boot time with and without it, and time for cold-starting Firefox immediately after boot with and without it.
Comment 20 Shawn Wilsher :sdwilsh 2010-10-25 07:29:19 PDT
Per bug 600713 comment 61, this is required to fix bug 600713.  As a result, it's not blocking.

(In reply to comment #19)
> Having the installer set the font cache service to autostart seems like it
> could be a good approach. We should measure boot time with and without it, and
> time for cold-starting Firefox immediately after boot with and without it.
However that will not help our users who update via the updater.
Comment 21 Emanuel Hoogeveen [:ehoogeveen] 2010-10-25 11:40:29 PDT
Or anyone using a portable installation of Firefox. It still sounds like a good thing to enable where possible though, given how much computer use nowadays is browser-related.
Comment 22 John Daggett (:jtd) 2010-10-25 13:27:27 PDT
(In reply to comment #20)
> Per bug 600713 comment 61, this is required to fix bug 600713.  As a result,
> it's not blocking.

No, that's not what that comment means.  It means that huge io done within dwrite code is in large part the reason for the problems reported in bug 593614.  The fix needed on 600713 is an associated factor.
Comment 23 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2010-10-25 13:32:57 PDT
(In reply to comment #20)
> Per bug 600713 comment 61, this is required to fix bug 600713.  As a result,
> it's not blocking.
> 
> (In reply to comment #19)
> > Having the installer set the font cache service to autostart seems like it
> > could be a good approach. We should measure boot time with and without it, and
> > time for cold-starting Firefox immediately after boot with and without it.
> However that will not help our users who update via the updater.

We have the ability to run code from the installer on application update.
Comment 24 Shawn Wilsher :sdwilsh 2010-10-25 13:42:18 PDT
(In reply to comment #22)
> No, that's not what that comment means.  It means that huge io done within
> dwrite code is in large part the reason for the problems reported in bug
> 593614.  The fix needed on 600713 is an associated factor.
Then I'm confused why you said we have to figure out this bug first over there...
Comment 25 John Daggett (:jtd) 2010-10-25 13:46:52 PDT
(In reply to comment #24)

> Then I'm confused why you said we have to figure out this bug first over
> there...

The comment was in response to Dietrich asking for an update.  We figure out a fix here, then I work on 600713.  They are only dependent from a scheduling perspective.
Comment 26 Shawn Wilsher :sdwilsh 2010-10-25 13:51:34 PDT
OK then.  My reasons for blocking on this bug no longer apply, so pulling those bits.
Comment 27 John Daggett (:jtd) 2010-10-25 13:53:44 PDT
Well, this blocks a blocking bug so it probably needs to block also.  This is fun.
Comment 28 John Daggett (:jtd) 2010-10-26 02:10:45 PDT
Created attachment 486020 [details]
FontEnumeration.cpp, time font enumeration with DirectWrite

Sample code to test font enumeration timing.

Options: -t    read in font tables from a selection of fonts
         -a    read in font tables from all fonts
         -c    read cmap table
         -m    query the modified date of the font
         -v    verbose output
         -l    test localized family name lookups

Example:

FontEnumeration.exe -a -c -v

Times reading in all cmaps for all families on the system, along with the initial DWrite factory and SystemFontCollection creation times.
Comment 29 Damon Sicore (:damons) 2010-11-09 15:54:12 PST
Adding this back to the blocking list again.  Talking with Taras, this bug and bug 600713 are the two bugs that have the most impact on startup.

To be clear: the concerns/effects/issues addressed in both of these bugs need to be resolved.
Comment 30 Scott A. 2010-11-16 15:58:00 PST
Is this a simple thing to tackle or will these 2 bugs be at risk for FF4?
Comment 31 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-11-16 16:00:24 PST
jtd: did you send the test to Microsoft?  Did they have any feedback?
Comment 32 John Daggett (:jtd) 2010-11-16 16:34:29 PST
(In reply to comment #30)
> Is this a simple thing to tackle or will these 2 bugs be at risk for FF4?

We can adjust our code to avoid problematic calls to DirectWrite to some extent and work with MS to fix the underlying perf problems (e.g. not much we can do about GetSystemFontCollection latency).
Comment 33 John Daggett (:jtd) 2010-11-16 16:37:25 PST
(In reply to comment #31)
> jtd: did you send the test to Microsoft?  Did they have any feedback?

Yes, I sent the sample code with detailed notes and xperf traces to both the manager of the DirectX team and folks on the IE team (since some of this affects them also).  I have a contact now of one of the DirectWrite engineers and I'm waiting to hear from him.  I'm planning to ping him again this week.
Comment 34 (dormant account) 2010-11-18 11:02:34 PST
So while we are waiting for this, can you figure out why chrome does not need to enumerate fonts the way we do?
Comment 35 John Daggett (:jtd) 2010-11-18 20:57:43 PST
(In reply to comment #34)
> So while we are waiting for this, can you figure out why chrome does not need
> to enumerate fonts the way we do?

In the context of this bug, Chrome doesn't use dwrite yet so they don't hit any of the problems here.

For family name lookups, Chrome does lookups with straight GDI calls so it has all the problems of GDI font name lookup and fails tests in the CSS 2.1 test suite because of this.  For example, the page below will render differently between Windows/OSX versions of Chrome even though the same basic fonts exist in both places:

  http://people.mozilla.org/~jdaggett/tests/arialvariations.html

For font fallback, Chrome relies on using hard coded font fallback settings (i.e. "for this codepoint, try these fonts...").  For codepoints that fall outside those it just punts; even with a font that supports a given codepoint it displays a missing character.  This method doesn't really scale as more codepoints are added in Unicode (e.g. Unicode 6.0 "oh joy" addition: codepoints for emoticons, including U+1F4A9 == Pile of Poo).

So the main reason we do font enumeration is for font fallback but we can definitely come up with ways that avoid the need to do expensive operations at startup in the majority of cases, including caching fallback info.  This will be dealt with in bug 600713, not this bug.
Comment 36 John Daggett (:jtd) 2010-11-21 19:41:09 PST
Created attachment 492242 [details]
FontEnumeration.cpp, time font enumeration with DirectWrite or GDI font table loading

Added '-g' flag to sample code to read in font tables using GDI interop calls rather than DirectWrite.

Cold startup I/O with different font table reading methods:

* FontEnumeration.exe -t -c (reads in cmaps from select group of fonts using DirectWrite table loading)

* FontEnumeration.exe -t -c -g (same but using GDI calls instead)

I/O counts in xperf (using 'xperf -on DiagEasy+FileIO'):

  DirectWrite table loads - program 28MB, fontcache service 10MB
  GDI table loads         - program  3MB, no fontcache I/O

The selected set of fonts are close to those that are used in the Facebook/Twitter pages in comment 9.
Comment 37 John Daggett (:jtd) 2010-11-22 18:59:54 PST
Created attachment 492552 [details] [diff] [review]
part 1v3, rework code for doing localized font name lookup

The system font collection contains all the localized name information, so load it in when iterating over the family names.  Avoid loading names on ASCII family name miss, this is currently causing all the name tables to be hit.  On localized systems (e.g. Japanese), for the GDI case the English name will be the "other" case.
Comment 38 John Daggett (:jtd) 2010-11-22 19:01:35 PST
Created attachment 492553 [details] [diff] [review]
part 2v3, reftests to localized name lookups

Test uses localized name, ref case uses English name.  If the font is not present, both cases will use the default font and trivially pass.
Comment 39 John Daggett (:jtd) 2010-11-22 19:10:53 PST
Created attachment 492556 [details] [diff] [review]
part 3v1, use GDI table loading in the dwrite case

Load tables via GDI to try and avoid dwrite font caching behavior (i.e. when reading in a single small table, dwrite reads the entire font sometimes).  Also adds debug timing and pushes out the cmap loads to 120s.  Need to put together a tryserver build of this for wider testing.

The pref 'gfx.font_rendering.directwrite.use_gdi_table_loading' controls table loading behavior.  When enabled, GDI font table loading is used, otherwise dwrite table loading is used.

Part of the problem testing this is that testing the same startup sequence repeatedly appears to "train" the dwrite font cache code to not have huge delays, which means I'm not always sure whether a code change I make is helping/hurting or if the dwrite caching code is behaving different due to system environment changes (e.g. state of system caches).
Comment 40 (dormant account) 2010-11-22 19:16:13 PST
(In reply to comment #39)

> Part of the problem testing this is that testing the same startup sequence
> repeatedly appears to "train" the dwrite font cache code to not have huge
> delays, which means I'm not always sure whether a code change I make is
> helping/hurting or if the dwrite caching code is behaving different due to
> system environment changes (e.g. state of system caches).

Indeed, a reboot is the only reliable way to clear system caches and even then prefetch may be kicking in.

Perhaps running the mmap trick from https://bugzilla.mozilla.org/show_bug.cgi?id=532965#c1 will help evict entries? You can also turn down the amount of ram in the bios, then it will be quicker to clear.
Comment 41 John Daggett (:jtd) 2010-11-22 19:49:05 PST
(In reply to comment #40)
> Indeed, a reboot is the only reliable way to clear system caches and even then
> prefetch may be kicking in.

I was actually talking about *across* reboots, sorry I wasn't clearer.  I'm guessing the font cache service is saving access data across reboots thus affecting caching behavior.  Right now, I don't know of any way of clearing out the font cache data other than explicitly hunting around and deleting cache files.
Comment 42 Emanuel Hoogeveen [:ehoogeveen] 2010-11-22 22:31:50 PST
Would disabling the font cache service altogether for testing be counterproductive? i.e. you could say the aim was to minimize disk I/O on cold startup for a fresh install of Windows. Then investigate optimizing use of the system cache as a secondary goal, if it turns out to be worth it (with Bug 600713 fixed).
Comment 43 Jonathan Kew (:jfkthame) 2010-11-23 09:21:33 PST
Comment on attachment 492552 [details] [diff] [review]
part 1v3, rework code for doing localized font name lookup

>+	    nsRefPtr<gfxFontFamily> fam;
>+
>+        if (mFontFamilies.GetWeak(name))
>+	    continue;
>+        

Several nits: spaces on the blank line; braces needed for the body of the "if", please; tab characters need to be converted to spaces. These things (braces needed for single-statement body, and tab characters) occur quite a bit in this file (thanks to Visual Studio settings?), so please do a global check of the patch.

>+    // set when other family names have been read in
>+    void OtherFamilyNamesInitialized() {
>+        mOtherFamilyNamesInitialized = PR_TRUE;
>+    }

Could we name this "SetOtherFamilyNamesInitialized()", please? Otherwise it sounds like a query rather than a setter, IMO.

>@@ -497,9 +502,6 @@ public:
>     // mark this family as being in the "bad" underline offset blacklist
>     void SetBadUnderlineFamily() {
>         mIsBadUnderlineFamily = PR_TRUE;
>-        if (mHasStyles) {
>-            SetBadUnderlineFonts();
>-        }
>     }

Are you sure this is OK? ISTM that it's still possible at least on some platforms for a family to be populated with styles, and then later we read a new (localized) family name that might be in the blacklist; in that case, we'd need to mark the faces here, wouldn't we?

>-    if ((familyEntry = mOtherFamilyNames.GetWeak(key, &found))) {
>+    if (familyEntry = mOtherFamilyNames.GetWeak(key, &found)) {

This change will probably lead to compiler warnings ("did you mean == ?") in some environments, which is what the extra parens are designed to suppress. If anything, I'd prefer to make it more explicit by writing

    if ((familyEntry = mOtherFamilyNames.GetWeak(key, &found)) != nsnull)

or separating the assignment out into its own statement, before the conditional. Or else just leave it untouched.

>@@ -472,8 +474,6 @@ gfxPlatformFontList::AddOtherFamilyName(
>         PR_LOG(gFontListLog, PR_LOG_DEBUG, ("(fontlist-otherfamily) canonical family: %s, other family: %s\n", 
>                                             NS_ConvertUTF16toUTF8(aFamilyEntry->Name()).get(), 
>                                             NS_ConvertUTF16toUTF8(aOtherFamilyName).get()));
>-        if (mBadUnderlineFamilyNames.Contains(key))
>-            aFamilyEntry->SetBadUnderlineFamily();
>     }
> }

Why don't we need to check the bad-underline blacklist here? Isn't that making assumptions about what names are used in the list?
Comment 44 John Daggett (:jtd) 2010-11-23 22:01:04 PST
Created attachment 492911 [details] [diff] [review]
part 1v4, rework code for doing localized font name lookup

Reworked based on review comments.

For the dwrite code, the SetUnderline poo can all be done in InitFontList but not for the GDI case (grrrr) so I've put back the code that I removed in the last patch.
Comment 45 John Daggett (:jtd) 2010-11-23 22:02:21 PST
Created attachment 492912 [details] [diff] [review]
part 3v1a, use GDI table loading in the dwrite case

Refresh patch to latest trunk & other patches
Comment 46 Jonathan Kew (:jfkthame) 2010-11-24 02:13:35 PST
Comment on attachment 492911 [details] [diff] [review]
part 1v4, rework code for doing localized font name lookup

Looks fine. (I haven't tried measuring the performance impact of this; do you have any figures for that?)
Comment 47 Jonathan Kew (:jfkthame) 2010-11-24 02:25:30 PST
Comment on attachment 492553 [details] [diff] [review]
part 2v3, reftests to localized name lookups

Hmm, basically looks ok but what's up with localized-family-names-005? Seems to be an empty test.

(Also, it seems odd to have a few missing numbers in the sequence.)

Finally, do you really think it's worthwhile testing such a large number of cases? I'd be inclined to just test the behavior using a handful each of the standard fonts normally available on each platform - maybe a few for each script - rather than expand the test suite quite so much.
Comment 48 John Daggett (:jtd) 2010-11-24 22:54:57 PST
Created attachment 493194 [details] [diff] [review]
part 3v1b, use GDI table loading in the dwrite case

trim out tabs, dump timing data to file, comment out system fallback
Comment 49 John Daggett (:jtd) 2010-11-24 22:58:09 PST
(In reply to comment #42)
> Would disabling the font cache service altogether for testing be
> counterproductive? i.e. you could say the aim was to minimize disk I/O on cold
> startup for a fresh install of Windows. Then investigate optimizing use of the
> system cache as a secondary goal, if it turns out to be worth it (with Bug
> 600713 fixed).

Keep in mind, the problem here is mainly related to cold startup.  On warm startup the font cache helps make things much quicker.  I am testing with all modes (font cache disabled, manual and automatic) just to be safe.
Comment 50 John Daggett (:jtd) 2010-11-25 22:49:42 PST
Created attachment 493336 [details] [diff] [review]
part 2v4, reftests to localized name lookups

Revised based on review comments.  I trimmed out some of the duplicates and reordered things but I'd still like to keep this number.  Part of the reason for this is that some names are pulled in because a name is pref font, others by the lazy loader (under GDI/OSX).  These fonts also cover different platforms/versions (e.g. Win7/XP/10.5/10.6).
Comment 51 John Daggett (:jtd) 2010-11-25 22:51:39 PST
Created attachment 493337 [details] [diff] [review]
part 3v1c, use GDI table loading in the dwrite case

Write out timing info to a file 'fonttimings.out' in the bin directory for use with more testing.
Comment 52 Jonathan Kew (:jfkthame) 2010-11-26 05:30:48 PST
Created attachment 493375 [details] [diff] [review]
part 2v5, alternate version of patch to add the reftests

(In reply to comment #50)
> Created attachment 493336 [details] [diff] [review]
> part 2v4, reftests to localized name lookups
> 
> Revised based on review comments.  I trimmed out some of the duplicates and
> reordered things but I'd still like to keep this number.  Part of the reason
> for this is that some names are pulled in because a name is pref font, others
> by the lazy loader (under GDI/OSX).  These fonts also cover different
> platforms/versions (e.g. Win7/XP/10.5/10.6).

OK, but in that case how about merging all the tests into a single file (as in the attached version), rather than a separate file per font. This makes the test run a LOT quicker, and ISTM that it's actually simpler to review and maintain in this form. Any failure would still be easy to pick out in the reftest images. WDYT?
Comment 53 John Daggett (:jtd) 2010-11-28 19:03:55 PST
Created attachment 493621 [details] [diff] [review]
part 2v6, reftests to localized name lookups

(In reply to comment #52)
> OK, but in that case how about merging all the tests into a single file (as in
> the attached version), rather than a separate file per font. This makes the
> test run a LOT quicker, and ISTM that it's actually simpler to review and
> maintain in this form. Any failure would still be easy to pick out in the
> reftest images. WDYT?

I did this to begin with but then broke them out into separate tests.  In the past, I've hit flakey problems with platforms tested in constrained VMs, loading a lot of fonts at once can cause random failures, generally on platforms you never realized we supported... ;)

So as a compromise, I've set it up so there are 4 files which test 10 fonts per page.
Comment 54 John Daggett (:jtd) 2010-11-29 00:59:41 PST
Created attachment 493644 [details] [diff] [review]
part 3v2, use GDI table loading in the dwrite case (with debug logging)
Comment 55 Scott A. 2010-11-29 16:22:23 PST
What kind or realistic numbers are seen on cold start improvements? Does it fix the high HDD activity on cold start? Or are those things still dependent on 600713 or some other bug.  Would like to see this make Beta 8 if it's significant.
Comment 56 John Daggett (:jtd) 2010-11-30 00:30:24 PST
Created attachment 493925 [details] [diff] [review]
part 3v3, use GDI table loading in the dwrite case (with debug logging)

Added logging around CreateFontFace and reverted the background cmap loading code to match trunk behavior.  Using Process Monitor, verified that only I/O for a given table is done rather the entire face.

The code in gfxDWriteFont creates a IDWriteFontFace obj at construction.  Turns out that's a sucky call, that's the precise point at which DirectWrite loads the entire font (i.e. does manual sync I/O on the entire font) if running under cold startup conditions.  On warm startup, it uses data in the system font cache.  Next step is to make this lazier as is done for GDI font objects to avoid I/O unless absolutely necessary.
Comment 57 Jonathan Kew (:jfkthame) 2010-11-30 01:16:17 PST
Comment on attachment 493621 [details] [diff] [review]
part 2v6, reftests to localized name lookups

r=jfkthame, with a request/suggestion: I think it'd be nice to add the English names alongside the localized ones in the actual text that's displayed (like I did in v5). ISTM that will be a convenience for whoever's looking at the images if there's ever a failure.
Comment 58 John Daggett (:jtd) 2010-11-30 21:52:06 PST
Created attachment 494309 [details] [diff] [review]
part 3v4, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (with debug logging)

Add in code to only load the metrics (and create the expensive-on-cold-startup IDWriteFontFace) when necessary.
Comment 59 John Daggett (:jtd) 2010-12-01 05:08:12 PST
The DirectWrite code path suffers from two DirectWrite API problems on cold startup:

  1. The GetSystemFontCollection method often decides to scan all fonts in the system after a reboot.  On a slow machine this can cause a 5 second delay, made significantly worse if a file-level virus scanner is running.
  2. When a IDWriteFontFace object is created the DirectWrite code often reads the *entire* font into memory.

The latest patches are primarily focused on reducing the need to create IDWriteFontFace objects, by using GDI API's for font table reads and creating these objects as lazily as possible. 

I set up a tryserver build with the latest set of patches.  There are still reftest failures but the build should be fine for doing perf testing:

  http://bit.ly/gEARUZ

This build should significantly reduce the level of I/O done at startup.  This build includes logging that will spit out timing info on the time spent in InitFontList and logs all font table reads and IDWriteFontFace object creations.  Data is appended to a file 'fonttiming.out' in the bin directory each time the app is run.

The testcase I've been using is:

1. Set up a profile to load two tabs at startup:

  www.facebook.com
  www.twitter.com/sohbunshu

2. Reboot
3. Run browser and wait 30 seconds

The pref 'gfx.font_rendering.directwrite.use_gdi_table_loading' is true by default.  Disable it to use the existing dwrite table loading.

Note that all of this concerns, *cold* startup behavior, there's little change in warm bootup behavior since the browser is able to pull in data from a font cache quickly on warm bootup.

I should also add that cold startup time is *very* adversely affected by virus scanners when GetSystemFontCollection scans all fonts on the system, on my testing netbook the time spent in this method with a virus scanner enabled varied from 25-40 *seconds* (not ms, actual seconds!).  With the virus scanner off, this is reduced to a still unacceptable 5 seconds.

I'll do more testing tomorrow and post startup times and I/O stats, with Chrome/IE9 comparisons for reference.
Comment 60 John Daggett (:jtd) 2010-12-02 00:25:38 PST
Using the tryserver build listed above, I ran some timing/io tests today
on a netbook machine: ASUS Eee PC, Atom N455 1.66GHz, 1GB RAM, Virus
checker disabled.

Cold startup times and font I/O
(font io is over first 2min):

  Minefield      54.7s   font io: 309MB 
  try-dwriteload 40.7s   font io: 300MB
  try-gdiload    35.1s   font io:  57MB
  
  IE9beta        23.6s   font io:  38MB
  FF3.6          18.6s   font io:  19MB
  chrome9dev     19.4s   font io:   4MB
  
Warm startup times (for comparison):

  Minefield       5.2s
  chrome9dev      3.1s

These results generally reflect when the DirectWrite font cache
determined that it needed to scan the font folder.  In some cases it
*doesn't* do this and the startup time 20-25 seconds.
Comment 61 John Daggett (:jtd) 2010-12-06 01:01:22 PST
Created attachment 495468 [details] [diff] [review]
patch, load dwrite dll at startup to start FntCache service earlier

I spoke with a member of the DirectWrite dev team at the end of last week.  When a DirectWrite IDWriteFactory object is created it communicates with the system to start the FntCache service.  For subsequent calls, the client code in dwrite.dll will try to communicate with the system FntCache service but if the service is still starting up it's possible the communication will time out.  If that occurs the client dwrite code then assumes the service isn't running and does the work itself (e.g. building the system font collection by reading all fonts manually).

My hunch is that we're hitting this because we load the dwrite library, call CreateDWriteFactory and immediately ask call the GetSystemFontCollection.  Since the service is still starting up, the communication can time out (since there's lots of I/O at app startup).  So my hunch by creating a factory object earlier to kick off the FntCache service, we'll reduce the chance of our hitting the degenerate case performance-wise.

I'm going to put this together in a tryserver build and test it tomorrow.
Comment 62 (dormant account) 2010-12-06 16:12:05 PST
(In reply to comment #61)
> Created attachment 495468 [details] [diff] [review]
> patch, load dwrite dll at startup to start FntCache service earlier
> 
> I spoke with a member of the DirectWrite dev team at the end of last week. 
> When a DirectWrite IDWriteFactory object is created it communicates with the
> system to start the FntCache service.  For subsequent calls, the client code in
> dwrite.dll will try to communicate with the system FntCache service but if the
> service is still starting up it's possible the communication will time out.  If
> that occurs the client dwrite code then assumes the service isn't running and
> does the work itself (e.g. building the system font collection by reading all
> fonts manually).
> 
> My hunch is that we're hitting this because we load the dwrite library, call
> CreateDWriteFactory and immediately ask call the GetSystemFontCollection. 
> Since the service is still starting up, the communication can time out (since
> there's lots of I/O at app startup).  So my hunch by creating a factory object
> earlier to kick off the FntCache service, we'll reduce the chance of our
> hitting the degenerate case performance-wise.

Can we instead check if the font service is ready before touching dwrite(fallback on gdi and/or cached dwrite metrics in meantime)?
Comment 63 Scott A. 2010-12-12 13:33:00 PST
Any ETA on this? Cold startup is very painful.
Comment 64 Jim Jeffery not reading bug-mail 1/2/11 2010-12-13 08:35:51 PST
*** Bug 618831 has been marked as a duplicate of this bug. ***
Comment 65 Luke Usherwood 2010-12-23 04:30:55 PST
Created attachment 499499 [details]
Resource monitor: FF cold start with FontCacheService running

I started Windows Font Cache Service manually, waited 2 minutes or so, then ran Firefox Beta 8. Presuming this is what the patch (comment #61) will do, it looks like it won't help. The tip from the DirectWrite dev team might be a red-herring.

In this screenshot of Resource Monitor: The orange line shows Firefox in the top throughput graph. Each grid is 5 seconds. When looking at Disk Queue length: 
 - C: is a harddrive with Windows, Fonts and my Firefox profile
 - S: is a SSD holding Firefox program code. 

Firefox displays and becomes responsive around 32s, the third 'blip' on the S: graph.
Comment 66 Bas Schouten (:bas.schouten) 2010-12-23 04:55:33 PST
(In reply to comment #65)
> Created attachment 499499 [details]
> Resource monitor: FF cold start with FontCacheService running
> 
> I started Windows Font Cache Service manually, waited 2 minutes or so, then ran
> Firefox Beta 8. Presuming this is what the patch (comment #61) will do, it
> looks like it won't help. The tip from the DirectWrite dev team might be a
> red-herring.
> 
> In this screenshot of Resource Monitor: The orange line shows Firefox in the
> top throughput graph. Each grid is 5 seconds. When looking at Disk Queue
> length: 
>  - C: is a harddrive with Windows, Fonts and my Firefox profile
>  - S: is a SSD holding Firefox program code. 
> 
> Firefox displays and becomes responsive around 32s, the third 'blip' on the S:
> graph.

So the real question is, why for example on my machine, where the service is always started (for whatever reason, Steam or Windows Live Messenger or other DWrite software that I run presumably). This doesn't happen at all. PErhaps it needs some other prodding than just service start.
Comment 67 Luke Usherwood 2010-12-23 05:48:50 PST
Created attachment 499506 [details]
System info corresponding to previous attachment
Comment 68 Luke Usherwood 2010-12-23 05:58:21 PST
(In reply to comment #66)
> So the real question is, why for example on my machine, where the service is
> always started (for whatever reason, Steam or Windows Live Messenger or other
> DWrite software that I run presumably). This doesn't happen at all. PErhaps it
> needs some other prodding than just service start.

I've attached a screenshot of about:support (Graphics) and dxinfo, in case any of the differences in system hardware/drivers help spark an idea.

My start page is similar to this one: http://trac.edgewall.org/
Comment 69 John Daggett (:jtd) 2011-01-04 23:52:09 PST
Created attachment 501273 [details] [diff] [review]
part 3v5, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (with debug logging)
Comment 70 Scott A. 2011-01-05 08:51:17 PST
Any possibility of this making beta9?
Comment 71 Joe Drew (not getting mail) 2011-01-05 13:43:43 PST
Probably not.
Comment 72 John Daggett (:jtd) 2011-01-06 21:41:50 PST
Created attachment 501921 [details] [diff] [review]
part 3v6, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily (with debug logging)

Fix up some reftest failures.
Comment 74 Scott A. 2011-01-07 16:51:10 PST
What about part 3?
Comment 75 John Daggett (:jtd) 2011-01-09 17:05:15 PST
(In reply to comment #74)
> What about part 3?

Still working on tracking down some reftest failures.

Tryserver build, tryserver changeset 93caff8631f8:
http://people.mozilla.org/~jdaggett/dwrite/
Comment 76 John Daggett (:jtd) 2011-01-11 00:07:45 PST
Created attachment 502747 [details] [diff] [review]
part 3v7, use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily

Cleaned up the patch so that GDI table access is only used at the font entry level.  Once the metrics code is getting called the dwrite code will have already cached the font data so it's better to use the dwrite api at that point.

Also trimmed out the font substitution code from the dwrite fontlist.  This code was to support font aliases from older versions of Windows but since the contents of the fontlist is changing in the dwrite case I think it's better we make a clean break here rather than burden all font lookups with a check for font names from older Windows versions.  GDI still uses this but I reworked the code to read the registry more efficiently.
Comment 77 Bas Schouten (:bas.schouten) 2011-01-11 13:54:44 PST
FWIW, Microsoft released an update which is flagged as 'Important' on my machine and it switches the service startup to 'Automatic (Delayed start)' for me.

http://support.microsoft.com/?kbid=2454826

Light text on dark background is also improved, see:

http://img821.imageshack.us/img821/5480/comparisonlightdark.png
Comment 78 Scott A. 2011-01-11 14:19:07 PST
(In reply to comment #77)
> FWIW, Microsoft released an update which is flagged as 'Important' on my
> machine and it switches the service startup to 'Automatic (Delayed start)' for
> me.
> 
> http://support.microsoft.com/?kbid=2454826
> 
> Light text on dark background is also improved, see:
> 
> http://img821.imageshack.us/img821/5480/comparisonlightdark.png

I can confirm... I checked for updates at 12pm EST today and found nothing, checked again now at 5PM and got a few updates.. got important updates KB976902 KB2419640  and optional update KB2454826  .... The later optional update mentions performance improvement with graphics in the description. Not sure what all of these do but I'll do some testing to see if there is any difference.
Comment 79 Scott A. 2011-01-11 14:44:32 PST
I don't have actual numbers but KB2454826 does seem to improve cold start performance. It seems like it could still be better so maybe the rest of it will be fixed with the last part of the patch in this bug.
Comment 80 John Daggett (:jtd) 2011-01-11 17:05:01 PST
(In reply to comment #79)
> I don't have actual numbers but KB2454826 does seem to improve cold start
> performance. It seems like it could still be better so maybe the rest of it
> will be fixed with the last part of the patch in this bug.

Scott, did you get a chance to test the tryserver build noted in comment 75?
Comment 81 John Daggett (:jtd) 2011-01-11 17:06:52 PST
Note that the optional update KB2454826 replaces KB2028560 which resolved the multiple font file mappings occurring in bug 617266.
Comment 82 Scott A. 2011-01-11 17:51:16 PST
(In reply to comment #80)
> (In reply to comment #79)
> > I don't have actual numbers but KB2454826 does seem to improve cold start
> > performance. It seems like it could still be better so maybe the rest of it
> > will be fixed with the last part of the patch in this bug.
> 
> Scott, did you get a chance to test the tryserver build noted in comment 75?

Yes I did, and it was an improvement, but currently I am testing the latest trunk and the beta9 candidate, I haven't redownloaded the tryserver build to see how it reacts with this patch... Is part3 even still a necessary patch?
Comment 83 John Daggett (:jtd) 2011-01-11 18:24:22 PST
(In reply to comment #82)
> Is part3 even still a necessary patch?

It would be nice if it wasn't necessary (i.e. the latest MS updates fixed the dwrite io problems) but color me skeptical.
Comment 84 Csaba Kozák [:WonderCsabo] 2011-01-11 23:33:37 PST
(In reply to comment #77)
> FWIW, Microsoft released an update which is flagged as 'Important' on my
> machine and it switches the service startup to 'Automatic (Delayed start)' for
> me.
> 
> http://support.microsoft.com/?kbid=2454826
> 
> Light text on dark background is also improved, see:
> 
> http://img821.imageshack.us/img821/5480/comparisonlightdark.png

Could you provide your testcase which is shown by the screenshot?
Comment 85 John Daggett (:jtd) 2011-01-13 00:52:30 PST
Created attachment 503434 [details]
FontEnumeration.cpp, time font enumeration with DirectWrite or GDI font table loading

Added function for looking up dwrite fonts via the GDI interop object as one potential way around looking up fonts via the system font collection.  Even with the latest MS updates, I still see the bad case behavior on my netbook test machine (i.e. GetSystemFontCollection does i/o on all font files).  I tested running the test program 15s after startup.  When the GDI interop is used a delay occurs on the first lookup of a font.  Running Process Monitor shows that the dwrite code is doing the same thing, it's reading a small amount of data from all fonts.  Note that the same behavior happens with IE9 beta.  In both the GetSystemFontCollection case and the GDI interop case, roughly 20MB of font I/O occurs within the first use of dwrite lookup methods.
Comment 86 John Daggett (:jtd) 2011-01-17 04:14:55 PST
Created attachment 504412 [details] [diff] [review]
part 3v7 (refresh) use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily
Comment 87 John Daggett (:jtd) 2011-01-17 04:18:18 PST
Created attachment 504415 [details] [diff] [review]
part4 clean up font logs

clean up font logging and include timing info along with font matching and fallback info when it occurs.
Comment 88 John Daggett (:jtd) 2011-01-17 04:24:55 PST
Created attachment 504418 [details] [diff] [review]
part5, load dwrite early to initiate font cache service initialization

Load dwrite.dll and create a dwrite factory object in a background thread at app startup.  Also, move use of GetSystemFontCollection as far out as possible, in this case until the first time a font is looked up.  The registry calls are purely for debugging use, these provide a simple way of inserting a log event into a run when logging with ProcessMonitor.
Comment 89 Scott A. 2011-01-19 15:08:17 PST
An extension has been posted that brings back basic timing statistics under about:startup What are the numbers with and without p3,4,5 ?

Comment 105: https://bugzilla.mozilla.org/show_bug.cgi?id=593743#c105
Comment 90 Scott A. 2011-01-20 21:13:48 PST
b10 will be branched tomorrow (21st), is the plan to try to get this reviewed and landed before then? As per Whiteboard date of the 21st.
Comment 91 John Daggett (:jtd) 2011-01-21 00:33:41 PST
Created attachment 505726 [details] [diff] [review]
part 3v8 use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily

- read font tables via GDI but via DirectWrite once font face is created
- lazy init dwrite font metrics
- make font substitutes list handling faster
- move cmap loader out to 2min post-startup
Comment 92 John Daggett (:jtd) 2011-01-21 00:38:39 PST
Created attachment 505728 [details] [diff] [review]
part4 clean up font logs

- clean up and centralize logfiles
- runtime init logfiles
- allow dumping of fontlist init timing and contents, textrun and font matching info
Comment 93 John Daggett (:jtd) 2011-01-21 00:47:29 PST
Created attachment 505730 [details] [diff] [review]
part5v5 load dwrite early to initiate font cache service initialization

- load the dwrite dll at app startup and create a factory in bg thread
- defer the call to GetSystemFontCollection for as long as possible

The goal with this is to maximize the time between font cache service startup and the GetSystemFontCollection call.  If the font cache service completes (1) stat'ing system font files and (2) reading in font cache data then the later call to GetSystemFontCollection returns in sub-millisecond time.  If the font cache service hasn't completed it's initialization, then the call to GetSystemFontCollection will initiate file io on all font files.  So the effect of this patch is to lessen the chance of this occurring but it doesn't completely eliminate the chance of this startup delay occurring.  These events only occur on cold startup, on warm startup these calls take sub-millisecond times.
Comment 94 Bas Schouten (:bas.schouten) 2011-01-21 01:52:45 PST
Comment on attachment 505726 [details] [diff] [review]
part 3v8 use GDI table loading in the dwrite case and init gfxDWriteFont metrics lazily

>-static const PRUint32 kDelayBeforeLoadingFonts = 8 * 1000; // 8secs
>-static const PRUint32 kIntervalBetweenLoadingFonts = 150; // 150ms
>+static const PRUint32 kDelayBeforeLoadingFonts = 120 * 1000; // 2 minutes after init
>+static const PRUint32 kIntervalBetweenLoadingFonts = 2000;   // every 2 seconds until complete

I'd like to see some documentation on how these values were chosen. For future reference if we ever revisit this.

>@@ -227,9 +239,34 @@ nsresult
> gfxDWriteFontEntry::GetFontTable(PRUint32 aTableTag,
>                                  FallibleTArray<PRUint8> &aBuffer)
> {
>-    nsRefPtr<IDWriteFontFace> fontFace;
>+    gfxDWriteFontList *pFontList = gfxDWriteFontList::PlatformFontList();
>+
>+    if (mFont && pFontList->UseGDIFontTableAccess()) {
>+        LOGFONTW logfont = { 0 };
>+        if (!InitLogFont(mFont, &logfont))
>+            return NS_ERROR_FAILURE;
>+
>+        AutoDC dc;
>+        AutoSelectFont font(dc.GetDC(), &logfont);
>+        if (font.IsValid()) {
>+            PRInt32 tableSize =
>+                ::GetFontData(dc.GetDC(), NS_SWAP32(aTableTag), 0, NULL, NULL);
>+            if (tableSize != GDI_ERROR) {
>+                if (aBuffer.SetLength(tableSize)) {
>+                    ::GetFontData(dc.GetDC(), NS_SWAP32(aTableTag), 0,
>+                                  aBuffer.Elements(), tableSize);

Although we're checking the return of SetLength here, for future safety in case of things getting moved around I'd prefer passing aBuffer.Length() to the buffer size here.

>+    // if loading via GDI, just use GetFontTable
>+    if (mFont && gfxDWriteFontList::PlatformFontList()->UseGDIFontTableAccess()) {
>+        const PRUint32 kCmapTag = TRUETYPE_TAG('c','m','a','p');
>+        AutoFallibleTArray<PRUint8,16384> buffer;
>+
>+        if (GetFontTable(kCmapTag, buffer) != NS_OK)
>+            return NS_ERROR_FAILURE;
>+        PRUint8 *cmap = buffer.Elements();
>+
>+        PRPackedBool  unicodeFont = PR_FALSE, symbolFont = PR_FALSE;
>+        rv = gfxFontUtils::ReadCMAP(cmap, buffer.Length(),
>+                                             mCharacterMap, mUVSOffset,
>+                                             unicodeFont, symbolFont);

Indentation seems off here.


>@@ -331,6 +391,16 @@ gfxDWriteFontEntry::CreateFontFace(IDWri
>     return NS_OK;
> }
> 
>+PRBool
>+gfxDWriteFontEntry::InitLogFont(IDWriteFont *aFont, LOGFONTW *aLogFont)
>+{
>+    HRESULT hr;
>+
>+    BOOL isInSystemCollection;
>+    hr = gfxDWriteFontList::PlatformFontList()->GetGDIInterop()->ConvertFontToLOGFONT(aFont, aLogFont, &isInSystemCollection);

Let's line-break on one of the dereference operators here.

>+#ifdef PR_LOGGING
>+    gFontInitLog = PR_NewLogModule("fontinit");
>+
>+    LARGE_INTEGER frequency;        // ticks per second
>+    LARGE_INTEGER t1, t2, t3, t4, t5, t6;           // ticks
>+    double elapsedTime, upTime;
>+    char nowTime[256], nowDate[256];
>+
>+    if (LOG_ENABLED()) {    
>+        GetTimeFormat(LOCALE_INVARIANT, TIME_FORCE24HOURFORMAT, 
>+                      NULL, NULL, nowTime, 256);
>+        GetDateFormat(LOCALE_INVARIANT, NULL, NULL, NULL, nowDate, 256);
>+        upTime = (double) GetTickCount();

Not too important but if you store this as double you may want to use GetTickCount64.

>-    // Create the list of FontSubstitutes
>-    nsCOMPtr<nsIWindowsRegKey> regKey = 
>-        do_CreateInstance("@mozilla.org/windows-registry-key;1");
>-    if (!regKey) {
>+    HKEY hKey;
>+    DWORD i, rv, lenAlias, lenActual, valueType;
>+    wchar_t aliasName[MAX_VALUE_NAME];
>+    wchar_t actualName[MAX_VALUE_DATA];

I think the windows APIs use WCHAR, I suggest we use that for consistency, although I don't expect the definitions to ever diverge.

>@@ -494,7 +535,7 @@ gfxDWriteFont::GetHintedGlyphWidth(gfxCo
> 
>     DWRITE_GLYPH_METRICS glyphMetrics;
>     HRESULT hr = mFontFace->GetGdiCompatibleGlyphMetrics(
>-                  GetAdjustedSize(), 1.0f, nsnull, FALSE,
>+                  FLOAT(mAdjustedSize), 1.0f, nsnull, FALSE,

Why this change? Are we 100% sure here the font is initialized? If so we should probably document that.
Comment 95 Bas Schouten (:bas.schouten) 2011-01-21 01:58:03 PST
Comment on attachment 505728 [details] [diff] [review]
part4 clean up font logs

I didn't review this extremely thoroughly since it's mostly logging and seems low-risk, but it looks fine.
Comment 96 Bas Schouten (:bas.schouten) 2011-01-21 02:15:09 PST
Comment on attachment 505730 [details] [diff] [review]
part5v5 load dwrite early to initiate font cache service initialization

We should be clear that this creates a small startup hit for people with a none-DX10 or blacklisted graphics card, where normally Dwrite.dll would never be loaded, nor would a DWrite factory. Since this happens off main-thread it shouldn't be too bad, but some additional IO will occur.

>diff --git a/toolkit/xre/nsAppRunner.cpp b/toolkit/xre/nsAppRunner.cpp
>--- a/toolkit/xre/nsAppRunner.cpp
>+++ b/toolkit/xre/nsAppRunner.cpp
>+static DWORD InitDwriteBG(LPVOID lpdwThreadParam)
>+{
>+  SetThreadPriority(GetCurrentThread(), THREAD_MODE_BACKGROUND_BEGIN);

I've been thinking about this and I'm not sure if this is what we want, on systems which are busy this thread could be starved by our mainthread and simply not execute. I'm thinking we probably want this running at a higher priority than the main thread. This thread is not expected to do a lot of work, but the work -that- it does we want to do it early on and not get deferred by the scheduler. We should double check the 'not do a lot of work' theory though.

>+  LOGREGISTRY(L"loading dwrite.dll");
>+  HMODULE dwdll = LoadLibraryW(L"dwrite.dll");

This LoadLibrary call should technically be balanced here somehow. Not before we load it in platform because then we'd be unloading the library and loading it again, which is a waste. But probably on app destruction.
Comment 97 John Daggett (:jtd) 2011-01-21 07:16:06 PST
Created attachment 505783 [details] [diff] [review]
part 6 avoid font face creation in harfbuzz shaper until needed

The lazy initialization in part 3 was tripped up by changes landed for bug 624310.
Comment 98 John Daggett (:jtd) 2011-01-21 07:34:23 PST
> >     DWRITE_GLYPH_METRICS glyphMetrics;
> >     HRESULT hr = mFontFace->GetGdiCompatibleGlyphMetrics(
> >-                  GetAdjustedSize(), 1.0f, nsnull, FALSE,
> >+                  FLOAT(mAdjustedSize), 1.0f, nsnull, FALSE,
> 
> Why this change? Are we 100% sure here the font is initialized? If so we should
> probably document that.

Yes, mAdjustedSize is calculated at the start of the method!

> >+static DWORD InitDwriteBG(LPVOID lpdwThreadParam)
> >+{
> >+  SetThreadPriority(GetCurrentThread(), THREAD_MODE_BACKGROUND_BEGIN);
> 
> I've been thinking about this and I'm not sure if this is what we want, on
> systems which are busy this thread could be starved by our mainthread and
> simply not execute. I'm thinking we probably want this running at a higher
> priority than the main thread. This thread is not expected to do a lot of work,
> but the work -that- it does we want to do it early on and not get deferred by
> the scheduler. We should double check the 'not do a lot of work' theory though.

My thought was that we would hide the work of this thread in the idle
time among all the io requests that take place at startup.  The code
in that thread doesn't have much to do, it loads the dll and creates a
factory. Most of the work that then takes place is in the font cache
process, in a completely separate process.  I'm going to land it like
this and we can fiddle with it if we find users that are still
hitting the slowdown.
Comment 99 Jonathan Kew (:jfkthame) 2011-01-21 07:44:12 PST
Comment on attachment 505783 [details] [diff] [review]
part 6 avoid font face creation in harfbuzz shaper until needed

Looks ok to me (I didn't test it locally, but I don't see how it could break anything...).
Comment 101 Tomas 2011-01-23 08:08:32 PST
So this bug is fixed?
Comment 102 Joe Drew (not getting mail) 2011-01-23 15:55:19 PST
Let's call this fixed, since all the patches have been pushed, and open a new bug if more work is needed.
Comment 103 John Daggett (:jtd) 2011-01-24 01:01:32 PST
(In reply to comment #102)
> Let's call this fixed, since all the patches have been pushed, and open a new
> bug if more work is needed.

The patches that have landed don't really "fix" the problem they just make it less likely to occur and not as bad when it does occur.  The root problem is the logic in DirectWrite code which (1) starts the FontCache service and then (2) waits a very short time to before deciding whether to start scanning *all* font files.  The FontCache also reads the entire font cache file at startup, which on my test machine had grown to 36MB.  Reading this much data on an underpowered machine takes a long time, so the timeout is much more likely to occur.  There's really no reason the FontCache code needs all that data just to startup.

One thing I tested today was actually whacking the font cache files.  Without a large font cache file around, startup never stalls in GetSystemFontCollection.  

To remove font cache files:

1. Stop the "Windows Font Cache" service
2. Open "C:\Windows\ServiceProfiles\LocalService\AppData\Local"
3. Delete the FontCache* files
4. Restart the font cache service

After running the browser a few times to populate the font cache again, GetSystemFontCollection takes sub-millisecond times on cold startup.
Comment 104 Joe Drew (not getting mail) 2011-01-24 06:59:15 PST
(In reply to comment #103)
> (In reply to comment #102)
> > Let's call this fixed, since all the patches have been pushed, and open a new
> > bug if more work is needed.
> 
> The patches that have landed don't really "fix" the problem they just make it
> less likely to occur and not as bad when it does occur.  The root problem is
> the logic in DirectWrite code which (1) starts the FontCache service and then
> (2) waits a very short time to before deciding whether to start scanning *all*
> font files.  The FontCache also reads the entire font cache file at startup,
> which on my test machine had grown to 36MB.  Reading this much data on an
> underpowered machine takes a long time, so the timeout is much more likely to
> occur.  There's really no reason the FontCache code needs all that data just to
> startup.
> 
> One thing I tested today was actually whacking the font cache files.  Without a
> large font cache file around, startup never stalls in GetSystemFontCollection.  
> 
> To remove font cache files:
> 
> 1. Stop the "Windows Font Cache" service
> 2. Open "C:\Windows\ServiceProfiles\LocalService\AppData\Local"
> 3. Delete the FontCache* files
> 4. Restart the font cache service
> 
> After running the browser a few times to populate the font cache again,
> GetSystemFontCollection takes sub-millisecond times on cold startup.

Let's open a new bug to get Microsoft to look into this.
Comment 105 Scott A. 2011-01-24 12:00:51 PST
Comment 103:  

You are right my font cache file was over 30MB, deleted it, rebuilt browser cache then restart computer and browser started up in about half the time... How long does it take for that file to start getting bad?

Microsoft needs to look into this.
Comment 106 Luke Usherwood 2011-01-26 07:09:41 PST
Cheers for the hard slog on this guys!

I've done a bunch more testing my end. I was hoping to report some good news, but unfortunately the results are mixed.

| Version    | Font Cache, Size | Hardware Accel | Cold Start (approx) |
+------------+------------------+----------------+---------------------+
| 2011-01-25 | Stopped, 39MB    | Disabled       |    2s               |
| 2011-01-25 | Running, 39MB    | Disabled       |    2s               |
| 2011-01-25 | Stopped, 39MB    | Enabled        |   23s               |
| 2011-01-25 | Running, 39MB    | Enabled        |   23s               |
| 2011-01-25 | Stopped,  1MB    | Enabled        |    2s               |
| 2011-01-25 | Running,  1MB    | Enabled        |    2s               |

From comment #103:
> One thing I tested today was actually whacking the font cache files.

Bingo! Looks like that did the trick. Conversely, having the font cache pre-started didn't make any perceivable difference.

>Reading this much data on an underpowered machine takes a long time, so the timeout is much more likely to
occur. 

My key specs: Intel i7-860, Firefox on SSD. Windows on HDD. I suspect the criteria might be more specifically "underpowered hard-drive". I'm also wondering, could having a fast machine actually undermine some of the workarounds here? I.e. if it blats through the startup code and smacks into the faulty MS API call too quickly.

Continuing to test, after purging my font cache:

| Version    | Font Cache, Size | Hardware Accel | Cold Start (approx) |
+------------+------------------+----------------+---------------------+
|  4 beta 7  | Stopped,  1MB    | Enabled        |   14s               |
|  4 beta 9  | Stopped,  1MB    | Enabled        |    2s               |
|  4 beta 9  | Stopped,  7MB    | Enabled        |    2s               |
|  4 beta 10 | Stopped, 13MB    | Enabled        |    7s               |
|  4 beta 10 | Running, 13MB    | Enabled        |    7s               |

On one hand: yay! Purging the font cache manually provides a short-term fix. On the other hand, after around 3 hours of solid testing (and about 30 reboots!) that cold time started creeping up again. :-|  I suspect the speed regression will kick in pretty quickly for users browsing, as the font cache would grow faster than in my tests. I was only visiting a small handful of websites while testing.

I can provide Resource Monitor screenshots and/or Process Monitor runs if either would help, although they pretty much just confirm what's known. Warm-Start: reads about 6 fonts. Cold start: enumerates heaps of font files (not the font CACHE files) over a period of ~4 seconds in the final run listed.
Comment 107 John Daggett (:jtd) 2011-01-26 12:40:54 PST
Luke, thanks for all your testing!  Everything you've described is similar to the behavior I've seen during my testing.  

One thing to note is that with the font cache service set to "Automatic (deferred)" I don't think it does much during the first 60 seconds after startup.  Specifically, I don't think it does any initialization unless forced by an app starting.  I did one round of testing where if you start the browser 15 seconds after the desktop appears, the browser hits the bad case but if you start it 60 seconds after startup it never does.  The key here is "when does the font cache process finish reading font cache files?"

One gotcha with this bug is that as we improve startup time in other ways, we increase the chance of hitting this bad DirectWrite behavior. :(
Comment 108 Csaba Kozák [:WonderCsabo] 2011-01-26 12:42:13 PST
Any feedback from Microsoft about this?
Comment 109 John Daggett (:jtd) 2011-01-26 12:50:02 PST
(In reply to comment #108)
> Any feedback from Microsoft about this?

I've been in contact with some folks on the DirectWrite team for a couple months now.  I sent another mail this week with the detail below but haven't heard anything back from them just yet.

Mail to MS guys:

We've worked around some of the problems with DirectWrite font
I/O at cold startup but there are still cases where DirectWrite
client code will perform I/O on all font files in the font folder
at startup and cause an overall slowdown in app startup.  The
same scenarios affect IE9 also.

I saw the fix that went out via Windows Update (KB2454826) that
updates the DirectWrite and FontCache libraries and sets the
default setting of the Windows Font Cache service to "Automatic
(deferred)".  That's great but it doesn't solve what seems to be
a fairly basic flaw in the way DirectWrite client code
communicates with the Windows Font Cache service.

The key factor seems to be the size of the font cache on disk
(i.e. the size of the files
%systemroot%\ServiceProfiles\LocalService\AppData\Local\FontCache
-*.dat).  The Windows Font Cache service appears to need to
read the *entire* contents of these files before responding to
even very basic DirectWrite client code requests.  If these files
are large enough to take more than a few seconds to read then the
DirectWrite code will skip using the font cache service and do
its own thing, i.e. read data from *all* fonts in the font folder.

During the initial startup of an app that uses DirectWrite, the
sequence of events appears to be:

1. DirectWrite client app calls the DWriteCreateFactory function.
   This function checks whether the Windows Font Cache service is
   running or not and starts it up if it's not.

2. The Windows Font Cache service starts.

2a. The font cache service scans the font list in the registry
    (HKLM\Software\Microsoft\Windows NT\CurrentVersion\Fonts) and
    stat's the individual font files.

2b. The font cache service reads in the *entire* contents of the
    font cache files stored in the system folder
    (%systemroot%\ServiceProfiles\LocalService\AppData\Local\
    FontCache-*.dat).  On my Atom netbook running Win7 Starter
    this file is 36MB, which takes 7 seconds to read entirely.

3. The client app looks up a font, either using via the system
   font collection or the GDI interop object.

3a. If the font cache has completed (2b) then the DirectWrite
    code retrieves font list information via the font cache service.

3b. However, if (2b) has not completed then the DirectWrite code
    will construct font lists manually (i.e. by reading all fonts in
    the font folder).  Depending on the number of fonts in the font
    folder and I/O system performance, this can take anywhere from a
    few seconds to more than 30 seconds.  This is especially true for
    netbook-level processors running file scanning virus checkers.

If (3b) occurs, then the DirectWrite will do direct, uncached
font file I/O for the lifetime of the client app.  It will read
the *entire* font into memory when a IDWriteFontFace object is
created.

As the font cache file grows larger, the probability that (3b)
will occur grows larger, proportional to the relative performance
of the I/O subsystem and the size of the client app, which will
be competing for I/O time with the Font Cache service when both
startup simultaneously.

From my naive point of view, I see a couple ways of fixing this:

1. Upon activation, the font cache service reads the font list in
the registry (2a) and then a small chunk of data containing only
global font list info (think 50K, 100K), reducing greatly the
chance that DirectWrite client code will ever time out.

2. The remainder of the font cache file containing per-glyph data
is memory mapped and read on demand, or at least on a background
thread, rather than entirely at font cache startup.

Some sort of fix like this would definitely be preferable to
telling users who run into slowdowns to clean out font cache
files periodically.
Comment 110 John Daggett (:jtd) 2011-01-26 17:32:39 PST
One more added note on diagnosing this problem.  The patches that landed included additional logging that times different DirectWrite API calls during font initialization (when logging is enabled).

To enable logging of font initialization:

1. Control Panels > System > Advanced Settings > Advanced tab > Environment Variables
2. Define two new user environment variables, replacing 'username' with your local account name:

  NSPR_LOG_MODULES=fontinit:5
  NSPR_LOG_FILE=C:\Users\username\Desktop\fontlog.txt

3. Run Minefield

Result: after pages appear a file 'fontlog.txt' will appear on the desktop.

If the GetSystemFontCollection time is over 100ms, you're hitting the bad scenario described above.  Typical 'bad scenario' times on cold startup are 4-7 seconds, more if you're using a slow I/O system or have a file scanning virus checker enabled.  In the normal scenario, the call to GetSystemFontCollection completes in sub-millisecond time.

Be sure to delete the environment variables when you're done testing to avoid unnecessary I/O at startup.
Comment 111 John Daggett (:jtd) 2011-01-30 18:03:29 PST
Feedback from MS: they're currently testing a fix that will eliminate excessive font I/O at startup for applications using DirectWrite.  The details of when and how this will be released are still under discussion within MS.
Comment 112 Scott A. 2011-01-30 18:35:46 PST
(In reply to comment #111)
> Feedback from MS: they're currently testing a fix that will eliminate excessive
> font I/O at startup for applications using DirectWrite.  The details of when
> and how this will be released are still under discussion within MS.

That's good news, hopefully it will happen before Firefox 4.0 final is released.
Comment 113 (dormant account) 2011-02-09 14:31:31 PST
So I still see a lot of overhead caused by d2d. Can we just use a heuristic to figure out if d2d is started? ie check the process listing for the font cache process?
Comment 114 John Daggett (:jtd) 2011-02-09 17:33:32 PST
(In reply to comment #113)
> So I still see a lot of overhead caused by d2d. 

I think you mean dwrite-related font i/o, right? As described in
comment 103, the fixes that landed only attempt to avoid
unnecessary DirectWrite library initialization I/O as much as
possible, it doesn't fix or completely workaround the problem.

Can you include the info below for the machine you're testing with?

1. DirectWrite version and font cache size (see about:support)
2. Output of enabling the startup logging described in comment 110

The key number in the logging file is the time spent in
GetSystemFontCollection, as that's the main place font I/O takes
place.

> Can we just use a heuristic to figure out if d2d is started? ie
> check the process listing for the font cache process?

It's not a d2d process, it's the font cache service I think you
mean.  With Microsoft's latest updates to DirectWrite (see
comment 77), the font cache service is set to "Automatic
Deferred" which basically seems to mean that the process starts
up but holds off on doing any initialization for a short time
after startup (60 secs?).  If an dwrite app starts during that
time, then initialization happens when a DirectWrite factory
object is created.  Comment 109 provides the details of the
scenario under which this happens.  So the process exists when
Firefox starts, it just hasn't done any initialization yet.
Talking with DirectWrite engineers at Microsoft, there's no easy
way to query whether font cache initialization has
started/completed.  So the answer to your question is "no".

As noted in comment 111, Microsoft says they are testing a fix
for this problem.  The only way to avoid unnecessary font I/O at
startup in all cases is to completely disable D2D/DirectWrite. 
That will disable all hardware acceleration and disable the
better quality font rendering that DirectWrite provides.
Comment 115 (dormant account) 2011-02-09 17:40:35 PST
Sorry about calling it d2d. So can we just not start using directwrite until a few minutes into startup? Otherwise we get into a ridiculous situation of 20second startups even on high end systems. ie https://bugzilla.mozilla.org/show_bug.cgi?id=627591#c80

Another way would be make a dummy executable that we spawn on the side to make sure it takes the hit for reading in fonts without blocking our UI.

I asked Brian to post his log.
Comment 116 John Daggett (:jtd) 2011-02-09 18:02:03 PST
(In reply to comment #115)
> Sorry about calling it d2d. So can we just not start using
> directwrite until a few minutes into startup? Otherwise we get
> into a ridiculous situation of 20second startups even on high
> end systems. ie
> https://bugzilla.mozilla.org/show_bug.cgi?id=627591#c80

Um, to draw text we need a DirectWrite font object.  Any API we
use to get a DirectWrite font, either via the system font
collection or via the GDI interop object suffers from the
DirectWrite initialization problems described in comment 109. 
The part 5 patch tries to make the time between initial startup
and first use as long as possible but if the font cache file is
big enough, the font cache service won't complete reading the
file in time and the DirectWrite code will do file I/O on all
fonts.  Note the results related to font cache file size that
Luke posted in comment 106.
Comment 117 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-02-09 19:18:30 PST
(In reply to comment #115)
> Sorry about calling it d2d. So can we just not start using directwrite until a
> few minutes into startup?

We could do what we do with D3D9 and start up using GDI and switch to D3D10/DWrite "later". The problem is that the DWrite text looks different and has different metrics. So when we switch over, there's going to be a jarring transition. That's not a good option.
Comment 118 Brian Carpenter [:geeknik] 2011-02-10 01:33:34 PST
Here is the fontlog.txt from my system (3.2ghz i7-920, 6GB DDR3, RAID0+1 (4x250 WD Enterprise class SATA drivers) GeForce 8600GT, etc.

0[82d140]: InitFontList
0[82d140]: Start: 02/10/2011 03:28:53
0[82d140]: Uptime: 30882.671 s
0[82d140]: dwrite version: 6.1.7601.17514
0[82d140]: Total time in InitFontList:        0.018 ms
0[82d140]:  --- gfxPlatformFontList init:     0.006 ms
0[82d140]:  --- GdiInterop object:            0.012 ms
0[82d140]: DelayedInitFontList
0[82d140]: Start: 02/10/2011 03:28:54
0[82d140]: Uptime: 30882.906 s
0[82d140]: dwrite version: 6.1.7601.17514
0[82d140]: Total time in DelayedInitFontList:        0.960 ms (families: 196, gdi table access)
0[82d140]:  --- GetSystemFontCollection:      0.036 ms
0[82d140]:  --- iterate over families:        0.923 ms

And

DirectWrite Enabled true (6.1.7601.17514, font cache 8.05 MB)
Comment 119 Brian Carpenter [:geeknik] 2011-02-10 01:40:20 PST
And here is the fontlog.txt from my system after I stopped the Windows Font Cache service, removed all of the cache files and restarted the Windows Font Cache service:

0[72d140]: InitFontList
0[72d140]: Start: 02/10/2011 03:37:54
0[72d140]: Uptime: 31423.343 s
0[72d140]: dwrite version: 6.1.7601.17514
0[72d140]: Total time in InitFontList:        0.019 ms
0[72d140]:  --- gfxPlatformFontList init:     0.006 ms
0[72d140]:  --- GdiInterop object:            0.013 ms
0[72d140]: DelayedInitFontList
0[72d140]: Start: 02/10/2011 03:37:54
0[72d140]: Uptime: 31423.796 s
0[72d140]: dwrite version: 6.1.7601.17514
0[72d140]: Total time in DelayedInitFontList:        0.956 ms (families: 196, gdi table access)
0[72d140]:  --- GetSystemFontCollection:      0.015 ms
0[72d140]:  --- iterate over families:        0.941 ms

And

DirectWrite Enabledtrue (6.1.7601.17514, font cache n/a)
Comment 120 John Daggett (:jtd) 2011-02-10 15:48:48 PST
(In reply to comment #119)

Thanks Brian.  Those times look fine but they appear to be *warm* startup times, the uptime number is the number of seconds since a reboot.  My guess is your cold startup times will be faster after cleaning out the font cache.

When you have a chance, could you post the numbers for cold startup on your machine?
Comment 121 (dormant account) 2011-02-11 10:07:52 PST
(In reply to comment #117)
> (In reply to comment #115)
> > Sorry about calling it d2d. So can we just not start using directwrite until a
> > few minutes into startup?
> 
> We could do what we do with D3D9 and start up using GDI and switch to
> D3D10/DWrite "later". The problem is that the DWrite text looks different and
> has different metrics. So when we switch over, there's going to be a jarring
> transition. That's not a good option.

It's a much better option than making Firefox the slowest browser ever to startup(by like 10seconds)
Comment 122 Robert O'Callahan (:roc) (email my personal email if necessary) 2011-02-11 14:07:03 PST
OK then, you'd better file a blocking bug for that approach.
Comment 123 Luke Usherwood 2011-02-14 08:55:21 PST
(In reply to comment #107)
> Luke, thanks for all your testing!  Everything you've described is similar to
> the behavior I've seen during my testing.  
Cheers, and sorry for late reply (daughter was in hospital).

> One thing to note is that with the font cache service set to "Automatic
> (deferred)" I don't think it does much during the first 60 seconds after
> startup.  Specifically, I don't think it does any initialization unless forced
> by an app starting.  I did one round of testing where if you start the browser
> 15 seconds after the desktop appears, the browser hits the bad case but if you
> start it 60 seconds after startup it never does.  The key here is "when does
> the font cache process finish reading font cache files?"

Yup, it's a pain to test this one reliably huh? :-)  Soooo many factors. 

While I was away two things happened: 
 1. Beta 11 came out. 
 2. Windows Update sent me: http://support.microsoft.com/?kbid=2454826, only last week.

Not sure why my update arrived so much later than for others. Also it was "Optional, Recommended" rather than "Important" as per comment #77. (Did they release it differently for 64bit Windows?)

After this, I started getting "bad" 6 second cold-start times, even with a recently-cleared font cache. A bunch of reboots later, and I'm down to nice 2s cold starts again.  So I assume this was just Windows Precache and/or Superfetch having not "warmed up" to the changed binaries. As you say, it may depend a lot on time-since-boot. (If I have time) in the future, I might use a boot script with fixed delays to help reduce this effect.

One weird behavior caught by Process Monitor in the slower cases was Firefox iterating many font files, calling EngUnmapFontFileFD on each. This took about 0.9 seconds. (Only kernel calls were shown in these call-stacks. Before and after, user calls to DWriteCreateFactory were seen. All on the same thread, BTW.) This no longer appears to happen, whether started soon or many minutes after Windows start. Weird.

(In reply to comment #109)
> Mail to MS guys:
Thanks for posting that! Nice email, BTW.

Initially I had expected that when the font cache file grew past a certain fuzzy threshold, it would jump from the 2s good case right up to 23s for the bad. So I was surprised by my result of 7s for a medium-sized font cache.  I see that the cache service reads the cached data simultaneously with the Direct Write application scanning font files, and non-cached to boot! Still, it's hard to fathom how a 26MB difference in font-cache data equates to a 16s difference in time. Can that just be down to non-cached reads? (Or other locking/contention in DirectWrite? Or it just does more work?) Anyway, no point in me armchair-reasoning further - I look forward to hearing back from the folk at MS!
Comment 124 John Daggett (:jtd) 2011-02-14 22:53:35 PST
(In reply to comment #123)
> Initially I had expected that when the font cache file grew past a certain
> fuzzy threshold, it would jump from the 2s good case right up to 23s for the
> bad. So I was surprised by my result of 7s for a medium-sized font cache.  I
> see that the cache service reads the cached data simultaneously with the Direct
> Write application scanning font files, and non-cached to boot! Still, it's hard
> to fathom how a 26MB difference in font-cache data equates to a 16s difference
> in time. Can that just be down to non-cached reads? (Or other
> locking/contention in DirectWrite? Or it just does more work?

Keep in mind that this takes place at startup when the process is attempting to load in *lots* of data from disk.  So i/o that would normally happen fairly quickly is now contending with other threads and in the case of the font cache service, other processes.
Comment 125 Luke Usherwood 2011-02-15 07:23:45 PST
Any tips on how to quickly grow the font cache? :-) E.g. so it's ready to test back at its "steady-state" level, if the MS fix comes out sooner rather than later.
Comment 126 John Daggett (:jtd) 2011-03-08 13:57:53 PST
Yay!!!!

KB 2505438:

"Slow performance in applications that use the DirectWrite API on a computer that is running Windows 7 or Windows Server 2008 R2"

http://support.microsoft.com/kb/2505438
Comment 127 Ed Morley [:emorley] 2011-03-08 17:55:09 PST
I've looked through the KB article, but there doesn't seem to be any mention of whether the fix is going to appear in Windows Update, or whether it's one of the self-serve ones. Also, unless it gets marked as important, it will only be optional / installed if the user notices. 

If it isn't going to be on WU (or if as optional only), then is it viable to ship the hotfix with the Firefox installer?

(I've checked Windows Updates on my pre-sp1 machine and it hasn't appeared yet)
Comment 128 Joe Drew (not getting mail) 2011-03-08 17:57:09 PST
Doing a manual "Check for updates" shows the fix appears and is selected by default; it'll therefore be pushed out to users. It's all good!
Comment 129 Luke Usherwood 2011-03-09 00:26:32 PST
Yay indeed! I can confirm this issue is busted!

(In reply to comment #128)
> Doing a manual "Check for updates" shows the fix appears and is selected by
> default; it'll therefore be pushed out to users. It's all good!

This got pushed out to my Win7-64 bit machines with the Tuesday updates, which both had SP1 installed. However in both cases it was Optional and *deselected* by default. 

I wonder what the difference is? (Perhaps because I don't have IE9 installed? I hope M$ aren't playing dirty after the help you've given them!!)

Recapping my cold-start times:

| Version    | Font Cache, Size | Hardware Accel | Cold Start (approx) |
+------------+------------------+----------------+---------------------+
| 2011-01-25 | Stopped, 39MB    | Disabled       |    2s               |
| 2011-01-25 | Running, 39MB    | Disabled       |    2s               |
| 2011-01-25 | Stopped, 39MB    | Enabled        |   23s               |
| 2011-01-25 | Running, 39MB    | Enabled        |   23s               |
|  4 beta 7  | Stopped,  1MB    | Enabled        |   14s               |
|  4 beta 9  | Stopped,  7MB    | Enabled        |    2s               |
|  4 beta 10 | Stopped, 13MB    | Enabled        |    7s               |
|  4 beta 10 | Running, 13MB    | Enabled        |    7s               |
|                  ------|  After MS update |-----                     |
|  4 beta 12 | Stopped, 14MB    | Enabled        |    2s (Yay!)        |
|  4 beta 12 | Running, 14MB    | Enabled        |    2s               |

There's about 5 seconds of HDD activity, but FF is responsive within 2-3s of cold-start in my single-tab test. No sign of fonts being touched anymore in Resource Monitor's Disk view.
Comment 130 John Daggett (:jtd) 2011-03-09 05:42:56 PST
(In reply to comment #129)
> Yay indeed! I can confirm this issue is busted!

Thanks for running these tests Luke!
 
> (In reply to comment #128)
> > Doing a manual "Check for updates" shows the fix appears and is selected by
> > default; it'll therefore be pushed out to users. It's all good!
> 
> This got pushed out to my Win7-64 bit machines with the Tuesday updates, which
> both had SP1 installed. However in both cases it was Optional and *deselected*
> by default. 
> 
> I wonder what the difference is? (Perhaps because I don't have IE9 installed? I
> hope M$ aren't playing dirty after the help you've given them!!)

Maybe certain updates are not pushed to all users immediately?  In other words, it's required for some, optional for others, as a way of catching problems before it goes to all users?

MS has actually been pretty good about getting back to us about this issue.  A lot of times it was "can't say yet" but they always said clearly "we want to fix this soon, it's just a matter of when".  With IE9 releasing soon, I'm sure the DirectWrite team was under pressure to get this fixed.
Comment 131 Leman Bennett [Omega] 2011-03-09 10:14:29 PST
That update on my Windows 7 machine was marked "Recommended" and included with the security patches when the install happened.
Comment 132 Harsh86 2011-03-09 10:24:45 PST
I'm using Windows 7 32bit - w/o sp1 and I cant see the update in Windows Update.

Perhaps its only shown to users who have installed SP1?
Comment 133 [not reading bugmail] 2011-03-09 10:38:14 PST
(In reply to comment #132)
> I'm using Windows 7 32bit - w/o sp1 and I cant see the update in Windows
> Update.
> 
> Perhaps its only shown to users who have installed SP1?

No, It shows SP1 and this KB 2505438 as separate items in the windows update list.  I could install it w/o SP1 installed.
Comment 134 Csaba Kozák [:WonderCsabo] 2011-03-09 10:53:13 PST
The update was "important" and automatically checked in on my computer. I have Win7 SP1 with IE9.
Comment 135 Luke Usherwood 2011-03-10 01:07:35 PST
Huh, the differences reported with the two KB updates probably has a very simple explanation:

Under: "Control Panel\System and Security\Windows Update", Change settings:

   [ ] Give me recommended updates the same way I receive important updates

I have this deselected. Others seeing them as "important" probably just have this option on.

(Hangs head in shame at previous off-cuff comment towards M$... I'll still call them M$ though :-)
Comment 136 Virtual_ManPL [:Virtual] - (ni? me) 2011-03-10 01:49:13 PST
(In reply to comment #135)
> Under: "Control Panel\System and Security\Windows Update", Change settings:
> 
>    [ ] Give me recommended updates the same way I receive important updates

This is enabled by default in Windows 7 as far I remember
Comment 137 Harald Langhammer 2011-03-10 04:32:20 PST
Not by default. After installation of Win7 at first start you have to answer the question which update settings you want: "Use recommended settings" (option above is enabled), "Install important updates only" (option above is disabled) or "Ask me later" (I assume here the option is disabled too).
Comment 138 Harsh86 2011-03-10 04:41:44 PST
(In reply to comment #133)
> (In reply to comment #132)
> > I'm using Windows 7 32bit - w/o sp1 and I cant see the update in Windows
> > Update.
> > 
> > Perhaps its only shown to users who have installed SP1?
> 
> No, It shows SP1 and this KB 2505438 as separate items in the windows update
> list.  I could install it w/o SP1 installed.

Weird. I just installed SP1 and as soon as I restart, I have 4 new updates available to install. One of which is KB2505438 and its recommended and checked by default.
Comment 139 Amarjeet Singh Rai 2012-01-27 07:17:12 PST
Needs to be re-opened.

That has been happening to me for a very long time.
I finally found out what it is.

I have a lot of fonts. So I have to wait a long time for this.

Only load the fonts needed please fix this.
Comment 140 Jo Hermans 2012-01-27 09:53:58 PST
What version of Windows are you running ? Is it up-to-date ?

Note You need to log in before you can comment on or make changes to this bug.