Closed Bug 705594 Opened 13 years ago Closed 12 years ago

optimize system font fallback

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla13

People

(Reporter: jtd, Assigned: jtd)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [Snappy:P1])

Attachments

(10 files, 16 obsolete files)

2.75 KB, patch
roc
: review+
Details | Diff | Splinter Review
37.31 KB, patch
roc
: review-
Details | Diff | Splinter Review
3.61 KB, patch
roc
: review+
Details | Diff | Splinter Review
21.80 KB, patch
jtd
: review+
Details | Diff | Splinter Review
13.33 KB, patch
jtd
: review+
Details | Diff | Splinter Review
44.22 KB, patch
jtd
: review+
Details | Diff | Splinter Review
12.64 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
8.32 KB, patch
roc
: review+
Details | Diff | Splinter Review
8.56 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
14.38 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
When looking up a font for a given character, the font code in gfx currently:

1. Searches for the character in the fonts in the font list (e.g. "Helvetica, Arial, sans-serif")
2. Searches the pref fonts, based on language and script
3. Performs system font fallback

The third step is obviously the most expensive one, where the code selects a font for a character from among all the fonts on the system.  This results in our code always reading in cmaps for all fonts on the system.  It would be simpler and faster to avoid as much work as possible here and to avoid pulling in cmaps for all fonts on the system, especially close to startup.

My initial proposal would be to:

1. Implement better hard-coded font fallbacks, essentially check specific fonts
after pref fonts are checked.

2. Build and cache bitvectors for codepoint ranges within the union of all codepoints for a given family, since there's a lot of correlation in cmap content across fonts in the same family.  Jonathan already has patches for doing the cmap union operations.

3. Use the codepoint range bitvectors during system font fallback to reduce the number of cmaps that need to be checked.

Hopefully this will both speed up font fallback and reduce the need to do lots of font i/o.  Naturally, we can test different tweaks when we start measuring actual performance.

The tricky part of this is that whatever information is stored related to fonts on the system has to be updated when new fonts are added.  Doing this isn't always easy or 100% reliable on all platforms/APIs.

Note: bug 600713 was filed to simply cache the cmaps in the startup cache to avoid all the i/o caused by that.
(In reply to John Daggett (:jtd) from comment #0)
> 1. Implement better hard-coded font fallbacks, essentially check specific
> fonts
> after pref fonts are checked.

Sounds good. I think we should do this and then measure the effect before we decide whether it's worth doing anything else.

> The tricky part of this is that whatever information is stored related to
> fonts on the system has to be updated when new fonts are added.  Doing this
> isn't always easy or 100% reliable on all platforms/APIs.

I think this could be done without requiring the cache to be reliable and in sync with system font changes. Just treat the cache as a hint to try certain fonts first. If those fonts don't exist, or they don't have the characters the cache said they might have, that's OK. If new fonts have been added that aren't in the cache, that's OK too, we'll fall back to the current enumerate-all-fonts code.
(In reply to John Daggett (:jtd) from comment #0)
> 1. Implement better hard-coded font fallbacks, essentially check specific
> fonts
> after pref fonts are checked.

This sounds like bug 560472.
(In reply to Jonathan Kew (:jfkthame) from comment #2)
> (In reply to John Daggett (:jtd) from comment #0)
> > 1. Implement better hard-coded font fallbacks, essentially check specific
> > fonts
> > after pref fonts are checked.
> 
> This sounds like bug 560472.

Yes, except that's really about poor font choice, this is about performance.  I do think the result would be that we fix a number of the "poor font choice" bugs that we have floating around.
(In reply to John Daggett (:jtd) from comment #3)
> (In reply to Jonathan Kew (:jfkthame) from comment #2)
> > (In reply to John Daggett (:jtd) from comment #0)
> > > 1. Implement better hard-coded font fallbacks, essentially check specific
> > > fonts
> > > after pref fonts are checked.
> > 
> > This sounds like bug 560472.
> 
> Yes, except that's really about poor font choice, this is about performance.
> I do think the result would be that we fix a number of the "poor font
> choice" bugs that we have floating around.

True, that bug was filed for a different reason; however, I think the fix is the same.

Basically, I think we can do this by improving the behavior of gfxFontGroup::WhichPrefFontSupportsChar(). As a first step, we should get rid of the use of FindCharUnicodeRange here (which in any case is not up-to-date with the current Unicode character repertoire), and instead make use of the fact that we've already resolved script runs by the time we do font matching. So we can look up pref fonts based on the script; then all we need to do is ensure that we provide a suitable list of preferred fonts for each script code. Our current font.name-list.* prefs are rather variable; in many cases, we only provide a single name.
(In reply to Jonathan Kew (:jfkthame) from comment #4)
> Basically, I think we can do this by improving the behavior of
> gfxFontGroup::WhichPrefFontSupportsChar(). As a first step, we should get
> rid of the use of FindCharUnicodeRange here (which in any case is not
> up-to-date with the current Unicode character repertoire), and instead make
> use of the fact that we've already resolved script runs by the time we do
> font matching. So we can look up pref fonts based on the script; then all we
> need to do is ensure that we provide a suitable list of preferred fonts for
> each script code. Our current font.name-list.* prefs are rather variable; in
> many cases, we only provide a single name.

I do think it makes sense to do this per-script, not sure I think we need to swizzle the pref font scheme all around but it wouldn't hurt to try that.
Blocks: 705258
Whiteboard: [Snappy]
Whiteboard: [Snappy] → [Snappy:P1]
The idea behind this is to (1) try explicit hard-coded fallbacks and
then (2) iterate over the system font list by family.  For each family
store a very small representation of which blocks/ranges that are supported
by at least some of the fonts in that family.  For a given character, skip all
fonts that don't support a given block or range.

The simplest way to do this would be to simply use a bitvector of blocks
but this turns out to be fairly inefficient, since there are large
ranges in Unicode where there are only a handful of fonts that support
that range and they are the same for the entire range.  So I've designed
this to use a combination of a bitvector of blocks within certain ranges
and flags for the rest.

High-level division:

+        rangeLowerBMP    = 0,      //     U+0000:4DFF
+        rangeHan         = 1,      //     U+4E00:9FFF
+        rangeMidBMP      = 2,      //     U+A000:ABFF
+        rangeHangul      = 3,      //     U+AC00:D7FF
+        rangeSurrogates  = 4,      //     U+D800:DFFF
+        rangePrivateUse  = 5,      //     U+E000:F8FF
+        rangeUpperBMP    = 6,      //     U+F900:FFFF
+        rangeUpperPlane1 = 7,      //   U+10000:1FFFF
+        rangeUpperPlane2 = 8,      //   U+20000:2FFFF
+        rangeHigherPlane = 9,      // U+30000:10FFFFF

For codepoints in the lower, middle and higher BMP ranges above,
individual blocks are flagged in the bitvector.  For other ranges
there's only a single bit for the entire range (e.g. the entire first
upper plane is flagged with just a single bit).  By doing this, we
reduce the storage needed to 4 longs per family and this should be easy
to store in the startup cache.  These are just *hints*, I'm still
assuming these are only used with system font fallback.  Before use the 
cmap is still read but we'll only be doing that for a small subset of
fonts on the system.

I haven't started testing this and the code doesn't yet use the startup
cache, I want to test this code first.  Next step is to flush out the
list of hard-coded fallback fonts per platform, based on fallback data and 
manual selection.
Next step is to complete the font fallback tables for Windows/Android, then add in code to store the data in the startup cache.
Attachment #581555 - Attachment is obsolete: true
Blocks: 710727
Data columns are size of cmap bitvector, checksum, family name.
Comment on attachment 585969 [details]
cmap checksum per font family, Windows 7 JA

Argh, wrong bug...
Attachment #585969 - Attachment is obsolete: true
Depends on: 716762
(In reply to Benoit Girard (:BenWa) from comment #12)
> Here's the profile data I collected with the built in profiler from bug
> 720257:
> http://varium.fantasytalesonline.com/cleopatra/
> ?report=AMIfv96M3X2ql5AOZIX9zieyKiTZTbPEyRwGlYLQ_lBPxNkbgAuA3uwE942LY5XfO2yZT
> 2zRTBrbBXOjuullZ32vnIbOmbW128gimaBqcbH5cw7XPUGmqGKKTUiX2zlpw0Ne6wqkxq3stLtVxI
> Ohb6BGI8e_agBHtQ

That data is somewhat cryptic.  If you can reproduce this run, it would be helpful to have logfile data for the same startup sequence.

  export NSPR_LOG_MODULES=fontlist:5,textrun:5
  export NSPR_LOG_FILE=/path/to/fontdata.out

This will dump system fallback data so that we can confirm that the fix for this bug will eliminate this stall.
This implements several improvements over the current way system font fallback is done.

Current method: iterate over all fonts in all families and pick the one that most closely matches the intended style

Patch method:

1. Use common fallback fonts
  - first try a list of explicit fallbacks for different ranges/scripts

2. Use system fallback

  - on OSX, use CoreText method which seems to have fairly good worse-case upper bound

  - on Windows, iterate over families used a cached list of coverage maps of a given family
    o coverage indicates which blocks the family supports
    o individual codepoints are still checked by reading cmap
    o time spent doing fallback is limited by a timeout (set via pref)
    o goal is to keep the number of cmaps read to an absolute minimum
    o without coverage map (i.e. first time/new font) font family is skipped

On OSX, the CoreText methods seems sufficient.  On Windows, all the codepoints covered by existing fonts are handled via common fallback fonts.  Because the coverage maps are cached there's a potential for these to be out of sync with the underlying font set, that's the compromise about this approach, we fix an upper bound on the time spent reading cmaps in exchange for potentially missing fallback fonts in some situations.

There may be a method to use DirectWrite to detect the existence of a fallback font, I need to experiment with that a bit more.
Attachment #582252 - Attachment is obsolete: true
First system font fallback, OSX cold startup
Loading wikipedia list
http://meta.wikimedia.org/wiki/List_of_Wikipedias

Latest nightly:
time: 961ms
cmaps read: 323

Debug build with patch:
time: 124ms
cmaps read: 33
John, looks like you're making good progress. As this is marked as a Snappy P1, do you have a timeline for when you expect to have a Windows patch ready to test?
Handle non-BMP codepoints in nsUnicodeRange for better mapping to pref
fonts. Adjust the pref fonts to include more fallback fonts.  Add Lao
to the list of fonts that require shaping on OSX.
Attachment #590678 - Attachment is obsolete: true
Attachment #592155 - Flags: review?(jfkthame)
This patch adds in the fallback logic described in comment 6.  Use
hard-coded fallbacks per platform and if that fails, search a select
set of fallback fonts based on block coverage maps stored in the
startup cache.  Fallback is also explicitly time limited, if fallback
times out after a pref-determined number of ms, fallback fails.  This
will generally only happen with non-default fonts.

Compared to the current code, there's a potential for regressions, but
I think these are largely limited to non-default fonts for which block
coverage info hasn't been cached.

I put in a pref 'gfx.font_rendering.fallback.always_use_cmaps', which
is close to what the existing code does, in case a user runs into a
regression with non-default fonts for less-common scripts.

The cache of block coverage data is updated when new fonts are added
and verified once weekly in the existing background loader code.  This
could definitely be smarter, we could stat font files directly on
Windows, run it in a background thread, etc. But I'd prefer to wait on
those changes until we're sure this doesn't cause a large number of
regressions in terms of fallback quality.
Attachment #592157 - Flags: review?(roc)
On OSX, use the CoreText fallback code instead of searching cmaps. 
This uses CoreText to find a font family, then loads the cmap.  Even
after cold startup, the first fallback is on the 100ms, which is much
better than the hit from reading in all cmaps.

is relatively fast without a huge hit on first use (first fallback
after cold startup was under 100ms under 10.6).
Attachment #592158 - Flags: review?(jfkthame)
Comment on attachment 592157 [details] [diff] [review]
patch, part 2 restructure system font fallback

Review of attachment 592157 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.h
@@ +489,2 @@
>  // used when picking fallback font
>  struct FontSearch {

This really needs a better comment describing how it's used.

@@ +491,5 @@
> +    FontSearch(const PRUint32 aCharacter,
> +               PRInt32 aRunScript,
> +               const gfxFontStyle *aStyle,
> +               PRUint32 aRangeFlag = 0,
> +               PRUint32 aBlockIndex = 0,

document aRangeFlag and aBlockIndex

@@ +493,5 @@
> +               const gfxFontStyle *aStyle,
> +               PRUint32 aRangeFlag = 0,
> +               PRUint32 aBlockIndex = 0,
> +               bool aAlwaysUseCmaps = false,
> +               const mozilla::TimeStamp* aStartTime = nsnull) :

Why not just make this
  mozilla::TimeStamp aStartTime = mozilla::TimeStamp::Now()
?

Also, add a typedef mozilla::TimeStamp TimeStamp; to FontSearch so you don't have to keep using the prefix.

@@ +520,5 @@
> +    mozilla::TimeStamp     mStartTime;  // used to time limit search
> +    bool                   mStop;       // true when time limit reached
> +};
> +
> +struct FontCoverage {

This needs a big comment describing how it's used and what ranges it can represent.

I'm not convinced this is the simplest effective approach, but I could be if it was better documented. Some obvious alternatives that we should at least consider (and justify rejecting, if we reject them):

* Using mBlockRanges for everything, i.e., storing 1 bit per 256 characters to indicate whether the font supports a character in that block

* Defining a list of ranges, with an enum value naming each range, and an array giving the limits of each range, and then just storing 1 bit per range, not treating mBlockRanges differently or having per-range code like in the Range() method. Implement Range() with an array of that maps the character's upper byte to the first range overlapping that block.

@@ +522,5 @@
> +};
> +
> +struct FontCoverage {
> +
> +    FontCoverage() : mFlags(0) {

Blank line not needed above

@@ +552,5 @@
> +        rangeUpperBMP    = 6,      //     U+F900:FFEF
> +        rangeSpecials    = 7,      //     U+FFF0:FFFF
> +        rangeUpperPlane1 = 8,      //   U+10000:1FFFF
> +        rangeUpperPlane2 = 9,      //   U+20000:2FFFF
> +        rangeHigherPlane = 10,     // U+30000:10FFFFF

These enum values don't map to any naming scheme I'm familiar with. I'd prefer RANGE_LOWER_BMP etc

@@ +562,5 @@
> +                       (1 << rangeUpperBMP),
> +        countBlockLowerBMP = 0x4D + 1,
> +        countBlockMidBMP   = 0xAB - 0xA0 + 1,
> +        countBlockUpperBMP = 0xFF - 0xF9 + 1,
> +        maxBMPRange = countBlockLowerBMP + countBlockMidBMP + countBlockUpperBMP,

These need to be documented

@@ +581,5 @@
> +        endBlockLowerBMP   = 0x4D,
> +        startBlockMidBMP   = 0xA0,
> +        endBlockMidBMP     = 0xAB,
> +        startBlockUpperBMP = 0xF9,
> +        endBlockUpperBMP   = 0xFF

Putting all these in the same enum seems misleading. They're not really the same thing (e.g. range flags vs limits).

::: gfx/thebes/gfxFontUtils.h
@@ +98,5 @@
>          return ((block->mBits[(aIndex>>3) & (BLOCK_SIZE - 1)]) & (1 << (aIndex & 0x7))) != 0;
>      }
>  
> +    // dump if logging enabled
> +    void dump(const char* aPrefix, eGfxLog aWhichLog) const;

Dump

::: gfx/thebes/gfxPlatform.h
@@ +318,5 @@
> +
> +    /**
> +     * Whether to check all font cmaps during system font fallback
> +     */
> +    PRInt32 SystemFallbackTimeLimit();

Fix comment

@@ +377,5 @@
> +    // these are *possible* matches, no cmap-checking is done at this level
> +    virtual void GetCommonFallbackFonts(const PRUint32 /*aCh*/,
> +                                        PRInt32 /*aRunScript*/,
> +                                        PRUint32 /* aRangeFlag */,
> +                                        PRUint32 /* aBlockIndex */,

Document aRangeFlag and aBlockIndex!

@@ +462,5 @@
>                              eFontPrefLang aCharLang, eFontPrefLang aPageLang);
>                                                 
>      PRInt8  mAllowDownloadableFonts;
>      PRInt8  mDownloadableFontsSanitize;
> +    PRInt8  mFallbackUsesCmaps;

Document this

@@ +471,5 @@
>  
>      PRInt8  mBidiNumeralOption;
>  
> +    // time limit for system font fallback
> +    PRInt32 mFallbackTimeLimit;

And better documentation for this

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +610,5 @@
> +static const char *kFontNyala = "Nyala";
> +static const char *kFontPlantagenetCherokee = "Plantagenet Cherokee";
> +static const char *kFontSegoeUISymbol = "Segoe UI Symbol";
> +static const char *kFontSylfaen = "Sylfaen";
> +static const char *kFontTraditionalArabic = "Traditional Arabic";

It's more efficient to make these like
  static const char kFontArial[] = "Arial";
and use AppendLiteral below.

@@ +615,5 @@
> +
> +void
> +gfxWindowsPlatform::GetCommonFallbackFonts(const PRUint32 aCh,
> +                                       PRInt32 aRunScript,
> +                                       PRUint32 aRangeFlag,

fix indent?
> ::: gfx/thebes/gfxWindowsPlatform.cpp
> @@ +610,5 @@
> > +static const char *kFontNyala = "Nyala";
> > +static const char *kFontPlantagenetCherokee = "Plantagenet Cherokee";
> > +static const char *kFontSegoeUISymbol = "Segoe UI Symbol";
> > +static const char *kFontSylfaen = "Sylfaen";
> > +static const char *kFontTraditionalArabic = "Traditional Arabic";
> 
> It's more efficient to make these like
>   static const char kFontArial[] = "Arial";
> and use AppendLiteral below.

I don't think this works in this case, AppendLiteral only works for
actual literals, not generic char*'s.  The code constructs a list of
families as char*'s, then looks up each searching for a match.  I did
switch the code that code to use familyName.AppendASCII.

> I'm not convinced this is the simplest effective approach, but I could
> be if it was better documented. Some obvious alternatives that we
> should at least consider (and justify rejecting, if we reject them):
> 
> * Using mBlockRanges for everything, i.e., storing 1 bit per 256
> characters to indicate whether the font supports a character in that
> block

This leads to a lot of duplication since Unicode has some fairly large ranges
that are dealt with the same way, the Han range, the Hangul range, and the
Unicode planes beyond the BMP.  Only keeping track of 256-character blocks for a
subset of these ranges reduces the size of FontCoverage to 4 32-bit values.

> * Defining a list of ranges, with an enum value naming each range, and
> an array giving the limits of each range, and then just storing 1 bit
> per range, not treating mBlockRanges differently or having per-range
> code like in the Range() method. Implement Range() with an array of
> that maps the character's upper byte to the first range overlapping
> that block.

Unless the ranges are always evenly spaced (i.e. always 256-characters, or 
16 * 256 characters) you end up spending a lot of time doing range comparisons
this way.  If you define names for all the ranges that ends up being a long list
of ranges.

Using the coarse blocks to classify which blocks need fine-grained classification is essentially a combination of both of the approaches you're
suggesting.
Attachment #592157 - Attachment is obsolete: true
Attachment #592157 - Flags: review?(roc)
Attachment #592408 - Flags: review?(roc)
(In reply to John Daggett (:jtd) from comment #21)
> I don't think this works in this case, AppendLiteral only works for
> actual literals, not generic char*'s.  The code constructs a list of
> families as char*'s, then looks up each searching for a match.  I did
> switch the code that code to use familyName.AppendASCII.

You can still use "static const char kFontArial[] = "Arial";" and it's still a little bit more efficient.

> > * Defining a list of ranges, with an enum value naming each range, and
> > an array giving the limits of each range, and then just storing 1 bit
> > per range, not treating mBlockRanges differently or having per-range
> > code like in the Range() method. Implement Range() with an array of
> > that maps the character's upper byte to the first range overlapping
> > that block.
> 
> Unless the ranges are always evenly spaced (i.e. always 256-characters, or 
> 16 * 256 characters) you end up spending a lot of time doing range
> comparisons
> this way.

In which function?

> If you define names for all the ranges that ends up being a long list of
> ranges.

We don't need to define names for all the ranges. Do we?
This switches to using a list of range limits.  The index associated with a range limit is used as the index into the bitvector.  This cleans up the code a lot, it eliminates a lot of conditionals in various places.  It's only used now in the global font fallback case.
Attachment #592408 - Attachment is obsolete: true
Attachment #592408 - Flags: review?(roc)
Attachment #592629 - Flags: review?(roc)
Updated based on changes in part 2
Attachment #592158 - Attachment is obsolete: true
Attachment #592158 - Flags: review?(jfkthame)
Attachment #592631 - Flags: review?(jfkthame)
Knowing the script code that's causing system font fallback will be handy when trying to analyze where we should focus tuning efforts in the future.
Attachment #592632 - Flags: review?(roc)
Comment on attachment 592629 [details] [diff] [review]
patch, part 2 v3 - restructure system font fallback

Review of attachment 592629 [details] [diff] [review]:
-----------------------------------------------------------------

This looks much better! I'm glad it worked out :-).

I think this patch could be usefully broken up. There seem to be several different things going on:
* interruptible system font searching
* font coverage infrastructure
* using font coverage during system font loading
* saving and restoring coverage in the startup cache
* per-platform hardcoded font lookup
* more logging

Can any of these be broken out into their own patches with minimal extra work?

::: gfx/thebes/gfxFont.cpp
@@ +492,5 @@
> +{
> +    NS_ASSERTION(rangeBMP[255] != 0,
> +                 "font coverage range list not properly initialized");
> +    PRUint32 b = aCh >> 8;
> +    PRUint32 r = b < 256 ? rangeBMP[b] : rangeBMP[255];

rangeBMP[NS_MIN(b, 255)]

@@ +901,5 @@
>          }
>  
> +        aMatchData->mCmapsTested++;
> +        if (aMatchData->mCmapsTested % TIME_CHECK_INTERVAL == 0) {
> +            TimeDuration elapsed = TimeStamp::Now() - aMatchData->mStartTime;

I think we could just always test? Do we have any data showing that TIME_CHECK_INTERVAL is a worthwhile optimization?

::: gfx/thebes/gfxFont.h
@@ +549,5 @@
> +    // xxx - may eventually need flags to capture other font matching related
> +    //       info such as whether IVS cmap exists, GSUB/GPOS tables, etc.
> +
> +    enum {
> +        blockRangeMax = 128

RANGE_MAX?

@@ +552,5 @@
> +    enum {
> +        blockRangeMax = 128
> +    };
> +
> +    typedef struct { PRUint32 c[blockRangeMax]; } BlockCounts;

This needs to be documented.

@@ +558,5 @@
> +    // initialized table used for range lookup
> +    static void Init();
> +
> +    // block range for a given codepoint
> +    static PRUint32 BlockRange(PRUint32 aCh);

Couldn't we call this GetRangeContainingChar?

@@ +562,5 @@
> +    static PRUint32 BlockRange(PRUint32 aCh);
> +
> +    // whether font has any glyphs for a given block range
> +    inline bool HasMatchingBlock(PRUint32 aBlockIndex)
> +    {

The use of the terms "block" and "range" is confusing here. How about calling this HasCharInRange(PRUint32 aRangeIndex)? I think the public interface could be all about ranges and blocks could just be an internal thing.

@@ +568,5 @@
> +        PRUint32 maskBit = aBlockIndex & 0x1F;
> +        return (mBlockRanges[aBlockIndex>>5] & (1 << maskBit)) != 0;
> +    }
> +
> +    inline void SetBlock(PRUint32 aBlockIndex)

SetHasCharInRange?

@@ +575,5 @@
> +        PRUint32 maskBit = aBlockIndex & 0x1f;
> +        mBlockRanges[aBlockIndex >> 5] |= (1 << maskBit);
> +    }
> +
> +    void AppendCmapData(gfxSparseBitSet& aCmap);

I think generally Cmap should be capitalized.

::: gfx/thebes/gfxPlatformFontList.h
@@ +214,5 @@
> +    bool mLoadCoverage;
> +    PRUint32 mCoverageLastUpdate;
> +    bool mCoverageLoaded; // true when coverage loaded from startup cache
> +    nsresult LoadFontCoverage();
> +    nsresult SaveFontCoverage(PRUint32 aUpdateTime);

Move these functions away so that all data members are together

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +762,5 @@
> +            aFontList.AppendElement(fam);
> +        }
> +    }
> +
> +}

Lose blank line
Comment on attachment 592155 [details] [diff] [review]
patch, part 1 update pref lang fallbacks

Review of attachment 592155 [details] [diff] [review]:
-----------------------------------------------------------------

The changes/additions in all.js here are probably fine, but I'm not thrilled at tinkering with nsUnicodeRange.cpp at this point. It's inadequate and obsolete, and we should be dropping it entirely, not fiddling with a few bits of it. We already have a better basis for selecting a list of fonts to try - the run's script, which has been resolved by the time we do font selection - as mentioned in comment 4.

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +177,5 @@
>  const ScriptRange gScriptsThatRequireShaping[] = {
>      { eComplexScriptArabic, 0x0600, 0x077F },   // Basic Arabic, Syriac, Arabic Supplement
>      { eComplexScriptIndic, 0x0900, 0x0D7F },     // Indic scripts - Devanagari, Bengali, ..., Malayalam
> +    { eComplexScriptTibetan, 0x0F00, 0x0FFF },     // Tibetan
> +    { eComplexScriptLao, 0x0E80, 0x0EFF } // Lao

This change - though it makes sense - doesn't belong in this patch (or even this bug, I guess); it's a separate issue.

And if you want to do this, better extend it to include Myanmar and Khmer, as they certainly need complex shaping support. Also N'Ko. And probably some more SEAsian stuff... at least New Tai Lue, for one, requires re-ordering.
(In reply to John Daggett (:jtd) from comment #18)

>  Fallback is also explicitly time limited, if fallback
> times out after a pref-determined number of ms, fallback fails.  This
> will generally only happen with non-default fonts.

I'm uncomfortable with this concept, which suggests that characters on a page might or might not render successfully, depending (for example) how busy the disk happens to be at that moment....

The goal here should be to make the need for "full-scale" fallback much more of a rarity than it currently is. Once that is done, I'm not sure that failing is better than slowness, in the rare case where we do need to search all the fonts to find something usable.

If we _do_ decide a timeout is desireable, then I think we should let the cmap-loading continue in the background, and flush the word caches and trigger a reflow once all cmaps have been loaded, so that the proper character gets rendered, albeit after a delay. That'd be a similar UX to the use of fallbacks for slow-downloading webfonts, with the correct text appearing later.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #26)
> I think this patch could be usefully broken up. There seem to be several
> different things going on:
> * interruptible system font searching
> * font coverage infrastructure
> * using font coverage during system font loading
> * saving and restoring coverage in the startup cache
> * per-platform hardcoded font lookup
> * more logging
> 
> Can any of these be broken out into their own patches with minimal extra
> work?

To expand on this a bit, I think it's important to split this up, not just for easier review, but also so that the various strategies can be evaluated (and landed?) separately. It may be sufficient to do a couple of relatively simple things that will have a large impact; then we can consider whether additional complexity is justified.

In particular, I think the first thing to do is to refresh the per-platform pref font lists so as to improve the chances that we'll actually find what we need there. (And base these prefs on the script, rather than on our old langGroup concept - that will (a) provide wider coverage, and (b) avoid the need to look up the character's "unicode range" and then map that to a "language group".)

Second, there's a simple win to be had in gfxPlatformFontList::FindFontForChar, which is what implements the "global" system fallback. Currently, this always iterates over the entire font list, looking for the "best" match. That's silly, because we have no real criteria for picking the "best" family (hence the occasional bug reports about "Firefox picked a really weird font for character XXX...."), and so we end up basically using the last family that supports the character in question. What we could do instead is to stop the search as soon as we find the _first_ family that supports the required character.
(In reply to Jonathan Kew (:jfkthame) from comment #27)
> The changes/additions in all.js here are probably fine, but I'm not
> thrilled at tinkering with nsUnicodeRange.cpp at this point. It's
> inadequate and obsolete, and we should be dropping it entirely, not
> fiddling with a few bits of it. We already have a better basis for
> selecting a list of fonts to try - the run's script, which has been
> resolved by the time we do font selection - as mentioned in comment 4.

Yeah, I don't like the langGroup/unicode range/pref font ball of beeswax
either.  But I think it's important that we address problems in the near
term and not postpone some simple improvements to what we have now
because we want to rewrite everything from scratch.  Especially when it involves changes to the UI in prefs.

The changes to nsUnicodeRange are just some minor tweaks to make it
easier to pick up the existing pref fonts.  I don't have a problem with
this being an interim measure but I don't think it makes sense to
postpone this type of simple improvement.
(In reply to Jonathan Kew (:jfkthame) from comment #27)
> >  const ScriptRange gScriptsThatRequireShaping[] = {
> >      { eComplexScriptArabic, 0x0600, 0x077F },   // Basic Arabic, Syriac, Arabic Supplement
> >      { eComplexScriptIndic, 0x0900, 0x0D7F },     // Indic scripts - Devanagari, Bengali, ..., Malayalam
> > +    { eComplexScriptTibetan, 0x0F00, 0x0FFF },     // Tibetan
> > +    { eComplexScriptLao, 0x0E80, 0x0EFF } // Lao
> 
> This change - though it makes sense - doesn't belong in this patch (or
> even this bug, I guess); it's a separate issue.
> 
> And if you want to do this, better extend it to include Myanmar and
> Khmer, as they certainly need complex shaping support. Also N'Ko. And
> probably some more SEAsian stuff... at least New Tai Lue, for one,
> requires re-ordering.

Yes, this is an existing bug that I noticed and fixed because Arial Unicode MS
appears to have some glyphs for Lao.  But I'm happy to file a separate bug.
(In reply to Jonathan Kew (:jfkthame) from comment #28)
> The goal here should be to make the need for "full-scale" fallback much more
> of a rarity than it currently is. Once that is done, I'm not sure that
> failing is better than slowness, in the rare case where we do need to search
> all the fonts to find something usable.
> 
> If we _do_ decide a timeout is desireable, then I think we should let the
> cmap-loading continue in the background, and flush the word caches and
> trigger a reflow once all cmaps have been loaded, so that the proper
> character gets rendered, albeit after a delay. That'd be a similar UX to the
> use of fallbacks for slow-downloading webfonts, with the correct text
> appearing later.

I think you're missing the point here.  If you look at font fallback
timings, there are two distinct scenarios (1) the very first font
fallback that generates a lot of cmap i/o and (2) all other cases.
Here's some of the telemetry data graphed:

  http://mzl.la/ygOtLR
  
The chart on the first page shows the problem, that the very first font
fallback  in some cases takes a *very* long time.  Look at the data (the
third sheet) and you'll see cumulative numbers for first fallback on OSX
and Windows.  On Windows, *9%* of users experience a first fallback of
7secs or more.  Yet normal fallback never takes more than 1ms.  The time
limit is there to put an absolute hard limit on how long font fallback
takes so that the user doesn't experience hangs.

The solution here *is* trying to make this type of first-time fallback
hang a rarity but it can't solve it in 100% of the cases.  The timeout is
there as a simple way of dealing with that 0.2% case.
Yes, it's unacceptable for 9% of Windows users to be seeing a 7-second hang. I suspect that currently, this problem hits users in common scenarios such as having a Gmail login page (with its language-choice menu) open in a tab that loads at startup. Clearly, we need to fix that.

But my point is that before we decide whether to add a timeout that may lead - only in rare cases, we hope - to _failure_ to render a character (for which the user has a font installed), we should implement the strategies that we believe will make those cases really rare, and then see where we stand.
Comment on attachment 592629 [details] [diff] [review]
patch, part 2 v3 - restructure system font fallback

Review of attachment 592629 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxDWriteFontList.cpp
@@ +1003,5 @@
>                              mForceGDIClassicMaxFontSize);
>  
> +    if (mLoadCoverage) {
> +        LoadFontCoverage();
> +    }

AFAICS, mLoadCoverage is always true, so testing it here (and elsewhere) is pointless - just get rid of it?

::: gfx/thebes/gfxFont.cpp
@@ +847,5 @@
>      aLocalizedName = mName;
>  }
>  
> +#define TIME_CHECK_INTERVAL  5
> +#define FALLBACK_TIME_LIMIT 100.0  // 100ms

Looks like this isn't used, there's a literal "500" as the default in the code.

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +775,5 @@
>      PreloadNamesList();
>  
> +    if (mLoadCoverage) {
> +        LoadFontCoverage();
> +    }

Shouldn't LoadFontCoverage be kept out of the main startup path, for the sake of the case where font fallback is _not_ needed during startup? It could instead be done as the first operation of the delayed font-loader, or on-demand if font fallback _does_ occur before that has had a chance to run. (Applies to the other platform implementation as well.)
Update pref lang fallbacks and update nsUnicodeRange to do better mappings, including upper Unicode planes.  Fix for Lao removed (handled on bug 722584).
Attachment #592155 - Attachment is obsolete: true
Attachment #592155 - Flags: review?(jfkthame)
Attachment #593011 - Flags: review?(roc)
Attachment #593011 - Flags: feedback?(jfkthame)
Dumps out contents of cmap data to a log file
Attachment #593012 - Flags: review?(roc)
Break up system font fallback process to first try a set of hard-coded defaults per OS, then use the existing global font search mechanism.
Attachment #592629 - Attachment is obsolete: true
Attachment #592629 - Flags: review?(roc)
Attachment #593013 - Flags: review?(roc)
Comment on attachment 593012 [details] [diff] [review]
patch, part 1b - add cmap data logging

Review of attachment 593012 [details] [diff] [review]:
-----------------------------------------------------------------

Your sprintfs here could overflow buffers. If this was debug-only, I'd let it go, but you're turning on the logging for release builds. So either change that, or fix the fixed-size char buffers.
For each system font family, computes a small bitvector of Unicode ranges for which a font family contains at least one font that supports a glyph in that range.  Cached in the startup cache, loaded at first system fallback.  Pref added to allow users to always fallback to the existing "load all cmaps" procedure.  Fallback timeout removed (separate patch coming later).
Attachment #593015 - Flags: review?(roc)
Clean up font loader params, make them consistent across platforms.
Attachment #593016 - Flags: review?(roc)
Comment on attachment 593011 [details] [diff] [review]
patch, part 1a - update pref lang fallbacks

Review of attachment 593011 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/src/init/all.js
@@ +1601,5 @@
>  pref("font.name.sans-serif.ja", "MS PGothic");
>  pref("font.name.monospace.ja", "MS Gothic");
> +pref("font.name-list.serif.ja", "MS PMincho, MS Mincho, MS PGothic, MS Gothic,Meiryo");
> +pref("font.name-list.sans-serif.ja", "MS PGothic, MS Gothic, MS PMincho, MS Mincho,Meiryo");
> +pref("font.name-list.monospace.ja", "MS Gothic, MS Mincho, MS PGothic, MS PMincho,Meiryo");

Put a space before Meiryo to be consistent with the others
Attachment #593011 - Flags: review?(roc) → review+
Use the CoreText system fallback routine on OSX instead of the searching through fonts, revised based on other patch revisions.
Attachment #592631 - Attachment is obsolete: true
Attachment #592631 - Flags: review?(jfkthame)
Attachment #593017 - Flags: review?(jfkthame)
Testing on OSX 10.7 Lion, load www.wikipedia.org:

Nightly:
  776 cmaps loaded
  792ms == first fallback time

Using cached coverage data, dev build (part2 patch): 
  87 cmaps loaded
  60ms == first fallback time

Using CoreText system fallback, dev build (part3 patch):
  42 cmaps loaded
  3ms == first fallback time

The testing times will vary depending on system state.
Comment on attachment 593013 [details] [diff] [review]
patch, part 1c - add hardcoded font fallback

Review of attachment 593013 [details] [diff] [review]:
-----------------------------------------------------------------

Why does WhichSystemFontSupportsChar even exist? Callers should just call gfxPlatform directly, shouldn't they?

What if GetCommonFallbackFonts just returned an array of const char*s, and the caller was responsible for looking up the gfxFontFamily etc? Seems like that would be less per-platform code and would be a bit more efficient if the character is found early in the font array.

::: gfx/thebes/gfxFont.cpp
@@ +3348,5 @@
>          selectedFont = aPrevMatchedFont;
>          return selectedFont.forget();
>      }
>  
> +    // never fallback for characters from unknown scripts

"fall back"

::: gfx/thebes/gfxPlatformFontList.cpp
@@ +412,2 @@
>  
> +    // if didn't find a font, so system-wide fallback (except for specials)

/so/do

@@ +439,5 @@
>      }
>  #endif
>  
>      // no match? add to set of non-matching codepoints
> +    // but only when not time limited

This comment line belongs in another patch, remove it

::: gfx/thebes/gfxPlatformMac.cpp
@@ +293,5 @@
> +            f.AppendElement(kFontGeneva);
> +        }
> +
> +    } else {
> +

Too many blank lines! Not needed between {}s

@@ +365,5 @@
> +            break;
> +        }
> +    }
> +
> +    // Arial Unicode MS has lots of glyphs for obscure, use it as a last resort

"for obscure characters"

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +631,5 @@
> +        }
> +
> +    } else {
> +
> +        PRUint32 b = (aCh >> 8) & 0xff;

More unnecessary blank lines :-)
Revised to limit string length.
Attachment #593021 - Flags: review?(roc)
Attachment #593012 - Attachment is obsolete: true
Attachment #593012 - Flags: review?(roc)
Comment on attachment 593015 [details] [diff] [review]
outdated patch, part 2a - cache font coverage data and use it to speed fallback

Review of attachment 593015 [details] [diff] [review]:
-----------------------------------------------------------------

I don't like the term "block range index" or similar... It's confusing. Perhaps "character coverage range"? The fact that we pack 1-bit-per-range into PRUint32 blocks is an implementation detail and shouldn't affect how things are named. But we should also avoid confusion with nsUnicodeRange.
Comment on attachment 593015 [details] [diff] [review]
outdated patch, part 2a - cache font coverage data and use it to speed fallback

Review of attachment 593015 [details] [diff] [review]:
-----------------------------------------------------------------

So there's nothing here (yet) to automatically run the CMAP loader when we fail system font fallback, and to refresh the layout after the CMAP loader has run?

This nsIObject*Stream stuff can't be the fastest way to serialize/deserialize ... I assume this is standard practice?

::: gfx/thebes/gfxFont.cpp
@@ +529,5 @@
> +    PRUint32 r;
> +
> +    for (r = 0; r < numRanges; r++) {
> +        if (HasCharInRange(r))
> +            aCounts.c[r]++;

{}

::: gfx/thebes/gfxFont.h
@@ +542,5 @@
> +    // xxx - may eventually need flags to capture other font matching related
> +    //       info such as whether IVS cmap exists, GSUB/GPOS tables, etc.
> +
> +    enum {
> +        RANGE_MAX = 96  // size of mBlockRanges in bits

Use sizeof(mBlockRanges)*8

::: gfx/thebes/gfxPlatformFontList.cpp
@@ +713,5 @@
> +
> +  LL_I2L(microSecondsPerSecond, PR_USEC_PER_SEC);
> +  LL_DIV(intermediateResult, prTime, microSecondsPerSecond);
> +  LL_L2UI(seconds, intermediateResult);
> +  return seconds;

We don't need to use this LL stuff. Just use normal operators.

@@ +967,5 @@
> +        rv = stream->WriteBytes(c, sizeof(FontCoverage));
> +    } else {
> +        FontCoverage dummy;
> +        c = reinterpret_cast<char*>(&dummy);
> +        rv = stream->WriteBytes(c, sizeof(FontCoverage));

Why do we want to write out these dummy records?
Comment on attachment 593017 [details] [diff] [review]
patch, part 3 v3 - use CoreText system fallback on OSX

While the idea of using Core Text to find a fallback is attractive - as the OS presumably has already cached lots of font-coverage info, etc - I'm concerned that it could bite us. We've seen issues in the past where it appears that some part of the Apple font infrastructure (ATS? Core Graphics? Core Text?) creates "fake" cmap subtables instead of using the true cmap from the font, to reflect its internal support for "faking" certain characters or sequences using glyph substitution/composition/decomposition/...? So I'm worried that there might be cases where the fallback CT chooses will not actually work for us when we attempt to shape the text through a harfbuzz path.
(See discussion in bug 554820 for some background to my Core Text worries.)
(In reply to Jonathan Kew (:jfkthame) from comment #48)
> So I'm worried that there might
> be cases where the fallback CT chooses will not actually work for us when we
> attempt to shape the text through a harfbuzz path.

But we can try that font first, and carry on doing the rest of our strategy if that fails, right?
Revised based on review comments.

> Why does WhichSystemFontSupportsChar even exist? Callers should just
> call gfxPlatform directly, shouldn't they?

I agree with you but there's code in gfxFT2FontGroup::WhichSystemFontSupportsChar
that I uses this method, so I think we should probably defer this for now.
Attachment #593053 - Flags: review?(roc)
Attachment #593013 - Attachment is obsolete: true
Attachment #593013 - Flags: review?(roc)
Comment on attachment 593017 [details] [diff] [review]
patch, part 3 v3 - use CoreText system fallback on OSX

Review of attachment 593017 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +887,5 @@
> +    bool useCmaps = gfxPlatform::GetPlatform()->UseCmapsDuringSystemFallback();
> +
> +    if (useCmaps) {
> +        return gfxPlatformFontList::GlobalFontFallback(aCh, aRunScript,
> +                                            aMatchStyle, aCmapCount);

Please indent wrapped params to line up with aCh.

@@ +896,5 @@
> +    CFIndex len = 1;
> +
> +    if (IS_IN_BMP(aCh)) {
> +        ch[0] = static_cast<UniChar> (aCh);
> +        str = CFStringCreateWithCharactersNoCopy(NULL, ch, 1, kCFAllocatorNull);

Here and throughout, please use the kCFAllocatorDefault constant (rather than NULL) for the Core Foundation default allocator, to make it easier to see what this parameter is.

@@ +902,5 @@
> +        ch[0] = static_cast<UniChar> (H_SURROGATE(aCh));
> +        ch[1] = static_cast<UniChar> (L_SURROGATE(aCh));
> +        str = CFStringCreateWithCharactersNoCopy(NULL, ch, 2, kCFAllocatorNull);
> +        len = 2;
> +    }

CFStringCreate.... is not documented as infallible, so we should check the result.

@@ +917,5 @@
> +    if (fallback) {
> +        CFStringRef familyName = CTFontCopyFamilyName(fallback);
> +        CFRelease(fallback);
> +
> +        char famNameBuf[1024];

Use an nsCAutoString and set its length as needed, rather than a fixed-size buffer.

@@ +922,5 @@
> +        CFIndex buf_len = CFStringGetMaximumSizeForEncoding(
> +                              CFStringGetLength(familyName),
> +                              kCFStringEncodingUTF8) + 1;
> +        CFStringGetCString(familyName, famNameBuf, buf_len, kCFStringEncodingUTF8);
> +        famNameBuf[buf_len] = 0;

....but wait a sec, what's famNameBuf used for anyway? Looks like it's redundant - residue of some debugging printf() stuff, maybe?

@@ +933,5 @@
> +            nsAutoTArray<UniChar, 1024> buffer;
> +            CFIndex len = ::CFStringGetLength(familyName);
> +            buffer.SetLength(len);
> +            ::CFStringGetCharacters(familyName, CFRangeMake(0, len),
> +                                    buffer.Elements());

Please be consistent in using the :: prefix on Core APIs.

@@ +941,5 @@
> +
> +            fontEntry = FindFontForFamily(family, aMatchStyle, needsBold);
> +        }
> +    }
> +

No need for a double blank line here.

As raised in preceding comments, I think we need to do something to handle the case where CT picks a fallback font that turns out to be unusable for us - presumably punt to the cmap-based search code.
> While the idea of using Core Text to find a fallback is attractive - as
> the OS presumably has already cached lots of font-coverage info, etc -
> I'm concerned that it could bite us. We've seen issues in the past where
> it appears that some part of the Apple font infrastructure (ATS? Core
> Graphics? Core Text?) creates "fake" cmap subtables instead of using the
> true cmap from the font, to reflect its internal support for "faking"
> certain characters or sequences using glyph
> substitution/composition/decomposition/...? So I'm worried that there
> might be cases where the fallback CT chooses will not actually work for
> us when we attempt to shape the text through a harfbuzz path.

I'm not sure how this relates to what the patch is doing, it's using
CoreText to figure out the family to use for fallback.  The code then
loads the cmap and tests it directly, same as the code does today.  In
both cases the cmap *will* be the synthetic one, I ran into this when
comparing the cmap data reported to us versus the actual contents when
dumped out using a font tool.  The existing font fallback code is no
different since it would read the same synthetic cmap as the CoreText
API.

The big advantage with this patch is that the first time fallback
performance is *much* better and we don't read as many cmaps.  In my
mind, a performance improvement which affects *all* users far outweighs
concerns about failures in rare situations for non-default fonts.

> As raised in preceding comments, I think we need to do something to
> handle the case where CT picks a fallback font that turns out to be
> unusable for us - presumably punt to the cmap-based search code.

In both cases we see a synthetic cmap, I don't understand the scenario
through which an "unusable" font would result.

The other I thing that should be pointed out is that the Core Text
CTFontCopyTable API has a CTFontTableOptions parameter.  The documentation
describes:

  kCTFontTableOptionExcludeSynthetic

    The font table excludes synthetic font data.
    Available in Mac OS X v10.5 and later.

That seems like the solution to the problem you're concerned about.
(In reply to John Daggett (:jtd) from comment #53)

> I'm not sure how this relates to what the patch is doing, it's using
> CoreText to figure out the family to use for fallback.  The code then
> loads the cmap and tests it directly, same as the code does today.  In
> both cases the cmap *will* be the synthetic one, I ran into this when
> comparing the cmap data reported to us versus the actual contents when
> dumped out using a font tool.  

Ugh - from re-reading comments in bug 554820, it looks like this may be right, though it worries me.

> The other I thing that should be pointed out is that the Core Text
> CTFontCopyTable API has a CTFontTableOptions parameter.  The documentation
> describes:
> 
>   kCTFontTableOptionExcludeSynthetic
> 
>     The font table excludes synthetic font data.
>     Available in Mac OS X v10.5 and later.
> 
> That seems like the solution to the problem you're concerned about.

However, we're not currently using CTFont APIs at all to load font tables (we use ATSFontGetTable on 10.5, and CGFontCopyTableForTag on 10.6+). Maybe that should change, if the CTFont API gives us better control.
(In reply to Jonathan Kew (:jfkthame) from comment #54)
> (In reply to John Daggett (:jtd) from comment #53)
> 
> > I'm not sure how this relates to what the patch is doing, it's using
> > CoreText to figure out the family to use for fallback.  The code then
> > loads the cmap and tests it directly, same as the code does today.  In
> > both cases the cmap *will* be the synthetic one, I ran into this when
> > comparing the cmap data reported to us versus the actual contents when
> > dumped out using a font tool.  
> 
> Ugh - from re-reading comments in bug 554820, it looks like this may be
> right, though it worries me.

Re-testing the example from bug 554820, I do _not_ see the problem there (caused by synthetic cmaps with added codepoints) when using the CGFont code paths that we use on 10.6; the problem only arises if I force the use of the older ATS-based code path (which we still have around for 10.5 support). This implies that CGFontCopyTableForTag is giving us the real cmap, not the "enhanced" synthetic one that ATSFontGetTable was returning.

So depending whether CTFontCreateForString bases its behavior on the real or synthetic cmap (which we may be able to determine by experimentation), there could be a problem here.
(In reply to Jonathan Kew (:jfkthame) from comment #55)
> So depending whether CTFontCreateForString bases its behavior on the real or
> synthetic cmap (which we may be able to determine by experimentation), there
> could be a problem here.

I've been doing some further experimentation with CTFontCreateForString, and it doesn't appear to be basing its font selection on the "synthetic" cmap tables that we saw in bug 554820. So that increases my confidence that we can expect the font it returns to be an appropriate choice for us to use.
Comment on attachment 593017 [details] [diff] [review]
patch, part 3 v3 - use CoreText system fallback on OSX

Review of attachment 593017 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +909,5 @@
> +
> +    gfxFontEntry *fontEntry = nsnull;
> +    CTFontRef fallback;
> +    CTFontRef defaultFont =
> +                       CTFontCreateWithName(CFSTR("Lucida Grande"), 12.f, NULL);

Rather than hard-coding Lucida Grande here, it might be better to pass in the first font from the current fontGroup; it's not clear how CT goes about choosing "the best substitute font..." for what's passed in, but giving it the current primary font as a starting point seems like it offers the best chance.

(It's definitely the case that the font returned by CTFontCreateWithName for a given Unicode character can vary depending what is passed in as the currentFont parameter, although I haven't seen any documentation suggesting what heuristics it may use internally.)
Comment on attachment 593011 [details] [diff] [review]
patch, part 1a - update pref lang fallbacks

I've just filed Bug 723045 - get rid of nsUnicodeRange.

It feels wrong to me that we'd make one piecemeal change to nsUnicodeRange, while ignoring all the other parts of Unicode where it's also out of date. I suppose we could attempt a complete overhaul, but I think it'd be much better to get rid of it altogether.
Comment on attachment 593053 [details] [diff] [review]
patch, part 1c v2 - add hardcoded font fallback

Review of attachment 593053 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxPlatformMac.cpp
@@ +273,5 @@
> +static const char kFontHiraginoKakuGothic[] = "Hiragino Kaku Gothic ProN";
> +static const char kFontLucidaGrande[] = "Lucida Grande";
> +static const char kFontMenlo[] = "Menlo";
> +static const char kFontPlantagenetCherokee[] = "Plantagenet Cherokee";
> +static const char kFontSTHeiti[] = "STHeiti";

Rather than hard-coding a bunch of font names here and in gfxWindowsPlatform, I think we should be using a pref (or a group of prefs, keyed on the script tag for aRunScript) that can be set via about:config (with platform-appropriate defaults).
Just FTR: I tried to dig around in the DirectWrite APIs to see if there's some way to get font fallback from DW. But AFAICT, there doesn't appear to be any support for this, despite hints in general introductory text: "The DirectWrite font system provides...font fallback..."[1]

I wondered if we could create an IDWriteTextLayout for the relevant character, and then ask for the FontFamilyName, but AFAICT this doesn't work: it always returns the name that was originally set on the textFormat parameter, regardless of whether this font actually supports the character(s) in the text - indeed, regardless of whether this font family name even exists on the system. So it doesn't help us find a fallback that can _actually_ render the character.

[1]http://msdn.microsoft.com/en-us/library/windows/desktop/dd371554%28v=vs.85%29.aspx
Does the release of http://www.unicode.org/charts/PDF/Unicode-6.1/ affect the patches here?
(In reply to Jonathan Kew (:jfkthame) from comment #59)
> Comment on attachment 593053 [details] [diff] [review]
> patch, part 1c v2 - add hardcoded font fallback
>
> > +static const char kFontHiraginoKakuGothic[] = "Hiragino Kaku Gothic ProN";
> > +static const char kFontLucidaGrande[] = "Lucida Grande";
> > +static const char kFontMenlo[] = "Menlo";
> > +static const char kFontPlantagenetCherokee[] = "Plantagenet Cherokee";
> > +static const char kFontSTHeiti[] = "STHeiti";
> 
> Rather than hard-coding a bunch of font names here and in
> gfxWindowsPlatform, I think we should be using a pref (or a group of prefs,
> keyed on the script tag for aRunScript) that can be set via about:config
> (with platform-appropriate defaults).

I actually don't think this is a good thing to do, the hard-coded
fallbacks aren't prefs, they are lists tuned to obviate the need to do a
system-wide search.

I could see adding a general catch-all pref for adding a set of fonts to
the commonly tested fallback list.  Or expanding the pref font list to
include a set of fonts for the Unicode punctuation/symbol ranges
(i.e.U+2000:2bff). But I don't think those changes need to happen for this bug.
(In reply to Mardeg from comment #61)
> Does the release of http://www.unicode.org/charts/PDF/Unicode-6.1/ affect
> the patches here?

New codepoint ranges in Unicode 6.1 will affect font fallback once they ship in the standard set of default fonts on the platforms we support.  So "not yet" would be the answer to this question.
Found another duplicate: https://bugzilla.mozilla.org/show_bug.cgi?id=593614

This should be a higher priority, if possible. It's a major kill on user experience (UX).
Comment on attachment 593015 [details] [diff] [review]
outdated patch, part 2a - cache font coverage data and use it to speed fallback

Review of attachment 593015 [details] [diff] [review]:
-----------------------------------------------------------------

r- since there are comments to be addressed
Attachment #593015 - Flags: review?(roc) → review-
Depends on: 732330
Blocks: 543200
Attachment #593015 - Attachment description: patch, part 2a - cache font coverage data and use it to speed fallback → outdated patch, part 2a - cache font coverage data and use it to speed fallback
Attachment #593016 - Attachment description: patch, part 2b - clean up font loader params → outdated patch, part 2b - clean up font loader params
Updated to latest trunk
Carrying r=roc forward
Attachment #593011 - Attachment is obsolete: true
Attachment #593011 - Flags: feedback?(jfkthame)
Attachment #603615 - Flags: review+
Updated to latest trunk
Carrying r=roc forward
Attachment #593021 - Attachment is obsolete: true
Attachment #603617 - Flags: review+
Updated to latest trunk
Carrying r=roc forward
Attachment #593053 - Attachment is obsolete: true
Attachment #603618 - Flags: review+
The patch for searching through fonts within a family assumed that the global font search process always iterated over all fonts.  This is no longer the case, so I've set up a separate method that shares the ranking code.
Attachment #603620 - Flags: review?(jfkthame)
Attachment #603620 - Attachment description: patch, fixup intra-family font searching → patch, part 1d - fixup intra-family font searching
This pulls out the changes for a pref that specifies whether cmap-based font fallback is used.  This serves as a way to get "old behavior" for global font searches.
Attachment #603621 - Flags: review?(roc)
Updated based on review comments.
Attachment #593017 - Attachment is obsolete: true
Attachment #593017 - Flags: review?(jfkthame)
Attachment #603622 - Flags: review?(jfkthame)
Similar to the patch for doing CoreText system fallback on OSX, this uses DirectWrite's fallback mapping in place of a global, cmap-based approach.  There isn't a straight call for this as there is with CoreText but by implementing a custom text renderer class and "drawing" to it, it's possible to pick out the font that DirectWrite is using in the fallback case.  Additionally, I tweaked some of the hard-coded fallbacks a bit so that they more closely match the fonts DirectWrite chooses when hard-coded fallbacks aren't used.  This should help us match IE9 font matching for these cases.
Attachment #603625 - Flags: review?(bas.schouten)
> +    gfxFontEntry *fontEntry = nsnull;
> +    CTFontRef fallback;
> +    CTFontRef defaultFont =
> +                       CTFontCreateWithName(CFSTR("Lucida Grande"), 12.f, NULL);

> Rather than hard-coding Lucida Grande here, it might be better to pass in the
> first font from the current fontGroup; it's not clear how CT goes about
> choosing "the best substitute font..." for what's passed in, but giving it the
> current primary font as a starting point seems like it offers the best chance.
> 
> (It's definitely the case that the font returned by CTFontCreateWithName for a
> given Unicode character can vary depending what is passed in as the
> currentFont parameter, although I haven't seen any documentation suggesting
> what heuristics it may use internally.)

I did some testing, iterating over all font families, using each as the initial
font in place of Lucide Grande.  The CoreText fallback code appears to try and
roughly match the san-serif/serif/cursive/etc like behavior of the initial font.
But it seems to also pick up some odd fonts (it picks up a crappy bitmap font I
have on 10.5).  For example, for basic Latin characters using some font
families, it will return Thonburi (?!?).  So I think we would end up creating as
many problems as we solve with an approach like this.

I think it might make sense to sniff the fonts in the font group for serif vs.
sans-serif and try to use appropriate defaults (e.g. Times instead of Lucida
Grande).  But that's best to leave to a follow-on bug.
Some data from a relatively generic Win 7 JA configuration:

Test page:
http://people.mozilla.org/~jdaggett/tests/fallback-allcodepoints.html
Tests font fallback for Unicode codepoints with associated character properties
(as opposed to all values in the 0 to 0x10ffff range).

Total codepoints:                    110,181
Font found via prefs:                        89,029
System fallback:                             21,152
Font found via hard-coded fallback:                 6,525
Font found via general fallback:                        0
No font found:                                     14,627

Note that these are the results for a machine without a lot of extra fonts but I
think that's a typical scenario for lots of users that only use their machines
for browsing.  Various software products ship with all sorts of fonts but
generally support only scripts that are already covered by the default font set.
In certain locales (e.g. Cambodia, Myanmar) users may have more fonts that
support scripts that are unsupported elsewhere in the world.
(In reply to Jonathan Kew (:jfkthame) from comment #52)
> As raised in preceding comments, I think we need to do something to handle the
> case where CT picks a fallback font that turns out to be unusable for us -
> presumably punt to the cmap-based search code.

One of the goals here is to avoid cmap-based fallback as much as possible. 
Enumerating all fonts is a costly process in both execution time and memory. 
For font fallback, it often doesn't buy us anything, as noted in the last
comment.  That's why I think if the OS system font fallback fails we should
*not* fallback to the cmap approach. Users can flip on the pref created in part
2a to activate the old behavior but I generally don't see where that would
really improve rendering.
As for the cached font coverage patches, I've reordered the patches so that the hard-coded and system-based fallback patches can land first.  The caching of font coverage data still depends on being able to update data in the startup cache which isn't currently possible (bug 716762).  I'm also going to set up the timeout in such a way that when a timeout occurs, an event/thread will run to complete the fallback process and do a reflow if fallback fonts are found.
Comment on attachment 603620 [details] [diff] [review]
patch, part 1d - fixup intra-family font searching

Review of attachment 603620 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with a couple of fixes, as below.

::: gfx/thebes/gfxFont.cpp
@@ +691,5 @@
>  }
>  
> +static void
> +CalcStyleMatch(gfxFontEntry *aFontEntry, const gfxFontStyle *aStyle,
> +               PRInt32& aRank)

No need for the aRank reference parameter here, that just makes the function more obscure. Simply return the computed match rank value as the function result.

@@ +2116,5 @@
>          }
>  
>          if (!breakHere) {
> +            PRUint32 testCh = PRUint32(ch);
> +            if (testCh >= 0x100) {

Please don't do this - I guess you're trying to eliminate a gcc warning, but it also makes it less likely that the compiler will optimize away the redundant test and statement from the 8-bit instantiation of the method.

@@ +2195,5 @@
>          // word was forcibly broken, so current char will begin next word
>          hash = HashMix(0, ch);
>          wordStart = i;
> +        PRUint32 testCh = PRUint32(ch);
> +        wordIs8Bit = (testCh < 0x100);

As above.
Attachment #603620 - Flags: review?(jfkthame) → review+
Comment on attachment 603622 [details] [diff] [review]
patch, part 2b v5 - use CoreText system fallback on OSX

Review of attachment 603622 [details] [diff] [review]:
-----------------------------------------------------------------

r+, with minor fixups as below.

::: gfx/thebes/gfxMacPlatformFontList.h
@@ +165,5 @@
>  private:
>      friend class gfxPlatformMac;
>  
>      gfxMacPlatformFontList();
> +    ~gfxMacPlatformFontList();

Please include "virtual" on the declaration, for clarity, even though it's redundant.

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +903,5 @@
> +    UniChar ch[2];
> +    CFIndex len = 1;
> +
> +    if (IS_IN_BMP(aCh)) {
> +        ch[0] = static_cast<UniChar> (aCh);

Are explicit casts really needed here? That would surprise me.

@@ +928,5 @@
> +        mDefaultFont = ::CTFontCreateWithName(CFSTR("Lucida Grande"), 12.f,
> +                                              NULL);
> +    }
> +
> +    fallback = ::CTFontCreateForString(mDefaultFont, str, CFRangeMake(0,len));

For consistency with the other system APIs being used here, that should be ::CFRangeMake. And a space after the comma, please.

@@ +944,5 @@
> +            CFIndex len = ::CFStringGetLength(familyName);
> +            buffer.SetLength(len);
> +            ::CFStringGetCharacters(familyName, CFRangeMake(0, len),
> +                                    buffer.Elements());
> +            family.Assign(buffer.Elements(), len);

This is copying the UTF16 characters of the name twice, first into 'buffer' and then into 'family' - that seems unnecessary. Just set the string length and then Get the characters directly into it? Or else create it as a dependent string that just wraps the characters stored in the array.

Also, I think this leaks the familyName CFString - you need to release that.
Attachment #603622 - Flags: review?(jfkthame) → review+
Comment on attachment 603625 [details] [diff] [review]
patch, part 2c - use DirectWrite system fallback on Windows

Review of attachment 603625 [details] [diff] [review]:
-----------------------------------------------------------------

Generally looks good to me! Could do with a little bit more comments though to explain what it's doing and why we're taking that route.

::: gfx/thebes/gfxDWriteFontList.h
@@ +317,5 @@
> +        return newCount;
> +    }
> +
> +    IFACEMETHOD(QueryInterface) (IID const& riid, void** ppvObject)
> +    {

Technically if (!ppvObject) { return E_POINTER; }

@@ +326,5 @@
> +        } else if (__uuidof(IUnknown) == riid) {
> +            *ppvObject = this;
> +        } else {
> +            *ppvObject = NULL;
> +            return E_FAIL;

Technically, E_NOINTERFACE
Attachment #603625 - Flags: review?(bas.schouten) → review+
Blocks: 734308
Logged bug 734308 to complete work on caching font coverage data and implementing font fallback timeout.
No longer depends on: 716762
Comment on attachment 603625 [details] [diff] [review]
patch, part 2c - use DirectWrite system fallback on Windows

Review of attachment 603625 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxDWriteFontList.cpp
@@ +1328,5 @@
> +        }
> +    }
> +
> +    // set up string with fallback character
> +    wchar_t str[16];

AFAICT, this only needs to be a 2-character array, not 16 (and doesn't need null-termination below).
I'm late to this bug, and didn't get all the details.  But did you consider using the TrueType coverate bitvector?
>4.50 +pref("font.name-list.serif.zh-TW", "PMingLiu, MingLiU, MingLiU-ExtB");
>4.51 +pref("font.name-list.sans-serif.zh-TW", "PMingLiU, MingLiU, MingLiU-ExtB");
>4.52 +pref("font.name-list.monospace.zh-TW", "MingLiU, MingLiU-ExtB");

>4.38 +pref("font.name-list.serif.zh-CN", "MS Song, SimSun, SimSun-ExtB");
>4.39 +pref("font.name-list.sans-serif.zh-CN", "MS Song, SimSun, SimSun-ExtB");
>4.40 +pref("font.name-list.monospace.zh-CN", "MS Song, SimSun, SimSun-ExtB");

Actually Windows Vista and Windows 7 include a sans-serif font explicitly for that locale

zh-CN is "Microsoft YaHei" while zh-TW is "Microsoft Jhenghei".  The fonts are also present on Windows XP if Office 2007 / 2010 (or even just the ppt viewer) is installed.

It would be much better if the appropriate sans-serif font were used instead of falling back to the serif font.
(In reply to henry.fai.hang.chan from comment #87)
> >4.50 +pref("font.name-list.serif.zh-TW", "PMingLiu, MingLiU, MingLiU-ExtB");
> >4.51 +pref("font.name-list.sans-serif.zh-TW", "PMingLiU, MingLiU, MingLiU-ExtB");
> >4.52 +pref("font.name-list.monospace.zh-TW", "MingLiU, MingLiU-ExtB");
> 
> >4.38 +pref("font.name-list.serif.zh-CN", "MS Song, SimSun, SimSun-ExtB");
> >4.39 +pref("font.name-list.sans-serif.zh-CN", "MS Song, SimSun, SimSun-ExtB");
> >4.40 +pref("font.name-list.monospace.zh-CN", "MS Song, SimSun, SimSun-ExtB");
> 
> Actually Windows Vista and Windows 7 include a sans-serif font explicitly
> for that locale
> 
> zh-CN is "Microsoft YaHei" while zh-TW is "Microsoft Jhenghei".  The fonts
> are also present on Windows XP if Office 2007 / 2010 (or even just the ppt
> viewer) is installed.
> 
> It would be much better if the appropriate sans-serif font were used instead
> of falling back to the serif font.

The patches that were landed don't affect the default fonts, those are
determined by the font.name.xxx settings.  The font.name-list.xxx prefs
are basically prioritized lists of alternate fonts if the font.name.xxx
one isn't available.  They are also used as part of font fallback, which
is why I updated them.  I updated them to include the xxx-ExtB fonts
that support supplementary plane codepoints (e.g. u+2xxxx codepoints). 
I didn't change the default font or the priority used.

If you think the default fonts for zh-TW and zh-CN should be different,
please file a separate bug. It sounds like you're saying the defaults
for font.name.sans-serif.zh-TW and font.name.sans-serif.zh-CN should be
different.

Thanks!
(In reply to Behdad Esfahbod from comment #86)
> I'm late to this bug, and didn't get all the details.  But did you consider
> using the TrueType coverate bitvector?

Behdad, yes I did look at this but opted for something more fine grained.  See bug 734308, comment 1.
Depends on: 739804
Depends on: 762090
Depends on: 776284
Forgive my late curiosity, but what happens on Linux?  Linux wasn't mentioned anywhere in this thread.  Is Linux just being left out of performance improvements like this?
On Linux, the font-matching/fallback code is quite different from that used on other platforms (much of this work is delegated to fontconfig, instead of being done explicitly within gecko code), so the work that was done in this bug isn't really relevant to it.
(In reply to Adam Porter from comment #90)
> Forgive my late curiosity, but what happens on Linux?

You may be interested in bug 600713 comment 66.
Depends on: 934261
Depends on: 989588
Depends on: 1119423
Blocks: 491283
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: