Closed Bug 416946 Opened 16 years ago Closed 16 years ago

load font cmap info via a background task

Categories

(Core :: Graphics, defect, P3)

All
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jtd, Assigned: jtd)

References

Details

Attachments

(3 files, 3 obsolete files)

Follow-on work from bug 409432:

Set up a background task of some flavor that loads in cmap information
little-by-little to avoid being forced to load all cmap information at startup
or on the first miss (i.e. when a character is not found either the style fonts
or in the fallback pref fonts).
Priority: -- → P3
jd we want this for 1.9 or later?
Yes, we want this for 1.9 but I don't think we should block on this.
Note: the windows code currently loads cmaps at startup, so any work on loading cmaps via a background task would help code-startup time on windows
Mm - Ts win?  makes me want it for 1.9...
Additional task: also read in other font family names along with the cmap's.  See bug 417444.

Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Priority: P3 → P2
If this looks okay, I'll work on setting up the needed code for this on Windows.
Looks to me like once the work functor starts running, it runs nonstop going back to the event loop every ten fonts but otherwise chewing all CPU. Might it be a good idea to wait 100ms between each font or something?

You're reading in CMAPs for all faces? I thought we decided to read CMAPs only for the normal face and assume all other faces were a subset of that CMAP, until proven otherwise.
Dump out diffs between cmaps within the same family.
Output from my system using the dump cmap patch.  I have (1) fonts installed by MS Office 2004 which includes most of the MS web fonts and (2) a separate folder containing fonts copied over from WinXP fonts folder.  This means there may be a strange mix of some of these font families.
(In reply to comment #7)
> Looks to me like once the work functor starts running, it runs nonstop going
> back to the event loop every ten fonts but otherwise chewing all CPU. Might it
> be a good idea to wait 100ms between each font or something?

Yeah, that makes sense.  You originally suggested using XPCOM events to do this, did you have another way in mind?  I put the timer in to keep it away from the startup process, but having a steady 100ms between cycles makes sense.

> You're reading in CMAPs for all faces? I thought we decided to read CMAPs only
> for the normal face and assume all other faces were a subset of that CMAP,
> until proven otherwise.

I think you must be remembering the IRC discussion you had with Stuart about how to handle cmaps on Windows.  The Windows code for handling cmaps grabs the cmap for just one face and assumes that it's the same across faces.  This is the jist of bug 382542.  The cmap handling code on the Mac is associated each face since the underlying ATS API needs a specific ATSUI id to load the cmap table.

For most fonts the cmaps will match across faces.  For those that do differ, most will contain the codepoint in the "normal" face but not always.  Baskerville has a few codepoints only in the semi-bold face and Futura has a few only in the Condensed face.  

(cmapdiff) family:Baskerville U+00141 Baskerville-SemiBold Baskerville-SemiBoldItalic 
(cmapdiff) family:Baskerville U+00142 Baskerville-SemiBold Baskerville-SemiBoldItalic 
(cmapdiff) family:Baskerville U+02074 Baskerville 
(cmapdiff) family:Baskerville U+02212 Baskerville-SemiBold Baskerville-SemiBoldItalic 
(cmapdiff) family:Baskerville U+02215 Baskerville-SemiBold Baskerville-SemiBoldItalic 
(cmapdiff) family:Baskerville differences Baskerville Baskerville-Italic Baskerville-SemiBold Baskerville-SemiBoldItalic Baskerville-Bold Baskerville-BoldItalic 

(cmapdiff) family:Futura U+0037e Futura-CondensedMedium 
(cmapdiff) family:Futura U+00394 Futura-CondensedMedium Futura-CondensedExtraBold Futura-Medium 
(cmapdiff) family:Futura U+00415 Futura-CondensedMedium 
(cmapdiff) family:Futura differences Futura-CondensedMedium Futura-CondensedExtraBold Futura-Medium Futura-MediumItalic 

The Hoefler Text family has an "Ornaments" face, it uses a completely different set of codepoints from other fonts in the family:

(cmapdiff) family:Hoefler Text U+00309 HoeflerText-Ornaments 
(cmapdiff) family:Hoefler Text U+0030f HoeflerText-Ornaments 
(cmapdiff) family:Hoefler Text U+00311 HoeflerText-Ornaments 
(cmapdiff) family:Hoefler Text U+00313 HoeflerText-Ornaments 
(cmapdiff) family:Hoefler Text U+00314 HoeflerText-Ornaments 
(cmapdiff) family:Hoefler Text U+0031b HoeflerText-Ornaments 

So I don't see a clear way of reading in just one cmap per family without running into a number of these cases.
(In reply to comment #10)
> (In reply to comment #7)
> > Looks to me like once the work functor starts running, it runs nonstop going
> > back to the event loop every ten fonts but otherwise chewing all CPU. Might
> > it be a good idea to wait 100ms between each font or something?
> 
> Yeah, that makes sense.  You originally suggested using XPCOM events to do
> this, did you have another way in mind?

No, I was just wrong.

> > You're reading in CMAPs for all faces? I thought we decided to read CMAPs
> > only for the normal face and assume all other faces were a subset of that
> > CMAP, until proven otherwise.
> 
> I think you must be remembering the IRC discussion you had with Stuart about
> how to handle cmaps on Windows.  The Windows code for handling cmaps grabs the
> cmap for just one face and assumes that it's the same across faces.  This is
> the jist of bug 382542.  The cmap handling code on the Mac is associated each
> face since the underlying ATS API needs a specific ATSUI id to load the cmap
> table.

But we could still do the same thing as on Windows, right?

> For most fonts the cmaps will match across faces.  For those that do differ,
> most will contain the codepoint in the "normal" face but not always. 
> Baskerville has a few codepoints only in the semi-bold face and Futura has a
> few only in the Condensed face.
> 
> The Hoefler Text family has an "Ornaments" face, it uses a completely
> different set of codepoints from other fonts in the family:
> 
> So I don't see a clear way of reading in just one cmap per family without
> running into a number of these cases.

I do --- we just pretend those glyphs aren't there :-). I think we'd gladly trade a Ts win for Hoefler Text Ornaments not working.
Or we could special-case Hoefler Text Ornaments (and any other fonts we find with this problem). I'm not proud.
(In reply to comment #11)

> But we could still do the same thing as on Windows, right?

We could do the same thing but honestly I don't see the advantage.  Right now cmap loading is done lazily, so making the assumption that all faces have the same cmap only helps you the first time system font fallback occurs (less cmaps to read in).  But if we implement a background task to read in the cmaps the chances are reduced that system font fallback will result in all the cmaps being read in at once. (Yes, if somebody is doing session restore with 20 tabs this will still occur).

> I do --- we just pretend those glyphs aren't there :-). I think we'd gladly
> trade a Ts win for Hoefler Text Ornaments not working.

The only Ts win here is from moving loading of cmaps out of the startup process, which is what occurs on Windows now.  The loading of cmaps on the Mac is done completely lazily now, Ts is not affected by loading cmaps per family vs. per face.
(In reply to comment #12)
> Or we could special-case Hoefler Text Ornaments (and any other fonts we find
> with this problem). I'm not proud.

I guess what I'm trying to say is that I don't see how *not* loading in cmaps for all faces helps us if we have to add a lot of code acrobatics to avoid the resulting problems (blacklists, gray lists, etc).  

If you look through the list in the "dump cmap diffs" attachment there are an awful lot of fonts to add to a blacklist.  It also won't solve the problem of cmap mismatches caused by version problems, such as more recent versions of Arial and Times New Roman including Arabic glyphs.  And differences in cmaps across faces seems to be much more common among open source fonts (e.g. DejaVu, STIX fonts, Thai Linux WG fonts).  Even a font-savvy place like Microsoft seems to be rather sloppy about maintaining the consistency of cmaps across faces.

On the Mac we also need to be more sensitive to this because we explicitly need to exclude fonts that lack AAT tables for rendering complex scripts like Arabic.  One current trend in font design seems to be to include glyphs for complex scripts like Arabic in commonly used fonts like Arial, Times New Roman and Courier New.  We explicitly need to test for these tables before using a particular face for complex scripts.  Marking specific fonts as ones to watch out for doesn't help us six months from now when Microsoft ships an update to MS Office that includes more fonts with cmap inconsistencies.
(In reply to comment #14)
> If you look through the list in the "dump cmap diffs" attachment there are an
> awful lot of fonts to add to a blacklist.

No, because only fonts with variant faces containing glyphs missing from the normal face need to be listed. In your tests that's Futura, Baskerville and Hoefler Text, and the problematic glyphs in Futura and Baskerville are relatively insignificant.

> On the Mac we also need to be more sensitive to this because we explicitly need
> to exclude fonts that lack AAT tables for rendering complex scripts like
> Arabic.  One current trend in font design seems to be to include glyphs for
> complex scripts like Arabic in commonly used fonts like Arial, Times New Roman
> and Courier New.  We explicitly need to test for these tables before using a
> particular face for complex scripts.  Marking specific fonts as ones to watch
> out for doesn't help us six months from now when Microsoft ships an update to
> MS Office that includes more fonts with cmap inconsistencies.

We can do the same thing for the AAT tables: check for AAT tables in the normal face. If they're not present we assume they're not present in the other faces and we don't use that font for complex scripts. If they are present we assume they'll be present in the other faces; if we find out we're wrong, we remember that and retry font selection for the text run.

This is definitely a losing strategy if people keep publishing fonts with lots of glyphs in variant faces that aren't covered by the normal face, or fonts whose variant faces have AAT tables but the normal face doesn't. But that would seem quite bizarre to me. The "Ornaments" variant makes a little bit of sense, although why not just include the ornaments in the normal face? We'd definitely have problems if people were publishing say an "Arial Arabic" variant ... but it sounds like they're moving away from that if anything.

As for mixing different font versions, that's only going to be a problem if someone installs a version with fewer glyphs and faces over a version with more glyphs and faces, i.e. basically downgrades the font. In which case, they probably don't want the extra faces and glyphs to be used anyway.

We don't have to change our get-CMAPs-for-all-faces behaviour in this bug. Maybe we don't have to change it at all. But I am worried about users who semi-regularly browse pages that hit the full-font-search path and who use Session Restore to bring those pages back on startup. Now that FF3 offers Session Restore on quit I expect it will be used a lot. I'm also worried about consistency with Windows; if searching all faces is the right thing to do on Mac, it's probably the right thing to do on Windows as well, and vice versa.
Loads cmaps and other font info via a background task.  Runs after a delay (10secs), after which individual slices are run at intervals (150ms).  If a font system event occurs during this process, the process aborts and starts over.  The code for running tasks on intervals is in gfxWorkRunner and is not Mac-specific, it should probably live somewhere else.  The Mac-specific portion is in the FontInfoLoad class, this handles the interaction with the Mac font system.

If this approach seems sane, then the next step is to consider what to do on Windows.  That may mean a separate patch, depending upon the work needed and whether other bugs should gate this work (i.e. bug 382542, the different faces have different cmaps blocker).

Stuart, could you review this and let me know what you think?  If you think the Windows work should be a separate bug, let me know.
Attachment #304992 - Attachment is obsolete: true
Attachment #305506 - Flags: review?(pavlov)
minor cleanup, foo *bar'ing, trim excess spaces, yadda, yadda
Attachment #305506 - Attachment is obsolete: true
Attachment #305755 - Flags: superreview?(pavlov)
Attachment #305755 - Flags: review?(pavlov)
Attachment #305506 - Flags: review?(pavlov)
Blocks: 419744
Windows work moved to bug 419744.
Stuart suggested using an idle timer instead of using timed delays:

http://lxr.mozilla.org/seamonkey/source/widget/public/nsIIdleService.idl

Flags: tracking1.9+ → blocking1.9+
Takin off the blocker list since we are in reasonable shape r.e. start time.  Will get this in next release.
Flags: tracking1.9+
Flags: blocking1.9-
Flags: blocking1.9+
Priority: P2 → P3
I looked over nsIdleService today and from the looks of it the font loading task is not a simple fit for this, using the idle service would make the code for this more complicated.  In the long run changing this service may be the way to go but right now the idle service object is used in a number of places and I don't like the prospect of having to confirm that any changes I make don't result in regressions.  That's time better spent fixing other blockers.  ;)
I simplified the code here, pulled out the non-mac specific code and moved it over to gfxFontUtils.  Once the lazy cmap loading patch that Stuart is working on for bug 424018 is checked in, setting up a version of this for Windows (bug 419744) will be easy.

I actually think keeping this code fairly simple is the way to go.  If we wait until idle time events fire, system font fallback may have already forced all the cmaps to be loaded in.  Dribbling these reads in means slight delays spread out over a few seconds vs. one big delay on system font fallback (a "big" delay here is still in the realm of 400ms on my machine but this number is relative to the number of fonts installed).

Rather than make this code more complicated, I think we should look at how better to avoid the system font fallback case because that will help overall performance, not just prevent slight one-time delays.
Attachment #305755 - Attachment is obsolete: true
Attachment #311752 - Flags: superreview?(pavlov)
Attachment #311752 - Flags: review?(pavlov)
Attachment #305755 - Flags: superreview?(pavlov)
Attachment #305755 - Flags: review?(pavlov)
Note: with this patch the fontInfoLog will dump out exactly when font info, including cmaps, are loaded:

  export NSPR_LOG_MODULES=fontInfoLog:5

The facename and size of the cmap table are shown:

  (fontinit-cmap) psname: HelveticaNeue-UltraLight, size: 1100

Clearing the tracking1.9+ flag as it's no longer relevant.
Flags: tracking1.9+
No longer blocks: 419744
Depends on: 419744
The patch for bug 419744 included both Mac and Windows changes.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Attachment #311752 - Flags: superreview?(pavlov)
Attachment #311752 - Flags: review?(pavlov)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: