[10.6] 1.9.0 crash [@MacOSFontEntry::GetFontID() ] usually at startup

RESOLVED FIXED

Status

()

Core
Graphics
P2
critical
RESOLVED FIXED
9 years ago
7 years ago

People

(Reporter: Smokey Ardisson (offline for a while; not following bugs - do not email), Assigned: jtd)

Tracking

({crash, fixed1.9.0.16})

1.9.0 Branch
x86
Mac OS X
crash, fixed1.9.0.16
Points:
---

Firefox Tracking Flags

(status1.9.1 ?)

Details

(Whiteboard: rdar://7429653, crash signature, URL)

Attachments

(3 attachments)

The current #5 crash for Camino 2.0b3 is a (usually-)startup crash in MacOSFontEntry::GetFontID().

This crash also appears in Firefox 3.0.13, though I suspect it's far from a topcrash there.  It does not appear to happen at all in Firefox 3.5.x.

I don't think we have much to go on until we can find a live person seeing this, unless something jumps out from the code; John?
(URL field contains the query results for last 2 weeks for current 1.9.0-based Camino nightlies + Camino 2.0b3 + Firefox 3.0.13.)

Comment 2

9 years ago
One of those crashes is mine:
http://crash-stats.mozilla.com/report/index/1470f626-bcd5-49e9-81f7-678bb2090829

Interesting that fonts are involved (I'm assuming that from the title) because I was dealing with fonts around the time of the crash: After installing 10.6 and rebooting I used Migration assistant to restore most everything, including the 1000+ fonts in ~/Library/Fonts, which were all re-enabled. So I had Font Book open for some time while I was disabling most of them and it also found a handful of invalid fonts (that had been fine under Leopard) which I removed. 

Here's Camino-related console: 

8/29/09 9:18:54 PM	[0x0-0x59059].org.mozilla.camino[2730]	Debugger() was called!
8/29/09 9:33:51 PM	com.apple.launchd.peruser.502[1850]	([0x0-0x59059].org.mozilla.camino[2730]) Exited with exit code: 1
8/29/09 9:47:58 PM	Camino[3379]	*** WARNING: Method selectRow:byExtendingSelection: in class ExtendedTableView is deprecated. It will be removed in a future release and should no longer be used.

Let me know if there's anything else that would be useful. I archived the bad fonts if you want them.

Comment 3

9 years ago
Sorry, I should have mentioned that I've had no further crashes.
(Assignee)

Comment 4

9 years ago
Lisa, do you see the crash again if you restore the invalid fonts?  This might be font cache corruption or something else.  The stack crawl from the crash reporter indicates that this occurred when trying to lookup information on some basic system fonts.

What does the config value 'font.preload-names-list' show for you?

1. Open a new tab and enter 'about:config' in the url bar
2. If a warning dialog appears, click through
3. In the filter box enter 'font.pre' and the entry for 'font.preload-names-list' should display

I'm guessing one way to reproduce this would be to 

1. Open FontBook
2. Disable a font

This will cause Camino to rebuild it's font list and should hit the same code where the error occurred.

Comment 5

9 years ago
font.preload-names-list setting:
Hiragino Kaku Gothic Pro,Hiragino Mincho Pro,STSong

I re-installed all the bad fonts and continued messing around in Font Book, enabling & disabling, all while Camino was running, no crash (yet?). In one case when I disabled a font (not one of the bad ones) I immediately got a popup saying Camino wanted to use the font (I clicked "Don't Allow"). That's expected given my prefs, but weird that Camino would want that one font, which I've never used in Camino, but not the others I disabled.

Comment 8

9 years ago
I got another MacOSFontEntry::GetFontID() crash last night:
http://crash-stats.mozilla.com/report/index/b7be8b9b-05f1-4e61-8979-971e02091002

Here's some additional information in case it's useful.

All the fonts reported as corrupt by Font Book had long since been removed from my system. Several days ago for routine maintenance I had deleted my font caches and deleted com.apple.fontbook.plist and com.apple.ATS.plist from my Prefs folder. I believe this is the first time I had done any of that since installing Snow Leopard. One or more of those steps causes all fonts to be enabled.

Camino, which had been running for at least several hours, crashed when I was in Font Book disabling fonts. None of the fonts I disabled generated the "Are you sure" message indicating that I was about to disable a "system font"; probably all of the fonts I disabled were in my user fonts folder, maybe 1 or 2 were in /Library/Fonts.

This is from Console around the time of the crash:

10/2/09 2:14:15 AM	fontd[188]	query/build result time = 0.292394 sec
10/2/09 2:14:55 AM	com.apple.launchd.peruser.502[174]	([0x0-0xd50d5].org.mozilla.camino[2500]) Exited with exit code: 1

I repeated the steps described above as best I could this morning, but Camino did not crash again.
Have you checked with FontBook for corrupt fonts?

Comment 11

9 years ago
(In reply to comment #10)
> Have you checked with FontBook for corrupt fonts?

Thanks for suggestion, there really was one bad font (Meta-Pro). I removed it and resolved some duplicates, and the problem gone. Anyway, I believe, Camino should be more tolerant to such a problems.
Yuri: if you put the bad font and the duplicates back, does the crashing return?

Also, do you see anything in Console.log from around the time of the crash(es), like what Lisa saw in comment 8?

We're still trying to figure out what's going on here so that we can figure out an appropriate robustness fix for the code, and while disabling duplicate and corrupt fonts seems to "fix" whatever's wrong for individuals, they then can't reproduce it again for investigation :-(
(Assignee)

Comment 13

9 years ago
(In reply to comment #12)
> Yuri: if you put the bad font and the duplicates back, does the crashing
> return?

When you test this it's probably a good idea to
Assignee: nobody → jdaggett
Priority: -- → P2

Comment 14

9 years ago
(In reply to comment #12)
> Yuri: if you put the bad font and the duplicates back, does the crashing
> return?
Weird, but neither putting back bad MetaPro fonts (4 files in total), nor enabling duplicates (all of which are Microsoft typography fonts (Andale Mono, Arial, Comic Sans MS, Georgia, Impact, Tahoma and so on), by the way) allowed me to crash Camino again. It launches as it did prior to 10.6.2 installation.

> 
> Also, do you see anything in Console.log from around the time of the crash(es),
> like what Lisa saw in comment 8?

No, Console kept silence at that time.

> 
> We're still trying to figure out what's going on here so that we can figure out
> an appropriate robustness fix for the code, and while disabling duplicate and
> corrupt fonts seems to "fix" whatever's wrong for individuals, they then can't
> reproduce it again for investigation :-(
(Assignee)

Comment 15

9 years ago
Yuri, when you replace the MetaPro fonts do they show up in FontBook?  I've seen cases where a reboot is required to pick up fonts that where removed from the fonts folder and then put back.
I was going to ask the same question as John in comment 15.

Additionally, after the steps in comment 11, did you then restart the machine ? Or clear font caches ?
(In reply to comment #17)
> This only seems to affect 1.9.0 branch (FF 3.0.x and Camino 2.0.x)

Yeah, and this crash is now 16% of all Camino 2.0rc crashes (and 20% of 2.0b4 crashes, where most of our Camino 2 users still are).

Since we can never manage to catch a live user while they're still seeing the crash, can we look at the code between 1.9.0 and 1.9.1 and see if we can get any ideas from plain-old code analysis?

In particular, if I'm reading Hg blame right, it looks like bug 437356 and bug 455243 both touched that function in 1.9.1.  Bug 437356 looks huge and unportable, but bug 455243 looks more palatable (and possibly even related).
(Assignee)

Comment 19

9 years ago
Well, if we're going to experiment with fixes, I would say the first place to start is to get rid of the private method in that function.  Not sure that will change anything, I have a feeling this is a font cache corruption bug (oh joy) but it might be something to try.

I'm going to experiment a bit with enabling/disabling today, see if that affects anything.
There may well be font cache corruption issues that are triggering this, but we still shouldn't crash....

Looking at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/gfx/thebes/src/gfxQuartzFontCache.mm&rev=1.42&mark=661, it's clear that we're headed for a crash if we've somehow managed to end up with a font family containing no faces.

We ought really to ensure that this can never happen, by eliminating any such families from the font list we build, but in the meantime I wonder if we could make things less fragile by adding a simple

    if (!fe) return;

after line 661 of gfxQuartzFontCache.mm.
Hmm, I think I really mean

    if (mAvailableFonts.Length() == 0) return;

although it wouldn't hurt to also sanity-check the "fe" pointer.
(Assignee)

Comment 22

9 years ago
(In reply to comment #21)
> Hmm, I think I really mean
> 
>     if (mAvailableFonts.Length() == 0) return;
> 
> although it wouldn't hurt to also sanity-check the "fe" pointer.

Yeah, both sound like the right thing here.  I'll put together a build and experiment to see if I can replicate the problem.  I played around with disabling faces but couldn't get a crash.
I wrote about this crash tonight to see if I could obtain some new live subjects for further questioning/test builds/etc.: http://www.ardisson.org/afkar/2009/11/17/mac-os-x-10-6-2-your-fonts-and-launching-camino-2/
(Assignee)

Comment 24

9 years ago
Not having any luck reproducing this.  But the only way a family could be empty is if availableMembersOfFontFamily returns an array with zero elements, once font entries are added to a family they're never removed.  As a test I nulled out the faceCount for one family and the program crashed at GetFontID:

0   com.apple.CoreFoundation      	0x93eda675 __CFStringCreateImmutableFunnel3 + 2149
1   com.apple.CoreFoundation      	0x93edb331 CFStringCreateWithCharacters + 97
2   com.apple.Foundation          	0x941e06f0 -[NSPlaceholderString initWithCharacters:length:] + 32
3   com.apple.Foundation          	0x941e0762 +[NSString stringWithCharacters:length:] + 82
4   org.mozilla.firefox           	0x00e4028a GetNSStringForString(nsAString_internal const&) + 68
5   org.mozilla.firefox           	0x00e40357 MacOSFontEntry::GetFontID() + 35 (gfxQuartzFontCache.mm:151)
6   org.mozilla.firefox           	0x00e41d3e MacOSFamilyEntry::ReadOtherFamilyNames(AddOtherFamilyNameFunctor&) + 178 (gfxQuartzFontCache.mm:679)
7   org.mozilla.firefox           	0x00e41892 gfxQuartzFontCache::RunLoader() + 174 (gfxQuartzFontCache.mm:1335)
8   org.mozilla.firefox           	0x00e43dcb gfxFontInfoLoader::LoaderTimerFire() + 83 (gfxFontUtils.h:438)
9   org.mozilla.firefox           	0x00e43df9 gfxFontInfoLoader::LoaderTimerCallback(nsITimer*, void*) + 23 (gfxFontUtils.h:429)
(Assignee)

Comment 25

9 years ago
Created attachment 412793 [details] [diff] [review]
patch, v.0.1, check for no faces and log

This should eliminate the problem at what I suspect is the source (bad values coming from Cocoa API) and guard against other situations where this may arise.  I also enabled logging so that setting NSPR_LOG_MODULE can be used with a shipping build to verify that this fixes the problem with a user that is experiencing this problem.
Attachment #412793 - Flags: review?(jfkthame)
(Assignee)

Updated

9 years ago
Flags: blocking1.9.0.16?
(Assignee)

Comment 26

9 years ago
By the way, the reason this bug only occurs 10.6 is that Apple changed the handling of corrupt fonts.  In 10.5, clearly corrupt fonts would never even appear in the font list but in 10.6 they appear and Cocoa API's pass them along to apps.  I guess the thought must have been that apps should do the error checking to handle funky fonts, although that's somewhat baffling.

The fonts Yuri had on his system don't show up in FontBook under 10.5 but do show in 10.6.  Validating the fonts gives an error.  Trunk builds don't load the cmaps and the metrics init call fails properly.  We should probably add code to remove the font entry when the cmap load fails.
Comment on attachment 412793 [details] [diff] [review]
patch, v.0.1, check for no faces and log

>@@ -760,25 +783,33 @@ gfxQuartzFontCache::InitFontList()
....
>         // create a font entry for each face
>         NSArray *fontfaces = [fontManager availableMembersOfFontFamily:availableFamily];  // returns an array of [psname, style name, weight, traits] elements, goofy api
>         int faceCount = [fontfaces count];
>         int faceIndex;
> 
>+        NS_ASSERTION(faceCount != 0, "no faces in family!");
>+        
>+        if (faceCount == 0) {
>+            LOG(("(fontinit) family with no faces (%s)\n", 
>+                 NS_ConvertUTF16toUTF8(availableFamilyName).get()));
>+            continue;
>+        }

IMO the NS_ASSERTION here should be an NS_WARNING instead (compare bug 520931, though it's true that assertions are less irritating by default on our OS X debug builds); given that we're going to explicitly handle the unexpected result from Cocoa's fontmanager, we don't need to raise an assertion at this point.

In principle we should also check after the for-loop that creates and adds the face entries, to ensure that at least one face was successfully created; so insert a check for an empty face array right before the code

        // add the family entry to the hash table
        ToLowerCase(availableFamilyName);
        mFontFamilies.Put(availableFamilyName, familyEntry);

though I think the only situation where this could occur is if "new MacOSFontEntry" or appending to the face array failed, which probably means we're in such bad shape (OOM?) that we're about to crash anyway.

Anyhow, marking this r+, with the suggestion that you consider tweaking it as above.
Attachment #412793 - Flags: review?(jfkthame) → review+
(Assignee)

Comment 28

9 years ago
Created attachment 412964 [details] [diff] [review]
patch, v.0.2, updated based on review comments

Switched the assertion to a warning and checked for the existence of faces in the family before adding it to the family list.
Comment on attachment 412964 [details] [diff] [review]
patch, v.0.2, updated based on review comments

Talked this through with John and Dan, then conferred with Al as well.

Approved for 1.9.0.16. a=ss for release-drivers
Attachment #412964 - Attachment is patch: true
Attachment #412964 - Attachment mime type: application/octet-stream → text/plain
Attachment #412964 - Flags: approval1.9.0.16+
(Assignee)

Comment 30

9 years ago
Landed patch v.0.2 on cvs trunk.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Keywords: fixed1.9.0.16
Resolution: --- → FIXED
Flags: blocking1.9.0.16?
I don't see any 1.9.1.x crashes on crash-stats, but the code you're patching here looks like it applies on that branch. Is there something else protecting us there? Do we need this patch on the 1.9.1 branch?
status1.9.1: --- → ?
(Assignee)

Comment 32

9 years ago
(In reply to comment #31)
> I don't see any 1.9.1.x crashes on crash-stats, but the code you're patching
> here looks like it applies on that branch. Is there something else protecting
> us there? Do we need this patch on the 1.9.1 branch?

Yes, I think we're just getting lucky on 1.9.1 or crashing in a slightly different place so the signature is different.  I'm going to set up more patches for other branches just to be safe.
Is there any way for QA to verify this fix for 1.9.0.16? Looking at the comments, I think that the answer is "no."
(Assignee)

Comment 34

9 years ago
No, we don't understand the underlying cause of the font corruption involved that causes the Apple API to return bogus data, only that some users have experienced it.  I've tried to reproduce the problem with bad fonts from Yuri (comment 9) with not luck.
Looking through the Camino crash reports for 2.0.1pre builds, there are none of these crashes with builds newer than 15 Nov; it's not a slam-dunk statistic for a million reasons, but it is one slight piece of evidence that could be saying they have.

For the morbidly curious (or those wanting to waste time), here is a 6 minute YouTube video someone made of the crash :P http://www.youtube.com/watch?v=0cc1Bxnypdo
So I upgraded to the snow cat right now.
Installing from disc and restarting, then updated to 10.6.2. Went to Font Book and checked my fonts - 1 'broken' one (i knew it didn't validate on 10.5.8, but had left it there for testing) - but didn't change anything yet. Launched the latest Camino nightly. Accessed a number of websites including random pages in Wikipedia (multi lingual).

No problems. This is probably as close as I can get to verify this bug.

The one thing I should mention. I cleaned up font-caches a week or so ago.
(Assignee)

Comment 37

9 years ago
The latest nightlies include the fix so they should not crash.  It would be a huge help if a user who experienced the problem could run through these steps:

1. Install the latest nightly build of Camino 2.0:

http://ftp.mozilla.org/pub/mozilla.org/camino/nightly/latest-2.0-M1.9/

2. Open a Terminal window and enter the following commands:

  cd /Applications/Camino.app/Contents/MacOS/
  export NSPR_LOG_MODULES=fontInfoLog:5
  export NSPR_LOG_FILE=~/Desktop/fontinfo.log
  ./Camino

3. Wait another 30 seconds or so, this allows the lazy font initialization to complete.

Camino shouldn't crash.  Upload the log file and attach it to the bug.  It's just a text file in which you should see a message "family with no names", that's the family of fonts that's causing the problem.  It would be extremely helpful if we could find out what's going on with those fonts so that we can report this to Apple.

Note: Apple released 10.6.2 on 11/9, I have a feeling that's when this problem started to occur.
(Assignee)

Comment 38

9 years ago
(In reply to comment #36)
> So I upgraded to the snow cat right now.
> Installing from disc and restarting, then updated to 10.6.2. Went to Font Book
> and checked my fonts - 1 'broken' one (i knew it didn't validate on 10.5.8, but
> had left it there for testing) - but didn't change anything yet. Launched the
> latest Camino nightly. Accessed a number of websites including random pages in
> Wikipedia (multi lingual).
> 
> No problems. This is probably as close as I can get to verify this bug.
> 
> The one thing I should mention. I cleaned up font-caches a week or so ago.

What about a Camino build from before 11/16?
(Assignee)

Comment 39

9 years ago
Created attachment 415089 [details]
font info log from user who experienced crashes on 10.6

At line 699 is the problem font:

-1601411840[130eb10]: (fontinit) family with no faces (䍯灹物杨琠䍯灲⸠ㄹ㤸⁆牡湴愠却潲浓灬敮摩摏牮慭敮瑹剥杵污牍慣牯浥摩愠䙯湴潧牡灨敲‴⸱⁓灬敮摩摏牮慭敮瑹印汥湤楤佲湡浥湴祍慣牯浥摩愠䙯湴潧牡灨敲‴⸱‰㜯㈰⼲〰ㅓ灬敮摩摏牮慭敮瑹Copyright 1998, ES Typography.  This font is free for all use and distribution, and may not be sold.  Inclusion in any collection for sale is strictly prohibited.

Splendid ESRegular1998, ES Typography, Splendid ESSplendid ES4.1Splendid ES)

I'm guessing this doesn't validate, not sure why Apple doesn't omit the font, this is normal behavior for fonts which invalid or disabled faces.

I suspect one way to reproduce this would be to:

1. Install 10.5
2. Add a font family with invalid faces
3. Add a font family and disable faces
4. Upgrade to 10.6 (*without* rebuilding font caches)

My guess the funky behavior will occur in this situation.
(Assignee)

Comment 40

9 years ago
ES Splendid 4.1
http://www.fontarium.com/splendid-es/
I downloaded this font and examined the name table; it clearly has several problems that make it invalid according to the OpenType spec, including a bunch of empty names, multiple Postscript names, and a Postscript name that includes a space. (It's a bit surprising that OS X has ever been willing to use it at all.) ftxvalidator also reports cmap problems; I didn't inspect this by hand to verify exactly what's wrong.

The problem scenario may run something like this:

- earlier OS versions had less careful validation
- user installed a broken font such as this one
- thus, this font family was listed in the OS X cache
- user upgrades to 10.6(.2?) which validates fonts more carefully
- when we ask Cocoa for all known font families, we get the cached list that includes this
- but when we ask for actual faces in the family, the improved validation rejects it
- result is a family with no faces

Clearly, we shouldn't crash in this situation, but IMO it's also an Apple bug: if they've tightened the criteria for what fonts the OS is willing to load, they should force the cache to be rebuilt so that a case like this, where no faces in the family are now considered valid, will be completely excluded.
Filed rdar://7429653 to see if we can draw Apple's attention to the issue.
John, Jonathan, thanks so much for all your work on this bug!
Whiteboard: rdar://7429653

Comment 44

9 years ago
For the record, since we shipped Camino 2.0.1 with this fix, we haven't seen a single crash report of this crash from Camino 2.0.1. Is that good enough to call this verified?
(In reply to comment #42)
> Filed rdar://7429653 to see if we can draw Apple's attention to the issue.

Response: "After further investigation it has been determined that this is a known issue, which is currently being investigated by engineering..."
Crash Signature: [@MacOSFontEntry::GetFontID() ]
You need to log in before you can comment on or make changes to this bug.