Closed Bug 975460 Opened 10 years ago Closed 10 years ago

Crashes in libCGXType.A.dylib@0x111d | MacFontInfo::LoadFontFamilyData(nsAString_internal const&) on startup on OS X 10.6

Categories

(Core :: Graphics: Text, defect)

All
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox29 + verified
firefox30 + verified
firefox31 + verified

People

(Reporter: smichaud, Assigned: jtd)

References

Details

(Keywords: topcrash-mac)

Crash Data

Attachments

(6 files, 2 obsolete files)

These have started happening in small numbers over the last 2-3 weeks:

https://crash-stats.mozilla.com/report/list?signature=objc_msgSend+|+libCGXType.A.dylib%400x111d&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&platform=mac&hang_type=any&date=2014-02-21+17%3A00%3A00&range_value=4#tab-reports

The earliest build for which I can find a report is firefox-2014-02-03-03-02-03-mozilla-central:
bp-135d5c49-fc10-482c-96cb-439602140210

MacFontInfo::LoadFontFamilyData(nsAString_internal const&) was added by the patch for bug 962440, which I suspect is the trigger here.

For now there are only a few of these, as people who use trunk builds tend to use later versions of OS X.  But we have more users on OS X 10.6.X than on any other single major version of OS X.  And the first 29-branch beta will come out soon.  So I expect the frequency of these crashes to increase soon, possibly dramatically.
Crash Signature: [@ objc_msgSend | libCGXType.A.dylib@0x111d ]
Assignee: nobody → jdaggett
Guessing that we need an explicit test for CTFontRef being null.  Argh, silliness...
From the crash stack it looks like a deref of a null pointer so I'm guessing the CTFontRef passed in is null.  Not sure why this is 10.6-specific but the best thing would be to put in a simple null-check to guard against this sort of thing.
Attachment #8380489 - Flags: review?(jfkthame)
Comment on attachment 8380489 [details] [diff] [review]
patch, null check CTFontRef after creation

Review of attachment 8380489 [details] [diff] [review]:
-----------------------------------------------------------------

It's not clear to me from the docs whether CTFontCreateWithFontDescriptor can ever (correctly) return null here, but whatever... perhaps it's simply a 10.6 bug, and has become more reliable on newer OS X releases. All we can do is add the check and then see if the crashes go away, I guess.
Attachment #8380489 - Flags: review?(jfkthame) → review+
Pushed to inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c1eb349d342

Please leave the bug open as we need to determine whether this resolves the crashes we're seeing on 10.6.
Keywords: leave-open
There've been four of these crashes in the last four weeks, two before John's patch landed and two after.  So it seems the patch made no difference.

bp-049bcbb9-4d6c-40d4-a70f-c26952140215
bp-be7c013e-634b-4c9d-b99d-d12392140224
bp-dc53898c-1239-45e4-8fb2-c3b722140307
bp-5b6583b1-2567-4dc4-aa5f-1919a2140306
Crash Signature: [@ objc_msgSend | libCGXType.A.dylib@0x111d ] → [@ objc_msgSend | libCGXType.A.dylib@0x111d ] [@ read_uint8 ]
These are now the #3 topcrasher on the 29 branch.

How feasible would it be to turn off the async font loader (implemented by bug 962440) on OS X 10.6?
Flags: needinfo?(jdaggett)
(In reply to Steven Michaud from comment #7)
> These are now the #3 topcrasher on the 29 branch.
> 
> How feasible would it be to turn off the async font loader (implemented by
> bug 962440) on OS X 10.6?

It's possible but not really a great option. I think the next step here is to add crashreporter annotations to try and narrow down the exact font or configuration that seems to be causing this.
Flags: needinfo?(jdaggett)
Steven, I'm wondering if this is an exception we can simply catch? The code here is failing trying to call CTFontCopyTable.  If we wrap this in a Obj-C @try-@catch block along the lines of the code below, would that avoid the crash?

http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsClipboard.mm#57

It's interesting that all the crashes are in 10.6.8, with *no* other variation, even in minor version.
(In reply to comment #9)

> If we wrap this in a Obj-C @try-@catch block along the lines of the
> code below, would that avoid the crash?

It might.  But it might also leave FF in an unstable state, and cause
crashes (or other problems) later on.  I wouldn't want to do this
unless we find effective STR for this bug, and can use it to also test
the "fix".

> It's interesting that all the crashes are in 10.6.8, with *no* other
> variation, even in minor version.

This probably isn't significant -- 10.6.8 has been the current 10.6
minor version for some years now, and for all that time has been
easily (almost "automatically") available via Software Update.  Even
if it was, though, that would just mean that the true cause of these
crashes is probably an Apple bug ... and that's what I've thought all
along.

But Apple isn't going to fix this bug in 10.6 now (even if it turns
out to have security implications), so we'll need to work around it.

(In reply to comment #8)

> I think the next step here is to add crashreporter annotations to
> try and narrow down the exact font or configuration that seems to be
> causing this.

This sounds promising, but I wouldn't know what annotations to add
(what annotations might be useful).  Can you do this yourself?

If we want to pursue this, it needs to be done quickly and then
uplifted to alpha and beta (the 29 branch), so that we have a chance
to fix this on the 29 branch before it's released.

I'll try to find time later this week to look for STR on OS X 10.6.8.
But past experience tells me I'll either succeed right away or not at
all.
Top of one good crash stack (on secondary thread), from bp-02a4157e-5bb0-4d0a-80ea-e5bb92140326:

0 	libFontStreams.A.dylib 	read_uint8 	
1 	libFontStreams.A.dylib 	ttf_stream_read_uint32 	
2 	libCGXType.A.dylib 	get_font_instances 	
3 	libCGXType.A.dylib 	xt_variation_handler_get_instance_count 	
4 	CoreGraphics 	CGFontGetVariationInstance 	
5 	CoreGraphics 	CGFontNameTableCreate 	
6 	CoreGraphics 	get_name_table 	
7 	CoreGraphics 	CGFontCopyPostScriptName 	
8 	CoreGraphics 	CGFontCreateFontsWithURL 	
9 	CoreText 	CoreText@0x73e0 	
10 	CoreText 	CoreText@0x71b9 	
11 	CoreText 	CoreText@0x7000 	
12 	CoreText 	CoreText@0x53c44 	
13 	XUL 	MacFontInfo::LoadFontFamilyData(nsAString_internal const&) 	gfx/thebes/gfxMacPlatformFontList.mm
14 	XUL 	MacFontInfo::LoadFontFamilyData(nsAString_internal const&) 	gfx/thebes/gfxMacPlatformFontList.mm

Atos translations of lines 9-12:

TCGFont::TCGFont(__CFURL const*, bool) (in CoreText) + 60
TCGFontCache::CopyFont(__CFURL const*) const (in CoreText) + 103
TBaseFont::CopyNativeFont() const (in CoreText) + 44
TBaseFont::CopyTable(unsigned int, unsigned int) const (in CoreText) + 308
For what it's worth, CGFontCreateFontsWithURL() is an undocumented method with the following declaration:

CFArrayRef CGFontCreateFontsWithURL(CFURLRef url);

It returns an array of CGFont objects.  In all the tests I've run so far 'url' is a file:// url -- so this method being called doesn't (necessarily) mean that it's trying to create a font with data from a remote location.  Also I've never seen it return an array containing more than one CGFont.

Here's one example of a 'url':
file:///Library/Fonts/BlackoakStd.otf#postscript-name=BlackoakStd

I can give more if needed.
> I'll try to find time later this week to look for STR on OS X
> 10.6.8.  But past experience tells me I'll either succeed right away
> or not at all.

I tried this with today's m-c nightly and FF 29b2 -- no luck (no
crashes).

The bug may have to do with which fonts are being "created".  Or it
may be just a timing thing.  At this point we have no way to know.

I strongly suspect that the best solution here will turn out to be
turning off the async font loader on OS X 10.6.
(In reply to Steven Michaud from comment #12)
> For what it's worth, CGFontCreateFontsWithURL() is an undocumented method
> with the following declaration:
> 
> CFArrayRef CGFontCreateFontsWithURL(CFURLRef url);
> 
> It returns an array of CGFont objects.  In all the tests I've run so far
> 'url' is a file:// url -- so this method being called doesn't (necessarily)
> mean that it's trying to create a font with data from a remote location. 
> Also I've never seen it return an array containing more than one CGFont.

FWIW, it would presumably return an array with multiple CGFont objects if the URL referred to a .ttc file containing multiple faces, such as /Library/Fonts/Futura.ttc; although if they always add a fragment identifier that names a specific face, then you'd still only get one.
Some more debugging:

Here's a declaration of the undocumented and unexported CGFontGetVariationInstance() method:

void *CGFontGetVariationInstance(CGFontRef);

I've never seen it return anything but NULL.  So it may be more likely to cause trouble when it *shouldn't* return NULL.  Of what kind of fonts might this be true?
(In reply to Steven Michaud from comment #15)
> Some more debugging:
> 
> Here's a declaration of the undocumented and unexported
> CGFontGetVariationInstance() method:
> 
> void *CGFontGetVariationInstance(CGFontRef);
> 
> I've never seen it return anything but NULL.  So it may be more likely to
> cause trouble when it *shouldn't* return NULL.  Of what kind of fonts might
> this be true?

AAT variation fonts (Apple's alternative to Adobe's now-defunct Multiple Master technology), of which most likely the only one present is Skia.ttf.
> AAT variation fonts (Apple's alternative to Adobe's now-defunct
> Multiple Master technology), of which most likely the only one
> present is Skia.ttf.

Skia.ttf is indeed present on my OS X 10.6.8 partition, and
CGFontGetVariationInstance() does return a (different) non-zero value
for each of two different instances of a "Skia-Regular" CGFontRef
object.

But this doesn't lead to any crashes. 

So comment #15 is probably a red herring.
Thinking about this a tad more, I was wondering why we don't see this crash with existing code, which basically uses the same logic the new async font loader uses. The one distinction is the method used to pull the name table is different. The non-async cases uses the CGFont version while the async code uses the CTFont version and this is where all the crash stacks show the code is failing.

So I think one solution would be to use a static method to access cmap/name tables with the code on 10.6 going through the extra step of converting CTFont ==> CGFont and then using the same table lookup as existing code. Not pretty but the chances of Apple fixing a 10.6 bug are pretty close to zero I think.
Does current code also load fonts on secondary threads?  *This* is what I thought was the key difference in our behavior after the patch for bug 962440 landed.
(Following up comment #19)

I just confirmed this with a pre bug 962440 nightly.

Before that patch, *no* calls to CGFontCreateFontsWithURL() happen on secondary threads.  After it some do and some don't.
(In reply to Steven Michaud from comment #19)
> Does current code also load fonts on secondary threads?  *This* is what I
> thought was the key difference in our behavior after the patch for bug
> 962440 landed.

It does but my hunch is that's not the problem here. All the CGFont/CTFont code is supposed to be thread-safe.  I think we're just working around a nasty internal Apple bug in the CTFont code.

I write up a patch to do what I suggested. Then we can see if the call pattern looks different. I think it would be worth a try.  The fallback would be simply to disable the async font loader on 10.6.
https://developer.apple.com/library/mac/documentation/Carbon/Reference/CoreText_Framework_Ref/_index.html

"Multicore Considerations: All individual functions in Core Text are
thread safe. Font objects (CTFont, CTFontDescriptor, and associated
objects) can be used simultaneously by multiple operations, work
queues, or threads. However, the layout objects (CTTypesetter,
CTFramesetter, CTRun, CTLine, CTFrame, and associated objects) should
be used in a single operation, work queue, or thread."

At least, that's the theory... ;)
It's pretty likely that some part of Apple's font code isn't threadsafe on OS X 10.6.

But we have no idea *what* part, and no good way to find out (without STR for these crashes).
When I get the chance (next week?) I'll start adding arbitrary delays to the code (in libFontStreams.dylib) that actually reads the fonts -- only when it's run on secondary threads, only when it's run on the main thread, and/or some combination of the two).  It's possible that this will allow me to *make* these crashes happen -- which will serve as a poor man's STR.

But there's no way to tell whether or not I'll be successful, and how long it will take.

In the meantime I really think you should write a patch that disables the patch for bug 962440 on OS X 10.6 -- if only as a fallback.  That will save us from a mad rush just before FF 29 is released, and from the possibility that we'll have no choice (given the shortage of time) but to back out bug 962440 on *all* platforms.
(In reply to Steven Michaud from comment #24)
> In the meantime I really think you should write a patch that disables the
> patch for bug 962440 on OS X 10.6 -- if only as a fallback.  That will save
> us from a mad rush just before FF 29 is released, and from the possibility
> that we'll have no choice (given the shortage of time) but to back out bug
> 962440 on *all* platforms.

Ok, will do. But I *really* want to avoid that solution!!
> Ok, will do.

Thanks!!

> But I *really* want to avoid that solution!!

I'll do what I can to help you.

For now, what's most likely to be helpful is the "poor man's STR" I mentioned in comment #24.
Tested locally by jiggling around the version numbers, both the full 10.6 disable case and the use of CGFontCopyTable within the async loader.
Attachment #8397527 - Flags: review?(smichaud)
So my suggestion would be to land this patch and keep this bug open. Then see what happens on trunk after a week. If we're still seeing crashes, disabling the async font loader will be a one-liner.
Comment on attachment 8397527 [details] [diff] [review]
patch, avoid calling CTCopyFontTable on 10.6

It's just a guess that this will help with the crashes, so I think it's unlikely to do so.  But we have some time to play with, so I'd say go ahead.

We don't have many OS X 10.6.X users on the trunk, so it'll be a few days before we know whether or not it worked.  But we also don't have much time, so we shouldn't let our test run longer than a week.

If there's even one crash with this patch, we should morph it into a patch that turns off async font loading on OS X 10.6 and quickly land *that*.  Though I'm pretty confident that will clear up the crashes, we can't be certain without testing.  So we also need to leave enough time for the turn-off-async-font-loading-on-OSX-10.6 patch to be tested on trunk.

Given the small amount of time that remains before the FF 29 release, it'll probably only be afterwards that we can start looking for the "real" problem here (presuming that your current patch doesn't stop the crashes).

I notice this patch contains a driveby -- unwinding gfxPlatformMac::OSXVersion().  That's fine with me.

I haven't tested this patch.  Have you tested it, on OS X 10.6?

I'll test it myself and report back.
Attachment #8397527 - Flags: review?(smichaud) → review+
Lukas, do you agree with what I said about release timing in comment #29?  Specifically, do we have time to test John's current patch and then my turn-off-async-font-loading-on-OSX-10.6 patch, on trunk, for potentially one week each, before it's too late to get one of those patches into FF 29?

If I should be asking someone else, please pass the needinfo along.
Flags: needinfo?(lsblakk)
Comment on attachment 8397527 [details] [diff] [review]
patch, avoid calling CTCopyFontTable on 10.6

Review of attachment 8397527 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +1079,5 @@
> +    }
> +
> +    if (nsCocoaFeatures::OnLionOrLater()) {
> +        fontTable = CTFontCopyTable(aFontRef, aTableTag,
> +                                    kCTFontTableOptionNoOptions);

I'm a bit concerned at the use of CTFontCopyTable with kCTFontTableOptionNoOptions (originally introduced in bug 962440, just being moved here).

This implies that we may get tables containing "synthetic" data that Core Text chooses to construct, rather than what's really in the font file. We've had problems in the past due to this, where (for example) Apple's code synthesizes a fake 'cmap' table because they know that Cocoa or Core Text will apply certain (de)compositions or fallbacks; but then when we use that data via harfbuzz, we fail.

I'd be much more comfortable using CTFontCopyTable if we were passing the kCTFontTableOptionExcludeSynthetic constant. Though the Core Text docs say this is "deprecated in OS X v10.8", which worries me ... does this mean Apple is intending to remove the ability to get "raw" font tables via CTFont altogether eventually? If so, perhaps we'd do better to stick with the lower-level CGFont APIs.
> I'll test it myself and report back.

I did some brief testing and saw no problems.

I tested both John's current patch and its turn-off-async-font-loading-on-OSX-10.6 case.  In the latter case (only, and as expected) I never saw CGFontCreateFontsWithURL() or CGFontGetVariationInstance() running on anything but the main thread.
(Following up comment #30)

Lukas tells me on IRC that, as we're currently in week 2 of beta for FF 29, we should have time for a couple of weeks of testing before we have to land something at the end of week 4.  But as Sylvestre Ledru is point person for the FF 29 release, she'd like me to get confirmation from him.

Sylvestre, what do you think?  (See comment #30 above.)
Flags: needinfo?(lsblakk) → needinfo?(sledru)
Sylvestre, when you answer my needinfo, please also tell us the deadline for changes to FF 29.

Rereading comment #33, it sounds like we're already uncomfortably close to this deadline, and that we don't really have time to test two different patches on the trunk (for one week each).
(Following up comment #17)

I've discovered that calls to read_uint8() only happen when we're dealing with a font that has variations (that has an 'fvar' table, as returned by FPFontGetTable()).  Also, the crashes that happen at libCGXType.A.dylib@0x111d happen in get_font_instances(), when a bad (and non-null) CFDataRef object returned by FPFontGetTable('fvar') is passed to CFDataGetLength(), before the calls to read_uint8().

So it really does seem to be true, after all, that this bug only happens with fonts that have variations.

(By the way, the objc_msgSend in "objc_msgSend | libCGXType.A.dylib@0x111d" is spurious.)

I haven't yet tried inserting arbitrary delays.  I'll try that next week.

Jonathan and John, do you know where I might be able to find fonts with variations that I can download and play with?
(Following up comment #35)

By the way, all these calls still happen, on a secondary thread, with or without John's patch from comment #27.  I strongly suspect this means the patch will make no difference to the crashes.
Another thing:

John's patch with the "turn off async fonts" variant doesn't load the Skia fonts at all (and so never calls read_uint8).  Don't know what's going on with that, but something may be wrong.
Based on current evidence, it's probably FPFontGetTable() (from libFontParser.dylib in the Resources directory of the ATS framework) that's not thread-safe on OS X 10.6 -- or at least which sometimes doesn't work properly when called on a secondary thread in OS X 10.6.  It sometimes returns CFDataRef objects which are garbled or incomplete -- sometimes so badly garbled that CFDataGetLength() chokes on them; and sometimes less badly, so that CFDataGetLength() "works", but then read_uint8() crashes reading the CFDataRef's contents.
(In reply to Jonathan Kew (:jfkthame) from comment #31)
> Comment on attachment 8397527 [details] [diff] [review]
> patch, avoid calling CTCopyFontTable on 10.6
> 
> Review of attachment 8397527 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/thebes/gfxMacPlatformFontList.mm
> @@ +1079,5 @@
> > +    }
> > +
> > +    if (nsCocoaFeatures::OnLionOrLater()) {
> > +        fontTable = CTFontCopyTable(aFontRef, aTableTag,
> > +                                    kCTFontTableOptionNoOptions);
> 
> I'm a bit concerned at the use of CTFontCopyTable with
> kCTFontTableOptionNoOptions (originally introduced in bug 962440, just being
> moved here).
> 
> This implies that we may get tables containing "synthetic" data that Core
> Text chooses to construct, rather than what's really in the font file. We've
> had problems in the past due to this, where (for example) Apple's code
> synthesizes a fake 'cmap' table because they know that Cocoa or Core Text
> will apply certain (de)compositions or fallbacks; but then when we use that
> data via harfbuzz, we fail.

I think this is old behavior, current implementations don't appear to
do the cmap swizzling you describe. I used 'cmapdata' logging to dump
out the cmaps of all fonts on the system. I tested with and without
this option on both 10.6 and 10.8 and in both cases the resulting
cmaps are identical for all fonts on the system.

To test this:

1. In about:config, enable 'gfx.font_rendering.fallback.always_use_cmaps'
2. Navigate to the executable directory and use the commands below to run Firefox:

  export NSPR_LOG_MODULES=cmapdata.out
  export NSPR_LOG_MODULES=fontinit:5,cmapdata:5
  ./firefox -P profile_name

This will dump out the cmaps for all fonts into the logfile.

This is an deprecated option, compiling with this option results in the warning below:

  warning: 'kCTFontTableOptionExcludeSynthetic' is deprecated

So I don't think the use of this option is really needed. My guess is
that Apple realized doing this auto-swizzling was an expensive waste
and problematic so it has switched to not doing this by default.
Tryserver builds:
With patch to use CGFontCopyTable on 10.6:
https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/jdaggett@mozilla.com-32c0eb3d9297

Patch to use CTFontCopyTable with kCTFontTableOptionExcludeSynthetic on all OSX versions:
https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/jdaggett@mozilla.com-58a58a23a8da
(In reply to Steven Michaud from comment #37)
> Another thing:
> 
> John's patch with the "turn off async fonts" variant doesn't load the Skia
> fonts at all (and so never calls read_uint8).  Don't know what's going on
> with that, but something may be wrong.

Are you using the patch here as written? That doesn't disable the async fontloader on 10.6. You'll need to tweak the version number to do that.

Using the tryserver builds and cmapdata logging described above, I verified that cmaps are successfully dumped for all system fonts, including 10 faces of Skia.

Could this possibly be an artifact of your debugging technique?
(In reply to Steven Michaud from comment #38)
> Based on current evidence, it's probably FPFontGetTable() (from
> libFontParser.dylib in the Resources directory of the ATS framework) that's
> not thread-safe on OS X 10.6 -- or at least which sometimes doesn't work
> properly when called on a secondary thread in OS X 10.6.  It sometimes
> returns CFDataRef objects which are garbled or incomplete -- sometimes so
> badly garbled that CFDataGetLength() chokes on them; and sometimes less
> badly, so that CFDataGetLength() "works", but then read_uint8() crashes
> reading the CFDataRef's contents.

So you have a reproducible set of steps for which you see data corruption when run on a thread under 10.6 but when run without the async font loader you don't see that data corruption?
I really, really hope we can avoid the need to apply this patch but here it is just in case.  The async fontloader is simply caching cmap/name data on an async thread.  If the async thread doesn't run the cache will be empty and the existing font data loading code is used.
Having it for beta 7 would be nice (April 10th) even if beta 8 would be acceptable too (April 14th). Will it work for you? (the first deadline is not exactly two weeks but close)
Flags: needinfo?(sledru)
(In reply to comment #41)

> Are you using the patch here as written? That doesn't disable the
> async fontloader on 10.6. You'll need to tweak the version number to
> do that.

I changed the version number from 0x1060 to 0x1070.  Otherwise I left
your patch as it is.

> Could this possibly be an artifact of your debugging technique?

*Very* unlikely.  The same debugging apparatus allows the Skia fonts
to be loaded (and sees them being loaded) in all other cases.

I'll try your straight off "disable async font loader" patch below,
and see if that makes a difference.

(By the way, I'll post the interpose library I've been using in a
later comment.)
(In reply to comment #42)

>> Based on current evidence, it's probably FPFontGetTable() (from
>> libFontParser.dylib in the Resources directory of the ATS
>> framework) that's not thread-safe on OS X 10.6 -- or at least which
>> sometimes doesn't work properly when called on a secondary thread
>> in OS X 10.6.  It sometimes returns CFDataRef objects which are
>> garbled or incomplete -- sometimes so badly garbled that
>> CFDataGetLength() chokes on them; and sometimes less badly, so that
>> CFDataGetLength() "works", but then read_uint8() crashes reading
>> the CFDataRef's contents.
>
> So you have a reproducible set of steps for which you see data
> corruption when run on a thread under 10.6 but when run without the
> async font loader you don't see that data corruption?

No.  I have an interpose library that hooks and logs the calls to the
relevant methods.  I'm also using a disassembler to look at the
machine code of the relevant frameworks and dylibs.  And of course I'm
looking at the crash stacks for this bug.

(The disassembler is Hopper Disassembler http://www.hopperapp.com/.
This is an excellent program, which I've found extremely useful.  It's
particularly good at tracing cross-references -- which other functions
call a given function, or reference a particular string in the DATA
segment, and so forth.  Unfortunately not open source, though.)
(In reply to comment #45)

Thanks, Sylvester!

We have more time than I realized.  Yes, it should be enough to test John's current patch for a week and then his "disable async fontloader on OSX 10.6" patch for a week.
Keywords: leave-open
Here's the interpose library I've been using for debug logging.  Instructions for use are at the top of interpose.mm.

As written, this will only work on OS X 10.6.8.  To use it on other versions of OS X, you'll need to change various instances of 'topBytes' in OnAddImage() as per the comments.

I'll be adding to this over time.  I'll post other versions later.
John, your patch from comment #44 doesn't compile.  This does.

But this patch also has the trouble I described in comment #37.

If I run FF (with this patch) with a clean profile on OS X 10.6.8, I never see the Skia fonts loading (or any calls to read_uint8()) -- even if I wait for a full minute.

With a pre-bug-962440 m-c nightly (e.g. firefox-2014-01-29-03-06-02-mozilla-central, the nightly just before it landed) I see the Skia fonts (and a bunch of others) load 5-10 seconds after the about:home page is displayed.  On the main thread of course.  With today's m-c nightly I see them load at the same time, on a secondary thread.

I don't know that this is a serious problem.  For example I don't know if this patch will prevent the Skia font (and others that are loaded after about:home is displayed) from being loaded "on demand" (when some page uses it).  All I'm saying is that your patch to "turn off async font loading on OS X 10.6" doesn't fully restore the pre-bug-962440 behavior.
Attachment #8398351 - Attachment is obsolete: true
Attachment #8398351 - Attachment is obsolete: false
(In reply to Steven Michaud from comment #50)
> Created attachment 8398697 [details] [diff] [review]
> Disable async font loader on OS X 10.6
> 
> John, your patch from comment #44 doesn't compile.  This does.

That patch was written on top of what is now trunk code. If we decide to omit the other changes here and just disable the async font loader on 10.6, then your patch will be fine for applying to the beta branch. But I'd really like to see what sort of crashstates we get back before rushing to make a judgement here.

> But this patch also has the trouble I described in comment #37.
> 
> If I run FF (with this patch) with a clean profile on OS X 10.6.8, I never
> see the Skia fonts loading (or any calls to read_uint8()) -- even if I wait
> for a full minute.
> 
> With a pre-bug-962440 m-c nightly (e.g.
> firefox-2014-01-29-03-06-02-mozilla-central, the nightly just before it
> landed) I see the Skia fonts (and a bunch of others) load 5-10 seconds after
> the about:home page is displayed.  On the main thread of course.  With
> today's m-c nightly I see them load at the same time, on a secondary thread.
> 
> I don't know that this is a serious problem.  For example I don't know if
> this patch will prevent the Skia font (and others that are loaded after
> about:home is displayed) from being loaded "on demand" (when some page uses
> it).  All I'm saying is that your patch to "turn off async font loading on
> OS X 10.6" doesn't fully restore the pre-bug-962440 behavior.

Hmmm. When you say you see the font what do you mean precisely? If you enable the logging described in comment 39, do you *not* see any Skia faces logged? I tested this logging on both 10.6 and 10.8 and in both cases I see the cmap data for Skia faces getting loaded.
Latest crash stats show crashes in the async font loader, even with previous patches applied:

https://crash-stats.mozilla.com/report/index/9875138a-17f7-4010-a7b2-d292b2140401

So, *sigh* I guess the only alternative here is to disable the async font loader on 10.6.

This patch disables the async font loader on 10.6 and fixes the problem that was causing other family names not to be loaded correctly when running on 10.6 (this is why Steven wasn't see the font table reads for Skia fonts).  It also avoids the loading of other family names for the synthetic Osaka-Mono family.

This patch is for landing on trunk. I'll write up a separate patch for the aurora/beta branches.
Attachment #8398351 - Attachment is obsolete: true
Attachment #8398697 - Attachment is obsolete: true
Attachment #8400417 - Flags: review?(smichaud)
Remove the function that special cases font table loads within the async font loader. Since the loader is disabled now under 10.6, we don't need any special case code.

Note: this patch is *not* needed on aurora/beta
Attachment #8400420 - Flags: review?(smichaud)
Same as other patch but adjusted to match aurora/beta code.
Attachment #8400429 - Flags: review?(smichaud)
Comment on attachment 8400417 [details] [diff] [review]
patch, disable async fontloader on OSX 10.6

This looks fine to me.

I tested it and the following patch on OS X 10.7.5 and 10.6.8, with my interpose library from comment #49.  I didn't see any problems -- the Skia fonts got loaded when I expected them, and on the threads I expected (a secondary thread for 10.7.5, the main thread for 10.6.8).

I'm not familiar with the code this patch changes, so I might have missed something.  You should probably also ask someone else for a review -- Jonathan?
Attachment #8400417 - Flags: review?(smichaud) → review+
Attachment #8400420 - Flags: review?(smichaud) → review+
Comment on attachment 8400429 [details] [diff] [review]
patch, disable async fontloader on OSX 10.6 (aurora/beta version)

As with your patch from comment #53, you should probably also get someone else who's more familiar with the code to review this.
Attachment #8400429 - Flags: review?(smichaud) → review+
As I said above, I'm reasonably confident these patches will work around the crashes.  But I'm not certain.  So let's leave the "keep-open" keyword in place, and check the crash stats in a few days.
(Following up comment #38)

It'll be a *lot* of work to carry my research beyond this point, which will potentially be very time consuming.  I'm also very busy with other bugs.

And, as far as I know, OS X 10.6.8 users won't suffer too badly for not having async font loading.

Please let me know if you think otherwise.  But it'll take a very serious problem (like crashes or hangs or badly broken UI) to keep me working on this bug :-)
Attachment #8400417 - Flags: review?(m_kato)
Attachment #8400420 - Flags: review?(m_kato)
Attachment #8400429 - Flags: review?(m_kato)
(In reply to Steven Michaud from comment #59)
> (Following up comment #38)
> 
> It'll be a *lot* of work to carry my research beyond this point, which will
> potentially be very time consuming.  I'm also very busy with other bugs.
> 
> And, as far as I know, OS X 10.6.8 users won't suffer too badly for not
> having async font loading.

Yes, I think we're good. It's too bad we can't enable the async font loader on 10.6 but it's not the end of the world by any means.

I agree that we should leave this open until we can confirm that crashes no longer occur on 10.6. Then upstream the aurora/beta patch.
Attachment #8400417 - Flags: review?(m_kato) → review+
Attachment #8400420 - Flags: review?(m_kato) → review+
Attachment #8400429 - Flags: review?(m_kato) → review+
Pushed to inbound
https://hg.mozilla.org/integration/mozilla-inbound/rev/db80411a6185
https://hg.mozilla.org/integration/mozilla-inbound/rev/c80bb0a566bf

Please leave the bug open until we can confirm that this fixes the 10.6 crashes.
Steven, is this something you can test, or that you can give QA some advice on how to test? Thanks!
Flags: needinfo?(smichaud)
Steven: sorry, I just saw comment 58. So it makes more sense to check crash-stats later next week.
Comment on attachment 8400429 [details] [diff] [review]
patch, disable async fontloader on OSX 10.6 (aurora/beta version)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): async font loader (bug 962440)
User impact if declined: crashes will occur for users on OSX 10.6
Testing completed (on m-c, etc.): patches landed on 4/3, no crashes on trunk since then
Risk to taking this patch (and alternatives if risky): low
String or IDL/UUID changes made by this patch: none

This patch disables the async font loader on OSX 10.6. We don't know precisely why this crashes but it's clearly something that happens due to calls to CoreText API's from off-main threads.  This is most likely an Apple problem so to avoid a large uptick in crashes for 10.6 users, it's best if we disable the async font loader.  Users on 10.6 will experience the same behavior they experienced before the async font loader was landed.
Attachment #8400429 - Flags: approval-mozilla-beta?
Attachment #8400429 - Flags: approval-mozilla-aurora?
Attachment #8400429 - Flags: approval-mozilla-beta?
Attachment #8400429 - Flags: approval-mozilla-beta+
Attachment #8400429 - Flags: approval-mozilla-aurora?
Attachment #8400429 - Flags: approval-mozilla-aurora+
Flags: needinfo?(smichaud)
In combined signatures this is way down, only 3 crashes reported in the last week, all with builds dated 2014-04-01 or older. I think it's probably safe to call this fixed. Steven?
Flags: needinfo?(smichaud)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #67)
> In combined signatures this is way down, only 3 crashes reported in the last
> week, all with builds dated 2014-04-01 or older. I think it's probably safe
> to call this fixed. Steven?

Sorry, just to clarify, these numbers are for Firefox 31. I've not yet checked Aurora/Beta and suspect that will take a few days before we have the data to make a determination for those branches.
> I think it's probably safe to call this fixed. Steven?

Yes, I agree.

In a few days you can check the 30 branch crash stats.  For the 29 branch you'll have to wait until a beta is released containing the patch for this bug.  But I think the results on the 31 branch just about guarantee success on the other branches.
Flags: needinfo?(smichaud)
Status: NEW → RESOLVED
Closed: 10 years ago
Keywords: leave-open
Resolution: --- → FIXED
Verified fixed as per comment 68 and 69. Tracking to verify against Aurora and Beta in a few days.
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla31
No crashes on trunk after 4/1, no crashes on aurora after 4/7. Yay...
See Also: → 1083356
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: