Closed
Bug 975460
Opened 10 years ago
Closed 10 years ago
Crashes in libCGXType.A.dylib@0x111d | MacFontInfo::LoadFontFamilyData(nsAString_internal const&) on startup on OS X 10.6
Categories
(Core :: Graphics: Text, defect)
Tracking
()
VERIFIED
FIXED
mozilla31
People
(Reporter: smichaud, Assigned: jtd)
References
Details
(Keywords: topcrash-mac)
Crash Data
Attachments
(6 files, 2 obsolete files)
1.11 KB,
patch
|
jfkthame
:
review+
|
Details | Diff | Splinter Review |
6.53 KB,
patch
|
smichaud
:
review+
|
Details | Diff | Splinter Review |
7.72 KB,
application/x-tar
|
Details | |
3.28 KB,
patch
|
smichaud
:
review+
m_kato
:
review+
|
Details | Diff | Splinter Review |
3.60 KB,
patch
|
smichaud
:
review+
m_kato
:
review+
|
Details | Diff | Splinter Review |
3.69 KB,
patch
|
smichaud
:
review+
m_kato
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
These have started happening in small numbers over the last 2-3 weeks: https://crash-stats.mozilla.com/report/list?signature=objc_msgSend+|+libCGXType.A.dylib%400x111d&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&platform=mac&hang_type=any&date=2014-02-21+17%3A00%3A00&range_value=4#tab-reports The earliest build for which I can find a report is firefox-2014-02-03-03-02-03-mozilla-central: bp-135d5c49-fc10-482c-96cb-439602140210 MacFontInfo::LoadFontFamilyData(nsAString_internal const&) was added by the patch for bug 962440, which I suspect is the trigger here. For now there are only a few of these, as people who use trunk builds tend to use later versions of OS X. But we have more users on OS X 10.6.X than on any other single major version of OS X. And the first 29-branch beta will come out soon. So I expect the frequency of these crashes to increase soon, possibly dramatically.
Reporter | ||
Updated•10 years ago
|
Crash Signature: [@ objc_msgSend | libCGXType.A.dylib@0x111d ]
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jdaggett
Assignee | ||
Comment 1•10 years ago
|
||
Guessing that we need an explicit test for CTFontRef being null. Argh, silliness...
Assignee | ||
Comment 2•10 years ago
|
||
From the crash stack it looks like a deref of a null pointer so I'm guessing the CTFontRef passed in is null. Not sure why this is 10.6-specific but the best thing would be to put in a simple null-check to guard against this sort of thing.
Attachment #8380489 -
Flags: review?(jfkthame)
Comment 3•10 years ago
|
||
Comment on attachment 8380489 [details] [diff] [review] patch, null check CTFontRef after creation Review of attachment 8380489 [details] [diff] [review]: ----------------------------------------------------------------- It's not clear to me from the docs whether CTFontCreateWithFontDescriptor can ever (correctly) return null here, but whatever... perhaps it's simply a 10.6 bug, and has become more reliable on newer OS X releases. All we can do is add the check and then see if the crashes go away, I guess.
Attachment #8380489 -
Flags: review?(jfkthame) → review+
Assignee | ||
Comment 4•10 years ago
|
||
Pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/8c1eb349d342 Please leave the bug open as we need to determine whether this resolves the crashes we're seeing on 10.6.
Keywords: leave-open
Comment 5•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/8c1eb349d342
Reporter | ||
Comment 6•10 years ago
|
||
There've been four of these crashes in the last four weeks, two before John's patch landed and two after. So it seems the patch made no difference. bp-049bcbb9-4d6c-40d4-a70f-c26952140215 bp-be7c013e-634b-4c9d-b99d-d12392140224 bp-dc53898c-1239-45e4-8fb2-c3b722140307 bp-5b6583b1-2567-4dc4-aa5f-1919a2140306
Reporter | ||
Updated•10 years ago
|
Crash Signature: [@ objc_msgSend | libCGXType.A.dylib@0x111d ] → [@ objc_msgSend | libCGXType.A.dylib@0x111d ]
[@ read_uint8 ]
Reporter | ||
Comment 7•10 years ago
|
||
These are now the #3 topcrasher on the 29 branch. How feasible would it be to turn off the async font loader (implemented by bug 962440) on OS X 10.6?
Reporter | ||
Updated•10 years ago
|
Keywords: leave-open → topcrash-mac
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(jdaggett)
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Steven Michaud from comment #7) > These are now the #3 topcrasher on the 29 branch. > > How feasible would it be to turn off the async font loader (implemented by > bug 962440) on OS X 10.6? It's possible but not really a great option. I think the next step here is to add crashreporter annotations to try and narrow down the exact font or configuration that seems to be causing this.
Flags: needinfo?(jdaggett)
Assignee | ||
Comment 9•10 years ago
|
||
Steven, I'm wondering if this is an exception we can simply catch? The code here is failing trying to call CTFontCopyTable. If we wrap this in a Obj-C @try-@catch block along the lines of the code below, would that avoid the crash? http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsClipboard.mm#57 It's interesting that all the crashes are in 10.6.8, with *no* other variation, even in minor version.
Reporter | ||
Comment 10•10 years ago
|
||
(In reply to comment #9) > If we wrap this in a Obj-C @try-@catch block along the lines of the > code below, would that avoid the crash? It might. But it might also leave FF in an unstable state, and cause crashes (or other problems) later on. I wouldn't want to do this unless we find effective STR for this bug, and can use it to also test the "fix". > It's interesting that all the crashes are in 10.6.8, with *no* other > variation, even in minor version. This probably isn't significant -- 10.6.8 has been the current 10.6 minor version for some years now, and for all that time has been easily (almost "automatically") available via Software Update. Even if it was, though, that would just mean that the true cause of these crashes is probably an Apple bug ... and that's what I've thought all along. But Apple isn't going to fix this bug in 10.6 now (even if it turns out to have security implications), so we'll need to work around it. (In reply to comment #8) > I think the next step here is to add crashreporter annotations to > try and narrow down the exact font or configuration that seems to be > causing this. This sounds promising, but I wouldn't know what annotations to add (what annotations might be useful). Can you do this yourself? If we want to pursue this, it needs to be done quickly and then uplifted to alpha and beta (the 29 branch), so that we have a chance to fix this on the 29 branch before it's released. I'll try to find time later this week to look for STR on OS X 10.6.8. But past experience tells me I'll either succeed right away or not at all.
Reporter | ||
Updated•10 years ago
|
status-firefox29:
--- → affected
status-firefox30:
--- → affected
status-firefox31:
--- → affected
tracking-firefox29:
--- → ?
tracking-firefox30:
--- → ?
tracking-firefox31:
--- → ?
Reporter | ||
Comment 11•10 years ago
|
||
Top of one good crash stack (on secondary thread), from bp-02a4157e-5bb0-4d0a-80ea-e5bb92140326: 0 libFontStreams.A.dylib read_uint8 1 libFontStreams.A.dylib ttf_stream_read_uint32 2 libCGXType.A.dylib get_font_instances 3 libCGXType.A.dylib xt_variation_handler_get_instance_count 4 CoreGraphics CGFontGetVariationInstance 5 CoreGraphics CGFontNameTableCreate 6 CoreGraphics get_name_table 7 CoreGraphics CGFontCopyPostScriptName 8 CoreGraphics CGFontCreateFontsWithURL 9 CoreText CoreText@0x73e0 10 CoreText CoreText@0x71b9 11 CoreText CoreText@0x7000 12 CoreText CoreText@0x53c44 13 XUL MacFontInfo::LoadFontFamilyData(nsAString_internal const&) gfx/thebes/gfxMacPlatformFontList.mm 14 XUL MacFontInfo::LoadFontFamilyData(nsAString_internal const&) gfx/thebes/gfxMacPlatformFontList.mm Atos translations of lines 9-12: TCGFont::TCGFont(__CFURL const*, bool) (in CoreText) + 60 TCGFontCache::CopyFont(__CFURL const*) const (in CoreText) + 103 TBaseFont::CopyNativeFont() const (in CoreText) + 44 TBaseFont::CopyTable(unsigned int, unsigned int) const (in CoreText) + 308
Reporter | ||
Comment 12•10 years ago
|
||
For what it's worth, CGFontCreateFontsWithURL() is an undocumented method with the following declaration: CFArrayRef CGFontCreateFontsWithURL(CFURLRef url); It returns an array of CGFont objects. In all the tests I've run so far 'url' is a file:// url -- so this method being called doesn't (necessarily) mean that it's trying to create a font with data from a remote location. Also I've never seen it return an array containing more than one CGFont. Here's one example of a 'url': file:///Library/Fonts/BlackoakStd.otf#postscript-name=BlackoakStd I can give more if needed.
Reporter | ||
Comment 13•10 years ago
|
||
> I'll try to find time later this week to look for STR on OS X
> 10.6.8. But past experience tells me I'll either succeed right away
> or not at all.
I tried this with today's m-c nightly and FF 29b2 -- no luck (no
crashes).
The bug may have to do with which fonts are being "created". Or it
may be just a timing thing. At this point we have no way to know.
I strongly suspect that the best solution here will turn out to be
turning off the async font loader on OS X 10.6.
Comment 14•10 years ago
|
||
(In reply to Steven Michaud from comment #12) > For what it's worth, CGFontCreateFontsWithURL() is an undocumented method > with the following declaration: > > CFArrayRef CGFontCreateFontsWithURL(CFURLRef url); > > It returns an array of CGFont objects. In all the tests I've run so far > 'url' is a file:// url -- so this method being called doesn't (necessarily) > mean that it's trying to create a font with data from a remote location. > Also I've never seen it return an array containing more than one CGFont. FWIW, it would presumably return an array with multiple CGFont objects if the URL referred to a .ttc file containing multiple faces, such as /Library/Fonts/Futura.ttc; although if they always add a fragment identifier that names a specific face, then you'd still only get one.
Reporter | ||
Comment 15•10 years ago
|
||
Some more debugging: Here's a declaration of the undocumented and unexported CGFontGetVariationInstance() method: void *CGFontGetVariationInstance(CGFontRef); I've never seen it return anything but NULL. So it may be more likely to cause trouble when it *shouldn't* return NULL. Of what kind of fonts might this be true?
Comment 16•10 years ago
|
||
(In reply to Steven Michaud from comment #15) > Some more debugging: > > Here's a declaration of the undocumented and unexported > CGFontGetVariationInstance() method: > > void *CGFontGetVariationInstance(CGFontRef); > > I've never seen it return anything but NULL. So it may be more likely to > cause trouble when it *shouldn't* return NULL. Of what kind of fonts might > this be true? AAT variation fonts (Apple's alternative to Adobe's now-defunct Multiple Master technology), of which most likely the only one present is Skia.ttf.
Reporter | ||
Comment 17•10 years ago
|
||
> AAT variation fonts (Apple's alternative to Adobe's now-defunct > Multiple Master technology), of which most likely the only one > present is Skia.ttf. Skia.ttf is indeed present on my OS X 10.6.8 partition, and CGFontGetVariationInstance() does return a (different) non-zero value for each of two different instances of a "Skia-Regular" CGFontRef object. But this doesn't lead to any crashes. So comment #15 is probably a red herring.
Assignee | ||
Comment 18•10 years ago
|
||
Thinking about this a tad more, I was wondering why we don't see this crash with existing code, which basically uses the same logic the new async font loader uses. The one distinction is the method used to pull the name table is different. The non-async cases uses the CGFont version while the async code uses the CTFont version and this is where all the crash stacks show the code is failing. So I think one solution would be to use a static method to access cmap/name tables with the code on 10.6 going through the extra step of converting CTFont ==> CGFont and then using the same table lookup as existing code. Not pretty but the chances of Apple fixing a 10.6 bug are pretty close to zero I think.
Reporter | ||
Comment 19•10 years ago
|
||
Does current code also load fonts on secondary threads? *This* is what I thought was the key difference in our behavior after the patch for bug 962440 landed.
Reporter | ||
Comment 20•10 years ago
|
||
(Following up comment #19) I just confirmed this with a pre bug 962440 nightly. Before that patch, *no* calls to CGFontCreateFontsWithURL() happen on secondary threads. After it some do and some don't.
Assignee | ||
Comment 21•10 years ago
|
||
(In reply to Steven Michaud from comment #19) > Does current code also load fonts on secondary threads? *This* is what I > thought was the key difference in our behavior after the patch for bug > 962440 landed. It does but my hunch is that's not the problem here. All the CGFont/CTFont code is supposed to be thread-safe. I think we're just working around a nasty internal Apple bug in the CTFont code. I write up a patch to do what I suggested. Then we can see if the call pattern looks different. I think it would be worth a try. The fallback would be simply to disable the async font loader on 10.6.
Assignee | ||
Comment 22•10 years ago
|
||
https://developer.apple.com/library/mac/documentation/Carbon/Reference/CoreText_Framework_Ref/_index.html "Multicore Considerations: All individual functions in Core Text are thread safe. Font objects (CTFont, CTFontDescriptor, and associated objects) can be used simultaneously by multiple operations, work queues, or threads. However, the layout objects (CTTypesetter, CTFramesetter, CTRun, CTLine, CTFrame, and associated objects) should be used in a single operation, work queue, or thread." At least, that's the theory... ;)
Reporter | ||
Comment 23•10 years ago
|
||
It's pretty likely that some part of Apple's font code isn't threadsafe on OS X 10.6. But we have no idea *what* part, and no good way to find out (without STR for these crashes).
Reporter | ||
Comment 24•10 years ago
|
||
When I get the chance (next week?) I'll start adding arbitrary delays to the code (in libFontStreams.dylib) that actually reads the fonts -- only when it's run on secondary threads, only when it's run on the main thread, and/or some combination of the two). It's possible that this will allow me to *make* these crashes happen -- which will serve as a poor man's STR. But there's no way to tell whether or not I'll be successful, and how long it will take. In the meantime I really think you should write a patch that disables the patch for bug 962440 on OS X 10.6 -- if only as a fallback. That will save us from a mad rush just before FF 29 is released, and from the possibility that we'll have no choice (given the shortage of time) but to back out bug 962440 on *all* platforms.
Assignee | ||
Comment 25•10 years ago
|
||
(In reply to Steven Michaud from comment #24) > In the meantime I really think you should write a patch that disables the > patch for bug 962440 on OS X 10.6 -- if only as a fallback. That will save > us from a mad rush just before FF 29 is released, and from the possibility > that we'll have no choice (given the shortage of time) but to back out bug > 962440 on *all* platforms. Ok, will do. But I *really* want to avoid that solution!!
Reporter | ||
Comment 26•10 years ago
|
||
> Ok, will do. Thanks!! > But I *really* want to avoid that solution!! I'll do what I can to help you. For now, what's most likely to be helpful is the "poor man's STR" I mentioned in comment #24.
Assignee | ||
Comment 27•10 years ago
|
||
Tested locally by jiggling around the version numbers, both the full 10.6 disable case and the use of CGFontCopyTable within the async loader.
Attachment #8397527 -
Flags: review?(smichaud)
Assignee | ||
Comment 28•10 years ago
|
||
So my suggestion would be to land this patch and keep this bug open. Then see what happens on trunk after a week. If we're still seeing crashes, disabling the async font loader will be a one-liner.
Reporter | ||
Comment 29•10 years ago
|
||
Comment on attachment 8397527 [details] [diff] [review] patch, avoid calling CTCopyFontTable on 10.6 It's just a guess that this will help with the crashes, so I think it's unlikely to do so. But we have some time to play with, so I'd say go ahead. We don't have many OS X 10.6.X users on the trunk, so it'll be a few days before we know whether or not it worked. But we also don't have much time, so we shouldn't let our test run longer than a week. If there's even one crash with this patch, we should morph it into a patch that turns off async font loading on OS X 10.6 and quickly land *that*. Though I'm pretty confident that will clear up the crashes, we can't be certain without testing. So we also need to leave enough time for the turn-off-async-font-loading-on-OSX-10.6 patch to be tested on trunk. Given the small amount of time that remains before the FF 29 release, it'll probably only be afterwards that we can start looking for the "real" problem here (presuming that your current patch doesn't stop the crashes). I notice this patch contains a driveby -- unwinding gfxPlatformMac::OSXVersion(). That's fine with me. I haven't tested this patch. Have you tested it, on OS X 10.6? I'll test it myself and report back.
Attachment #8397527 -
Flags: review?(smichaud) → review+
Reporter | ||
Comment 30•10 years ago
|
||
Lukas, do you agree with what I said about release timing in comment #29? Specifically, do we have time to test John's current patch and then my turn-off-async-font-loading-on-OSX-10.6 patch, on trunk, for potentially one week each, before it's too late to get one of those patches into FF 29? If I should be asking someone else, please pass the needinfo along.
Flags: needinfo?(lsblakk)
Comment 31•10 years ago
|
||
Comment on attachment 8397527 [details] [diff] [review] patch, avoid calling CTCopyFontTable on 10.6 Review of attachment 8397527 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxMacPlatformFontList.mm @@ +1079,5 @@ > + } > + > + if (nsCocoaFeatures::OnLionOrLater()) { > + fontTable = CTFontCopyTable(aFontRef, aTableTag, > + kCTFontTableOptionNoOptions); I'm a bit concerned at the use of CTFontCopyTable with kCTFontTableOptionNoOptions (originally introduced in bug 962440, just being moved here). This implies that we may get tables containing "synthetic" data that Core Text chooses to construct, rather than what's really in the font file. We've had problems in the past due to this, where (for example) Apple's code synthesizes a fake 'cmap' table because they know that Cocoa or Core Text will apply certain (de)compositions or fallbacks; but then when we use that data via harfbuzz, we fail. I'd be much more comfortable using CTFontCopyTable if we were passing the kCTFontTableOptionExcludeSynthetic constant. Though the Core Text docs say this is "deprecated in OS X v10.8", which worries me ... does this mean Apple is intending to remove the ability to get "raw" font tables via CTFont altogether eventually? If so, perhaps we'd do better to stick with the lower-level CGFont APIs.
Reporter | ||
Comment 32•10 years ago
|
||
> I'll test it myself and report back.
I did some brief testing and saw no problems.
I tested both John's current patch and its turn-off-async-font-loading-on-OSX-10.6 case. In the latter case (only, and as expected) I never saw CGFontCreateFontsWithURL() or CGFontGetVariationInstance() running on anything but the main thread.
Reporter | ||
Comment 33•10 years ago
|
||
(Following up comment #30) Lukas tells me on IRC that, as we're currently in week 2 of beta for FF 29, we should have time for a couple of weeks of testing before we have to land something at the end of week 4. But as Sylvestre Ledru is point person for the FF 29 release, she'd like me to get confirmation from him. Sylvestre, what do you think? (See comment #30 above.)
Flags: needinfo?(lsblakk) → needinfo?(sledru)
Reporter | ||
Comment 34•10 years ago
|
||
Sylvestre, when you answer my needinfo, please also tell us the deadline for changes to FF 29. Rereading comment #33, it sounds like we're already uncomfortably close to this deadline, and that we don't really have time to test two different patches on the trunk (for one week each).
Reporter | ||
Comment 35•10 years ago
|
||
(Following up comment #17) I've discovered that calls to read_uint8() only happen when we're dealing with a font that has variations (that has an 'fvar' table, as returned by FPFontGetTable()). Also, the crashes that happen at libCGXType.A.dylib@0x111d happen in get_font_instances(), when a bad (and non-null) CFDataRef object returned by FPFontGetTable('fvar') is passed to CFDataGetLength(), before the calls to read_uint8(). So it really does seem to be true, after all, that this bug only happens with fonts that have variations. (By the way, the objc_msgSend in "objc_msgSend | libCGXType.A.dylib@0x111d" is spurious.) I haven't yet tried inserting arbitrary delays. I'll try that next week. Jonathan and John, do you know where I might be able to find fonts with variations that I can download and play with?
Reporter | ||
Comment 36•10 years ago
|
||
(Following up comment #35) By the way, all these calls still happen, on a secondary thread, with or without John's patch from comment #27. I strongly suspect this means the patch will make no difference to the crashes.
Reporter | ||
Comment 37•10 years ago
|
||
Another thing: John's patch with the "turn off async fonts" variant doesn't load the Skia fonts at all (and so never calls read_uint8). Don't know what's going on with that, but something may be wrong.
Reporter | ||
Comment 38•10 years ago
|
||
Based on current evidence, it's probably FPFontGetTable() (from libFontParser.dylib in the Resources directory of the ATS framework) that's not thread-safe on OS X 10.6 -- or at least which sometimes doesn't work properly when called on a secondary thread in OS X 10.6. It sometimes returns CFDataRef objects which are garbled or incomplete -- sometimes so badly garbled that CFDataGetLength() chokes on them; and sometimes less badly, so that CFDataGetLength() "works", but then read_uint8() crashes reading the CFDataRef's contents.
Assignee | ||
Comment 39•10 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #31) > Comment on attachment 8397527 [details] [diff] [review] > patch, avoid calling CTCopyFontTable on 10.6 > > Review of attachment 8397527 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/thebes/gfxMacPlatformFontList.mm > @@ +1079,5 @@ > > + } > > + > > + if (nsCocoaFeatures::OnLionOrLater()) { > > + fontTable = CTFontCopyTable(aFontRef, aTableTag, > > + kCTFontTableOptionNoOptions); > > I'm a bit concerned at the use of CTFontCopyTable with > kCTFontTableOptionNoOptions (originally introduced in bug 962440, just being > moved here). > > This implies that we may get tables containing "synthetic" data that Core > Text chooses to construct, rather than what's really in the font file. We've > had problems in the past due to this, where (for example) Apple's code > synthesizes a fake 'cmap' table because they know that Cocoa or Core Text > will apply certain (de)compositions or fallbacks; but then when we use that > data via harfbuzz, we fail. I think this is old behavior, current implementations don't appear to do the cmap swizzling you describe. I used 'cmapdata' logging to dump out the cmaps of all fonts on the system. I tested with and without this option on both 10.6 and 10.8 and in both cases the resulting cmaps are identical for all fonts on the system. To test this: 1. In about:config, enable 'gfx.font_rendering.fallback.always_use_cmaps' 2. Navigate to the executable directory and use the commands below to run Firefox: export NSPR_LOG_MODULES=cmapdata.out export NSPR_LOG_MODULES=fontinit:5,cmapdata:5 ./firefox -P profile_name This will dump out the cmaps for all fonts into the logfile. This is an deprecated option, compiling with this option results in the warning below: warning: 'kCTFontTableOptionExcludeSynthetic' is deprecated So I don't think the use of this option is really needed. My guess is that Apple realized doing this auto-swizzling was an expensive waste and problematic so it has switched to not doing this by default.
Assignee | ||
Comment 40•10 years ago
|
||
Tryserver builds: With patch to use CGFontCopyTable on 10.6: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/jdaggett@mozilla.com-32c0eb3d9297 Patch to use CTFontCopyTable with kCTFontTableOptionExcludeSynthetic on all OSX versions: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/jdaggett@mozilla.com-58a58a23a8da
Assignee | ||
Comment 41•10 years ago
|
||
(In reply to Steven Michaud from comment #37) > Another thing: > > John's patch with the "turn off async fonts" variant doesn't load the Skia > fonts at all (and so never calls read_uint8). Don't know what's going on > with that, but something may be wrong. Are you using the patch here as written? That doesn't disable the async fontloader on 10.6. You'll need to tweak the version number to do that. Using the tryserver builds and cmapdata logging described above, I verified that cmaps are successfully dumped for all system fonts, including 10 faces of Skia. Could this possibly be an artifact of your debugging technique?
Assignee | ||
Comment 42•10 years ago
|
||
(In reply to Steven Michaud from comment #38) > Based on current evidence, it's probably FPFontGetTable() (from > libFontParser.dylib in the Resources directory of the ATS framework) that's > not thread-safe on OS X 10.6 -- or at least which sometimes doesn't work > properly when called on a secondary thread in OS X 10.6. It sometimes > returns CFDataRef objects which are garbled or incomplete -- sometimes so > badly garbled that CFDataGetLength() chokes on them; and sometimes less > badly, so that CFDataGetLength() "works", but then read_uint8() crashes > reading the CFDataRef's contents. So you have a reproducible set of steps for which you see data corruption when run on a thread under 10.6 but when run without the async font loader you don't see that data corruption?
Assignee | ||
Comment 43•10 years ago
|
||
Pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/58eadd6aa714
Assignee | ||
Comment 44•10 years ago
|
||
I really, really hope we can avoid the need to apply this patch but here it is just in case. The async fontloader is simply caching cmap/name data on an async thread. If the async thread doesn't run the cache will be empty and the existing font data loading code is used.
Comment 45•10 years ago
|
||
Having it for beta 7 would be nice (April 10th) even if beta 8 would be acceptable too (April 14th). Will it work for you? (the first deadline is not exactly two weeks but close)
Flags: needinfo?(sledru)
Updated•10 years ago
|
Reporter | ||
Comment 46•10 years ago
|
||
(In reply to comment #41) > Are you using the patch here as written? That doesn't disable the > async fontloader on 10.6. You'll need to tweak the version number to > do that. I changed the version number from 0x1060 to 0x1070. Otherwise I left your patch as it is. > Could this possibly be an artifact of your debugging technique? *Very* unlikely. The same debugging apparatus allows the Skia fonts to be loaded (and sees them being loaded) in all other cases. I'll try your straight off "disable async font loader" patch below, and see if that makes a difference. (By the way, I'll post the interpose library I've been using in a later comment.)
Reporter | ||
Comment 47•10 years ago
|
||
(In reply to comment #42) >> Based on current evidence, it's probably FPFontGetTable() (from >> libFontParser.dylib in the Resources directory of the ATS >> framework) that's not thread-safe on OS X 10.6 -- or at least which >> sometimes doesn't work properly when called on a secondary thread >> in OS X 10.6. It sometimes returns CFDataRef objects which are >> garbled or incomplete -- sometimes so badly garbled that >> CFDataGetLength() chokes on them; and sometimes less badly, so that >> CFDataGetLength() "works", but then read_uint8() crashes reading >> the CFDataRef's contents. > > So you have a reproducible set of steps for which you see data > corruption when run on a thread under 10.6 but when run without the > async font loader you don't see that data corruption? No. I have an interpose library that hooks and logs the calls to the relevant methods. I'm also using a disassembler to look at the machine code of the relevant frameworks and dylibs. And of course I'm looking at the crash stacks for this bug. (The disassembler is Hopper Disassembler http://www.hopperapp.com/. This is an excellent program, which I've found extremely useful. It's particularly good at tracing cross-references -- which other functions call a given function, or reference a particular string in the DATA segment, and so forth. Unfortunately not open source, though.)
Reporter | ||
Comment 48•10 years ago
|
||
(In reply to comment #45) Thanks, Sylvester! We have more time than I realized. Yes, it should be enough to test John's current patch for a week and then his "disable async fontloader on OSX 10.6" patch for a week.
Reporter | ||
Updated•10 years ago
|
Keywords: leave-open
Reporter | ||
Comment 49•10 years ago
|
||
Here's the interpose library I've been using for debug logging. Instructions for use are at the top of interpose.mm. As written, this will only work on OS X 10.6.8. To use it on other versions of OS X, you'll need to change various instances of 'topBytes' in OnAddImage() as per the comments. I'll be adding to this over time. I'll post other versions later.
Reporter | ||
Comment 50•10 years ago
|
||
John, your patch from comment #44 doesn't compile. This does. But this patch also has the trouble I described in comment #37. If I run FF (with this patch) with a clean profile on OS X 10.6.8, I never see the Skia fonts loading (or any calls to read_uint8()) -- even if I wait for a full minute. With a pre-bug-962440 m-c nightly (e.g. firefox-2014-01-29-03-06-02-mozilla-central, the nightly just before it landed) I see the Skia fonts (and a bunch of others) load 5-10 seconds after the about:home page is displayed. On the main thread of course. With today's m-c nightly I see them load at the same time, on a secondary thread. I don't know that this is a serious problem. For example I don't know if this patch will prevent the Skia font (and others that are loaded after about:home is displayed) from being loaded "on demand" (when some page uses it). All I'm saying is that your patch to "turn off async font loading on OS X 10.6" doesn't fully restore the pre-bug-962440 behavior.
Attachment #8398351 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8398351 -
Attachment is obsolete: false
Assignee | ||
Comment 52•10 years ago
|
||
(In reply to Steven Michaud from comment #50) > Created attachment 8398697 [details] [diff] [review] > Disable async font loader on OS X 10.6 > > John, your patch from comment #44 doesn't compile. This does. That patch was written on top of what is now trunk code. If we decide to omit the other changes here and just disable the async font loader on 10.6, then your patch will be fine for applying to the beta branch. But I'd really like to see what sort of crashstates we get back before rushing to make a judgement here. > But this patch also has the trouble I described in comment #37. > > If I run FF (with this patch) with a clean profile on OS X 10.6.8, I never > see the Skia fonts loading (or any calls to read_uint8()) -- even if I wait > for a full minute. > > With a pre-bug-962440 m-c nightly (e.g. > firefox-2014-01-29-03-06-02-mozilla-central, the nightly just before it > landed) I see the Skia fonts (and a bunch of others) load 5-10 seconds after > the about:home page is displayed. On the main thread of course. With > today's m-c nightly I see them load at the same time, on a secondary thread. > > I don't know that this is a serious problem. For example I don't know if > this patch will prevent the Skia font (and others that are loaded after > about:home is displayed) from being loaded "on demand" (when some page uses > it). All I'm saying is that your patch to "turn off async font loading on > OS X 10.6" doesn't fully restore the pre-bug-962440 behavior. Hmmm. When you say you see the font what do you mean precisely? If you enable the logging described in comment 39, do you *not* see any Skia faces logged? I tested this logging on both 10.6 and 10.8 and in both cases I see the cmap data for Skia faces getting loaded.
Assignee | ||
Comment 53•10 years ago
|
||
Latest crash stats show crashes in the async font loader, even with previous patches applied: https://crash-stats.mozilla.com/report/index/9875138a-17f7-4010-a7b2-d292b2140401 So, *sigh* I guess the only alternative here is to disable the async font loader on 10.6. This patch disables the async font loader on 10.6 and fixes the problem that was causing other family names not to be loaded correctly when running on 10.6 (this is why Steven wasn't see the font table reads for Skia fonts). It also avoids the loading of other family names for the synthetic Osaka-Mono family. This patch is for landing on trunk. I'll write up a separate patch for the aurora/beta branches.
Attachment #8398351 -
Attachment is obsolete: true
Attachment #8398697 -
Attachment is obsolete: true
Attachment #8400417 -
Flags: review?(smichaud)
Assignee | ||
Comment 54•10 years ago
|
||
Remove the function that special cases font table loads within the async font loader. Since the loader is disabled now under 10.6, we don't need any special case code. Note: this patch is *not* needed on aurora/beta
Attachment #8400420 -
Flags: review?(smichaud)
Assignee | ||
Comment 55•10 years ago
|
||
Same as other patch but adjusted to match aurora/beta code.
Attachment #8400429 -
Flags: review?(smichaud)
Reporter | ||
Comment 56•10 years ago
|
||
Comment on attachment 8400417 [details] [diff] [review] patch, disable async fontloader on OSX 10.6 This looks fine to me. I tested it and the following patch on OS X 10.7.5 and 10.6.8, with my interpose library from comment #49. I didn't see any problems -- the Skia fonts got loaded when I expected them, and on the threads I expected (a secondary thread for 10.7.5, the main thread for 10.6.8). I'm not familiar with the code this patch changes, so I might have missed something. You should probably also ask someone else for a review -- Jonathan?
Attachment #8400417 -
Flags: review?(smichaud) → review+
Reporter | ||
Updated•10 years ago
|
Attachment #8400420 -
Flags: review?(smichaud) → review+
Reporter | ||
Comment 57•10 years ago
|
||
Comment on attachment 8400429 [details] [diff] [review] patch, disable async fontloader on OSX 10.6 (aurora/beta version) As with your patch from comment #53, you should probably also get someone else who's more familiar with the code to review this.
Attachment #8400429 -
Flags: review?(smichaud) → review+
Reporter | ||
Comment 58•10 years ago
|
||
As I said above, I'm reasonably confident these patches will work around the crashes. But I'm not certain. So let's leave the "keep-open" keyword in place, and check the crash stats in a few days.
Reporter | ||
Comment 59•10 years ago
|
||
(Following up comment #38) It'll be a *lot* of work to carry my research beyond this point, which will potentially be very time consuming. I'm also very busy with other bugs. And, as far as I know, OS X 10.6.8 users won't suffer too badly for not having async font loading. Please let me know if you think otherwise. But it'll take a very serious problem (like crashes or hangs or badly broken UI) to keep me working on this bug :-)
Assignee | ||
Updated•10 years ago
|
Attachment #8400417 -
Flags: review?(m_kato)
Assignee | ||
Updated•10 years ago
|
Attachment #8400420 -
Flags: review?(m_kato)
Assignee | ||
Updated•10 years ago
|
Attachment #8400429 -
Flags: review?(m_kato)
Assignee | ||
Comment 60•10 years ago
|
||
(In reply to Steven Michaud from comment #59) > (Following up comment #38) > > It'll be a *lot* of work to carry my research beyond this point, which will > potentially be very time consuming. I'm also very busy with other bugs. > > And, as far as I know, OS X 10.6.8 users won't suffer too badly for not > having async font loading. Yes, I think we're good. It's too bad we can't enable the async font loader on 10.6 but it's not the end of the world by any means. I agree that we should leave this open until we can confirm that crashes no longer occur on 10.6. Then upstream the aurora/beta patch.
Updated•10 years ago
|
Attachment #8400417 -
Flags: review?(m_kato) → review+
Updated•10 years ago
|
Attachment #8400420 -
Flags: review?(m_kato) → review+
Updated•10 years ago
|
Attachment #8400429 -
Flags: review?(m_kato) → review+
Assignee | ||
Comment 61•10 years ago
|
||
Pushed to inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/db80411a6185 https://hg.mozilla.org/integration/mozilla-inbound/rev/c80bb0a566bf Please leave the bug open until we can confirm that this fixes the 10.6 crashes.
Comment 62•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/db80411a6185 https://hg.mozilla.org/mozilla-central/rev/c80bb0a566bf
Comment 63•10 years ago
|
||
Steven, is this something you can test, or that you can give QA some advice on how to test? Thanks!
Flags: needinfo?(smichaud)
Comment 64•10 years ago
|
||
Steven: sorry, I just saw comment 58. So it makes more sense to check crash-stats later next week.
Assignee | ||
Comment 65•10 years ago
|
||
Comment on attachment 8400429 [details] [diff] [review] patch, disable async fontloader on OSX 10.6 (aurora/beta version) [Approval Request Comment] Bug caused by (feature/regressing bug #): async font loader (bug 962440) User impact if declined: crashes will occur for users on OSX 10.6 Testing completed (on m-c, etc.): patches landed on 4/3, no crashes on trunk since then Risk to taking this patch (and alternatives if risky): low String or IDL/UUID changes made by this patch: none This patch disables the async font loader on OSX 10.6. We don't know precisely why this crashes but it's clearly something that happens due to calls to CoreText API's from off-main threads. This is most likely an Apple problem so to avoid a large uptick in crashes for 10.6 users, it's best if we disable the async font loader. Users on 10.6 will experience the same behavior they experienced before the async font loader was landed.
Attachment #8400429 -
Flags: approval-mozilla-beta?
Attachment #8400429 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8400429 -
Flags: approval-mozilla-beta?
Attachment #8400429 -
Flags: approval-mozilla-beta+
Attachment #8400429 -
Flags: approval-mozilla-aurora?
Attachment #8400429 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 66•10 years ago
|
||
Pushed to beta: https://hg.mozilla.org/releases/mozilla-beta/rev/79c61c6f632d Pushed to aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/abb3952bbef0
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(smichaud)
Comment 67•10 years ago
|
||
In combined signatures this is way down, only 3 crashes reported in the last week, all with builds dated 2014-04-01 or older. I think it's probably safe to call this fixed. Steven?
Flags: needinfo?(smichaud)
Comment 68•10 years ago
|
||
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #67) > In combined signatures this is way down, only 3 crashes reported in the last > week, all with builds dated 2014-04-01 or older. I think it's probably safe > to call this fixed. Steven? Sorry, just to clarify, these numbers are for Firefox 31. I've not yet checked Aurora/Beta and suspect that will take a few days before we have the data to make a determination for those branches.
Reporter | ||
Comment 69•10 years ago
|
||
> I think it's probably safe to call this fixed. Steven?
Yes, I agree.
In a few days you can check the 30 branch crash stats. For the 29 branch you'll have to wait until a beta is released containing the patch for this bug. But I think the results on the 31 branch just about guarantee success on the other branches.
Flags: needinfo?(smichaud)
Comment 70•10 years ago
|
||
Verified fixed as per comment 68 and 69. Tracking to verify against Aurora and Beta in a few days.
Status: RESOLVED → VERIFIED
Updated•10 years ago
|
Target Milestone: --- → mozilla31
Assignee | ||
Comment 71•10 years ago
|
||
No crashes on trunk after 4/1, no crashes on aurora after 4/7. Yay...
Comment 72•10 years ago
|
||
Socorro shows no post-fix crashes in Aurora and Beta: https://crash-stats.mozilla.com/report/list?signature=objc_msgSend+%7C+libCGXType.A.dylib%400x111d&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A31.0a1&version=Firefox%3A30.0a2&version=Firefox%3A29.0b&hang_type=any&date=2014-04-15+11%3A00%3A00&range_value=4#tab-reports https://crash-stats.mozilla.com/report/list?signature=read_uint8&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A31.0a1&version=Firefox%3A30.0a2&version=Firefox%3A29.0b&hang_type=any&date=2014-04-15+11%3A00%3A00&range_value=4#tab-reports
You need to log in
before you can comment on or make changes to this bug.
Description
•