Closed Bug 663688 Opened 13 years ago Closed 13 years ago

[10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.7 or later (Fx 5) or on 10.6 or later (Fx 6 and above) or disable downloadable fonts on 10.7 or later (Fx 3.6) to prevent crashes on Mac OS X 10.7 Lion

Categories

(Core :: Graphics, defect)

x86_64
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox5 + .1-fixed
firefox6 + fixed
blocking1.9.2 --- .19+
status1.9.2 --- .19-fixed

People

(Reporter: scoobidiver, Assigned: jfkthame)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, relnote, verified1.9.2, Whiteboard: [sg:vector (apple)] rdar://9632502)

Crash Data

Attachments

(6 files, 5 obsolete files)

3.37 KB, patch
jtd
: review+
jfkthame
: review+
Details | Diff | Splinter Review
4.61 KB, text/plain
Details
48.62 KB, patch
jtd
: review+
Details | Diff | Splinter Review
3.22 KB, patch
jtd
: review+
Details | Diff | Splinter Review
48.90 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
2.92 KB, patch
jfkthame
: review+
jtd
: review+
christian
: approval1.9.2.19+
Details | Diff | Splinter Review
It is #53 top browser crasher in 4.0.1, #8 in 5.0b3, #69 in 6.0a2 on Mac OS X.

Some comments say:
"Firefox is randomly quitting / crashing while testing Mac OS X Lion Dev. Preview 4. It seems to transpire whenever I scroll a webpage."
"Reboot from this OS cause the crash."

Signature	libobjc.A.dylib@0x6050
UUID	bcc8a1ab-0981-438c-acf9-545c12110611
Date Processed	2011-06-11 23:03:25.514323
Uptime	3087
Last Crash	121960 seconds (1.4 days) before submission
Install Age	3087 seconds (51.5 minutes) since version was first installed.
Install Time	2011-06-12 05:10:47
Product	Firefox
Version	6.0a2
Build ID	20110611042006
Release Channel	aurora
Branch	2.2
OS	Mac OS X
OS Version	10.7.0 11A480b
CPU	amd64
CPU Info	family 6 model 23 stepping 10
Crash Reason	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash Address	0x7865547d
App Notes 	Renderers: 0x22600,0x20400GL Context? GL Context+
GL Layers? GL Layers+
WebGL? WebGL+

Frame 	Module 	Signature [Expand] 	Source
0 	libobjc.A.dylib 	libobjc.A.dylib@0x6050 	
1 	CoreFoundation 	CoreFoundation@0x8b4f 	
2 	CoreFoundation 	CoreFoundation@0xd2e1 	
3 	CoreFoundation 	CoreFoundation@0xd0cf 	
4 	CoreFoundation 	CoreFoundation@0x8c95 	
5 	CoreFoundation 	CoreFoundation@0xd2e1 	
6 	CoreFoundation 	CoreFoundation@0xd0cf 	
7 	CoreFoundation 	CoreFoundation@0x8c95 	
8 	CoreGraphics 	CoreGraphics@0xacfc1 	
9 	CoreGraphics 	CoreGraphics@0xad154 	
10 	CoreGraphics 	CoreGraphics@0xad017 	
11 	CoreFoundation 	CoreFoundation@0x8c95 	
12 	XUL 	_moz_cairo_font_face_destroy 	gfx/cairo/cairo/src/cairo-font-face.c:132
13 	XUL 	_cairo_scaled_font_fini_internal 	gfx/cairo/cairo/src/cairo-scaled-font.c:827
14 	XUL 	_moz_cairo_scaled_font_destroy 	gfx/cairo/cairo/src/cairo-scaled-font.c:1249
15 	XUL 	gfxMacFont::~gfxMacFont 	gfx/thebes/gfxMacFont.cpp:146
16 	XUL 	gfxFontCache::NotifyExpired 	gfx/thebes/gfxFont.cpp:997
17 	XUL 	nsExpirationTracker<gfxFont,3u>::TimerCallback 	r.h:210
18 	XUL 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:424
19 	XUL 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:520
20 	XUL 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
21 	XUL 	NS_ProcessPendingEvents_P 	obj-firefox/x86_64/xpcom/build/nsThreadUtils.cpp:195
22 	XUL 	nsBaseAppShell::NativeEventCallback 	widget/src/xpwidgets/nsBaseAppShell.cpp:130
23 	XUL 	nsAppShell::ProcessGeckoEvents 	widget/src/cocoa/nsAppShell.mm:422 

More reports at:
https://crash-stats.mozilla.com/report/list?signature=libobjc.A.dylib%400x6050
I'm getting a lot of this crashes everyday while using a development Lion VM with Nightly.
Summary: Crash [@ libobjc.A.dylib@0x6050 ] on Mac OS X 10.7 Lion → [10.7] Crash [@ libobjc.A.dylib@0x6050 ] on Mac OS X 10.7 Lion
Did this just start with the Dev Preview 4 or was it happening in other seeds?

One report had the adobe contribute cs5 toolbar, which I have seen cause crashes in the past. Not enough crash volume to get correlations.

(In reply to comment #1)
> I'm getting a lot of this crashes everyday while using a development Lion VM
> with Nightly.
In looking at crash stats it appears almost all the crashes are running 11A480b, which is the latest Dev Preview.
At least some of these crashes are at addresses that look like ASCII
strings.  That could mean there are buffer and/or heap overflows.  So
I'm marking this security-sensitive.

Here are some examples:

bp-5998d250-d6ef-4818-a259-3edde2110612
bp-eda096c0-0fd5-4bf5-9805-cfdbd2110609
bp-2b5c92a1-320d-4341-9ea3-98c102110613
Group: core-security
For what it's worth, here are a couple of the crash addresses
translated into ASCII/ANSI text fragments:

Hexadecimal     ASCII/ANSI

0x7865547d      xeT}
0x74787d        tx}
I suspect that any buffer/heap overflows here are Apple's.

Do we have an Apple security contact we can CC on this bug?
> For what it's worth, here are a couple of the crash addresses
> translated into ASCII/ANSI text fragments:
>
> Hexadecimal     ASCII/ANSI
>
> 0x7865547d      xeT}
> 0x74787d        tx}

If these addresses were read in little-endian format (quite likely
since all the machines that have these crashes are Intel boxes), they
would translate into strings as follows:

Hexadecimal     ASCII/ANSI

0x7865547d      {Tex
0x74787d        {xt

When I tried grepping on "{Tex" in the /Library directory (where the
fonts are) I found no matches.  Though I did find matches under
/System/Library, including in *.tcl, *.py and *.pm files.

But what I was really hoping for is that "{Tex" and "{xt" would be
possible components of font files.  Aren't some fonts (like Postscript
fonts) stored in "text" format?  Would either of these string
fragments be possible in a Postscript font (or some other hypothetical
text-format font)?
Might be worth running under valgrind to see if it flags any memory issues (which could be happening even if it doesn't necessarily crash every time).

Could someone who's running 10.7-preview (I'm not) file a bug with Apple?
(In reply to comment #8)
> Might be worth running under valgrind to see if it flags any memory issues

In particular, that might clarify whether the problem lies in Gecko code or in Apple's libraries. A crash deep in CoreFoundation could still be caused by a memory management error on our side somewhere earlier.
If anyone has a 10.7 build and can reproduce this bug, it would be very helpful to run with crashwrangler enabled.  This will give us better info about whether it's an exploitable crash or not.
> Could someone who's running 10.7-preview (I'm not) file a bug with
> Apple?

There really isn't (yet) enough information to put in a formal bug
report.  For now, I think the best we can do is ask someone from Apple
to follow our discussion here.  Welcome, bbrack@mac.com :-)

Running FF in valgrind sounds like a good idea.  I do have a copy of
Lion DP4, on which I hope and assume I'll be able to compile valgrind
and run it.

We don't yet have enough information to figure out how to reproduce
these crashes.  But we may get lucky -- for example one of us may
start seeing them.  If any of you do, please try to figure out how to
reproduce it, and (whether or not you can repro) please tell us as
much as you can about it here.
(Following up comment #7)

I grepped for "{Tex" in /usr/share/fonts/ on my Linux box (which I
know contains some Postscript fonts), and found no matches.  So (for
now at least) I think fonts are the wrong place to look for this
string fragment.

I also grepped /usr/lib/ and /System/Library (again), and found no
likely suspects (no libraries that appear likely to be involved with
these crashes).

But I did find one match in XUL -- in the (OS-generated) string that
specifies the fields of an nsChildView object that's a member of a
ChildView object.  Which makes it seem a bit more likely that this
might be a Mozilla bug (and not an Apple bug).
> Running FF in valgrind sounds like a good idea.  I do have a copy of
> Lion DP4, on which I hope and assume I'll be able to compile
> valgrind and run it.

Looks like valgrind won't run on OS X 10.7 :-(

When trying to compile either the current release (3.6.1) or current
trunk code (pulled via svn), I get the following error when running
'configure':

checking for the kernel version... unsupported (11.0.0)
configure: error: Valgrind works on Darwin 9.x and 10.x (Mac OS X 10.5 and 10.6)

Nick:  Is there any way around this, or am I just out of luck?
The top two levels of many of these stacks (if not all of them) are:

0   libobjc.A.dylib   libobjc.A.dylib@0x6050
1   CoreFoundation    CoreFoundation@0x8b4f

CoreFoundation@0x8b4f is somewhere inside CFRelease().  And
libobjc.A.dylib@0x6050 is in what I think is a vtable used by all
Objective-C objects (one quarter of the way between
objc_msgSend_vtable14 and objc_msgSend_vtable15).  (Gdb can be made to
break on objc_msgSend_vtable14+16, and when it does so it's always
from a call to CFRelease()).

So we appear to be dealing with an attempt to call CFRelease() on a
deleted Objective-C object.  And at least some of the time the pointer
to the (former) Objective-C object now points to a chunk of RAM, part
of which contains a Objective-C-compiler-generated string (one that
specifies the fields of a (non-Objective-C) nsChildView object from
the point of view of Objective-C code).

I'm not (yet) sure what all this means, but it's another piece of the
puzzle.
By the way, there's an interesting article at
http://cocoasamurai.blogspot.com/2010/01/understanding-objective-c-runtime.html,
part of which explains how the vtable at objc_msgSend_vtable0 through
objc_msgSend_vtable15 works.  Apparently it's a holder for pointers to
the most commonly used methods of any Objective-C object.  Its
contents can change over time (as new methods start getting called).
If my analysis from comment #14 is correct, we probably *don't* have a
buffer/heap overflow here.  But I'm going to keep this bug
security-sensitive until I (and others) have had a chance to do more
research.
The crash occurs where text runs are cleaned out so my guess is that this bug isn't specific to any one site but is rather a function of running through a set of pages.  I would suggest just setting up some form of automated page loading procedure that runs through a set of pages.  My guess is doing that with even a random sample of pages will probably trigger this crash.
Chofmann, could you find and post a list of URLs that this bug's
crashes have most commonly occurred on?  Someone could feed these to
John Daggett's iteratepages test from comment #18.

Thanks in advance!
I've seen several crashes-on-quit on OS X 10.7 (particularly on DP4),
which I almost never see on other versions of OS X.  These also come
from trying to access deleted objects.

I'm not sure why we're seeing more accessing-deleted-objects crashes
on 10.7 -- whether it's bugs we've had all along but can no longer get
away with so easily, or whether it's really problems with 10.7.
Its not blocking FF5 release at this point, but it would be good to keep tracking this down ahead of June 21.
(In reply to comment #13)
> 
> Looks like valgrind won't run on OS X 10.7 :-(
> 
> Nick:  Is there any way around this, or am I just out of luck?

I would start by modifying Valgrind's configure.in to allow 10.7 -- just search for DARWIN_10_6 and copy the checking code appropriately... I guess Mac OS 10.7 is Darwin 11.x?

Then just try running it to see what happens.  But it may well crash;  Apple tends to make reasonably large changes between major OS releases, IIRC Valgrind required some non-trivial effort to with 10.6.
Going to assume this is an apple bug rather than a security bug in our code at this point, but if you trace the problem and it's ours please clear the [sg:vector] from the whiteboard so the security team can look at it again.
Whiteboard: [sg:vector (apple)]
I've gone back and forth on this a couple of times (from thinking it's
an Apple bug to thinking it's our bug).  Now I'm pretty sure it's our
bug.  But I'm not going to say anything more until I'm *really* sure.

That said, I hope to have a patch for this tomorrow :-)

And no, I don't think this has serious security consequences --
whether it's Apple's bug or ours.  But once again I'm going to leave
things the way they are until I've pinned things down better.
I was overly optimistic when I said I might have a patch ready today.
But I've made some progress.

First, I now have a way to reproduce these crashes -- John Daggett's
iteratepages test from comment #18 (either in its original form or
with chofmann's URLs from comment #22 and comment #23 substituted).
Though there's a catch -- it also reproduces *many* other, seemingly
unrelated crashes.

Second, I've found a way to stop these crashes from happening on 10.7,
plus all the other seemingly unrelated crashes also reproducible using
interatepages, plus the crashes-on-quit I've found so easy to
reproduce on 10.7 (STR in a later comment).  But I'm not satisfied
with it, because it would presumably leak every font created in
gfxMacPlatformFontList::MakePlatformFont()):

 class MacOSUserFontData : public gfxUserFontData {
 public:
     MacOSUserFontData(ATSFontContainerRef aContainerRef)
         : mContainerRef(aContainerRef)
     { }

     virtual ~MacOSUserFontData()
     {
         // deactivate font
-        if (mContainerRef)
-            ::ATSFontDeactivate(mContainerRef, NULL, kATSOptionFlagsDefault);
+        //if (mContainerRef)
+        //    ::ATSFontDeactivate(mContainerRef, NULL, kATSOptionFlagsDefault);
     }

     ATSFontContainerRef     mContainerRef;
 };

However this may turn out to be the best we can do.  In other words,
we may be forced to choose between leakings fonts in
MacOSUserFontData::~MacOSUserFontData() (on 10.7) and putting up with
a very large number of new crashes when 10.7 comes out.  I can pretty
confidently predict that all this bug's associated crashes (including
the crashes-on-quit), taken together, will quickly become the #1 Mac
topcrasher shortly after OS X 10.7 is released -- unless we work
around the problem, or Apple fixes it.

Which brings me to the subject of whose fault these crashes are.  I've
changed my mind on this three or four times already, as new evidence
came in.  And I may change my mind again.  But now the evidence seems
pretty strong that this is Apple's bug.

This bug's crashes only happen on 10.7, and iteratepages only
reproduces crashes (of any kind) on 10.7.  And as best I can tell
we're using the ATS calls correctly.  And what minor changes I've been
able to come up with over the last couple of days have been no help
with the crashes (aside, of course, from never calling
ATSFontDeactivate()).

I'll keep playing with the ATS calls, of course.  And I'm sure John
and Jonathan will now start doing so too.  But I'm not sure what we'll
be able to come up with.
About the security implications of this bug:

It's disturbing that iteratepages reproduces such a large variety of
seemingly unrelated crashes on 10.7, and that all of them go away with
my patch from comment #27.

That call to ATSFontDeactivate() could be doing almost anything, up to
and including corrupting the heap.  But it's hard to understand how
anyone could exploit this.

What do you think, Dan?
Here's STR that (for me) crashes (many different kinds) on OS X 10.7
DP4 (build 11A480b) about 50% of the time.  No, it doesn't make any
sense ... but there's a lot about this bug that doesn't make sense :-)

1) Start FF.

2) Visit http://www.mozilla.org/ in the existing tab.

   a) Scroll the page up and down.
   b) Open a context menu (e.g. by right-clicking) and close it.

3) Open another tab and visit http://www.mozilla.com/ in it.

   a) Scroll the page up and down.
   b) Open a context menu (e.g. by right-clicking) and close it.

4) Open another tab and visit http://www.microsoft.com/ in it.

   a) Scroll the page up and down.
   b) Open a context menu (e.g. by right-clicking) and close it.

5) Open another tab and visit http://www.apple.com/ in it.

   a) Scroll the page up and down.
   b) Open a context menu (e.g. by right-clicking) and close it.

6) Cmd-quit.

   I crash about 50% of the time.
Keywords: testcase-wanted
John and Jonathan:

Do you think there could be something wrong with the data fed to
gfxMacPlatformFontList::MakePlatformFont() and
ATSFontActivateFromMemory()?  Or if its not *wrong*, could ATS somehow
be choking on it (even before the call to ATSFontDeactivate())?  And
is there any way to find out?
Steven: I repeated your steps in Comment 29 using the new Dev Preview that was pushed out yesterday, 11A494A. So far after a few tries I was not able to crash using the latest trunk nightly. Earlier though I did crash in Firefox 5 B7 when I had some of URLs loaded from Comment 22 and 23 - https://crash-stats.mozilla.com/report/index/bp-3339e0fd-2be0-4843-be0c-b89e32110616 is that report.
Marcia, I'm going to install the new DP on one of my machines tomorrow, and see what happens.
I've now finished installing the new Lion DP (build 11A494a), and have
tested on it a couple of times.  As best I can tell, it does *not* fix
this bug.

Here's a URL to the altered copy of John Daggett's iteratepages that
I've been testing with (it substitutes several of chofmann's URLs from
comment #22 and comment #23):

http://people.mozilla.com/~stmichaud/bmo/iteratepages-663688.html

You sometimes need to run this test for more than five minutes before
it crashes.  But it invariably does crash on OS X 10.7 (builds 11A480b
or 11A494a).  It never crashes on OS X 10.6.7.

Even on OS X 10.7, it stops crashing if you set
gfx.downloadable_fonts.enabled to false.  I suggest we do this by
default on 10.7 -- it's probably better than just leaking the
downloaded fonts :-)

I'll post a patch that does this in my next comment.  I suggest we
land it on the trunk as soon as possible, and see what effect it has
there.
By the way, I tried turning off font sanitizing, and even preventing
changes of any kind to the downloaded fonts -- neither helped.
I've just submitted a bug report to Apple.  Here are its contents:

ATSFontDeactivate causes crashes on OS X 10.7, build 11A480b and up

Starting (I think) with Lion build 11A480b, ATSFontDeactivate() causes
crashes every time it's used from the Mozilla tree (in Firefox 4.0 and
up).  Sometimes the crashes happen right away; sometimes they're
delayed by a few seconds.  These crashes have many different
signatures, and on the face of it appear unrelated.  But they stop
happening as soon as Mozilla stops using ATSFontDeactivate().

Judging by the disparity in crash types, I think ATSFontDeactivate()
must be doing something very seriously wrong.  The results bear all
the hallmarks of heap corruption.

Since the Mozilla tree only uses ATSFontDeactivate() for downloaded
fonts, one way to stop it being used is to enter "about:config" in the
location bar and then change the following setting from "true" (its
default) to "false":

gfx.downloadable_fonts.enabled

One way to reproduce these crashes is to run the following script,
which randomly visits several different URLs, some of which use
downloadable fonts.  It can take more than five minutes to reproduce a
crash, but it always does so.

http://people.mozilla.com/~stmichaud/bmo/iteratepages-663688.html

This issue is being followed at
https://bugzilla.mozilla.org/show_bug.cgi?id=663688.  Access to this
bug is currently restricted.  But bbrack@mac.com has access to it, and
I'd be happy to add others from Apple to the CC list.
Whiteboard: [sg:vector (apple)] → [sg:vector (apple)] rdar://9632502
I crashed pretty easily using http://people.mozilla.com/~stmichaud/bmo/iteratepages-663688.html and the latest nightly on my 11A494a test machine in the QA lab. So if this is pushed to the trunk we should be able to verify whether the crashes go away.
Assignee: nobody → smichaud
Status: NEW → ASSIGNED
Attachment #540164 - Flags: review?(jfkthame)
Attachment #540164 - Flags: review?(jdaggett)
If this isn't fixed by the 10.7 release (and I don't really expect that it will be), we'll need some kind of release note explaining why we've disabled downloadable fonts on Lion.
Keywords: relnote
(In reply to comment #36)
> I crashed pretty easily using
> http://people.mozilla.com/~stmichaud/bmo/iteratepages-663688.html and the
> latest nightly on my 11A494a test machine in the QA lab. So if this is
> pushed to the trunk we should be able to verify whether the crashes go away.

You don't need the workaround to test this, just disable downloadable fonts with the existing pref, 'gfx.downloadable_font.enabled'.  The patch adds an additional, Lion-only pref for this, so that downloadable fonts are enabled on 10.6 but not on 10.7 by default.
(In reply to comment #35)
> I've just submitted a bug report to Apple.  Here are its contents:

What's the Radar number on this?
Comment on attachment 540164 [details] [diff] [review]
Workaround, hopefully temporary

Patch seems fine but it's obviously not idea.  I know WebKit switched from using ATS to using a CGData interface in 10.6 and above:

http://trac.webkit.org/browser/trunk/Source/WebCore/platform/graphics/mac/FontCustomPlatformData.cpp#L122

Let me investigate today how simple/hard it would be to switch to using the CGData interface instead.  If it's not a big impact patch, I think it would probably better for us to use CGData, as ATS is a bit of an orphan child.  There's always going to be a problem of hitting these sorts of issues with ATS because it's not the main branch that is getting tested most.
> What's the Radar number on this?

rdar://9632502 (from the whiteboard)
(In reply to comment #39)

>> I crashed pretty easily using
>> http://people.mozilla.com/~stmichaud/bmo/iteratepages-663688.html
>> and the latest nightly on my 11A494a test machine in the QA
>> lab. So if this is pushed to the trunk we should be able to
>> verify whether the crashes go away.
>
> You don't need the workaround to test this ...

The reason both Marcia and I would like to see this patch on the
trunk quickly is to double-check that it really does fix this
bug's crash (by looking at the trunk crash stats).

I hope you do find a quick and easy way to use CGData instead of
ATS.  But whatever you find will require at least some time for
testing, and we don't have a lot of time before the 10.7 release.
As I said, I'd like to get my patch onto the trunk as soon as
possible, to test it.  I hope it doesn't need to be included in a
release, but if it has to it'll need a relnote.

It's almost certainly too late to get the patch into FF 5.  I'm aiming
for FF 6.

When 10.7 comes out and a lot of FF users start crashing on it, we can
tell them to set 'gfx.downloadable_font.enabled' to 'false'.  When FF
6 comes out we can tell 10.7 users to upgrade to that.

We've got *some* time (though not a whole lot) before the FF 6
release.  With luck that'll be enough time to switch from ATS to
CGData for downloadable fonts (and we can backout my patch from the
trunk).  If not we can land my patch on aurora (which will become FF
6).
Comment on attachment 540164 [details] [diff] [review]
Workaround, hopefully temporary

Ok, as long as we're agreed that this is just a temporary workaround until we can figure out how to do the right workaround and/or Apple fixes the underlying problem they introduced.
Attachment #540164 - Flags: review?(jdaggett) → review+
Comment on attachment 540164 [details] [diff] [review]
Workaround, hopefully temporary

Review of attachment 540164 [details] [diff] [review]:
-----------------------------------------------------------------

This sucks greatly, but I suppose we'd better do it for now. :(
Attachment #540164 - Flags: review?(jfkthame) → review+
On the latest Lion DP (Build 11A494a) these crashes appear to happen [@ libobjc.A.dylib@0xb390 ].
Crash Signature: [@ libobjc.A.dylib@0x6050 ] → [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ]
Summary: [10.7] Crash [@ libobjc.A.dylib@0x6050 ] on Mac OS X 10.7 Lion → [10.7] Crash [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] on Mac OS X 10.7 Lion
Comment on attachment 540164 [details] [diff] [review]
Workaround, hopefully temporary

Landed on mozilla-central:
http://hg.mozilla.org/mozilla-central/rev/c4b84b05c46c

I'll hold off a few days before marking this fixed, to see what effect this patch has on our trunk crash stats.
(Following up comment #47)

> On the latest Lion DP (Build 11A494a) these crashes appear to happen
> [@ libobjc.A.dylib@0xb390 ].

A few of these crashes' crash addresses also look like ASCII strings.
Among them is our old friend "{Tex":

bp-ed0e24b3-75c7-46df-8751-0ecf32110620
For what it's worth, here's some information that may help Apple fix
this bug.

1) I've attached a stack made in gdb using libgmalloc with
   MALLOC_FILL_SPACE == 1.  Under these conditions,
   iteratepages-663688 always crashes under ATSFontDeactivate() in
   exactly the same place, the first time ATSFontDeactivate() is
   called.

   (This is what first suggested to me that this bug is about
   ATSFontDeactivate(), and prompted me to comment out the call to
   it.)

2) When	running	in gdb without libgmalloc, there are usually several
   calls to ATSFontDeactivate() before you get a crash.

   Here	are the	results	of using x/s$rdi a couple of times on what
   appears to be exactly the same call to strlen() on which we always
   crash when using libgmalloc (with MALLOC_FILL_SPACE):

   "/System/Library/Frameworks/ApplicationServices.framework/Frameworks/\001"
   "/System/Library/Frameworks/ApplicationServices.framework/Frameworks/\002"

   Looks like the string passed	to this	strlen() call hasn't been
   properly terminated.
(In reply to comment #50)
> 1) I've attached a stack made in gdb using libgmalloc with
>    MALLOC_FILL_SPACE == 1.  Under these conditions,
>    iteratepages-663688 always crashes under ATSFontDeactivate() in
>    exactly the same place, the first time ATSFontDeactivate() is
>    called.

Have you done some simple logging of activate/deactivate calls here to verify that we are indeed not double-free'ing, or ignoring errors on the activate?  It might also be worth writing a small sample program that just activates and deactivates fonts iteratively to see if we can reproduce the crash in a smaller piece of code.
> Have you done some simple logging of activate/deactivate calls here to
> verify that we are indeed not double-free'ing, or ignoring errors on
> the activate?

If we were double-freeing, libgmalloc should have caught it.

And judging from the code around our call to
ATSFontActivateFromMemory() in
gfxMacPlatformFontList::MakePlatformFont(), we don't appear to be
ignoring any errors on activate.

> It might also be worth writing a small sample program that just
> activates and deactivates fonts iteratively to see if we can
> reproduce the crash in a smaller piece of code.

Yes, it might :-)

But I think my evidence from comment #50 makes it almost undeniable
that this is Apple's bug, and only Apple's bug.
> Have you done some simple logging of activate/deactivate calls here
> to verify that we are indeed not double-free'ing

Oops, I misunderstood you.  You're asking if I've checked that we
aren't calling ATSFontDeactivate more than once on the same font.

The answer is yes I've checked, and no we aren't.  (I just
double-checked again to be sure.)
This is the beginnings of a patch to try and replace our usage of the ATS font APIs with CoreGraphics/CoreText APIs instead.

Currently, this seems to run fine during normal browsing operations, including the use of both local and @font-face fonts; however, I get a shutdown crash (in my debug build):

0   libobjc.A.dylib 0x00007fff81eb6120 objc_msgSend + 44
1   XUL             0x00000001028f14c1 _cairo_quartz_font_face_destroy + 33 (cairo-quartz-font.c:238)
2   XUL             0x000000010289d094 _moz_cairo_font_face_destroy + 149 (cairo-font-face.c:156)
3   XUL             0x00000001028ccf38 _cairo_scaled_font_fini_internal + 83 (cairo-scaled-font.c:838)
4   XUL             0x00000001028cd048 _cairo_scaled_font_fini + 21 (cairo-scaled-font.c:872)
5   XUL             0x00000001028cc1d1 _cairo_scaled_font_map_destroy + 199 (cairo-scaled-font.c:428)
6   XUL             0x000000010289c2c4 _moz_cairo_debug_reset_static_data + 9 (cairo-debug.c:66)
7   XUL             0x00000001027a3f21 gfxPlatform::~gfxPlatform() + 31 (gfxPlatform.cpp:387)
8   XUL             0x00000001027c281a gfxPlatformMac::~gfxPlatformMac() + 40 (gfxPlatformMac.cpp:115)
9   XUL             0x00000001027a3773 gfxPlatform::Shutdown() + 331 (gfxPlatform.cpp:365)
...

This appears to imply that font object lifetimes are not being handled correctly. I'll try to investigate that further.

Also, this is completely untested for non-OpenType/TrueType fonts (i.e. "classic" Type-1 or bitmap fonts).

Steven, if you're able to try this on 10.7, it would be interesting to know whether it's sufficient to resolve the non-shutdown crashes you see there.
(In reply to comment #13)
> Looks like valgrind won't run on OS X 10.7 :-(
> Is there any way around this, or am I just out of luck?

I spent Friday afternoon looking at this.  I got V part way through a
Fx startup before it asserted.  There's clearly stuff that needs to be
fixed in order to make it work on 10.7.  Switched back to doing other
stuff for the time being.  Yell if you want/need me to increase the
priority of fixing it.
(In reply to comment #54)

> This appears to imply that font object lifetimes are not being
> handled correctly. I'll try to investigate that further.

You might want to try debugging in gdb with libgmalloc and
MALLOC_FILL_SPACE.  I've recently had good luck with it :-)

> Steven, if you're able to try this on 10.7, it would be interesting
> to know whether it's sufficient to resolve the non-shutdown crashes
> you see there.

I'll try it and post my results here.
(In reply to comment #55)

Thanks for whatever work you can do.

> Yell if you want/need me to increase the priority of fixing it.

The demand for Valgrind on OS X 10.7 will go up after 10.7 is
released.  But (as best I can tell) we no longer need it for this bug.
Here's an update to the patch to eliminate use of ATS; this no longer crashes at shutdown (at least in my initial tests). Much testing still needed....
Attachment #540706 - Attachment is obsolete: true
Jonathan, I got the same crash as you on OS X 10.7 with your v1 patch
-- whether running iteratepages-663688 or on quitting.  This happened
even with downloadable fonts off.

I'll try your v2 patch on 10.7 as soon as I can finish building it.
> I'll try your v2 patch on 10.7 as soon as I can finish building it.

I don't see any crashes (on 10.7) running iteratepages-663688 or on
quitting.  This is with downloadable fonts enabled or disabled.
I pushed this to tryserver (5635f28be71b) .... looking good so far on OS X 64 (10.6), but there are a bunch of test failures on OS X (10.5).
Early 5.0 Crash data shows [@ libobjc.A.dylib@0xb390 ] at the top of the list for Mac crashes.
Crash Signature: [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] → [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ]
Crash Signature: [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] → [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] [@ libobjc.A.dylib@0x9e90 ]
Here are nine crashes [@ libobjc.A.dylib@0xb390 ] in builds that
contain my workaround from comment #48.  They're all from the latest
10.7 DP -- build 11A494a.  (There haven't been any [@
libobjc.A.dylib@0x6050 ], from the previous DP, build 11A480b.)

bp-dc329330-39cb-4c8f-85de-cd6502110622
bp-932aa120-39aa-4af1-8466-4ccd22110622
bp-bd1813fe-bab4-4d0b-9598-1842c2110622
bp-ccc0bcd1-b0a7-46cf-bc5b-e0fcf2110622
bp-546b869c-d5da-4759-a5d5-013fe2110622
bp-bf269274-d627-44e7-a04c-30d3c2110621
bp-1e3aee05-fe06-45f2-9c93-2460c2110621
bp-7ffc66ee-ac93-49bd-9e90-780e42110621
bp-1e09ce2c-58f7-47fd-beda-0a5d22110621

This is bad, and I assume it means there are additional problems with
the 11A494a build, which will require additional workarounds.

I'll see what I can come up with.

(It's always possible that some users have turned
gfx.downloadable_fonts.enabled.lion from false to true, but I doubt
this could explain so many crashes.)
(Following up comment #63)

Forgive me, I take it all back.

These are all aurora/6.0 builds, which don't yet contain my patch
(which so far is only in trunk/7.0 builds).

I breathe an enormous sigh of relief!! :-)
It seems that we can't move completely off the ATS font APIs on 10.5 (see tryserver run e639a4a639cb, which _just_ replaced the ATS-based function CTFontCreateWithPlatformFont with its CGFont-based equivalent CTFontCreateWithGraphicsFont, and resulted in dozens of test failures, and reftest logs showing garbled fonts).

So this patch replaces the use of ATS with CGFont APIs on 10.6+, but retains the existing ATS-based code path for use on 10.5. To implement this, it makes MacOSFontEntry into an abstract class with two concrete implementations, ATSFontEntry and CGFontEntry, and key points in gfxMacPlatformFontList etc test which OS version is in use and choose the appropriate class.

This passes tryserver (cb8513002cc9) on both 10.5/32-bit and 10.6/64-bit; further testing invited!
Attachment #540806 - Attachment is obsolete: true
Attachment #541387 - Flags: review?(smichaud)
Comment on attachment 541387 [details] [diff] [review]
use CGFont/CTFont APIs instead of ATS on 10.6+

This looks fine to me (though it should also be reviewed by someone
more familiar than I am with our font code).

I'll test it on 10.7 this afternoon.
Attachment #541387 - Flags: review?(smichaud) → review+
Attachment #541387 - Flags: review?(jdaggett)
(In reply to comment #67)
> I'll test it on 10.7 this afternoon.

Thanks; I plan to do some more testing with non-standard fonts, too.

Also tagging John for review.
> I'll test it on 10.7 this afternoon.

I tested Jonathan's tryserver build on 10.7 (build 11A494a), using
both John's original iteratepages
(http://people.mozilla.org/~jdaggett/memtesting/iteratepages.html) and
my variant on it
(http://people.mozilla.com/~stmichaud/bmo/iteratepages-663688.html).

I saw no problems or crashes.
(Following up comment #69)

I forgot to mention that before my tests I set gfx_downloadable_fonts.enabled.lion to true.
Comment on attachment 541387 [details] [diff] [review]
use CGFont/CTFont APIs instead of ATS on 10.6+

Switching additional r? to roc, as jdaggett is apparently travelling.
Attachment #541387 - Flags: review?(jdaggett) → review?(roc)
Updated comments in gfxMacFont::InitMetricsFromPlatform(), removed TODO note after checking that the CTFontGet* APIs are returning reasonable values. No code change from previous version. (Carrying forward r=smichaud.)
Attachment #541387 - Attachment is obsolete: true
Attachment #541387 - Flags: review?(roc)
Attachment #541652 - Flags: review?(roc)
Comment on attachment 541652 [details] [diff] [review]
use CGFont/CTFont APIs instead of ATS on 10.6+ (refreshed)

Review of attachment 541652 [details] [diff] [review]:
-----------------------------------------------------------------

John should probably still review this, but if you need to land it, you can.

::: gfx/thebes/gfxMacPlatformFontList.mm
@@ +464,3 @@
>      }
>  
> +    CFDataRef tableData = ::CGFontCopyTableForTag(fontRef, aTableTag);

Copying the data just to see if there's a table seems unfortunate. Did you consider using CGFontCopyTableTags and storing them in a hashtable or sorted array with binary search?

I suppose it probably doesn't matter.

@@ +1009,5 @@
> +    sprintf(warnBuf, "downloaded font not loaded properly, removed face for (%s)",
> +            NS_ConvertUTF16toUTF8(aProxyEntry->mFamily->Name()).get());
> +    NS_WARNING(warnBuf);
> +#endif
> +    delete newFontEntry;

Make newFontEntry an nsAutoPtr and use .forget() when you return it above. Then you won't need this delete.
Attachment #541652 - Flags: review?(roc)
Attachment #541652 - Flags: review?(jdaggett)
Attachment #541652 - Flags: review+
(In reply to comment #73)
> > +    CFDataRef tableData = ::CGFontCopyTableForTag(fontRef, aTableTag);
> 
> Copying the data just to see if there's a table seems unfortunate.

"Copy" in the name just means that we get an owning reference that we need to CFRelease when we're finished with it, not that any actual copying takes place. So I doubt that this actually copies table data internally; it's returning a CFDataRef, which is a refcounted wrapper around the block of bytes. If CoreGraphics has already loaded the font in some way, then this just increments a refcount and returns an existing CFDataRef; if it hasn't loaded it, then it may have to mmap() the file and create the CFData wrapper, but that's still a cheap operation as it doesn't have to copy the actual table.

> Did you
> consider using CGFontCopyTableTags and storing them in a hashtable or sorted
> array with binary search?

I did, but I don't think it's worth doing, assuming CGFontCopyTableForTag works according to the expected CoreFoundation pattern. In either case, the font file has to be opened or mapped in order to look at the table directory; once that cost is paid, getting references to CFData table wrappers is cheap.
Using nsAutoPtr<CGFontEntry>, as per roc's comment. Carrying forward r=smichaud,roc, and intending to land this to begin getting nightly exposure. Flagging jdaggett for additional (post-landing) review.
Attachment #541652 - Attachment is obsolete: true
Attachment #541652 - Flags: review?(jdaggett)
Attachment #541684 - Flags: review?(jdaggett)
Jonathan, I assume it's alright with you if we leave my patch in for the time being.  It only effects behavior on 10.7.  And, though the crash-stats results look good so far, I'd prefer to have at least a week of crash-stats to look at.
(In reply to comment #76)
> Jonathan, I assume it's alright with you if we leave my patch in for the
> time being.  It only effects behavior on 10.7.  And, though the crash-stats
> results look good so far, I'd prefer to have at least a week of crash-stats
> to look at.

Yes, that's fine - what I'd like to do is land the CGFont patch to get nightly exposure for the new code on 10.6, and then after a few more days we can flip the lion-specific pref to reenable downloadable fonts by default (or back out your patch).
Pushed the ATS-to-CGFont patch to m-i:

http://hg.mozilla.org/integration/mozilla-inbound/rev/f1d138ffa159

After this goes out in nightly builds, and bakes for a few days, we can consider reverting the Lion-downloadable-font pref.
Whiteboard: [sg:vector (apple)] rdar://9632502 → [sg:vector (apple)] rdar://9632502 [
Whiteboard: [sg:vector (apple)] rdar://9632502 [ → [inbound] [sg:vector (apple)] rdar://9632502
Attachment #541684 - Flags: review?(jdaggett) → review+
We believe this should fix the crash, but we're not really done with this bug until we back out Steven's workaround. Or I suppose we could close this and file a followup about enabling downloadable fonts on Lion, if you prefer that.
Jonathan, it occurs to me your patch is big enough (and potentially
dangerous enough) that it might be desirable to be able to turn it off
with a pref.  The same pref could enable or disable my workaround
patch.

In other words, if your patch was turned on, my patch's
gfx.downloadable_fonts.enabled.lion would have no effect.  But if your
patch was turned off, that setting would have effect.
> In other words, if your patch was turned on, my patch's
> gfx.downloadable_fonts.enabled.lion would have no effect.  But if
> your patch was turned off, that setting would have effect.

Or maybe if your patch was turned on,
gfx.downloadable_fonts.enabled.lion could default to true, and if your
patch was turned off that setting would default to false.

That'd avoid having a setting that might (under some circumstances)
get ignored.
(In reply to comment #81)
> Jonathan, it occurs to me your patch is big enough (and potentially
> dangerous enough) that it might be desirable to be able to turn it off
> with a pref.  The same pref could enable or disable my workaround
> patch.

I'm not sure I understand the "dangerous" part of the patch.  It actually is fairly close to existing code, it simply reworks the API's we're using.  I don't see the high risk you seem to be inferring other than that of the underlying API's.  Since these API's have all been around since 10.5 I don't see any reason to infer risk from their use.  ATS has always had some flakey behavior, if anything we avoid some of that flakiness with this change.

I think the only "risk" here is simply the generic risk of new code, the changes by themselves are not dramatically high risk.
> I don't see the high risk you seem to be inferring other than that
> of the underlying API's.

This is the risk I'm talking about -- that of the underlying APIs.
I'm not saying that risk is "high".  More that it's hard to quantify
(since we presumably don't have much experience using these new	APIs).

> Since these API's have all been around since 10.5 I don't see any
> reason to infer risk from their use.

But (as far as I know) they're new *to us*, so they may have some
quirks we're not yet aware of.

> ATS has always had some flakey behavior, if anything we avoid some
> of that flakiness with this change.

I think this is likely to be true (especially since Apple has been
trying to get people off these APIs).  But that doesn't effect my
argument.
A pref would complicate things unnecessarily, in my opinion. The CGFont/CTFont APIs are the current, supported path; the ATS APIs are obsolete and deprecated, and no longer thoroughly tested and maintained by Apple. It's analogous to the move from ATSUI to Core Text. We would drop the old code path unconditionally, if it wasn't necessary to maintain it for now in order to support older OS versions (with ATSUI, we had to keep the code around as long as we supported 10.4; in this case, it's 10.5).

(In reply to comment #84)
> > Since these API's have all been around since 10.5

True, although in 10.5 they were apparently rather flaky (see comment #65). But by 10.6 they seem to be solid.

> But (as far as I know) they're new *to us*, so they may have some
> quirks we're not yet aware of.

Sure. That's what Nightly (and then Aurora, and then Beta) testing is for, and we'll file and fix bugs as necessary to deal with them. One reason the patch prefers the new code path on 10.6 (where both versions work correctly) is that this gets us much more testing coverage for the new code, rather than keeping most of it on the old code that we want to drop as soon as we cut 10.5 support.
Whiteboard: [inbound] [sg:vector (apple)] rdar://9632502 → [sg:vector (apple)] rdar://9632502
You guys have a lot more experience with the font APIs (both old and
new) than I do.

But (in these days of faster releases) it's not unusual for a moderate
to large patch to have a pref whereby it can be turned on or off.  If
that happens, please make sure to retain my workaround and have that
pref also turn it off or on.

>> But (as far as I know) they're new *to us*, so they may have some
>> quirks we're not yet aware of.
>
> Sure. That's what Nightly (and then Aurora, and then Beta) testing
> is for, and we'll file and fix bugs as necessary to deal with them.

But we need to have *some* fix for this bug in FF 6 -- whether
Jonathan's patch or my workaround.  If we keep to the regular
schedule, Jonathan's patch won't get into a release until FF 7.
(In reply to comment #86)
> But (in these days of faster releases) it's not unusual for a moderate
> to large patch to have a pref whereby it can be turned on or off.

In the new world, every feature needs an easy off-switch or backout, and this probably qualifies.

> But we need to have *some* fix for this bug in FF 6 -- whether
> Jonathan's patch or my workaround.  If we keep to the regular
> schedule, Jonathan's patch won't get into a release until FF 7.

And that's the real problem here. It's listed as a security patch but ends up being more or less a new feature, which would not be allowed in Aurora, even less so late in the cycle. If Lion ships before FF7 does, we end up in a situation we should not have and that's a hard nut to crack. Please bring up this issue really soon with release drivers.
(In reply to comment #87)
> (In reply to comment #86)
> > But (in these days of faster releases) it's not unusual for a moderate
> > to large patch to have a pref whereby it can be turned on or off.
> 
> In the new world, every feature needs an easy off-switch or backout, and
> this probably qualifies.

The code already has an easy off-switch:

>+    static PRBool UseATSFontEntry() {
>+        return gfxPlatformMac::GetPlatform()->OSXVersion() < MAC_OS_X_VERSION_10_6_HEX;
>+    }
>+

I don't think we have any evidence that suggests we need a run time pref for this.
(In reply to comment #87)
> And that's the real problem here. It's listed as a security patch but ends
> up being more or less a new feature

I don't see how this is a "new feature" in any sense. Replacing the use of a deprecated OS X API that appears to be buggy in the upcoming OS release with equivalent code based on newer, supported APIs isn't a new feature, it is the natural evolution of our code to follow development of the OS.
Comment on attachment 541684 [details] [diff] [review]
use CGFont/CTFont APIs instead of ATS on 10.6+ (updated per review comment)

Requesting approval for Aurora. Without this patch, the only way to avoid frequent crashes on OS X 10.7 is to disable downloadable fonts, because the (deprecated) APIs we're using appear to be buggy. This moves our font code to newer, supported APIs that don't exhibit the same crashiness.

Depending how soon Apple is likely to release 10.7 (and how much we care about crashiness on pre-release OS versions), we might want to take this on Beta as well.
Attachment #541684 - Flags: approval-mozilla-beta?
Attachment #541684 - Flags: approval-mozilla-aurora?
Just to remind people:

The bottom line is that either Jonathan's patch or my workaround needs
to be included in the FF 6 release.

When OS X 10.7 comes out, the number of 10.7 users will increase
dramatically, and so will the crash rate on OS X (it could easily
double).  (Unless Apple fixes the underlying problem before the 10.7
release.  But I think that's very unlikely so close to the release
date -- which Apple has said will be "in July".)

It's quite likely FF 6 won't yet have been released when OS X 10.7
comes out.  So until FF 6 comes out we'll have to tell people to go
into about:config and set gfx.downloadable_fonts.enabled to false.
*Not* something that naive users will be able to handle easily.  And
that's only if they don't just conclude that "Firefox doesn't work on
OS X 10.7" and stop using it.

We *don't* want this situation to continue for any length of time.  So
we *must* have something better in place by the time FF 6 comes out.
(In reply to comment #89)
> I don't see how this is a "new feature" in any sense. Replacing the use of a
> deprecated OS X API that appears to be buggy in the upcoming OS release with
> equivalent code based on newer, supported APIs isn't a new feature, it is
> the natural evolution of our code to follow development of the OS.
That's only true if these newer supported APIs works as well in OS X 10.7 as in OS X 10.5 and 10.6 (no regression).

(In reply to comment #91)
> We *don't* want this situation to continue for any length of time.  So
> we *must* have something better in place by the time FF 6 comes out.
I think the release of a version 5.0.1 (maybe restricted to Mac if there is no other security patches) at the same time Mac OS X 10.7 is released is required.
The patch, only valid for this 5.0.1 release (not in Aurora or Beta), should contains something that doesn't take into account the value of gfx.downloadable_fonts.enabled for Mac OS X 10.7 but considers it as false. A dedicated bug is required.
I've heard rumors about the possibility of a 5.0.1 or 5.1 release to
deal with this bug.  It might be a very good idea -- if the FF 6
release date is sufficiently far away when OS X 10.7 is released (and
if Apple hasn't yet fixed the underlying bug).

But we can't decide whether a 5.0.1/5.1 release is really needed until
after OS X 10.7 is released (or at least until we know the release
date).  So let's hold off (for now) deciding what kind of patch it
should contain.
Comment on attachment 541684 [details] [diff] [review]
use CGFont/CTFont APIs instead of ATS on 10.6+ (updated per review comment)

We've debated this for an hour and we think the three options are all pretty awful and horrible: 1) take this scary change, 2)do nothing and crash a lot when 10.7 ships, or 3) disable downloadable fonts. 

We really really do not want to disable such a popular new web feature for the OS that will probably have the largest group of web designers in our user base. That's just a really big hit to our web platform credibility. 

We're also very afraid of taking such a large change so late in the game, sacrificing the couple of months of additional testing we'd get if it came up through the normal process. (admittedly something we can't wait for given the impending release of the new Mac OS.) 

And at the same time, we can't let our users just crash this bad.

We also have Firefox 5 to worry about but the window between the new Mac OS release and the update to Fireofox 6 might be just small enough that we'd consider the stopgap solution there if it's necessary. 

By the end of our discussion, we no longer had our full set of triage experts and so we concluded that we want to take the half measure of landing this now on Aurora with the idea that we may very well overrule ourselves in a week and ask for a backout. But if a week passes and we don't have this on the branch, it'll be nearly impossible to say yes.

So, please land this on Aurora ASAP and we'll do what we can to make a call to action to get our Mac community rallied around testing the Beta. Again, we intend to revisit this and we very may well reverse our decision.
Attachment #541684 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The other question to consider is whether we have the option of changing the "use the new code on 10.6 or higher" check to a "use the new code on 10.7 or higher" if we feel we need to.  Is this an option?  (If we want to do that, we're still probably better off getting wider testing on 10.6 right now.)
I went to check this patch in, but it doesn't apply cleanly to m-a tip.
(In reply to comment #96)
> I went to check this patch in, but it doesn't apply cleanly to m-a tip.

Ah yes - there was a one-line fix in bug 467669 part 4.1 that happened right in the midst of one of the chunks here. I've done the merge and am rebuilding locally to double-check things, then I'll push it.
Now that the ATS-to-CGFont patch has landed (and stuck), we should remove the 10.7-specific workaround from trunk.
Attachment #542753 - Flags: review?(jdaggett)
I renamed the bug summary to reflect the patch content.
If you want to use another patch to prevent these crashes (disable downloadable fonts for instance - http://hg.mozilla.org/mozilla-central/rev/c4b84b05c46c), please file a new bug.
Summary: [10.7] Crash [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] on Mac OS X 10.7 Lion → [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.6 or later to prevent crashes on Mac OS X 10.7 Lion
(In reply to comment #95)
> The other question to consider is whether we have the option of changing the
> "use the new code on 10.6 or higher" check to a "use the new code on 10.7 or
> higher" if we feel we need to.  Is this an option?

Yes, all that would require is the obvious change in

>+    static PRBool UseATSFontEntry() {
>+        return gfxPlatformMac::GetPlatform()->OSXVersion() < MAC_OS_X_VERSION_10_6_HEX;
>+    }

>  (If we want to do that,
> we're still probably better off getting wider testing on 10.6 right now.)

Indeed. We should only consider that if we see unexpected issues with the new code path on 10.6; otherwise, we're better off using the modern APIs there as well.
(In reply to comment #94)
> We also have Firefox 5 to worry about

Yes, this is a big concern. This is currently the #13 topcrash for Firefox 5.0 on OS X .... and that's just from users with preview versions of Lion. When 10.7 is actually released, we can expect it to leap dramatically up the crash-stats. :(

> but the window between the new Mac OS
> release and the update to Fireofox 6 might be just small enough that we'd
> consider the stopgap solution there if it's necessary. 

It sure would help if we knew _when_ in July they'll be releasing 10.7, but I doubt we'll get much notice, so anything we do will have to be decided and executed on short notice. Supposing the release happens around mid-July, that would leave a gap of a month before FF6, which I think is much too long to leave this problem unresolved, or to work around it by switching off downloadable fonts for those users. 

> We really really do not want to disable such a popular new web feature for the OS that
> will probably have the largest group of web designers in our user base. That's just a
> really big hit to our web platform credibility. 

Indeed; I don't think we can afford to alienate those users by switching off such a high-profile feature, and just hoping they'll come back and give us another chance after we ship 6.0.

Provided we have even a week or two of "bake time" on aurora without seeing major issues with this patch, I think we should very seriously consider pushing it to Mac users on 5.0 as a "chemspill-like" security/stability fix that can't afford to wait for the next scheduled release.
Attachment #542753 - Flags: review?(jdaggett) → review+
Removed the workaround from trunk:
http://hg.mozilla.org/mozilla-central/rev/2b591feb6a32

Marking this as FIXED, as we believe this is now resolved on trunk.

We still need to watch for any issues on Aurora, with the patch having just landed there, and consider what to do about Firefox 5.
Assignee: smichaud → jfkthame
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
A few days ago Apple made available a "gold master" DP of OS X 10.7 (build 11A511).  As expected, it doesn't fix this bug -- it's just as easy as previously to crash using iteratepages-663688.  Here are a few of my crash stacks:

bp-8a7562b6-f6f2-4190-8614-ba92a2110704
bp-a30a7d33-4ff7-45b7-9991-100172110704
bp-4f6311dc-d61f-4030-b016-558032110704
bp-490d9ef5-17d4-4d05-bc23-f7c192110704
bp-f1ea544a-d99d-48c5-8799-8aafb2110704

As before, there are many different kinds.  But this bug's particular crashes now seem to be happening at libobjc.A.dylib@0xb350.

I tested with FF 5.0.
Crash Signature: [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] [@ libobjc.A.dylib@0x9e90 ] → [@ libobjc.A.dylib@0x6050 ] [@ libobjc.A.dylib@0xb390 ] [@ libobjc.A.dylib@0x9e90 ] [@ libobjc.A.dylib@0xb350 ]
Also as expected, iteratepages-663688 does *not* crash today's mozilla-central nightly on the 10.7 gold master (build 11A511).
Patch for the 5.0 release, using the new CGFont-based code path on 10.7 only, leaving 10.6 and earlier using the ATS APIs. This is the same patch as we currently have on Aurora (6.0a2), except for changing the OS X version in gfxMacPlatformFontList::UseATSFontEntry().

(Carrying forward r+ from the trunk/aurora patches, as this is the same code.)

If drivers are not comfortable taking the patch from Aurora in 5.0.x prior to the Lion release, this version provides the same fix for 10.7 but arguably reduces the risk involved for 10.6 users by not moving them to the newer APIs.
Attachment #543813 - Flags: review+
Summary: [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.6 or later to prevent crashes on Mac OS X 10.7 Lion → [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.7 or later (Fx 5 only) or on 10.6 or later (Fx 6 and above) to prevent crashes on Mac OS X 10.7 Lion
I just ran the testpage in this bug's URL with 3.6.18 on a 10.7 OS X gold master build and it crashed within two minutes.

bp-e15f0d62-5c6f-434a-95ee-7695a2110706
I ran the test on OS X 10.6.8 with Firefox 3.6.18 and I have not had a crash after about ten minutes so this looks to be 10.7 only.

Chemspill!
blocking1.9.2: --- → ?
Summary: [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.7 or later (Fx 5 only) or on 10.6 or later (Fx 6 and above) to prevent crashes on Mac OS X 10.7 Lion → [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.7 or later (Fx 3.6 or Fx 5) or on 10.6 or later (Fx 6 and above) to prevent crashes on Mac OS X 10.7 Lion
For what it's worth, Chrome uses ATS on 10.6 and also presumably on 10.7 (WebKit only has a build time choice between CGFont and ATS). Does the test case fail there?
(In reply to comment #110)

Chrome doesn't crash with iteratepages-663688 (this bug's testcase).  I'll grab Chromium's latest source and try to figure out why.  It's possible Google has already worked around this bug.

Does anyone know how closely the Chromium source code reflects what's actually in Google Chrome?
(In reply to comment #111)
> (In reply to comment #110)
> 
> Chrome doesn't crash with iteratepages-663688 (this bug's testcase).  I'll
> grab Chromium's latest source and try to figure out why.  It's possible
> Google has already worked around this bug.
> 
> Does anyone know how closely the Chromium source code reflects what's
> actually in Google Chrome?

It's usually pretty similar. You can also just build Chromium and test if it fails.
I haven't yet downloaded the latest Chromium code or built it, but I've already found out why the current version of Google Chrome (12.0.742.112) doesn't crash on OS X 10.7 -- it never calls ATSFontDeactivate().

I can think of a number of possible explanations for this, which I'll list in the order of decreasing likelihood:

1) Google Chrome no longer uses ATS on OS X 10.7.
2) Google Chrome has disabled downloadable fonts on OS X 10.7.
3) Google Chrome is using ATS but just leaking the fonts.

I tested on what's supposed to be the GM (build 11A511).  I tested by using gdb to attach to all three of Chrome's running processes ("Google Chrome" and two instances of "Google Chrome Helper") and breaking on ATSFontDeactivate.
I've now tested with Google Chrome on OS X 10.6.8 and 10.5.8 (in addition to 10.7), and found that it still uses ATS on all three platforms, but also leaks ATS fonts on all three platforms :-)

On none of these platforms does Chrome ever call ATSFontDeactivate() (testing with this bug's testcase -- iteratepages-663688).  But on all of them it does call ATSFontActivateFromMemory().

If at some point we have trouble with the "new" (new to us) CTFont APIs, we might want to consider doing the same thing ... though of course only as a last resort.  And someone (not me) would need to concoct a test to find out what the consequences would be.
> But on all of them it does call ATSFontActivateFromMemory().

It does this from the instance of "Google Chrome Helper" for rendering.

And by the way, Chrome doesn't even call ATSFontDeactivate() on quitting.
Attachment #544672 - Flags: review?(jfkthame)
Comment on attachment 544674 [details] [diff] [review]
attachment 540164 [details] [diff] [review] merged to 1.9.2 branch

oops, misunderstood which of us was uploading the patch. Use Christian's
Attachment #544674 - Attachment is obsolete: true
Comment on attachment 544672 [details] [diff] [review]
Temporary workaround for 1.9.2

adding r?jdagget to increase odds of a faster OK
Attachment #544672 - Flags: review?(jdaggett)
Attachment #544672 - Flags: approval1.9.2.19?
Attachment #544672 - Flags: approval1.9.2.19? → approval1.9.2.19+
1.9.2 fix was verified from a trybuild by Dan Veditz on my OS X 10.7 machine. The crashing testcase here did not crash after 10 minutes and a webfonts test page no longer downloaded fonts. Christian checked on his 10.6 box as well and fonts downloaded there (right, Christian?).
(In reply to comment #122)
> 1.9.2 fix was verified from a trybuild by Dan Veditz on my OS X 10.7
> machine. The crashing testcase here did not crash after 10 minutes and a
> webfonts test page no longer downloaded fonts. Christian checked on his 10.6
> box as well and fonts downloaded there (right, Christian?).

Yes
Attachment #544672 - Flags: review?(jdaggett) → review+
Attachment #544672 - Flags: review?(jfkthame) → review+
(Following up comment #113, comment #114 and comment #115)

I've now built and tested Chromium (using yesterday's source tarball), and found that it calls both ATSFontActivateFromMemory() and ATSFontDeactivate().  But it doesn't crash on the OS X 10.7 GM (build 11A511).

However both of Chromium's executables (Chromium and Chromium Helper) are 32-bit apps, and so far I've only been testing FF in 64-bit mode.
Verified fixed in 1.9.2.19 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.7; en-US; rv:1.9.2.19) Gecko/20110707 Firefox/3.6.19.
Keywords: verified1.9.2
I'm running 5.0.1 build 1 on 10.7 (Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:5.0.1) Gecko/20100101 Firefox/5.0.1) and the testcase here isn't crashing anymore.

On https://www.google.com/webfonts, I'm seeing downloadable fonts properly as well.
Attachment #541684 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Summary: [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.7 or later (Fx 3.6 or Fx 5) or on 10.6 or later (Fx 6 and above) to prevent crashes on Mac OS X 10.7 Lion → [10.7] Use CGFont/CTFont APIs instead of ATS APIs on 10.7 or later (Fx 5) or on 10.6 or later (Fx 6 and above) or disable downloadable fonts on 10.7 or later (Fx 3.6) to prevent crashes on Mac OS X 10.7 Lion
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: