Last Comment Bug 532533 - [10.6] Crash in [@ fnt_SH_Common(fnt_LocalGraphicStateType*, long*, long*, int*) ] when loading site
: [10.6] Crash in [@ fnt_SH_Common(fnt_LocalGraphicStateType*, long*, long*, in...
Status: RESOLVED FIXED
[crashkill][3.6.x] rdar://7447018
: crash
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: 1.9.2 Branch
: x86 Mac OS X
: -- critical (vote)
: ---
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
http://www.gesteves.com/experiments/s...
Depends on:
Blocks: 429605 573449
  Show dependency treegraph
 
Reported: 2009-12-02 15:25 PST by Marcia Knous [:marcia - use ni]
Modified: 2015-10-16 11:39 PDT (History)
15 users (show)
dsicore: blocking1.9.2-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
crash-causing font from the http://www.gesteves.com/experiments/starwars.html site (139.05 KB, application/octet-stream)
2009-12-04 16:00 PST, Jonathan Kew (:jfkthame)
no flags Details
simple testcase for crash using frabk.ttf font (240 bytes, text/html)
2009-12-04 16:17 PST, Jonathan Kew (:jfkthame)
no flags Details
avoid using flaky ATS metrics function if possible (27.10 KB, patch)
2009-12-04 16:49 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
command-line program to demonstrate the ATS crash with Franklin Gothic font (85.52 KB, application/zip)
2009-12-05 08:18 PST, Jonathan Kew (:jfkthame)
no flags Details
patch v2 - clean-up and refactor code, check table sizes (31.77 KB, patch)
2009-12-11 15:41 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
differences in maxAdvance value (8.40 KB, text/plain)
2009-12-15 00:07 PST, John Daggett (:jtd)
no flags Details
differences between ATS and table-based metrics calculations (97.68 KB, text/plain)
2009-12-16 00:36 PST, John Daggett (:jtd)
no flags Details
testcase page showing metrics related problems (1.28 KB, text/html)
2009-12-16 00:39 PST, John Daggett (:jtd)
no flags Details
screenshot, side-by-side comparison of testcase with and without patch (82.82 KB, image/png)
2009-12-16 00:44 PST, John Daggett (:jtd)
no flags Details
screenshot of testcase on 10.6 without (L) and with (R) patch (54.35 KB, image/png)
2009-12-16 06:01 PST, Jonathan Kew (:jfkthame)
no flags Details
Assembly code for fnt_SH_Common() (9.90 KB, text/plain)
2009-12-16 16:24 PST, Steven Michaud [:smichaud] (Retired)
no flags Details
patch v3 - read more metrics from font, improve xheight heuristics (36.61 KB, patch)
2009-12-17 05:05 PST, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
screenshot of the test file with patch v3 (37.74 KB, image/png)
2009-12-17 05:06 PST, Jonathan Kew (:jfkthame)
no flags Details
Patch that hooks Apple C/C++ functions (13.96 KB, patch)
2009-12-22 14:18 PST, Steven Michaud [:smichaud] (Retired)
no flags Details | Diff | Splinter Review
patch v4 - read metrics from font tables where possible - refreshed for current trunk (37.87 KB, patch)
2010-03-17 10:49 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
patch v4.1 - read metrics from font tables where possible - refreshed for current trunk (37.36 KB, patch)
2010-03-17 13:29 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
patch v4.2 - read metrics from font tables to avoid ATSGetFontHorizontalMetrics - refreshed again for current trunk (37.35 KB, patch)
2010-04-07 04:44 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
patch, v4.3 - read metrics from font tables to avoid ATSGetFontHorizontalMetrics - refreshed again for current trunk (37.24 KB, patch)
2010-05-24 14:03 PDT, Jonathan Kew (:jfkthame)
jd.bugzilla: review-
Details | Diff | Splinter Review
font metric differences, existing code vs. patch (1.29 MB, text/plain)
2010-05-31 00:03 PDT, John Daggett (:jtd)
no flags Details
screenshots showing differences in subscript positioning (178.89 KB, image/png)
2010-05-31 10:17 PDT, Jonathan Kew (:jfkthame)
no flags Details
patch, v5 - avoid calling ATSFontGetHorizontalMetrics for tt/ot fonts - revised for new trunk code (after harfbuzz landing) (35.87 KB, patch)
2010-06-23 04:57 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
testcase, use hacked fonts to get to ATS path and crash (508.90 KB, application/zip)
2010-06-23 19:06 PDT, John Daggett (:jtd)
no flags Details
patch, v5.1 - never fall back to ATS for sfnt fonts, only keep this for old bitmap/type1 fonts (37.42 KB, patch)
2010-06-24 03:02 PDT, Jonathan Kew (:jfkthame)
no flags Details | Diff | Splinter Review
patch, v5.2 - never fall back to ATS for sfnt fonts, only keep this for old bitmap/type1 fonts (un-bitrotted again) (37.27 KB, patch)
2010-06-28 16:23 PDT, Jonathan Kew (:jfkthame)
jd.bugzilla: review+
Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2009-12-02 15:25:39 PST
Seen while running Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b5pre) Gecko/20091202 Namoroka/3.6b5pre

STR:
1. Load the site in the URL.
2. Crash 100%

This is #20 crash for Beta 4 on Mac and seems to only occur using 10.6. http://crash-stats.mozilla.com/report/index/9b625b91-9329-41c5-ae28-61dd02091202 is my report ID.
Comment 1 timeless 2009-12-02 16:22:03 PST
Signature	fnt_SH_Common(fnt_LocalGraphicStateType*, long*, long*, int*)
UUID	9b625b91-9329-41c5-ae28-61dd02091202
Time 	2009-12-02 15:17:47.582680
Uptime	4
Last Crash	19 seconds before submission
Product	Firefox
Version	3.6b5pre
Build ID	20091202033803
Branch	1.9.2
OS	Mac OS X
OS Version	10.6.2 10C540
CPU	x86
CPU Info	GenuineIntel family 6 model 7 stepping 6
Crash Reason	EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash Address	0x4

fnt_SH_Common	
fnt_SHP	
fnt_InnerExecute	
fnt_CALL	
fnt_InnerExecute	
fnt_Execute	
RunPreProgram	
CreateGlyphElement	
CreateScalerGlyphBlock	
AssureGlyphBlock	
TTGetStrikeSpecs	
HandleOFAScalerMessage	
SendStrikeMessage	
_eOFAGetStrikeSpecs	
_eGetGlyphVectorIndex	
OldGlyphsCacheStrike	
_eGCGetStrikeMetrics	
_eATSFontGetHorizontalMetrics	
ATSFontGetHorizontalMetrics	
gfxAtsuiFont::InitMetrics	gfx/thebes/src/gfxAtsuiFonts.cpp:277 

libTrueTypeScaler.dylib  	 	6C8916A28F8598E0AAD50020C39C0FC90
Comment 2 chris hofmann 2009-12-02 20:07:36 PST
not a regression, but since we have a reproducible test case we should knock it out.

checking --- 20091201-crashdata.csv fnt_SH_Common
release total-crashes
              fnt_SH_Common crashes
                         pct.
all     230708  128     0.000554814
3.0.15  48996           0
3.5.5   120703  78      0.000646214
3.6b4   17457   26      0.00148937
3.6b3   2149    3       0.001396
3.6b2   1187            0
3.6b1   2899    2       0.000689893

all releases
   1 3.0.4
   1 3.5.1
   1 3.5.3
   2 3.5.4
  78 3.5.5
   2 3.5.6pre
   2 3.6b1
   3 3.6b3
  26 3.6b4
   5 3.6b5pre
   7 3.7a1pre

os distribution
117     0.914062        Mac OS X10.6.2 10C540
8       0.0625          Mac OS X10.6.1 10B504
3       0.0234375       Mac OS X10.6.0 10A432

other reported crash urls are
   1 http://www.colourlovers.com/color/F26A6A/Keep_Running
   1 http://powertwitter.me/browser/safari/powertwitter.user.js
   1 http://postabon.com/pc

adn 
   5 http://news.ycombinator.com
   2 http://www.google.com
   2 http://ajaxian.com
   1 https://mail.google.com
   1 http://www.mozilla.org
   1 http://www.ideaxidea.com
   1 http://www.google.co.jp
   1 http://www.facebook.com
   1 http://www.colourlovers.com
   1 http://www.baconbabble.com
   1 http://powertwitter.me
   1 http://postabon.com
   1 http://popurls.com
   1 http://my.yahoo.com
   1 http://insider.espn.go.com
   1 http://email02.secureserver.net
   1 http://colloquy.it.rit.edu
Comment 3 Damon Sicore (:damons) 2009-12-03 10:56:18 PST
Looking for an owner for this for now.  I don't think this should block, so minusing.  Still, looks like this might be font-face related and we may have a patch here shortly.
Comment 4 Jonathan Kew (:jfkthame) 2009-12-03 11:24:24 PST
The crash occurs deep inside Apple font code when we try to use one (or more) of the downloadable fonts linked on that site.

Currently trying to determine exactly what is broken in these fonts, so that we can either patch them up on the fly if there's a simple fix, or (more likely) block them during our validation.
Comment 5 Jonathan Kew (:jfkthame) 2009-12-03 12:55:26 PST
Not looking good here. I tried simplifying one of the fonts used, to narrow down the problem area. Rebuilding the font with TTX (which sometimes helps to fix inconsistencies) made no difference. Removing kerning and device metrics also made no difference.

What did finally resolve the crash was to strip out all the hinting instructions. But there's no way we can realistically do full validation of TrueType instructions before using a font.

The other possible way forward may be to avoid the particular ATS function where the crash is happening, given that other applications (that use Cocoa Text) seem to be able to use the fonts successfully. But changing how we access basic font metrics will need to be handled with considerable care.
Comment 6 Damon Sicore (:damons) 2009-12-03 17:06:25 PST
Jonathan, in IRC you mentioned something about just not loading that font.  Is that something we can do?  Other things to mitigate the crash?
Comment 7 Jonathan Kew (:jfkthame) 2009-12-03 22:49:10 PST
I was hoping there'd be some clear flaw in the font that our pre-load validation could recognize; in that case we'd refuse to load it. But so far I don't see anything.

For trunk, using the Core Text back-end, I can avoid the crash by replacing the use of ATS function that crashes with alternate code using newer APIs. But for 3.6 (or earlier) where we use the ATSUI back-end, that's going to be harder; I'll be experimenting today to see if it's possible at all.

As a last resort, I guess we could black-list those particular fonts (by name and version, combined with the OS version we're running on). That sounds like a nightmare to maintain, though; given the variety of crash URLs mentioned in comment #2, I suspect there are a lot of "innocent" fonts out there that are hitting this platform bug.
Comment 8 timeless 2009-12-04 01:17:06 PST
fwiw, we're currently actively experimenting in the nightmare that is maintaining blacklists (extensions, plugins, dlls), so now's a good time to join the party :-)

but seriously, i'm sorry you have to worry about this, but i can't say i'm surprised.

has someone filed a radar: bug?
Comment 9 Steven Michaud [:smichaud] (Retired) 2009-12-04 08:01:29 PST
This does look like an OS bug.

Next week I'll look into the possibility of doing a bodacious hack to
work around it, along the lines of my debugging patch for bug 490929.

That is unless somebody comes up with a better solution first :-)

By the way, this isn't the only example of Apple degrading an "old"
technology (ATSUI) in its latest OS update (presumably
unintentionally).  Bug 513048 (bug 530633) is another example (there
the old technology is Carbon-based Cocoa menus).
Comment 10 Mike Beltzner [:beltzner, not reading bugmail] 2009-12-04 08:26:27 PST
re: comment 2, none of those other URLs crash for me, so I think they're misreports. Of the URLs linked in the comments, only the HTML5+CSS3 experiment at http://www.gesteves.com/experiments/starwars.html to crash.

Based on that, I don't know if this blocks. Do we know which font on that page is causing it?
Comment 11 Steven Michaud [:smichaud] (Retired) 2009-12-04 08:40:25 PST
I crash reliably with this bug's URL
(http://www.gesteves.com/experiments/starwars.html), even before
giving permission for the site to "store data on my computer for
offline use".
Comment 12 Steven Michaud [:smichaud] (Retired) 2009-12-04 08:40:59 PST
(Forgot to add)

Nothing interesting in the Console.
Comment 13 Daniel Veditz [:dveditz] 2009-12-04 12:51:02 PST
We should capture the relevant files locally (including the font) to reproduce this crash in case the site changes in the future.
Comment 14 Jonathan Kew (:jfkthame) 2009-12-04 16:00:14 PST
Created attachment 416185 [details]
crash-causing font from the http://www.gesteves.com/experiments/starwars.html site

Here's one of the fonts from this bug's site. It reliably causes the crash for me, whether used via @font-face or installed locally. It does not crash other (Cocoa) applications, as the API that's crashing is ATSFontGetHorizontalMetrics, which modern Core Text or Cocoa-based apps wouldn't be using.

I have not found a clear "smoking gun" in the font; it seems to be a well-formed TrueType file, so our up-front validation (or Font Book's) does not flag it as dangerous.
Comment 15 Steven Michaud [:smichaud] (Retired) 2009-12-04 16:07:00 PST
Jonathan:

If it's not too much trouble, could you also write a minimal testcase (presumably an html file) that loads the font and triggers crashes?

If need be, it could be part of a package (including the font file) that each of us could put on our own server to test with (e.g. the Apache server that's bundled with OS X).
Comment 16 Jonathan Kew (:jfkthame) 2009-12-04 16:17:00 PST
Created attachment 416193 [details]
simple testcase for crash using frabk.ttf font

The attached testcase crashes reliably for me; just put the frabk.ttf font file in the same folder and it'll be loaded by the @font-face rule.
Comment 17 Steven Michaud [:smichaud] (Retired) 2009-12-04 16:38:47 PST
I can't get it to crash, whether I serve it from a server or try to
open it locally.

Here's what I did:

1) Put both files (frabk.ttf and fratest.html) in the same directory,
   either on my web server or locally.

2) Load fratest.html either from the server
   (e.g. http://server.mynet.net/misc/fratest.html) or locally
   (e.g. file:///Volumes/Kilroy/Users/smichaud/misc/fratest.html).

What am I doing wrong?

I didn't edit fratest.html.
Comment 18 Jonathan Kew (:jfkthame) 2009-12-04 16:44:36 PST
(In reply to comment #17)
> I can't get it to crash, whether I serve it from a server or try to
> open it locally.

That's odd. Is it using the Franklin Gothic font, or are you seeing the fallback (serif)?

This is presumably with the same OS and browser version that crashes for you on the original URL?
Comment 19 Jonathan Kew (:jfkthame) 2009-12-04 16:49:05 PST
Created attachment 416202 [details] [diff] [review]
avoid using flaky ATS metrics function if possible

This patch attempts to avoid the crash with Franklin Gothic on OS X 10.6 by avoiding the use of the flaky ATS function where we're encountering the crash. Instead, it reads the font tables directly to initialize our metrics record. With this, I no longer get crashes with the local testcase or the original URL.

I have not yet done a thorough comparison of the resulting metrics between this patch and the original code; it's possible that we will see some minor variations in layout as a result, e.g. due to metrics scaling and hinting effects.

(This is for trunk; if we decide to use this, we'll need to backport to the affected branches as well. There's a regrettable bunch of almost-duplicated code between the gfxAtsuiFonts.cpp and gfxCoreTextFonts.cpp files; work in progress to refactor these as part of harfbuzz integration will eliminate that duplication.)
Comment 20 Steven Michaud [:smichaud] (Retired) 2009-12-04 16:57:19 PST
(In reply to comment #18)

Sorry, I was just being dumb -- I was testing on OS X 10.5.8 :-(

I see crashes on 10.6.2, loading fratest.html either from the server
or locally.
Comment 21 Jonathan Kew (:jfkthame) 2009-12-05 08:18:41 PST
Created attachment 416266 [details]
command-line program to demonstrate the ATS crash with Franklin Gothic font

Here is a simple command-line tool that can be used to reproduce the crash outside of Firefox (attached here for reference).

To demonstrate, just compile the code (g++ -o atsFontGetHorizontalMetricsTest atsFontGetHorizontalMetricsTest.cpp -framework ApplicationServices), install the frabk.ttf font, and then run "./atsFontGetHorizontalMetricsTest FranklinGothic-Book" in Terminal on 10.6.2.

Result: segfault, with the stack we're seeing here. Result with other fonts: works fine, prints the expected metrics.

Filed rdar://7447018 to report this to Apple.
Comment 22 Mike Beltzner [:beltzner, not reading bugmail] 2009-12-06 08:59:01 PST
For the blocking decision: do we know if it's likely to be a specific problem with that one font, or something more widespread?
Comment 23 Mike Beltzner [:beltzner, not reading bugmail] 2009-12-08 09:37:41 PST
Do we have any confirmed crashes for this on any site other than the one in the URL field?
Comment 24 Jonathan Kew (:jfkthame) 2009-12-08 09:41:20 PST
As far as I've seen, it seems to be occurring with one specific font (Monotype's Franklin Gothic, frabk.ttf on that site); however, that's not a particularly "exotic" font so there's a good chance of it showing up in more places than just the one web page.

There doesn't seem to be an obvious error in the font itself that would suggest a unique problem. I'd guess there are likely to be other fonts in existence that tickle the same underlying Apple bug, even if we haven't identified them yet.

The crash also occurs if a user has the problem font (or another that happens to trigger the same problem) installed locally, and the browser attempts to use it; this could explain some of the variety of reported crash URLs where we cannot necessarily reproduce the bug.
Comment 25 Joe Drew (not getting mail) 2009-12-08 09:44:16 PST
OK, then I think we should block on this and get it knocked out.
Comment 26 Jonathan Kew (:jfkthame) 2009-12-08 09:49:49 PST
Apparently (according to a Google search), frabk.ttf has been bundled with at least some version of MS Office, so it's probably quite widespread as a locally-installed font, though it may be relatively unusual for it to be used in a web page.
Comment 27 chris hofmann 2009-12-08 12:32:11 PST
as part of topsite testing tomcat might be able to include a script from bc that scans the content for patterns like

  src: url(frabk.ttf);

to figure out what sites might using this font or others.  also talked to him about compiling a list of fonts used on various top sites that we could use for future reference.
Comment 28 Steven Michaud [:smichaud] (Retired) 2009-12-08 16:02:19 PST
I've tested Jonathan's patch (from comment #19), using his testcase
from comment #16 (loaded locally), and saw no crashes.

Here's a tryserver build made with the patch, to make it easier for
others to test.  Please do so!

https://build.mozilla.org/tryserver-builds/smichaud@pobox.com-bugzilla532533-v1.1/bugzilla532533-v1.1-macosx.dmg

(I've called this v1.1 because I had to update the original patch
slightly to make it apply to current code.)
Comment 29 Steven Michaud [:smichaud] (Retired) 2009-12-08 16:09:07 PST
There was one tryserver test failure that might not be spurious:
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1260298168.1260308118.30787.gz
Comment 30 John Daggett (:jtd) 2009-12-09 13:58:09 PST
I think the approach here is fine but since metrics has a big impact on layout, I think we really need to do some thorough testing assure that the metrics we calculate by directly accessing font tables is close to those calculated from the metrics that ATS provides.

Specifically, I think we should run through all platform 10.5/10.6 fonts and spit out differences in the metrics reported by ATS versus metrics we calculate by directly accessing font tables.  Then we can decide whether specific differences are important or not (i.e. have an impact on layout).

Within InitMetrics, I don't like the use of needToReadMetricsFromATS, I think it would be better to split out the code to initialize the metrics from font tables from the code that initializes the metrics from ATS.  So InitMetrics would look something like:

if (InitMetricsFromTables() != noErr) {
  InitMetricsFromATS();
}

- common code -

> if (!needToReadMetricsFromATS) {
> 	if (::ATSFontGetTable(mATSFont, kCTFontTableOS2, 0, 0, 0, &dataLength) == noErr) {
> 		Byte *tableData = new Byte[dataLength];
> 		::ATSFontGetTable(mATSFont, kCTFontTableOS2, 0, dataLength,
> 						  tableData, &dataLength);
>       .
>       .
>       .
> 		delete[] tableData;
> 	} else
> 		needToReadMetricsFromATS = PR_TRUE;
> }

This pattern is repeated several times, so I think it would be better to have a static helper function that reads table data into an adjustable array that's passed in:

  nsAutoTArray<PRUint8, 2048> tableData;
  
  if (ReadTableData(mATSFont, kCTFontTableHead, tableData) == noErr) {
    // pull out metrics data from the table
  }
  
  if (ReadTableData(mATSFont, kCTFontTableOS2, tableData) == noErr) {
    // pull out metrics data from the table
  }
  
The table reading code needs to deal with the version of the OS/2 table.  For example, the first version of the OS/2 table (version=0) didn't have the sxHeight field:

http://developer.apple.com/textfonts/TTRefMan/RM06/Chap6OS2.html
Comment 31 Jonathan Kew (:jfkthame) 2009-12-11 15:41:53 PST
Created attachment 417188 [details] [diff] [review]
patch v2 - clean-up and refactor code, check table sizes

This *does* affect the metrics we get. The main effect is that the x-height is calculated directly from font table data as an unhinted value, whereas ATS was giving slightly different values that I assume were affected by hinting. A typical example would be:

OLD:
Font: (LucidaGrande) size: 12.000000
    emHeight: 12.000000 emAscent: 9.600000 emDescent: 2.400000
    maxAscent: 12.000000 maxDescent: 3.000000 maxAdvance: 19.675781
    internalLeading: 3.000000 externalLeading: 0.000000
    spaceWidth: 3.796875 aveCharWidth: 7.359375 xHeight: 6.432000
    uOff: -1.171875 uSize: 1.000000 stOff: 3.216000 stSize: 1.000000 suOff: 6.432000 suSize: 6.432000

NEW:
Font: (LucidaGrande) size: 12.000000
    emHeight: 12.000000 emAscent: 9.600000 emDescent: 2.400000
    maxAscent: 12.000000 maxDescent: 3.000000 maxAdvance: 19.675781
    internalLeading: 3.000000 externalLeading: 0.000000
    spaceWidth: 3.796875 aveCharWidth: 7.359375 xHeight: 6.363281
    uOff: -1.171875 uSize: 1.000000 stOff: 3.181641 stSize: 1.000000 suOff: 6.363281 suSize: 6.363281

(Note that the change in xHeight also has a knock-on effect on the strikethrough and super/subscript positioning values.)

We already have small discrepancies between platforms and between font back-ends (there are cases where ATSUI and Core Text give us slightly different metrics); I don't think this is significant to layout. In the longer term, we might even want to move all platforms to a common interpretation of metrics (as far as possible), which can best be done using the new approach of directly reading the font tables.
Comment 32 Damon Sicore (:damons) 2009-12-14 15:32:46 PST
Ok, we've only seen 3 crashes on this in the past week.  While we should fix, I can't say we should block.

-'ing.  Please re-nom if you object.
Comment 33 chris hofmann 2009-12-14 17:21:03 PST
the number of crashes for this signature are very eratic making it hard to identify/predict exactly where it is or will be on the top crash list.

we floated along at 0-4 crashes per day during Oct. and early Nov. then spiked at over 100 crashes on Dec. 1.

0  crashes for fnt_SH_Common on  20091122-crashdata
3  crashes for fnt_SH_Common on  20091123-crashdata
1  crashes for fnt_SH_Common on  20091124-crashdata
2  crashes for fnt_SH_Common on  20091125-crashdata
1  crashes for fnt_SH_Common on  20091126-crashdata
0  crashes for fnt_SH_Common on  20091127-crashdata
1  crashes for fnt_SH_Common on  20091128-crashdata
30  crashes for fnt_SH_Common on  20091129-crashdata
26  crashes for fnt_SH_Common on  20091130-crashdata
128  crashes for fnt_SH_Common on  20091201-crashdata
23  crashes for fnt_SH_Common on  20091202-crashdata
4  crashes for fnt_SH_Common on  20091203-crashdata
35  crashes for fnt_SH_Common on  20091204-crashdata
10  crashes for fnt_SH_Common on  20091205-crashdata
6  crashes for fnt_SH_Common on  20091206-crashdata
2  crashes for fnt_SH_Common on  20091207-crashdata
2  crashes for fnt_SH_Common on  20091208-crashdata
1  crashes for fnt_SH_Common on  20091209-crashdata
5  crashes for fnt_SH_Common on  20091210-crashdata
2  crashes for fnt_SH_Common on  20091211-crashdata
1  crashes for fnt_SH_Common on  20091212-crashdata

I marked it topcrash on 2009-12-02 20:00:55 PST the day after that spike.

It might have been a real traffic spike on the starwars page, that caused the single day spike and the jump into the top crash list since the spike was before this bug was filed and started a following of testers trying out the crash on the starwars page.

I suspect this might continue to be bursty until we get a fix on our side or from apple.
Comment 34 John Daggett (:jtd) 2009-12-14 19:56:04 PST
(In reply to comment #31)
> This *does* affect the metrics we get. The main effect is that the x-height is
> calculated directly from font table data as an unhinted value, whereas ATS was
> giving slightly different values that I assume were affected by hinting. A
> typical example would be:

Why would the values from ATS be affected by hinting since those metrics are calculated without any reference to the font size?  Isn't this your comment:

// prefer to get xHeight from ATS metrics (unhinted) rather than Core Text (hinted),
// see bug 429605.

I think you may be thinking of the CoreText API which does take a size parameter when fetching the x-height.

My point is not whether or not there are differences but that this change needs to be tested as extensively as possible so that we understand where differences occur and can justify that those differences are reasonable, either because the ATS-calculated value is incorrect or we simply don't care so much about differences in a given metric.
Comment 35 John Daggett (:jtd) 2009-12-15 00:07:52 PST
Created attachment 417646 [details]
differences in maxAdvance value

This is a list of fonts on my system (10.5) that show a difference in maxAdvance values larger than 10 for a font size of 128.  Not sure what's going on here, the table calculation looks correct, not sure why ATS is amping up the values:

Columns are: PS name, maxAdvance calculated from table values, and ATS's calc

HiraKakuPro-W3 163.200000 215.039993
HiraKakuPro-W6 174.848000 229.119995
HiraKakuProN-W3 163.200000 215.039993
HiraKakuProN-W6 174.848000 229.119995
HiraKakuStd-W8 187.136000 227.968002
HiraKakuStdN-W8 187.136000 227.968002
HiraMaruPro-W4 165.632000 214.143997
HiraMaruProN-W4 165.632000 214.143997
HiraMinPro-W3 155.008000 201.727997
HiraMinPro-W6 163.712000 215.167999
HiraMinProN-W3 155.008000 201.727997
HiraMinProN-W6 163.712000 215.167999

Most of these fonts seem to be OpenType Postscript fonts.
Comment 36 Jonathan Kew (:jfkthame) 2009-12-15 04:00:47 PST
I manually checked a few OT/CFF fonts; the hhea:advanceWidthMax value was correct (corresponding to the maximum hmtx:width value), and the resulting maxAdvance calculated from the tables is also correct.

I don't know where ATSFontGetHorizontalMetrics is coming up with its inflated figure; I suspect this may well be an ATS bug. Anyway, I don't think there's any reason we need to preserve that fictitious result.
Comment 37 Jonathan Kew (:jfkthame) 2009-12-15 04:12:44 PST
(In reply to comment #34)
> (In reply to comment #31)
> > This *does* affect the metrics we get. The main effect is that the x-height is
> > calculated directly from font table data as an unhinted value, whereas ATS was
> > giving slightly different values that I assume were affected by hinting. A
> > typical example would be:
> 
> Why would the values from ATS be affected by hinting since those metrics are
> calculated without any reference to the font size?

Errr.... yes, so that's not the explanation. Ok, I don't know why ATS gives different values. It's a black box, and it has flaky innards. We're better off reading the font tables ourselves. (Besides the fact that the ATS function is crashy.)

Note that xHeight is not always available (either from the font tables or via ATS), and in that case we fall back to attempting to measure a glyph. That gives us a different (size-dependent, hinted) basis for the metrics we end up using, but we don't really have any alternative.

My position here is that yes, there are differences; the new code gives us results that are in general better (more accurate, based on the actual font data) than ATS was giving; but in any case the differences (small differences in xHeight and aveCharWidth; sometimes larger differences in maxAdvance) are not critical to our behavior.
Comment 38 John Daggett (:jtd) 2009-12-16 00:36:29 PST
Created attachment 417882 [details]
differences between ATS and table-based metrics calculations

For all the fonts on my 10.5 system, this lists the differences between ATS-calculated metrics and the font table calculations.  Only differences larger than 5% are listed.  The three values are the ATS value, the table-based value and the difference in percentage.
Comment 39 John Daggett (:jtd) 2009-12-16 00:39:09 PST
Created attachment 417883 [details]
testcase page showing metrics related problems

Based on the differences in metrics values for various fonts, I put together a page that shows some of the problems with the new metrics calculations.
Comment 40 John Daggett (:jtd) 2009-12-16 00:44:47 PST
Created attachment 417884 [details]
screenshot, side-by-side comparison of testcase with and without patch

Testcase problems:

- Null values for some metrics properties for a number of fonts

Example:

Symbol strikeoutOffset 29.916666 0.000000 -100
Symbol subscriptOffset 59.833332 0.000000 -100
Symbol superscriptOffset 59.833332 0.000000 -100
Symbol xHeight 59.833332 0.000000 -100

This also breaks font-size-adjust for these fonts but not small-caps.  Not sure why Hebrew fonts seem to be so adversely affected.

- Large variations in some metrics for a few fonts

Zapfino strikeoutOffset 38.015999 163.840000 330.976
Zapfino subscriptOffset 76.031998 327.680000 330.976
Zapfino superscriptOffset 76.031998 327.680000 330.976

- Difference in underline (this is probably better)
Comment 41 John Daggett (:jtd) 2009-12-16 01:08:09 PST
(In reply to comment #37)
> My position here is that yes, there are differences; the new code gives us
> results that are in general better (more accurate, based on the actual font
> data) than ATS was giving; but in any case the differences (small differences
> in xHeight and aveCharWidth; sometimes larger differences in maxAdvance) are
> not critical to our behavior.

I completely understand your point of view but that point of view MUST BE VALIDATED BY TESTING!!!  The attached list of differences clearly highlights a number of problems areas.  The maxAdvance being different is probably not such a big deal but xHeight and other metrics being null is.  I don't know what the heck is going on with Zapfino.

While in theory calculating our own metrics definitely gives us more control, it's not a low-risk option, especially for consideration on either the 1.9.1 or 1.9.2 branches.  This is fundamentally an Apple bug, we're not even sure that reading the metrics ourselves is going to completely eliminate the problem; the stack trace shows that the TrueType code is "running the preprogram", wouldn't this also be executed when rendering?

Another approach might be to try and localize the font data that's causing the TrueType scaler to get upset, then try and figure out a way of screening fonts for that.  No fun, but at least that approach feels like it would be much lower impact overall.
Comment 42 Jonathan Kew (:jfkthame) 2009-12-16 06:01:03 PST
Created attachment 417917 [details]
screenshot of testcase on 10.6 without (L) and with (R) patch

I am not able to fully reproduce your screenshot here on 10.6. The attached image shows the results I get. I agree that the Symbol and Hebrew results are less than ideal, though they are more reasonable than your image shows.

One possible improvement may be to read the super/subscript offsets from the font, rather than basing these on the x-height value, as our old code does. I'll experiment with that and see what effect that has.

The zero x-height is troubling (though I cannot reproduce that here). Looking in the OS/2 table in Symbol.ttf, it does indeed claim to have an x-height of zero, but our code should then fall back on measuring a glyph. I suppose we should then check that *that* doesn't give a zero result.
Comment 43 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-12-16 13:02:38 PST
It seems to me that with some clever reverse-engineering and/or delta debugging we could figure out which instruction(s) in the 'prep' program are actually triggering the crash, which might give us some insight into what's causing the bug and how to work around it. That sounds worth doing, to me. Maybe we could ask Apple for help with that in their radar bug, but I know they don't care about ATSUI users anymore...

However it does sound quite likely that analyzing Truetype bytecode and blocking stuff that might trigger the bug would turn out to be harder and riskier (both in terms of not blocking crashable fonts, and blocking valid fonts) than just computing our own metrics.

One thing that occurs to me is that if rendering is OK, could it be that there's a problem with initialization and API ordering, so that ATSFontGetHorizontalMetrics isn't setting things up properly before the preprogram executes? It might be worth trying to render some glyphs before we call ATSFontGetHorizontalMetrics, to see if that works around the crash.
Comment 44 Steven Michaud [:smichaud] (Retired) 2009-12-16 16:24:23 PST
Created attachment 418041 [details]
Assembly code for fnt_SH_Common()

This may be a red herring.  But when I dissassembled fnt_SH_Common()
(where this bug's crash happens) in gdb, it looked very peculiar.  I'm
wondering if this can have something to do with this bug's crash, and
with some of ATSFontGetHorizontalMetrics()'s peculiar behavior.

First off, it appears fnt_SH_Common() only ever uses the first of its
(supposed) parameters (on the stack):  There's only a reference to
0x8(%ebp) (the first stack parameter), near the end of the method.  We
never see 0xC(%ebp), 0x10(%ebp) or 0x14(%ebp) (which would refer to
the second, third, and fourth stack parmeters mangled into the
function's name).  Nothing terribly unusual about that, I suppose.
And it does explain the fact that those other parameters do sometimes
appear to have random, nonsensical values.  But ...

Second (and here's what's really strange), the method appears to be
sensitive to the values in the %eax, %edx and %ecx registers before
the function was called -- even though these registers aren't supposed
to be used to pass parameters.  And the contents of one of these
registers (%eax) is responsible for triggering the crash (when it
happens):

The line "mov %eax,%ebx" sets %ebx to the same value as %eax.

The line "mov (%ebx),%esi" sets %esi to what's pointed at by %ebx.
(When the crash happens, %esi gets set to '0').

The line "mov 0x4(%esi),%eax" is supposed to copy what's pointed at by
%esi+4 into %eax.  But since %esi was set to '0' above, instead it
triggers the crash (at the address 0x00000004).
Comment 45 Steven Michaud [:smichaud] (Retired) 2009-12-16 16:35:11 PST
Needless to say, fnt_SH_Common() appears never to be called in Safari
-- even when you run it in 32-bit mode.
Comment 46 Steven Michaud [:smichaud] (Retired) 2009-12-16 16:50:17 PST
(Following up comment #44)

I suppose the most logical interpretation is that the stack is only
used to pass the first parameter, and that the second, third and
fourth parameters are passed in %eax, %edx and %ecx (possibly not in
that order).

Still very strange, though, because this implies that calls to
fnt_SH_Common() set up its stack frame "by hand", using custom
assembly code.
Comment 47 Steven Michaud [:smichaud] (Retired) 2009-12-16 17:04:16 PST
(Following up comment #44)

The key to this puzzle has finally dawned on me.  The actual
declaration of fnt_SH_Common() must be:

fnt_SH_Common(fnt_LocalGraphicStateType*, register long*,
              register long*, register int*)

I've just never looked at the assembly code for such a function
before.
Comment 48 Steven Michaud [:smichaud] (Retired) 2009-12-16 18:32:34 PST
Actually it's the first parameter (fnt_LocalGraphicStateType*) that's passed in %eax.
Comment 49 Steven Michaud [:smichaud] (Retired) 2009-12-16 19:39:58 PST
(Following up comment #47)

> fnt_SH_Common(fnt_LocalGraphicStateType*, register long*,
>               register long*, register int*)

Actually the following is more likely:

__attribute__((regparm(3))) fnt_SH_Common(fnt_LocalGraphicStateType*,
                                          long*, long*, int*)

This would put the first three parameters in %eax, %edx and %ecx (in
that order), and the fourth argument would be pushed onto the stack
(and be the first stack parameter).
Comment 50 Jonathan Kew (:jfkthame) 2009-12-17 05:05:10 PST
Created attachment 418155 [details] [diff] [review]
patch v3 - read more metrics from font, improve xheight heuristics

This update uses the font's values for super- and subscript offsets, instead of fixed proportions of the xheight. This has significant effects in examples like Zapfino (the superscript 2 in e=mc^2 looks very high, but consider if it were used with a letter having an ascender) and Symbol (in particular, the subscript offset is quite small).

In individual cases, it's debatable which metrics are "better"; the answer will partly be a matter of taste, and may also depend on the particular examples you pick to look at. But I think we should respect the font's values as a general rule; if they're poor (like the strikeout position in Symbol, IMO) that's a font design problem, not a Gecko software problem.

The results with Raanana were poor because it has no OS/2 table (and therefore no explicit x-height), and our code then tries to measure the 'x' glyph. But the font also has no Latin letters, and so we end up measuring the .notdef box, which is not an appropriate glyph. To improve this, I've provided a selection of possible characters to try, and the code picks the first one that's actually present. This heuristic could be extended and refined further, of course, as we find problem cases.
Comment 51 Jonathan Kew (:jfkthame) 2009-12-17 05:06:38 PST
Created attachment 418156 [details]
screenshot of the test file with patch v3
Comment 52 Steven Michaud [:smichaud] (Retired) 2009-12-22 14:18:35 PST
Created attachment 418909 [details] [diff] [review]
Patch that hooks Apple C/C++ functions

I've actually managed to find a way to work around these crashes by
hooking (and changing) Apple code.

But my workaround isn't particularly narrowly targeted, and I don't
know what the side effects might be.  So this is just a fallback, in
case other solutions are found wanting.

A tryserver build should follow in a few hours.

Hooking C/C++ code is a lot harder than hooking Objective-C code -- I
have to guess parameter names and return values, and I can't hook
every function (only those dynamically linked from other modules).
So, for example, I'm not able to hook fnt_SH_Common().  And I haven't
been able to find out what (specifically) it is about the
FranklinGothic-Book font's 'cvt ' table that Apple's code doesn't
like.

Others who know more about font technology may be able to go further.
If anyone's interested, I can post my full debugging patch.

(By the way, this bug's URL no longer crashes for me even in unpatched
builds.  I assume they're now using another font.  My tests have been
with Jonathan's testcase from comment #16.)
Comment 53 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-12-22 14:39:05 PST
The consequences of that hooking are completely unclear.
Comment 54 Steven Michaud [:smichaud] (Retired) 2009-12-22 15:26:49 PST
My patch does work around the crash.  And (I think) it's very unlikely
to cause any other crashes:  It just prevents the 'cvt ' table from
being "seen" while a call to ATSFontGetHorizontalMetrics() is on the
stack.

It's quite true I don't know exactly what effects the patch will have
on font appearance.  But that shouldn't be too hard for someone who
knows about fonts (i.e. Jonathan or John) to find out (by means of
various tests).

But I wouldn't say the consequences of my patch are "completely
unclear".
Comment 55 Jonathan Kew (:jfkthame) 2009-12-22 15:39:56 PST
(In reply to comment #52)
> (By the way, this bug's URL no longer crashes for me even in unpatched
> builds.  I assume they're now using another font.  My tests have been
> with Jonathan's testcase from comment #16.)

The original URL now has a video instead of the live page. If you click the "try it in your browser" link, it goes to the live demo - but they've changed their CSS to point to fonts (the same fonts) hosted on a different site. Because we apply a same-origin restriction by default, we don't load the fonts. And so we don't crash any more.
Comment 56 Jonathan Kew (:jfkthame) 2009-12-22 15:52:21 PST
(In reply to comment #54)
> My patch does work around the crash.  And (I think) it's very unlikely
> to cause any other crashes:  It just prevents the 'cvt ' table from
> being "seen" while a call to ATSFontGetHorizontalMetrics() is on the
> stack.

Preventing the cvt table from loading sounds pretty dangerous to me. There will be instructions in the font program and the individual glyph hints that want to refer to values from the cvt. Depending how carefully the ATS font code is written (hmm, so far it doesn't look all that robust!), I could imagine this being a possible cause of instability in itself. I wonder if they correctly check the existence and size of the cvt table everywhere they use it? For the most part, font rasterizers seem to assume fonts will be internally consistent, and you're breaking that assumption if you (effectively) remove the cvt table without removing all instructions that reference it.

> It's quite true I don't know exactly what effects the patch will have
> on font appearance.  But that shouldn't be too hard for someone who
> knows about fonts (i.e. Jonathan or John) to find out (by means of
> various tests).

Well, we could try rendering lots of fonts, and visually compare the results - sort of "reftesting" between patched and unpatched versions. Most likely any effects would be pretty subtle, though if some of the font programs start reading garbage values where they expected to find cvt entries, I can imagine the possibility of significant distortion.

I also note that this could potentially affect every TrueType font - both local and downloaded. I'd be pretty hesitant to do that.
Comment 57 Steven Michaud [:smichaud] (Retired) 2009-12-22 16:23:41 PST
I'm not saying my patch is the best fix for this bug ... or even a
particularly good fix.  But I think it's unlikely to have catastrophic
consequences.

libTrueTypeScaler appears to use libFontParser to do all its low-level
"reading" of fonts.  The former imports a bunch of methods from the
latter.  I hooked all of them in my debugging patch, and was able to
observe their behavior.  As best I can tell there are only two methods
that can access a font table by "type name" --
TFontSurrogate::GetTableIndex(unsigned int, unsigned int*), which
returns the index and size of a font table given its "type name", and
TFontSurrogate::GetTable(unsigned int, unsigned int, unsigned int*),
which returns the actual table data (plus its size).

My current patch hooks the former, and makes it behave as if the font
in question doesn't have a 'cvt ' table.  It's actually a call to
TFontSurrogate::GetTable() that triggers (or at least just preceeds)
this bug's crashes.  But (in my tests), making the 'cvt ' table
invisible to TFontSurrogate::GetTableIndex() is enough to stop the
call to TFontSurrogate::GetTable() that triggers the crashes.

Calls to TFontSurrogate::GetTable() with the "type name" set to 'cvt '
still do happen with ATSFontGetHorizontalMetrics() on the stack, and
still do succeed (they find and return the appropriate data).  But
none of them trigger this bug's crash.

So it appears that some code in libTrueTypeScaler (including the code
that crashes) calls TFontSurrogate::GetTableIndex() first, and only
goes on to call TFontSurrogate::GetTable() if the former returned a
positive result.  But other code (which doesn't crash) calls
TFontSurrogate::GetTable() directly (without first calling
TFontSurrogate::GetTableIndex()).

It's possible to also hook TFontSurrogate::GetTable(), and to prevent
it from seeing a 'cvt ' table.  But (for reasons I won't go into for
the moment) it's much trickier (and hackier).  And (for now at least)
it doesn't seem to be necessary.

> I also note that this could potentially affect every TrueType font -
> both local and downloaded.

I believe this is correct.
Comment 58 Steven Michaud [:smichaud] (Retired) 2009-12-22 18:22:40 PST
Here's a tryserver build made with my patch from comment #52:
https://build.mozilla.org/tryserver-builds/smichaud@pobox.com-bugzilla532533-hook/bugzilla532533-hook-macosx.dmg
Comment 59 Jonathan Kew (:jfkthame) 2010-03-17 10:49:32 PDT
Created attachment 433095 [details] [diff] [review]
patch v4 - read metrics from font tables where possible - refreshed for current trunk

I've refreshed this for current trunk, and also moved as much as possible of the generic metrics initialization code from the Mac subclass into the gfxFont class. This is so that we can follow up with patches for Windows and Linux to use the same metrics initialization (for TrueType/OpenType fonts), which will give us two benefits: (1) easier to maintain cross-platform uniformity; and (2) the ability to fix bug 429605 (erratic font-size-adjust results) on Windows (GDI) and Linux.
Comment 60 Jonathan Kew (:jfkthame) 2010-03-17 13:29:45 PDT
Created attachment 433154 [details] [diff] [review]
patch v4.1 - read metrics from font tables where possible - refreshed for current trunk

Previous version included a couple of errors - apologies for the bugspam.
Comment 61 Jonathan Kew (:jfkthame) 2010-04-07 04:44:33 PDT
Created attachment 437533 [details] [diff] [review]
patch v4.2 - read metrics from font tables to avoid ATSGetFontHorizontalMetrics - refreshed again for current trunk

Refreshed again for current trunk.

This disappeared from top-crashes when the flurry of excitement over that StarWars demo page (with the flaky Franklin Gothic font) died down, but there's a steady trickle of crashes still affecting 10.6.x users; almost all these reports show ATSFontGetHorizontalMetrics on the stack:

http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=fnt_SH_Common%28fnt_LocalGraphicStateType*%2C%20long*%2C%20long*%2C%20int*%29&date=&range_value=1&range_unit=weeks&process_type=all&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&signature=fnt_SH_Common%28fnt_LocalGraphicStateType*%2C%20long*%2C%20long*%2C%20int*%29
Comment 62 Jonathan Kew (:jfkthame) 2010-05-24 14:03:20 PDT
Created attachment 447174 [details] [diff] [review]
patch, v4.3 - read metrics from font tables to avoid ATSGetFontHorizontalMetrics - refreshed again for current trunk

Refreshed this again.... would be good to get this into the tree for wider testing. The ATS crash is still out there affecting users.
Comment 63 John Daggett (:jtd) 2010-05-27 16:38:08 PDT
At Apple this bug is marked as "Veirfy/Cannot Reproduce", which means it may have been fixed in 10.6.4 or in post-10.6.4 code, a number of fixes have been made in this code recently. 

Steven, if you have the latest 10.6.4 seed set up could you run a quick test to see if this is fixed or not?  Otherwise, I'll test out the latest seed next week.
Comment 64 Jonathan Kew (:jfkthame) 2010-05-27 17:13:25 PDT
(In reply to comment #63)
> At Apple this bug is marked as "Veirfy/Cannot Reproduce", which means it may
> have been fixed in 10.6.4 or in post-10.6.4 code, a number of fixes have been
> made in this code recently. 

Where do you find this? I just checked the Radar bug report mentioned in comment 21, and it's still marked as Open there.
Comment 65 John Daggett (:jtd) 2010-05-30 23:42:19 PDT
(In reply to comment #64)
> (In reply to comment #63)
> > At Apple this bug is marked as "Veirfy/Cannot Reproduce", which means it may
> > have been fixed in 10.6.4 or in post-10.6.4 code, a number of fixes have been
> > made in this code recently. 
> 
> Where do you find this? I just checked the Radar bug report mentioned in
> comment 21, and it's still marked as Open there.

I pinged an engineer at Apple about it.

Crash still occurs on 10.6.4 seed build 10F54.
Comment 66 John Daggett (:jtd) 2010-05-31 00:03:46 PDT
Created attachment 448339 [details]
font metric differences, existing code vs. patch
Comment 67 John Daggett (:jtd) 2010-05-31 00:04:22 PDT
Comment on attachment 447174 [details] [diff] [review]
patch, v4.3 - read metrics from font tables to avoid ATSGetFontHorizontalMetrics - refreshed again for current trunk

I'm not opposed to doing font metrics manually rather than relying on platform APIs but with this patch certain font metrics would change significantly.  I compared the font metrics for roughly 6500 fonts with and without the patch.  Of those, below is listed the cases in which the difference was greater than 30%:

  aveCharWidth: 1410
  maxAdvance: 73
  strikeoutOffset: 1995
  strikeoutSize: 4352
  subscriptOffset: 6150
  superscriptOffset: 4480
  underlineOffset: 152
  underlineSize: 177
  xHeight: 282

Specific differences for Adobe's Caslon Pro, Garamond Pro and Jenson Pro are listed below. (e) is the value for existing code, (p) for code with the patch and (diff) the difference as a percentage of the original:

(ACaslonPro-Bold) subscriptOffset: e 20.568000 p 3.600000 diff -82.4971
(ACaslonPro-BoldItalic) subscriptOffset: e 20.496000 p 3.600000 diff -82.4356
(ACaslonPro-Italic) subscriptOffset: e 19.968000 p 3.600000 diff -81.9712
(ACaslonPro-Regular) aveCharWidth: e 16.080000 p 24.864000 diff 54.6269
(ACaslonPro-Regular) subscriptOffset: e 20.376000 p 3.600000 diff -82.3322
(AGaramondPro-Bold) subscriptOffset: e 19.608000 p 3.600000 diff -81.6401
(AGaramondPro-BoldItalic) subscriptOffset: e 19.872000 p 3.600000 diff -81.8841
(AGaramondPro-Italic) aveCharWidth: e 17.280000 p 24.384000 diff 41.1111
(AGaramondPro-Italic) subscriptOffset: e 19.584000 p 3.600000 diff -81.6176
(AGaramondPro-Regular) aveCharWidth: e 17.280000 p 23.328000 diff 35
(AGaramondPro-Regular) subscriptOffset: e 19.318970 p 3.600000 diff -81.3655
(AJensonPro-Bold) aveCharWidth: e 16.944000 p 24.528000 diff 44.7592
(AJensonPro-Bold) subscriptOffset: e 19.200000 p 3.600000 diff -81.25
(AJensonPro-BoldIt) aveCharWidth: e 17.136000 p 25.152000 diff 46.7787
(AJensonPro-BoldIt) subscriptOffset: e 19.704000 p 3.600000 diff -81.7296
(AJensonPro-It) aveCharWidth: e 16.032000 p 24.000000 diff 49.7006
(AJensonPro-It) subscriptOffset: e 19.416000 p 3.600000 diff -81.4586
(AJensonPro-LtItSubh) aveCharWidth: e 14.112000 p 22.128000 diff 56.8027
(AJensonPro-LtItSubh) subscriptOffset: e 18.864000 p 3.600000 diff -80.916
(AJensonPro-LtSubh) aveCharWidth: e 13.872000 p 21.888000 diff 57.7855
(AJensonPro-LtSubh) subscriptOffset: e 18.312000 p 3.600000 diff -80.3408
(AJensonPro-Regular) aveCharWidth: e 15.840000 p 23.568000 diff 48.7879
(AJensonPro-Regular) subscriptOffset: e 18.960000 p 3.600000 diff -81.0127
 
I tried subscripts with Adobe Garamond Pro and they don't look right, they're very shallow compared to either existing code or Safari.  

To get this right I think will require a lot more extensive testing.  If we're going to make a change that affects rendering for a lot of users then I think we should try to only do that when we're convinced that the change is correct or at least better for a large number of users.

Unless it's felt that the specific crash can be exploited I think we should let Apple fix this rather than implementing a high-impact fix that's hard to get right.
Comment 68 Jonathan Kew (:jfkthame) 2010-05-31 10:01:10 PDT
(In reply to comment #67)
> (From update of attachment 447174 [details] [diff] [review])
> I'm not opposed to doing font metrics manually rather than relying on platform
> APIs but with this patch certain font metrics would change significantly.

This is true - it would give us metrics that actually reflect the data in the fonts. It seems that ATS returns values that are often significantly different from what the font designer specified.

> I tried subscripts with Adobe Garamond Pro and they don't look right, they're
> very shallow compared to either existing code or Safari.  

The "shallow" subscripts seem to be a characteristic of Adobe's fonts, in a number of cases that I've tried. Our existing code (Firefox 3.6/Mac) places the subscripts MUCH too deep; they look really bad. I don't know what metrics Safari is using; they're less extreme than FF3.6, but still greater than the actual metrics in the font. See the attached screenshot. (Fonts in the sample are: Arial, Verdana, Georgia, Times New Roman, Adobe Caslon Pro, Adobe Jenson Pro, Minion Pro, Adobe Garamond Pro, Garamond Premier Pro. Note how all the Adobe fonts feature very shallow subscript positioning.)

However, it's worth noting that Firefox 3.6 on Windows ALREADY uses the "shallow" metrics as found in these fonts. So we currently have a huge discrepancy between Firefox on Mac and on Windows. Abandoning the ATS metrics API and using values from the fonts would bring these two into harmony.

If we then want to override the font designer's values in certain cases - e.g., by clamping the subscript offset within what we think is a "reasonable" range, that's something we can consider - and if so, we should do it in a consistent way on all platforms. But in any case I think our starting point should be to respect the font's defined metrics.

> To get this right I think will require a lot more extensive testing.  If we're
> going to make a change that affects rendering for a lot of users then I think
> we should try to only do that when we're convinced that the change is correct
> or at least better for a large number of users.

I'll argue that this change is "correct" in the sense that it causes us to respect the font's design more accurately; it is also good because it results in a more consistent layout across platforms, and because in many cases our current rendering places the subscripts much too low.

And the way to find out whether the results are acceptable or objectionable to users is to land the change on trunk, and let it go out in nightly builds.
Comment 69 Jonathan Kew (:jfkthame) 2010-05-31 10:17:17 PDT
Created attachment 448394 [details]
screenshots showing differences in subscript positioning
Comment 70 John Daggett (:jtd) 2010-06-02 01:04:21 PDT
Tested with Mac OS X 10.6.4 (10F564), same same.
Comment 71 John Daggett (:jtd) 2010-06-22 17:49:47 PDT
Confirmed again on 10.6.4 (10F569), the release version.  Still crashes.
Comment 72 Jonathan Kew (:jfkthame) 2010-06-23 04:57:03 PDT
Created attachment 453349 [details] [diff] [review]
patch, v5 - avoid calling ATSFontGetHorizontalMetrics for tt/ot fonts - revised for new trunk code (after harfbuzz landing)

Revised again to handle changes in the Mac font classes due to the harfbuzz landing.

Given that the crash still happens with the latest OS update, and the same (deprecated) Apple function is also at the root of bug 573449, I think we should bite the bullet and take this patch.

This *will* affect text metrics on OS X; the main effect will be on super- and subscript positioning with some fonts, as illustrated in attachment 448394 [details]. In particular, it seems that many Adobe fonts have a very small subscript offset, and so the subscripts are moved down much less than with our existing Mac code.

Despite this, I think we should take the change, as discussed in comment #68. In brief:

(1) We should respect the font designer's intention, as expressed in the font's metrics, with exceptions being made only in cases where the font is clearly wrong. The "shallowness" of subscripts in the Adobe fonts is not wrong, it is a design choice.

(2) While typographic tastes may of course vary, I think our existing subscript positioning on the Mac, based on the metrics we're getting from ATS, is often much too low and does not look good. Note that Safari does NOT give the same result as our existing code.

(3) Although this changes the result from our existing code on the Mac, it will make our rendering match much more closely between Mac and Windows.

If we land this on trunk and get a lot of negative feedback about the subscript positioning, we could easily enforce a "minimum offset" in SanitizeMetrics(). However, I think we should consider that only in response to significant feedback suggesting that users dislike the change - which I think is unlikely.
Comment 73 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-06-23 17:00:08 PDT
Whatever we do, I think we have to stop using this fragile ATS code. So I think Jonathan's patch moves us in the right direction.
Comment 74 John Daggett (:jtd) 2010-06-23 19:06:05 PDT
Created attachment 453620 [details]
testcase, use hacked fonts to get to ATS path and crash

The attached testcase is a modified version of Christoph's testcase from bug 532533.

> +    if (!InitMetricsFromSfntTables(mMetrics) &&
> +        !InitMetricsFromATSMetrics()) {
>          return;
>      }

This still isn't a complete fix to the *security* aspect of this
problem, there are any number of failure conditions in
InitMetricsFromSfntTables that will cause it to fail and the
problematic ATSFontGetHorizontalMetrics will get called.  I just
experimented a little and was able to make a font that would hit the
right codepath to cause a crash.
Comment 75 John Daggett (:jtd) 2010-06-23 19:30:21 PDT
(In reply to comment #73)
> Whatever we do, I think we have to stop using this fragile ATS code. So I think
> Jonathan's patch moves us in the right direction.

Long term, I agree.  But this fix is being sold as a simple bug fix
and it's not, we're basically making our own implementation of the ATS
metrics routine in a way that will have memory implications (font
tables hanging around) and performance implications (are we
faster/slower than the platform routines?).  We still need a way of
dealing with things like Type1 fonts, so we have to fallback to *some*
form of platform routine, unless we want to dive into reading Type1
metrics which I don't think we want to do.

It looks like Webkit code uses only simple CoreGraphics metrics calls.
Not sure what they do about underline/subscript/superscript offsets,
maybe they just arbitrarily calculate those based on other metrics.
Maybe the fallback code could do something like that rather than use
ATS?

Basically, this is a big change with a *lot* of potential for
regressions, it's not a slamdunk-let's-do-it-now fix.
Comment 76 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2010-06-23 20:29:16 PDT
(In reply to comment #75)
> Maybe the fallback code could do something like that rather than use
> ATS?

That sounds like a good idea.

> Basically, this is a big change with a *lot* of potential for
> regressions, it's not a slamdunk-let's-do-it-now fix.

All the more reason to do it without unnecessary delay, so we have time to fix regressions. The worst possible outcome is that we're forced to do this in a hurry because someone figures out how to exploit ATS.
Comment 77 Jonathan Kew (:jfkthame) 2010-06-24 02:19:00 PDT
(In reply to comment #74)
> Created an attachment (id=453620) [details]
> testcase, use hacked fonts to get to ATS path and crash
> 
> The attached testcase is a modified version of Christoph's testcase from bug
> 532533.
> 
> > +    if (!InitMetricsFromSfntTables(mMetrics) &&
> > +        !InitMetricsFromATSMetrics()) {
> >          return;
> >      }
> 
> This still isn't a complete fix to the *security* aspect of this
> problem, there are any number of failure conditions in
> InitMetricsFromSfntTables that will cause it to fail and the
> problematic ATSFontGetHorizontalMetrics will get called.  I just
> experimented a little and was able to make a font that would hit the
> right codepath to cause a crash.

That's a good point; the fallback to platform APIs was really intended to support the non-sfnt cases (old bitmap and type 1 fonts), but by crafting a suitably-broken sfnt you can also hit it.

I think we can modify the behavior in a couple of ways to resolve this: (1) distinguish between sfnt fonts that fail for some reason in InitMetricsFromSfntTables() and fonts that are not sfnt format at all, and only fall back to platform APIs for the latter; and (2) never fall back to the platform API for downloadable fonts.

That should make it MUCH harder to find a path to the danger areas.
Comment 78 Jonathan Kew (:jfkthame) 2010-06-24 03:02:22 PDT
Created attachment 453691 [details] [diff] [review]
patch, v5.1 - never fall back to ATS for sfnt fonts, only keep this for old bitmap/type1 fonts

This modifies the fallback behavior as described above, so it should be impossible to reach the ATS code via downloadable fonts.

The hacked-fonts testcase no longer crashes with this patch; it renders the fpgm.ttf font successfully, and safely rejects the others.
Comment 79 Jonathan Kew (:jfkthame) 2010-06-28 16:23:40 PDT
Created attachment 454677 [details] [diff] [review]
patch, v5.2 - never fall back to ATS for sfnt fonts, only keep this for old bitmap/type1 fonts (un-bitrotted again)

Updated this patch following the flattening of gfx/thebes.
Comment 80 Jonathan Kew (:jfkthame) 2010-06-30 17:40:32 PDT
(In reply to comment #75)
> It looks like Webkit code uses only simple CoreGraphics metrics calls.
> Not sure what they do about underline/subscript/superscript offsets,
> maybe they just arbitrarily calculate those based on other metrics.
> Maybe the fallback code could do something like that rather than use
> ATS?

The callstacks in bug 574368 show that even CoreGraphics metrics calls are vulnerable to bad or malicious fonts; reading the raw tables is our best chance of survival in the face of such examples.
Comment 81 John Daggett (:jtd) 2010-07-02 00:12:28 PDT
Comment on attachment 454677 [details] [diff] [review]
patch, v5.2 - never fall back to ATS for sfnt fonts, only keep this for old bitmap/type1 fonts (un-bitrotted again)

Ok, looks good.  I'm still concerned about the memory and perf impact of this but we need to move away from relying on ATS.  I don't think it makes sense to backport this to 1.9.2 until it's had a good time to bake on trunk.
Comment 82 Jonathan Kew (:jfkthame) 2010-07-11 13:35:36 PDT
Landed on trunk:
http://hg.mozilla.org/mozilla-central/rev/396bf925984c
Comment 83 Jonathan Kew (:jfkthame) 2010-07-20 07:24:25 PDT
Although this was originally filed against the 1.9.2 branch, I'm not sure it really makes sense to try and backport the patch here. Even if we did backport this, so that our metrics initialization avoids the particular ATS function shown here, I doubt 1.9.2 will ever be significantly more robust against corrupt fonts as long as it uses the (deprecated) ATSUI APIs for text layout.
Comment 84 Ali Juma [:ajuma] 2011-11-25 07:26:20 PST
Closing based on Comment 82.

Note You need to log in before you can comment on or make changes to this bug.