Closed Bug 554820 Opened 15 years ago Closed 8 years ago

does not detect other fonts that can dispaly a unicode code point

Categories

(Core :: Layout: Text and Fonts, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: neskiem, Assigned: jfkthame)

References

(Depends on 1 open bug, )

Details

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100202 Firefox/3.5.8

Instead of looking for another font on the Mac OS X system that could display Uncode point U0313 Firefox decides to display a little box.

I do have the correct font, and when I open the page in Google Chrome or in Safari.  The page loads correctly.  Leading me to believe that there is something wrong with Mozilla Firefox's ability to fallback gracefully to another font.



Reproducible: Always

Steps to Reproduce:
1. Goto a page with a CSS file that uses a font that doesn't specify the correct font.
2. See that it doesn't gracefully choose another font to display the unicode codepoint.
Actual Results:  
I opened a page that showed little boxes instead of the Combining Comma Above

Expected Results:  
Firefox should look for any font that can display the Combining Comma Above
The problem appears to be specific to Verdana (used by the styleheet for the site in the URL field).

Another weird thing, viewing the page with Gecko 1.9.2 (FX 3.6) shows the missing glyph box. But with Minefield nightly builds, there is no missing glyph box, but the offending character is not displayed.
Component: General → Layout: Text
Product: Firefox → Core
QA Contact: general → layout.fonts-and-text
Version: unspecified → Trunk
Attached file test case
minimal test case extracted from URL
Can you please verify if this occurs on OS X 10.5.x, or if it is only happening on 10.6?
The problem is not limited to Verdana; other fonts such as Georgia and Impact show similar problems.

This testcase shows the "problem" word from the original site using several fonts. Of these, only Lucida Grande and American Typewriter actually support the character U+0313 COMBINING COMMA ABOVE, as can be verified with the Character Viewer palette or by dumping font files using ttx. With the other fonts, we'd expect fallback to occur.

However, on 10.6 our fallback path is NOT used. This is because when we check the 'cmap' tables of Verdana, Georgia, and Impact (and other similar fonts), we detect that U+0313 **is** present. So we proceed to use the same font - and fail, because the glyph is not really there.

What appears to be happening is that when we call ATSGetFontTable, the OS is returning NOT the real 'cmap' table from the font file but instead a synthetic table that it has built on the fly. This cmap includes a platform-0, format-4 subtable that was not present in the original file, and in that subtable, many additional characters are included. I believe this is because Cocoa Text tries to simulate these added characters using available glyphs in the font -- e.g., U+0313 can be simulated by using the normal COMMA and repositioning it as a diacritic. However, this simulation path does not take effect via the Core Text APIs that we're using to shape the text (it's probably a Cocoa-specific feature), and so those characters simply fail.

The results of this simulation of U+0313 can be seen by viewing the testcase in Safari: it displays a comma diacritic in all the fonts, although the positioning is quite variable.
Assignee: nobody → jfkthame
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
This works around the problem for these fonts by preferring the later (MS-platform) cmap subtable when both platform 0 and platform 3 versions are present. With this in place, the comma diacritic triggers font fallback as expected when it is not directly supported in the font.
Attachment #434838 - Flags: review?(jdaggett)
Cleaned up the patch a bit (eliminate third arg to acceptableFormat4; fixed excessive line lengths).
Attachment #434838 - Attachment is obsolete: true
Attachment #434843 - Flags: review?(jdaggett)
Attachment #434838 - Flags: review?(jdaggett)
yes I am using snow leopard.  I don't know about if it is on leopard.
If you'd like to test this patch, and confirm that it resolves the problem, there are tryserver builds available at

http://build.mozilla.org/tryserver-builds/jkew@mozilla.com-try-34537df1b222 (Firefox 3.6 branch)

http://build.mozilla.org/tryserver-builds/jkew@mozilla.com-try-622567c255b1 (trunk)
looks good on the 3.6 branch i haven't tried trunk yet
So is ATS synthesizing the entire table or just adding selected codepoints related to diacritics?  My only concern here is that by using the MS platform cmap we may be out of sync with the actual table CoreText is using, i.e. we'd be mapping to the ms-platform table and CoreText would be using the synthetic table which might lead to strange regressions in other cases.
As far as I can tell from looking at the data that we get back from ATSGetFontTable, they're prepending a new platform-0 subtable, and leaving the existing tables (platforms 1 and 3) untouched. Obviously, we don't know the precise algorithm they're using, but the only thing that makes any sense is to add codepoints for characters they believe they can simulate.

The issue here is that Core Text is not fully implementing the added codepoints for us; they're appearing in the synthesized cmap, but they result in .notdef glyphs.

On looking again at what's happening in Apple's applications, I don't think they are really "simulating" the diacritics in the way I originally thought, by repositioning other glyphs; I think they're adding the combining marks to the cmap in the expectation that canonical composition will then be used to map to precomposed Unicode characters. Then, if that fails, they do font fallback on the diacritics that are left in the text stream. So the platform 0 table in the cmap is claiming "support" for accents that are only really "supported" via the NFC composition route, they're not present as separate characters in the font. And that's why we have to avoid this table as a reference for font matching.

The other way to solve this - which seems much harder to me - would be to trust the cmap from ATS, and let Core Text do its shaping and layout; then look for any characters that mapped to the .notdef glyph (i.e., Core Text failed to "magically" support them via canonical composition), perform font fallback at that stage, and re-process them.

There is indeed a potential regression here, which is that we could end up doing font fallback for a diacritic (because it's not present in the font) in cases where Core Text would have successfully mapped the base+diacritic to a precomposed form that IS present. I think that's considerably less bad than "losing" diacritics altogether, especially as text on the Web will normally use NFC for characters that exist in precomposed form in Unicode. To do better, we need to get font matching to work on the grapheme cluster level, and when a diacritic is not supported in the main font, check whether it can be supported via canonical composition.
Patch seems fine but did you test this both on 10.5 and on 10.6 since this code affects both?  What fonts are affected by this, which ones have a significantly different cmap when the MS-format cmap is preferred over the Unicode-format cmap?
(In reply to comment #12)
> Patch seems fine but did you test this both on 10.5 and on 10.6 since this code
> affects both?  

Yes, appears to be harmless on 10.5. This is expected, because in practice, when both platform 0 and platform 3 subtables are present, they should have identical content. And they do.... until 10.6 comes along and starts synthesizing an "enhanced" platform 0 subtable, leading to the problem reported here.

> What fonts are affected by this, which ones have a significantly
> different cmap when the MS-format cmap is preferred over the Unicode-format
> cmap?

I ran a test on the 545 .ttf and .otf fonts that I happen to have on this machine, which include both Apple fonts and a variety of 3rd-party fonts from various sources. Results:

  67 fonts have only the Unicode subtable (platform 0)
  167 fonts have only the MS subtable (platform 3)
  310 fonts have both Unicode and MS subtables, with identical content

And lastly, exactly ONE font has Unicode and MS subtables that differ. This is an old Fontographer-generated .ttf (Yataghan) that I must have picked up from fontsquirrel. And the ONLY difference between the two subtables is that the MS one includes a mapping for U+20AC, the Euro character. This is characteristic of a very old TrueType font originally created for Windows 3.1, and subsequently patched when (in the Win95 timeframe, IIRC) the default Windows codepage was changed to allow the addition of Euro currency support. At that time, there were tools in circulation that would patch old fonts, but they typically touched only the MS subtable. There are no doubt other similar fonts around, but in those cases preferring the MS subtable is if anything an improvement.
Comment on attachment 434843 [details] [diff] [review]
prefer the MS-platform cmap subtable to avoid synthetic table on Snow Leopard

Sounds good then.
Attachment #434843 - Flags: review?(jdaggett) → review+
checked-in: http://hg.mozilla.org/mozilla-central/rev/c475fc8bac21
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reopening, as this led to a serious regression with Hebrew on 10.5, bug 565766.

http://hg.mozilla.org/mozilla-central/rev/8f3b7dc9d368 (backout)
http://hg.mozilla.org/mozilla-central/rev/14b90e3736d5 (merge)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 565766
So it turns out that on OS X 10.5, ATS and Core Text are doing a different kind of cmap-related trickery and this patch caused a regression for Hebrew and Arabic, depending on the font families specified in CSS.

The problem is that some fonts such as Arial include Arabic and Hebrew characters, but OS X believes that it can't actually render those scripts properly because the fonts don't have the requisite AAT tables (they have OpenType layout tables, having been designed for Windows use).

So it seems that when we call ATSFontGetTable to get the cmap from such a font, 10.5 does not return the true cmap from the font but rather a synthetic version where the Unicode subtable (platform 0) has had any Arabic and Hebrew ranges omitted. Therefore, if the font-family list calls for Arial but the text is Hebrew (for example), we'd see that the characters aren't supported, and fall back to the next font in the list (or a default).

But when we switched to prefer the MS subtable, we suddenly see the "true" character repertoire of the font, and try to use it. That would be ok... except that when we call Core Text to do the text shaping, it uses the munged Unicode cmap, concludes the characters aren't supported, and does an internal fallback to some other font; therefore, it gives us glyph IDs that don't match the font we believed we were using.

So it seems that using the "real" MS cmap subtable rather than OS X's "fake" Unicode one is more dangerous than I thought, at least on 10.5 (although on 10.6 this issue doesn't arise, as in fact Core Text *CAN* render the Arabic and Hebrew from Arial, etc., and the fallback and glyph garbling doesn't happen).

The simplest solution might be to make the cmap subtable preference depend on the OS version, so that we'd favor the Apple Unicode subtable on 10.5 and the MS one (to avoid the original issue in this bug, where 10.6 is adding fake codepoints for diacritics to the cmap).
I believe this may have been resolved as a side-effect of bug 663688. Can anyone still reproduce the problem in a current version, or should we close this bug?
The code that used to be involved here is no longer in the tree, and AFAIK there haven't been reports of a comparable problem with the current code.

Closing as WFM -- as we don't have a RESOLVED:NO_LONGER_RELEVANT status. :)
Status: REOPENED → RESOLVED
Closed: 14 years ago8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: