Closed Bug 399636 Opened 17 years ago Closed 16 years ago

Font selection broken when symbol fonts are involved

Categories

(Core :: Graphics, defect, P2)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: ian, Assigned: pavlov)

References

()

Details

(Keywords: regression, testcase)

Attachments

(3 files, 1 obsolete file)

STEPS TO REPRODUCE
   Use a symbol or wingding font to display characters from that font, mixed 
   with other characters.

ACTUAL RESULTS
   Characters from the font are not rendered. Characters from outside the font 
   are misrendered using glyphs from that font.

TESTCASE:
   http://www.hixie.ch/tests/adhoc/css/fonts/family/006.html

This is a serious regression. It works fine in Firefox 2 and Opera 9.x.
Sounds like stuff Stuart and Karl were talking about
Component: Layout: Fonts and Text → GFX: Thebes
QA Contact: layout.fonts-and-text → thebes
Depends on: 399391
I think it is hoping a bit much to expect reasonable results from monotype's Symbol or Wingding fonts because of problems with the charmaps
(see bug 397288 comment 1).
Unless we expect to maintain correction tables for all fonts with corrupted charmaps...
We should support the two fonts that ship with Windows, at least. We used to.
As a workaround, we could also simply ignore fonts for which we don't have the character mappings. Symbol and Wingdings, as I understand it, both claim to not be Unicode fonts. We could just skip them (except for ye old <font face> hack for legacy pages) -- that would make the test case pass again as well.
This isn't a bug.  Things work as expected.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
It doesn't make sense to mark a bug as a duplicate of its opposite.
it doesn't make sense to have two bugs that conflict with each other.  a decision should be made in one bug
This is a bug, whether you accept it as such or not, pav. Things are not working as expected, they've regressed.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Keywords: testcase
things are working as expected and this is not a bug.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → INVALID
You are in denial. This violates the specs, it encourages platform-specific authoring in a way that we had previously decided was undesirable, and it prevents pages that correctly use Unicode and happen to mention font names which on the user's platform map to symbol fonts from working.

Why can't you fix this?
making statements like "happen to mention font names which on the user's platform map to symbol fonts from working" seem to imply a disconnect from reality.

There really aren't many symbol fonts out there.  On my Windows machine with over 3000 fonts installed I have these fonts that are symbol fonts:
Symbol, Webdings, Wingdings, Wingdings 2, Wingdings 3
Bookshelf Symbol 7, MS Outlook, MT Extra, Marlett, Mathmaatica5Mono, MS Reference Specialty, VisualUI, Sabaean

Now, really the only ones realistically people are going to use are Symbol, Webdings, Wingdings, Wingdings 2, and Wingdings 3.  The characters in Symbol are covered fully by Unicode and many code points in the other 4 are covered in Unicode but not all giving a perfectly legitimate reason to use them.

Now, does anyone want these to become wide spread on the web?  Certainly not.  But there are reasons to use them.  Providing inconsistent support depending on if you're using CSS or if you're using deprecated HTML tags doesn't make sense.

Now.  You claim that this "violates the specs" but I can't find anything that specifies what to do with non-Unicode fonts.  The only relivant piece of data I can find says that &#xxx entities are UCS codepoints.  Given that you've asked for a non-unicode font and that there is no way to overwrite the document charset on a per element basis, I believe that you have to interpret the code points being asked to render with a non-Unicode font as non-Unicode code points.  If you decide that they are no longer Unicode code points then you can no longer do font substitution for missing code points in the font because the characters aren't Unicode.  Rendering some characters as non-UCS characters and then some as doesn't make sense.

If you are using non-Unicode fonts then you should know what you are getting yourself in to.
If your real concern is around authors using Windows-only fonts you could, continuing with the current behavior (described above), map the code points that do have Unicode equivalent to Unicode code points and render them with another font.  Therefore on mac or linux you could map 0x61 to GREEK SMALL LETTER A when asking for Symbol.  Similar mappings could be done for the other "common" fonts.
(In reply to comment #11)
> Now.  You claim that this "violates the specs" but I can't find anything that
> specifies what to do with non-Unicode fonts.

Sorry, the whole "Unicode fonts" issue is an implementation detail.  The API for how we get glyphs for characters out of fonts, and how those fonts represent the data internally, has nothing to do with how characters are used on the Web or with Web standards.  It's the job of the browser to map characters to glyphs that represent that character, not some other random character that happens to have the same code point in a different encoding in a font that happens to be on the system.

If you can't map the glyphs a font provides to characters in Unicode, then that font isn't relevant to the Web.
> But there are reasons to use them. Providing inconsistent support depending on
> if you're using CSS or if you're using deprecated HTML tags doesn't make sense

We want to stop these fonts from being used on the Web. We can't stop it altogether because of legacy use of <font> elements. That's one reason it makes sense to "support" them (that is, to map Unicode characters to unrelated codepoints in a platform-specific fashion) in the <font>-limited fashion.

The other reason is that CSS is split from the text to which it corresponds, whereas here the binding of font to character is very intentional and thus belongs in the markup layer, if at all.

It's a violation of the specs because the specs say that when a document contains a Unicode character X, the glyph rendered must be X. There are in fact multiple violations here, for example the lack of fallback after you find a symbol font, and the lack of mapping of symbol font characters (or, alternatively, ignorance of those fonts altogether).

But most importantly, this is a regression of something that used to work.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → WONTFIX
So does Pav get the last word on this? Or is there someone who actually cares about product quality at the wheel at all?
With the current behavior, changing the style sheet can change the charset
used to interpret the (text in the element of the) document.  Such behavior would mean that CSS is no longer "separating the presentation style of
documents from the content of documents".

Maintaining the ability to separate style and content seems more important
than requiring any consistency between the font-family property and a font
modifier element quirk.

(And if web authors really need/want to use glyphs from Wingdings that do not
 have Unicode code points assigned then they should be using the Unicode PUA
 mappings from the microsoft/symbol charmap not the mappings from the mac/roman
 charmap.)
I'm sorry, but I think David's comment 13 is key.  The Web uses Unicode.  The fact that we internally have to deal with fonts that don't actually use Unicode as their character set is our problem, and we shouldn't just be punting on it.

The test page in question has a "latin lowercase a" character, and we end up producing a glyph for a quite different character.  As I understand it, we even know we're doing that, because this is a "non-Unicode" font (but correct me if I'm wrong).

We made a decision here years ago that when a page has a "latin lowercase a" we should render a glyph for that character.  If the font the webpage specifies has no such glyph, we should do fallback.

This turned out to cause a number of issues on existing sites, so we introduced a quirk: <font face=""> would do some weirdness with using the character number as an index into a non-unicode character table.  We now propagate enough information to gfx that it knows where font-family information is coming from.

This bug is about the fact that cairo gfx has expanded the quirk to all font-families, not just the <font face=""> ones.  As Ian says, this is a regression.  And as David says, this is bad for the Web.

Maybe we need to revisit the earlier decision.  If so, we should do it on its merits in terms of what the right behavior for the Web is, not based on "we happened to implement it this way, and we don't want to pollute our clean implementation with real-life considerations".
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
fwiw, I tried to reopen the bug with the earlier decision so that we could have a discussion on the issue and was promptly shot down.  This bug shouldn't be open (as I've tried to close it several times) and we should be having this discussion in 33127.  I don't care what the resolution you want to use for this bug is, but this bug shouldn't exist.
The right forum for the discussion is probably newsgroups, not the old bug, for what it's worth.  At the very least, that bug has a lot of irrelevant noise, and is in the wrong component.

I don't see why you think this bug should not be open.  We have a decided-on behavior that was implemented.  We've never decided that the behavior should change.  The behavior changed.  That's what we would call a regression.  We need a bug to track this regression.  This bug should, in my opinion, block 1.9.  This is that bug.  I'm not sure why you're suggesting reopening a bug that asked us to change the behavior in question to track the fact that it has in fact changed (when it shouldn't have).

I appreciate the fact that we have a lot of stuff on our plates for 1.9, but trying to make it seem like we have less by resolving bugs when the issue is alive and "well" seems really bizarre to me.
Summary: Font selection has broken → Font selection for Symbol fonts broken
Summary: Font selection for Symbol fonts broken → Font selection broken when symbol fonts are involved
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
Priority: P3 → P2
Flags: tracking1.9+ → blocking1.9+
Assignee: nobody → pavlov
Status: REOPENED → NEW
Attachment #311059 - Flags: review?(vladimir)
Attachment #311059 - Flags: approval1.9b5?
Comment on attachment 311059 [details] [diff] [review]
a change in behavior

a=beltzner
Attachment #311059 - Flags: approval1.9b5? → approval1.9b5+
Status: NEW → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
A mechanism for correctly rendering non-Unicode Symbol-font glyphs is necessary.  With Wingdings,  FF 3ß is seeing no UC character sets (code-page support), so is therefore rendering with the default font.

How does one specify (pick) a glyph in a particular such non-UC symbol font to display? E.g., the code

<code>Chapter One<span style="font-family: Wingdings;">&#0240;</span></code>

explicitly requests the user agent render Wingdings’ (which is by default installed in almost all Windows systems) hollow, right-pointing arrowhead after the word One, by specifying the font-family:, then the character’s codepoint within (offset, dec. 240, hex 0xF0), but it’s still nevertheless rendered as eth (ð) in FF 3ß, regardless of which encoding is specify.

Of course, Wingdings, like Symbol, Webdings, and many other TT fonts installed in many O/S workspaces, isn’t a Unicode TTF font, but a plain, raw, old-style, pre-Unicode TTF, and such’ll be out there for the foreseeable future.  These fonts don't have corrupted font tables, just use the old, non-Unicode layout.  Wingdings' character 240 (0×F0 hex) is the hollow right-pointing arrowhead, and explicitly specifying the font face in the stylespec and the character’s offset within it should render that way, only rendering eth (ð) in the default font if the user agent can’t find the Wingdings (symbol) font installed in its O/S workspace to render with (e.g., on a Mac), right?

If a font coded for by a stylespec using font-family: has no Unicode support, and code (HTML|CSS2)specifies the &#glyphnum; syntax to explicitly pick a glyph from it, FF IMHO should revert to brute-force, inelegant rendering, just picking the raw character at that offset in the font to render (or display a hollow box if it’s not found), disregarding symbol-font encoding and code-paging entirely until that font is relinquished — which I’d think should be fairly simple code invoked when font-family: is detected in a style, like this pseudocode

IF(code-page support found in font-family font requested)
    {Unicode font rendering}
ELSE
    {raw font rendering}
ENDIF

Using deprecated <font face="Wingdings"> syntax is a poor workaround.  MANY authors are & will still be wanting to use these kind of non-Unicode symbol-font glyphs (e.g., the ever-popular Webdings).  BTW, Opera 9.26 displays the same behavior., but the above code does work without error (surprisingly…) in MSIE 7 & 8 beta.

See instructions & usage kludge for FF 2 at http://www.bachware.com/books/computer%20utilization/Preface.htm#symbol_error
and for testing FF 3 at
http://www.bachware.com/books/computer%20utilization/toc.htm, the table cell to the right of the cell containing the "Chapters:" legend.
In current versions of Windows, symbol fonts like Wingdings are mapped to a portion of the Unicode Private Use Area or PUA (U+E000 - U+F8FF), where fonts can put whatever they please. The correct way to reference characters from these fonts is to add 0xF000 to the character code shown in Character Map, which produces a codepoint in the PUA range. For example, the hollow right-pointing arrowhead 0xF0 becomes 0xF0F0, which can be used in HTML as &#xf0f0; or &#61680;.

Some more details at
http://blogs.msdn.com/michkap/archive/2005/11/08/490495.aspx
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=HextoDecConversion#5d5a68dd
Helge: fwiw, i agree with you completely.
Justin, in testing the F0-prepend to the &#glyphchar; you mention in comment #24:  This works in FF 3beta, but breaks display of such glyphs in Firefox 2.0.0.12 (glyphs show as question marks), MSIE 7 & 8beta (where it worked as described above, but now shows up as a string of garbage), and also messes up display in Opera 9.26 (wrong characters).

Obviously, then, if what you and the sources in your links are saying about Windows symbol fonts' glyph set being re-mapped to the Unicode private use area in the F0 block is correct, FF 2.0.0.12, MSIE's, and Opera's handling of the &#xf0offset; is wrong, which is difficult to imagine.

If I remember correctly, both Wingdings and Symbol originated in Windows 3.x, when there was no Unicode spec or info in .TTFs, and the F0 addressing kludge is somehow accessing the right character offsets for Firefox 3beta only.  I still maintain that FFox's code should look at a TTF's prefix data to see if it has Unicode support, and if not, revert to a raw (non-Unicode, legacy) method of rendering its glyphs, if possible.  That permits code also to work with MSIE without obnoxious workarounds.
> which is difficult to imagine.

Not really difficult, no.
(In reply to comment #23)
> E.g., the code
> 
> <code>Chapter One<span style="font-family: Wingdings;">&#0240;</span></code>
> 
> explicitly requests the user agent render Wingdings’
> (which is by default installed in almost all Windows systems)

But not other systems.

> hollow, right-pointing arrowhead after
> the word One, by specifying the font-family:, then the character’s codepoint
> within (offset, dec. 240, hex 0xF0),

This example is explicitly requesting the user agent render Unicode
character U+00F0 LATIN SMALL LETTER ETH with Wingdings.

> but it’s still nevertheless rendered as
> eth (ð) in FF 3ß,

Wingdings does not support that character and so it is rendered with a
fallback font.

> regardless of which encoding is specify.

> ... and code (HTML|CSS2)specifies the &#glyphnum; syntax to explicitly pick
> a glyph from it,

There is no &#glyphnum; syntax in HTML.  This syntax is for Unicode character
code points.

Numeric character references are code positions in the
Universal Character Set regardless of the encoding.

http://www.w3.org/TR/html4/charset.html#h-5.3.1

If there were such a glyphnum syntax, the glyph number for the hollow,
right-pointing arrowhead in Wingdings would be 214 (decimal), but that may
depend on the version of the font.  240 (oxF0) is a character code in one of
the font's cmaps.  Any font-specific character code syntax would need to also
specify which cmap to use for interpreting the character code.  Wingdings has
two different cmaps.  Other fonts have more.

(In reply to comment #26)

> I still maintain that FFox's code should look at a TTF's prefix data to see
> if it has Unicode support ...

The microsoft/symbol cmap in these fonts is really a Unicode cmap, using code
points from the Private Use Area. 

I assume the 0xF0xx character codes work in FF3beta because Uniscribe is using
the microsoft/symbol cmap in the font.  Perhaps this doesn't work in other
software because the other cmap is used, or perhaps because other software or
platform libraries are performing some transformation of the code as hinted at
by "Under Windows, only the first 224 characters of non-standard fonts will be
accessible" here:

http://www.microsoft.com/typography/otspec/recom.htm#sym

The only way to get a consistent character across many platforms and
applications is to use a Unicode character.
Is U+21E8 RIGHTWARDS WHITE ARROW suitable for the purpose here?

If it is absolutely necessary to use a glyph from a font of the Windows 3.x
era, then assumptions need to be made about which cmap is read by applications
and how that cmap is interpreted (as well as which fonts are on the system).  These assumptions of course won't be valid across platforms and applications.
Karl:  We have all the cmap data and know exactly what glyphs are there for Wingdings, etc.  The patch I checked in for this bug explicitly breaks the behavior that both Helge and I (and pretty much all other web authors) want.
Agreed: The only way to get a consistent character for a Unicode font across platforms is by using Unicode mechanisms (that's what they're for!); but for the non- Unicode, platform-specific fonts being discussed here, Unicode code-page selection isn't an issue at all.

According to High Logic's Font Creator program, Wingdings.ttf has only Macintosh Roman and Microsoft Symbol internal mappings.  There's NO Unicode support in it at all. It's a completely old-style, vanilla .ttf.  Its glyph with the Postscript name "bright" (hollow right-pointing arrow) has the Macintosh Roman mapping $00F0, and the Microsoft Symbol mapping $F0F0.  Whatever Unicode processing (remapping) that's happening regarding F0 with it and legacy fonts like it, is synthetic and external (see below regarding eth and the Windows-key glyph).

And, according to XP's Character Map applet, Wingdings, Webdings, Symbol, Commercial PI, etc. are non-Unicode fonts: It greys out Unicode in its Character Set field and disables all Advanced view if that check box is turned on (again, I assume M$ has coded it correctly), so infer from this that such fonts have no Unicode support, as corroborated by Font Creator, above.

Though many of these fonts had their beginnings in 3.x, and've been updated since Win9x (my XP-origin Wingdings.ttf is from 1995), Character Map and Font Creator both imply they still have no as-such (internal) Unicode support. I suspect that what’s actually happening is that, if some Unicode handler (e.g., the Uniscribe Karl mentions) receives data read from the font, it detects it in the 0xF0 region, as previous comments describe (e.g., Justin's, Comment #24).

Such pre-Unicode symbol fonts' cmaps are apparently being externally synthesized to be "as-if" Unicode maps in the F0 PUA, but why doesn't the explicit &#F0--; syntax work with MSIE 7 & 8?, but does with &#00--;? There must be a bug in MSIE, which is of course not unheard of <grin>. If I use the former syntax, I get garbage output, as per Comment #26, but the latter works — perhaps because MSIE sees the symbol-font Panose flag or Symbolic/Pictorial family declarations in a non-Unicode font, so forces raw reading from the F0 region at the font's self-declared start offset.  The link Karl cited at M$' web site does indeed seem to apply to this situation:  These Wingdings cdata start at 0xF021, and run to 0xF0FF.

If a font-family: stylespec specifies a non- (pre-)Unicode, font face such as Wingdings present on a very prevalent platform, Firefox should suspend attempting to use Unicode character-encoding processing (mapping), and revert to a raw mode that allows Web authors's HTML to explicitly pick the desired character from a font by code offset into it, Firefox detecting the font's start offset (into the F0 area on the Windows platform, forcing the offset if not specified by HTML), and render within the the .ttf's character-count range using its glyphs, and rendering any other points as hollow boxes.

My point is that: Because of their prevalence on Windows platforms on the Net, and their continued use, such a raw Firefox mechanism is necessary that triggers when not detecting Unicode in the font (detects pictorial/symbol type?), for selecting and rendering specific characters from such fonts by stylespec and &# pick, so that the &#----; syntax used to explicitly select & display a glyph from such a font by number (offset) or works as it does in MSIE, perhaps by checking the font startpoint into F0 and character count, then picking and rendering accordingly.

Consider this: True, Wingdings has no as-such-named eth (ð) character, but Font Creator says Wingdings' hollow-arrowpoint character has the "segment mapping to delta values" of hex F0F0 — and (the second) F0 offset's the same as eth's in Unicode space (U+00F0); so, even if one puts in &eth; instead of &#F0; with the font-family: stylespec being Wingdings, the hollow white arrow is output to the renderer, apparently because that's the character at that offset in Wingdings' F0 region, and it works.

Another example (hope I'm not being tedious...): The Windows-key flag character in Wingdings that many want to use in an HTML script for that platform, is hex F0FF.  This maps to the &yumlaut; (U+00FF, ÿ in Unicode space), so the user agent should render either &#ff; (with or without leading zeroes) or &#yumlaut; as the Windows-key flag character from the Wingdings font, if style="font-family: Wingdings;" is in force, using brute-force offset addressing from hex F0.  If the platform the code’s running on hasn’t the Wingdings font, the user agent should just render it as the hollow-box "unknown" character.

Old-style (legacy) absolute offset addressing from F0 should be used here with such a symbol font, not Unicode-style (code-page) addressing to named glyphpoints.  If a font isn’t set up to use Unicode codepaging, why even agonize over glyphpoints at all? Revert to the pre-Unicode, legacy, offset-based addressing that simply works for these fonts, and be done with it.  They’re going to be around for the foreseeable future, and need to be supported by style="font-family: {face name};">&{specification}; sensibly.  Using a simple fall-back mechanism to do so is the way to go.  If properly coded, it won't break or even interfere with more modern, advanced code-page-based Unicode font handling at all.

Finally:  Since such symbol fonts can (as does Wingdings) also have Macintosh Roman mappings, Apple-platform Firefox should also be coded to display them correctly the same way, using analogous raw mapping there, if the font is available.  For Macintosh Roman mapping, Wingdings has no F0 page offset (i.e., the offset is zero).

I'm sorry this is so long, rambling, repetitive, & verbose...  It’s taken hours of experimentation, compiling & research.
Most other browsers don't properly support things in the private use areas in my testing.
Yet another situation where MANY Windows-based HTML authors will want to use a TrueType symbol font resident on such systems is the Marlett font that Windows itself uses to render many window controls. In my testing of HTML pages that employ it, displaying these controls breaks completely.  See the close-box example  in http://www.bachware.com/books/computer%20utilization/Ch.%201/case.htm.
This displays properly in MSIE 7 | 8, and breaks in Firefox.
Blocks: 425367
Blocks: 399391
No longer depends on: 399391
verified fixed using the testcases and Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008041217 Minefield/3.0pre ID:2008041217

--> Verified fixed
Status: RESOLVED → VERIFIED
I disagree.  This isn't really fixed, just is a complete failure of Gecko 1.9's handling of Windows-type symbol fonts in comparison to how it works in MSIE and Firefox 2.0 (with a hacked res/fonts/fontEncoding.properties symbol fonts section). Carsten in comment #33 may've been testing with my file linked to in comment #32 that I re-coded to kludge around the problem by using &times;, but see the specific test cases in http://www.bachware.com/books/computer%20/utilization/old_case.htm, which shows tests of using both an old font-family: Marlett;>{char} syntax and then new #&xF0--; syntax for fishing symbol-font glyphs out of a font file's PUA using non-backward-compatible brute force.

I guess I and other author just have to live with the decision to have Gecko display symbol-type font glyphs as it does now, and kludge-​around in­com­pat­ib­il­it­ies be­tween brows­ers. That, of course, is a frustration. For the record, I think the way that’s implemented in FF & Opera is a mistake, and that, for just this kind of font, all Windows browser versions’ handling of them should be harmonized to work as MSIE handles them: with non-Unicode, raw font addressing.
Helge meant http://www.bachware.com/books/computer%20utilization/Ch.%201/old_case.htm in comment 34.

Anyway, Helge, please file a new bug for the issue you are encountering and mention the bug number here. It's better to file a new bug for this.
Attached patch reftests (obsolete) — Splinter Review
I plan to land these later today.  I filed bug 429017, bug 429019, and bug 429022 on the current failures.
Attached patch reftestsSplinter Review
I'm still discussing these with jdaggett, so I may not have a chance to land them for a bit; here's a revised version with the typo in reftest.list fixed.
Attachment #315659 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: