Closed Bug 123095 Opened 23 years ago Closed 23 years ago

Windows fonts on MS encodings on Unix show "wrong" glyphs, or, funny apostrophe

Categories

(Core :: Layout, defect)

x86
Linux
defect
Not set
minor

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Stevan_White, Assigned: attinasi)

References

()

Details

(Keywords: fonts)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010914 BuildID: 2001011002 Certain Microsoft encodings, especially, the apostrophe (#146), are rendered incorrectly if a Windows font is requested by the page and is available to the browser. It appears that Mozilla attempts to translate the MS character to some standard encoding so that it will look right using unix fonts. But if Windows fonts are installed, this is the wrong thing to do. Reproducible: Always Steps to Reproduce: 1.Windows font, such as "Arial" installed in system 2.A MS encoded character, such as the curly apostrophe, ’ in the page 3.The page needs request that the character be displayed using that font Actual Results: The character is rendered incorrectly; in the case of ’, it is rendered as a y with two dots over it. Expected Results: This is tough. Should it determine whether the font being used has the Microsoft encodings, and in that case, not translate the value? This happens whether an SGML entity is used (’), or an explicit character is embedded in the html. But I can't seem to copy explicit characters into this text field, so I can't show you that. Many web pages generated by MS products have such explicit characters, though.
If the character is embedded directly in the page, then the page's character set should determine how a character is handled. If the character set is one of the Windows character sets (e.g. windows-1252), we do the "right thing", namely we pick the character you expect (I believe). If the character encoding is UTF8, then we probably don't do the right thing, since the right thing is to not display anything (in that character encoding, character 146 is U+0092, which is the PRIVATE USE TWO control code). If the character is embedded using an SGML entity reference, then we should do the same as if it was embedded directly using UTF8. (According to the specs, the native character encoding for HTML is Unicode). In practice though, at least on my machine using a build from a day or so ago, I find that on the test page we render all the control characters using the glyphs one would expect to see on a non-compliant backwards-compatible browser. To my knowledge I have no non-free fonts (i.e., no Microsoft fonts) installed. If you remove your Microsoft fonts from your system, does everything render ok?
Whiteboard: WORKSFORME?
Apparently this problem _only_ occurs with Microsoft fonts. Since I don't have any installed, I can't check this further myself. cc'ing the usual suspects.
Keywords: fonts
Summary: Windows fonts on MS encodings, or, funny apostrophe → Windows fonts on MS encodings on Unix show "wrong" glyphs, or, funny apostrophe
Whiteboard: WORKSFORME?
The document charset of HTML file should always be UCS, ie. unicode, no matter what kind of encoding it is. So ’ should always reference to u+0092. Using NCR value in range of 0x80 to 9f to denote its corresponding code points in win125x encoding is very popular in web page practice, though that is theoritically incorrect. To handle such situation, we interprete those values according to their win125x definition. Since those code points in unicode is in private area, this hack does not create further problem. Because of the above mentioned reason, ’ is always translated to unicode code point u+2019. Unfortunately, this character is not include in iso8859-x encodings except iso8859-13, which is not usually available in X font collection. For those "special" characters, we didn't traverse all system font. If we resort to CJK fonts, the glyph might be too wide and looks ugly in western document. The current implementation is to try all iso8859-x fonts first. If no match is found, tranliteration will be made and u+2019 will be displayed as apostrophe '(u+0027). I would suggest to resolve this bug as wontfix for now. When windows fonts become popular on X and the fonts declare they are of win125x encodings, we might considering trying those fonts before tranliteration.
Marking won't fixed per last comments.
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.