233474 - Symbol fonts only work on ISO-8859-1 pages

Reporter

Description

•

22 years ago

User-Agent: Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 Open the URL page http://www.mccme.ru/mmmf-lectures/books/books/books.php?book=20&page=2 in Mozilla. Its right character coding is Cyrillic (Windows 1251). Select this character coding from View|Character Coding|More|East European menu if it is not selected automatically. Pay attention to the last characters of the first paragraph. They look like A M B. In fact, M must be a well known math symbol meaning "is a subset of". To see it correctly, change your Character Coding to Western (ISO-8859-1). Unfortunately, all the rest text becomes an abracadabra in this encoding. I noticed that in the page source, the character is marked as Ì It seems that, when Mozilla switches to Win 1251, it starts displaying this symbol in its default font, instead of "symbol" (font symbol does contain the glyph looking similar to 'M' but it is different from the one displayed by Mozilla). Internet Explorer shows this page just fine in Win 1251. Reproducible: Always Steps to Reproduce: 1. Browse http://www.mccme.ru/mmmf-lectures/books/books/books.php?book=20&page=2 2. Switch encoding to Cyrillic (Windows 1251) 3. Observe A M B at the end of the first paragraph of normal text (in Russian). Actual Results: I can see A M B Expected Results: I must see A "is a subset of" B where "is a subset of" is a glyph in a 'symbol' fontface.

Wolfgang Uhr

Comment 1

•

21 years ago

Attached image A short test using IE — Details

Hello I have visited your site using IE and cannot agree to your statement. The html-sequence A М B only works on your computer and those who have similar font sets than yours. Best regards Wolfgang

Simon Montagu :smontagu

Comment 2

•

21 years ago

The quirk only works on pages in ISO-8859-1 (or compatible) encoding. In Windows-1251 0xCC is decoded to the Unicode codepoint U+041C, CYRILLIC CAPITAL LETTER EM, which isn't represented in the symbol font. I believe that IE ignores the encoding of the page for , but I doubt if we want to extend the quirk to do that.

Summary: When selecting proper encoding, math. symbols are not right → Symbol fonts only work on ISO-8859-1 pages

Pavel

Reporter

Comment 3

•

21 years ago

From the Standard: " face = cdata [CI] Deprecated. This attribute defines a comma-separated list of font names the user agent should search for in order of preference. " My reading of it is that the agent must look through the list of fonts. If the font is not found or the character numeric value is not defined in this fontface, the standard does not say what to do; however, displaying "CYRILLIC CAPITAL LETTER EM" seems at least questinable and and of course arbitrary. Displaying the question mark in this case (say, in the default font of the page) would work much better. However, I do not believe this is the case here. The font is defined, the numeric value of a character is legal in the encoding (0xCC and the value of HTTP header "Encoding" is "windows-1251"), so the character's glyph should be taken from the said font. The standard never implies that 'face' attribute is a quirk or it must only work if the encoding is "ISO-8859-1 (or compatible)" or that that strange procedure (taking the character name in Unicode following by the search of the character name in the specified font) should be applied. By the way, the latter procedure would probably not work even if the one would want to change current font from "Corier" to "Helvetica" for Latin "A" because the descriptions of this character in the font, even where available, would probably not match the Unicode's code point name. My approach to following the standard in this case would be like this: 1. Use current encoding rules (which is the combination of a character set with a reversible method of serializing them to a sequence of bit) to get a numeric representation of a character. (Please note that the standard does not even require all document characters to be representable by the encoding: it allows using character entity references to represent other characters; so the document's encoding is more an optimization than something that must affect the behavior of the agent) 2. Apply the current font to the obtained number and display. 2. Apply the current font to get a glyph 3. Display the glyph

Simon Montagu :smontagu

Comment 4

•

21 years ago

http://www.w3.org/TR/1999/REC-html401-19991224/charset.html#h-5.2.1 : "Conforming user agents must correctly map to ISO 10646 all characters in any character encodings that they recognize (or they must behave as if they did)." In other words, if the document encoding is Windows-1251, the octet 0xCC represents CYRILLIC CAPITAL LETTER EM. This is neither questionable nor arbitrary, it's compulsory. (In reply to comment #3) > Please note that the standard does not even > require all document characters to be representable by the encoding: it allows > using character entity references to represent other characters Quite so: this is exactly the way to represent the SUBSET OF character: a numeric reference |⊂| or |⊂| or an entity reference |⊂| Using the Symbol font is unnecessary, non-standard, and non-portable.

Pavel

Reporter

Comment 5

•

21 years ago

" 5.4 Undisplayable characters ... we recommend the following behavior for user agents: 1. Adopt a clearly visible, but unobtrusive mechanism to alert the user of missing resources. " 'M' is hardly such alert. Until I decided to browse the page in IE I was looking for a definition of the operator 'M' and was feeling sick...

Simon Montagu :smontagu

Comment 6

•

21 years ago

I think you are missing the point here. A better argument for fixing this would be simple consistency: if we have decided to implement a quirk for there isn't any good reason (other than ease of implementation) for it to be dependent on the encoding of the document.

Pavel

Reporter

Comment 7

•

21 years ago

Well, if you put it this way I will certainly not argue :-) -- as soon as the behavior is going to change -- this way or that.

Gervase Markham [:gerv]

Comment 8

•

20 years ago

This is an automated message, with ID "auto-resolve01". This bug has had no comments for a long time. Statistically, we have found that bug reports that have not been confirmed by a second user after three months are highly unlikely to be the source of a fix to the code. While your input is very important to us, our resources are limited and so we are asking for your help in focussing our efforts. If you can still reproduce this problem in the latest version of the product (see below for how to obtain a copy) or, for feature requests, if it's not present in the latest version and you still believe we should implement it, please visit the URL of this bug (given at the top of this mail) and add a comment to that effect, giving more reproduction information if you have it. If it is not a problem any longer, you need take no action. If this bug is not changed in any way in the next two weeks, it will be automatically resolved. Thank you for your help in this matter. The latest beta releases can be obtained from: Firefox: http://www.mozilla.org/projects/firefox/ Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html Seamonkey: http://www.mozilla.org/projects/seamonkey/

Pavel

Reporter

Comment 9

•

20 years ago

Somebody has to confirm the reported behavior. It have not changed and M is still displayed instead of a 'a subset of' symbol in Mozilla 1.7.10

Hardware: PC → Other

Simon Montagu :smontagu

Updated

•

20 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Anne (:annevk)

Comment 10

•

8 years ago

I agree that we should not add the hack outlined in comment 2. I doubt Edge implements it.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

Simon Montagu :smontagu

Comment 11

•

8 years ago

Agreed WONTFIX, but for the record do we support under any circumstances these days?

Status: RESOLVED → VERIFIED

Anne (:annevk)

Comment 12

•

8 years ago

I don't know if there's any special code paths, but if there's a font named "symbol" that should work just like "verdana" works. (So if it decides that certain code points map to glyphs that don't really represent those code points, it'd work.)

Bugzilla

Symbol fonts only work on ISO-8859-1 pages

Categories

(Core :: Layout: Text and Fonts, defect)

Tracking

()

People

(Reporter: paultolk, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type