Closed Bug 587112 Opened 15 years ago Closed 15 years ago

C1 control codes should not be interpreted as Microsoft characters with the HTML4 parser

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: vincent-moz, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-GB; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-GB; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 According to the W3C[*], C1 control codes such as U+0080 should not be interpreted as Microsoft characters. However, for text/html files, Firefox interprets U+0080 as the Euro symbol, which is incorrect. [*] http://www.w3.org/International/questions/qa-controls which says: Whereas the ISO 8859 family reserves the C1 range for controls, Microsoft character sets (e.g. 1250-1258) place characters in this range. Sometimes content authors mistakenly use the Microsoft character code points in creating NCRs instead of using the Unicode values. Because of the prevalence of this mistake, many browsers display the Microsoft characters in this range. This is incorrect ^^^^^^^^^^^^^^^^^ behavior and further misleads the developer by incorrectly ^^^^^^^^ confirming the mistaken value. The problem may eventually be discovered when the data is treated by some application, or when a standards-conforming browser fails to display the intended character. Reproducible: Always Steps to Reproduce: 1. Open a HTML file, either local or served as text/html, with € in it (I'll attach a testcase). Actual Results: The € is rendered as the Euro symbol €. Expected Results: The character could be ignored. The XML parser keeps the character, which is rendered as a square box 0080: at least one knows there's something wrong in the source.
The behavior exhibited by the HTML parser is required for compatibility with Web content and is specified in HTML5: http://www.w3.org/TR/2010/WD-html5-20100624/tokenization.html#tokenizing-character-references
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago
Resolution: --- → INVALID
OK, if Firefox follows HTML5 parsing rules, it must do this consistently. I've reported bug 589953.
In bug 589953, it is said that Firefox 3.6 doesn't use the HTML5 parser. So, it should follow the HTML4 parsing rules.
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Summary: C1 control codes should not be interpreted as Microsoft characters → C1 control codes should not be interpreted as Microsoft characters with the HTML4 parser
(In reply to comment #4) > In bug 589953, it is said that Firefox 3.6 doesn't use the HTML5 parser. So, it > should follow the HTML4 parsing rules. 1) Firefox 3.6 is a released product, and changes like that aren't done in point releases. 2) HTML4 has no parsing rules and isn't a suitable guide for implementing a Web-compatible browser. 3) The same compatibility concerns apply to Firefox 3.6, so changing this in Firefox 3.6 would make Firefox 3.6.x less successful at rendering real-world Web content.
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → WONTFIX
(In reply to comment #5) > 1) Firefox 3.6 is a released product, and changes like that aren't done in > point releases. OK. > 2) HTML4 has no parsing rules and isn't a suitable guide for implementing a > Web-compatible browser. Of course it does have parsing rules, as long as the web page is valid, which is the case here. The point is that HTML5 (served as text/html) has changed the rules (while HTML5 served as application/xhtml+xml has preserved them). > 3) The same compatibility concerns apply to Firefox 3.6, so changing this in > Firefox 3.6 would make Firefox 3.6.x less successful at rendering real-world > Web content. At the same time it leads web page authors to think that their web pages are rendered correctly while they aren't, with the consequence that such web pages won't be rendered as the authors expect with some web browsers (such as w3m and some lynx versions).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: