216 bytes, text/html
3 bytes, text/html
220 bytes, text/html
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 With WindowsXP Notepad you have the possibility to save a file (e.g. html page) in UTF-8 charset. A 3 byte UTF-8 header is added to the file. If the file claims to use a different charset (for example by the meta tag <meta http-equiv="Content-Type" content="text/html; charset=ISO8859-1"> or because the webbrowser adds a different encoding to the response) the UTF 8 header is displayed in the page. Reproducible: Always Steps to Reproduce: 1.Create a html page with windows notepad and save it as UTF-8. 2.Make the page claim that it is e.g. ISO8859-1 3.View the page in Mozilla (see attached file "WrongCharsetDeclared.html"). Other approach: 1. Download and install a default apache webserver. 2. If the server uses the default configuration, the httpd.conf file should contain the following line: AddDefaultCharset ISO8859-1 (this creates a response header specifying the charset ISO8859-1 for the returned html file, no matter how the file actually is encoded). If not, add it. 3. Open the page with mozilla and see 3 interesting chars. Expected Results: I think mozilla should check for availability of this UTF-8 header bytes instead of trying to render them. I know that the apache is somehow malconfigured, but this was the default install, and I am not the only one who runs in this problem. I am not the only one who has this problem. Go to http://www.aopen.nl/products/vga/ (hardware manufacturer) and you will find the same problem (also this page claims to be Windows-1252). Suse 9.0 Konqueror has the same problem, IE does not. It's not a problem of Mozilla version (validated it with 1.5, too) or OS (Windows, Linux).
Created attachment 144767 [details] This page declares a wrong charset, so UTF-8 header is displayed
hmm, this is interesting. should mozilla use the BOM or the meta charset when both are present? Note that if the webserver sends a charset, mozilla will not look at any other source of charset information; this is intentional.
Assignee: general → smontagu
Component: Browser-General → Internationalization
QA Contact: general → amyy
Summary: UTF-8 Header bytes are rendered if charset is malconfigured → UTF-8 BOM are rendered if charset is malconfigured
IMO there is no bug here. If the meta charset is inaccurate, the document is displayed incorrectly. I don't see any reason why the BOM should override the meta charset.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → INVALID
(In reply to comment #4) > hmm, this is interesting. should mozilla use the BOM or the meta charset when > both are present? > > Note that if the webserver sends a charset, mozilla will not look at any > other source of charset information; this is intentional. The intention is well intended and "politically correct". However it is wrong. * IE and Webnkit respects the BOM higher than Content-Type header. * The IE/WEbkit behaviour is in tune with XML 1.0. * The Firefox/Opera behavior triggers Quirks-Mode in HTML and trigger Yellow Screen of Death in XML - those errors are not seen in Webkit or IE. Summary: for the encoding, then the BOM should take have higher priority than the HTTP header. Test cases: http://malform.no/testing/html5/bom/ HTML5 bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info It is illogical to even allow the user override the UTF-8 encoding, because doing such a thing will *either* make the page render in Quirks-Mode *or* will make the page suffer Yellow Screen of Death.
(In reply to comment #7) > * The IE/WEbkit behaviour is in tune with XML 1.0. That's not true. > XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info ... which clearly defers to RFC 3023, which says that the charset parameter is authoritative.
(In reply to comment #8) > (In reply to comment #7) > > * The IE/WEbkit behaviour is in tune with XML 1.0. > > That's not true. Beg to differ - or at least question it. See below. > > XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info > > ... which clearly defers to RFC 3023, which says that the charset parameter > is authoritative. Quoting XML 1.0: ]] F.2 Priorities in the Presence of External Encoding Information [[ Thus, Appendix F.2 talks about presence of external encoding info. The preceding F.1 speaks about internal encoding info. F.2 a bit later says: ]] their relative priority and the preferred method of handling conflict should be specified [[ Thus, F.2 explains how derivated specifications (like XHTML specs) should behave. Note as well that it refers to RFC 3023 as "useful guidance", and nothing more. The most important part of F.2, is clearly the last two sentences, which I'll quote. And remember once more that F.2 speaks about "Presence of External Encoding Information". Hence, the last two sentences should also be applied to a situation where there is external encoding info: ]] In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding. [[ I don't know if it is contested that an XML entity served via HTTP "is in a file"? And even if it is contested, I would like to know, in very much detail, what Webkit and IE is breaking w.r.t. the XML spec.
"in a file" means in a file on the local filesystem, not something retrieved via HTTP (note the distinction between files and network protocols that it makes earlier in F.2: "as in some file systems and some network protocols".)
(In reply to comment #10) I think that it had wanted to remove all unclarity, then it should have said "in a file in a file system". I would think that a far more important contrast is "in a database record" versus "in a file, including a file served via HTTP".
You need to log in before you can comment on or make changes to this bug.