Closed Bug 238694 Opened 21 years ago Closed 21 years ago

UTF-8 BOM are rendered if charset is malconfigured

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: wolfgang.knauf, Assigned: smontagu)

References

Details

Attachments

(3 files)

This page displays the UTF-8 header with a malconfigured apache 21 years ago Wolfgang Knauf 216 bytes, text/html		Details
Empty UTF-8 file, just contains the header 21 years ago Wolfgang Knauf 3 bytes, text/html		Details
This page declares a wrong charset, so UTF-8 header is displayed 21 years ago Wolfgang Knauf 220 bytes, text/html		Details

Wolfgang Knauf

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 With WindowsXP Notepad you have the possibility to save a file (e.g. html page) in UTF-8 charset. A 3 byte UTF-8 header is added to the file. If the file claims to use a different charset (for example by the meta tag <meta http-equiv="Content-Type" content="text/html; charset=ISO8859-1"> or because the webbrowser adds a different encoding to the response) the UTF 8 header is displayed in the page. Reproducible: Always Steps to Reproduce: 1.Create a html page with windows notepad and save it as UTF-8. 2.Make the page claim that it is e.g. ISO8859-1 3.View the page in Mozilla (see attached file "WrongCharsetDeclared.html"). Other approach: 1. Download and install a default apache webserver. 2. If the server uses the default configuration, the httpd.conf file should contain the following line: AddDefaultCharset ISO8859-1 (this creates a response header specifying the charset ISO8859-1 for the returned html file, no matter how the file actually is encoded). If not, add it. 3. Open the page with mozilla and see 3 interesting chars. Expected Results: I think mozilla should check for availability of this UTF-8 header bytes instead of trying to render them. I know that the apache is somehow malconfigured, but this was the default install, and I am not the only one who runs in this problem. I am not the only one who has this problem. Go to http://www.aopen.nl/products/vga/ (hardware manufacturer) and you will find the same problem (also this page claims to be Windows-1252). Suse 9.0 Konqueror has the same problem, IE does not. It's not a problem of Mozilla version (validated it with 1.5, too) or OS (Windows, Linux).

Wolfgang Knauf

Reporter

Comment 1

•

21 years ago

Attached file This page displays the UTF-8 header with a malconfigured apache — Details

Wolfgang Knauf

Reporter

Comment 2

•

21 years ago

Attached file Empty UTF-8 file, just contains the header — Details

Wolfgang Knauf

Reporter

Comment 3

•

21 years ago

Attached file This page declares a wrong charset, so UTF-8 header is displayed — Details

Christian :Biesinger (don't email me, ping me on IRC)

Comment 4

•

21 years ago

hmm, this is interesting. should mozilla use the BOM or the meta charset when both are present? Note that if the webserver sends a charset, mozilla will not look at any other source of charset information; this is intentional.

Assignee: general → smontagu

Component: Browser-General → Internationalization

QA Contact: general → amyy

Summary: UTF-8 Header bytes are rendered if charset is malconfigured → UTF-8 BOM are rendered if charset is malconfigured

Simon Montagu :smontagu

Assignee

Comment 5

•

21 years ago

IMO there is no bug here. If the meta charset is inaccurate, the document is displayed incorrectly. I don't see any reason why the BOM should override the meta charset.

Status: UNCONFIRMED → RESOLVED

Closed: 21 years ago

Resolution: --- → INVALID

Leif Halvard Silli

Comment 7

•

14 years ago

(In reply to comment #4) > hmm, this is interesting. should mozilla use the BOM or the meta charset when > both are present? > > Note that if the webserver sends a charset, mozilla will not look at any > other source of charset information; this is intentional. The intention is well intended and "politically correct". However it is wrong. * IE and Webnkit respects the BOM higher than Content-Type header. * The IE/WEbkit behaviour is in tune with XML 1.0. * The Firefox/Opera behavior triggers Quirks-Mode in HTML and trigger Yellow Screen of Death in XML - those errors are not seen in Webkit or IE. Summary: for the encoding, then the BOM should take have higher priority than the HTTP header. Test cases: http://malform.no/testing/html5/bom/ HTML5 bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info It is illogical to even allow the user override the UTF-8 encoding, because doing such a thing will *either* make the page render in Quirks-Mode *or* will make the page suffer Yellow Screen of Death.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 8

•

14 years ago

(In reply to comment #7) > * The IE/WEbkit behaviour is in tune with XML 1.0. That's not true. > XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info ... which clearly defers to RFC 3023, which says that the charset parameter is authoritative.

Leif Halvard Silli

Comment 9

•

14 years ago

(In reply to comment #8) > (In reply to comment #7) > > * The IE/WEbkit behaviour is in tune with XML 1.0. > > That's not true. Beg to differ - or at least question it. See below. > > XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info > > ... which clearly defers to RFC 3023, which says that the charset parameter > is authoritative. Quoting XML 1.0: ]] F.2 Priorities in the Presence of External Encoding Information [[ Thus, Appendix F.2 talks about presence of external encoding info. The preceding F.1 speaks about internal encoding info. F.2 a bit later says: ]] their relative priority and the preferred method of handling conflict should be specified [[ Thus, F.2 explains how derivated specifications (like XHTML specs) should behave. Note as well that it refers to RFC 3023 as "useful guidance", and nothing more. The most important part of F.2, is clearly the last two sentences, which I'll quote. And remember once more that F.2 speaks about "Presence of External Encoding Information". Hence, the last two sentences should also be applied to a situation where there is external encoding info: ]] In the interests of interoperability, however, the following rule is recommended. If an XML entity is in a file, the Byte-Order Mark and encoding declaration are used (if present) to determine the character encoding. [[ I don't know if it is contested that an XML entity served via HTTP "is in a file"? And even if it is contested, I would like to know, in very much detail, what Webkit and IE is breaking w.r.t. the XML spec.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 10

•

14 years ago

"in a file" means in a file on the local filesystem, not something retrieved via HTTP (note the distinction between files and network protocols that it makes earlier in F.2: "as in some file systems and some network protocols".)

Leif Halvard Silli

Comment 11

•

14 years ago

(In reply to comment #10) I think that it had wanted to remove all unclarity, then it should have said "in a file in a file system". I would think that a far more important contrast is "in a database record" versus "in a file, including a file served via HTTP".

You need to log in before you can comment on or make changes to this bug.

Bugzilla

UTF-8 BOM are rendered if charset is malconfigured

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: wolfgang.knauf, Assigned: smontagu)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type