Closed
Bug 236325
Opened 20 years ago
Closed 13 years ago
UTF-8 documents containing Byte Order Mark (BOM), misdelivered as ISO-8859-1, fail to display
Categories
(Tech Evangelism Graveyard :: Other, defect)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: yhlien2004, Unassigned)
References
()
Details
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7b) Gecko/20040302 Camino/0.7+ Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7b) Gecko/20040302 Camino/0.7+ When the given URL was loaded, the page content area just showed the strange symbols "" in the upper left corner. I checked the source of the URL and found the charset label of the page was UTF-8. However, Camino still selected the Western(Latin ISO 1) encoding and ignored the charset label in that page. The given URL showed up properly after selecting UTF-8 manually. Reproducible: Always Steps to Reproduce: 1. input the given URL 2. 3. Actual Results: strange symbols "" showed up. Expected Results: The page showed up Traditional Chinese properly.
Also happens using Mozilla. The document is sent with "Content-Type: text/html;charset=ISO-8859-1", explicitly specifying ISO-8859-1. Those characters are a Unicode Byte Order Mark (BOM). Should the http-equiv META data override the Content-Type? Should some special content sniffing detect a BOM in non-UTF-8 files and compensate? Reassigning to Browser/Parser.
Assignee: pinkerton → parser
Severity: minor → normal
Status: UNCONFIRMED → NEW
Component: Page Layout → HTML: Parser
Ever confirmed: true
Product: Camino → Browser
Summary: the page content did not showed up until the proper encoding was selected → UTF-8 documents containing Byte Order Mark (BOM), misdelivered as ISO-8859-1, fail to display
Version: unspecified → Trunk
Comment 2•20 years ago
|
||
(In reply to comment #1) > Should the http-equiv META data override the Content-Type? See <http://www.w3.org/TR/html4/charset.html#h-5.2.2> : the Content-Type should have preference. Reporter, a workaround is to specify the UTF-8 charset with the View->Character Coding menu, which overides everthing. Note: the auto-detector didn't work either.
Comment 3•20 years ago
|
||
invalid, http charset headers override everything else.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
Comment 4•20 years ago
|
||
actually... maybe not... shouldn't we show the frameset? jshin?
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Comment 5•20 years ago
|
||
<body> tags in HTML are optional. Once we hit text, we automatically open a <body>. Once a <body> is open, <frameset> is no longer allowed (and the parser drops it). Evang.
Assignee: parser → other
Status: REOPENED → NEW
Component: HTML: Parser → Other
Product: Browser → Tech Evangelism
QA Contact: other
Version: Trunk → unspecified
Comment 6•18 years ago
|
||
Though I may be completely off, shouldn't one also be able to put  ,  , � , etc. before the prolog of an XML document and have it not show up as the characters but rather be treated as byte-order marks?
Comment 7•18 years ago
|
||
Comment 6 is correct, but I don't see what application it has to this bug report. If you are asking whether the encoding determined by the byte-order mark should take precedence over the encoding specified in HTTP headers, the answer is no. However the BOM may be used to determine the encoding when none is specified or to identify the endianness of UTF-16, and we do this.
Comment 8•13 years ago
|
||
INCOMPLETE due to lack of activity since the end of 2009. If someone is willing to investigate the issues raised in this bug to determine whether they still exist, *and* work with the site in question to fix any existing issues, please feel free to re-open and assign to yourself. Sorry for the bugspam; filter on "NO MORE PRE-2010 TE BUGS" to remove.
Status: NEW → RESOLVED
Closed: 20 years ago → 13 years ago
Resolution: --- → INCOMPLETE
Comment 9•13 years ago
|
||
(In reply to comment #2) > (In reply to comment #1) > > Should the http-equiv META data override the Content-Type? > > See <http://www.w3.org/TR/html4/charset.html#h-5.2.2> : the Content-Type > should have preference. This conclusion is wrong. Because: * HTML4 did not discuss the UTF-8 BOM * HTML5 unfortunately stills says the same, however, bugs have been filed. * However, IE and Webnkit respects the BOM higher than Content-Type header. * The IE/WEbkit behaviour is in tune with XML 1.0. * The Firefox/Opera behavior triggers Quirks-Mode in HTML and trigger Yellow Screen of Death in XML - those errors are not seen in Webkit or IE. Summary: for the encoding, then the BOM should take have higher priority than the HTTP header. Test case: http://malform.no/testing/html5/bom/ HTML5 bug: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 XML spec: http://www.w3.org/TR/xml/#sec-guessing-with-ext-info > Reporter, a workaround is to specify the UTF-8 charset with the > View->Character > Coding menu, which overides everthing. Note: the auto-detector didn't work > either. This is yet another issue: Webkit and IE does not allow you to override the encoding whenever the encoding is UTF-8 *and* there is a UTF-8 Byte Order Mark.
Updated•9 years ago
|
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•