Closed Bug 214952 Opened 22 years ago Closed 19 years ago

content-type disregarded causing text corruption on page

Categories

(Core :: Internationalization, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: u32858, Assigned: smontagu)

References

Details

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312 Even though I have <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in my html headers, Mozilla disregards this, and uses the apache default. Which is a terrible mistake in my opionion. Reproducible: Always Steps to Reproduce: 1.Go to any website where the webmaster specifies the charset in the html. 2. My home page http://jguk.org/ had all html sent as iso-8859-1, I will try and get sysadmin to change to UTF-8 (the same as my HTML) however, it should not be necessary for every other page to be 3. Actual Results: Japanese/Russian and other text is corrupted because incorrect charset useage by mozilla Expected Results: Correct text display. I checked, it is still present in moz 1.4 too. Perhaps this has already been submitted, I hope there would be at least a pref override. Perhaps mozilla is just following the HTTP spec
The URL appears to be fixed now. It's serving as UTF-8, the same as the META tag. Resolving WFM. This would not be a bug in any case. The charset specified by the server takes precedence over the META tag. Check out http://www.webstandards.org/learn/askw3c/dec2002.html for a good tutorial reference.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
The server was changed, many servers exhibit this problem. There should be a way for Mozilla to disregard the HTTP header charset, in favour of the charset specificed in HTML head. That URL may be broken again when the sysadmin changes httpd config. There are other example sites in the meantime. I have changed to ENHANCEMENT to indicate I believe it should be added at some point. JG
Severity: normal → enhancement
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Re-resolving WFM. You can already do that. Go to View - Character Coding and you can force the charset to whatever you want.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → WORKSFORME
> ------- Additional Comments From bugzilla@accessibleinter.net 2003-08-03 16:38 ------- > Re-resolving WFM. > > You can already do that. Go to View - Character Coding and you can force the > charset to whatever you want. You have missunderstood the issue here. Many HTTP servers are sending incorrect charset in header. In this case, it is necessary for web browsers to have an option to rely on the HTML charset. (Current this is disregarded, if HTTP charset is present) JG
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
The HTML spec is very clear that an explicit charset sent over HTTP overrides any specified using the META element: http://www.w3.org/TR/html4/charset.html#h-5.2.2 Resolving as invalid.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → INVALID
Verified as invalid.
Status: RESOLVED → VERIFIED
*** Bug 255738 has been marked as a duplicate of this bug. ***
(In reply to comment #5) > The HTML spec is very clear that an explicit charset sent over HTTP overrides > any specified using the META element: I disagree. The spec says (not in the 'to sum up' list which is less detailed): "The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element." This means (as I interpret it anyway) that META should be used when the browser "sees" it as such (e.g. not in UCS-16). Specifically, META may be used when no Content-Type is specified by the server (so IS-8859-1 is assumed) or when ISO-8859-1 is specified by the server. Please consider re-opening this bug.
(In reply to comment #8) > (In reply to comment #5) > > The HTML spec is very clear that an explicit charset sent over HTTP overrides > > any specified using the META element: > > I disagree. The spec says (not in the 'to sum up' list which is less detailed): What are you disagreeing to? What you wrote is exactly what Mozilla does, what Simon meant and what spec says.
Jungshik (is this your first name? I hope so...), this bug was marked as invalid because it was claimed that the HTML spec says that if the server specifies a charset, it always overrides the one specified in the META element. However, reading the spec I believe that is not true, i.e. the server-specified charset does not always override the one specified in the META element and thus the bug is not invalid (since Mozilla _does_ always override the META element when the server specifies a charset). A common scenario is that Apache servers installed by Linux distributions are configured with the AddDefaultCharset on option, due to which they send Content-Type: text/html; charset=ISO-8859-1 even for webpages which are definitely not ISO-8859-1 (e.g. ISO-8859-8-I), and are marked with the appropriate META tag. Now you could say: "well, let people fix their Apache configurations", but not everyone has administrator access on the box on which the web server is running.
(In reply to comment #10) > because it was claimed that the HTML spec says that if the server specifies a > charset, it always overrides the one specified in the META element. However, It does period. There's absolutely no doubt about it. You don't need to be a server admin to specify charset emitted by your Apache server . See http://www.w3.org/International/questions/qa-htaccess-charset > Specifically, META may be used when no Content-Type is specified by the server (so IS-8859-1 is assumed) In absence of charset in HTTP header, the value specified in META should be respected. > or when ISO-8859-1 is specified by the server. No, this is not true. When ISO-8859-1 is explicitly specified in HTTP header, it doesn't matter what's specified in META.
A server might not always read .htaccess ; or you may not have direct access to the folder and are only sending your HTML files via some web interface. And the mere (partial) availability of a workaround for a problem does not make it non-existent. "In absence of charset in HTTP header, the value specified in META should be respected." - this is true. "In presence of charset in HTTP header, the value specified in META should not be respected." - this is not true. I again refer you to the spec. It says the META tag is intended "To address server or configuration limitations" - like when you're limited in configuring what the server sends as the default content type. And when the spec says "The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters" we note that it is not referring only to what the server sent as the content-type, but what the sum of the client and server configurations achieve. That is to say, if Mozilla has been led to believe for any reason that the charset is, say, windows-1255 (in which the 128 ASCII bytes stand for ASCII characters IIRC), and it starts parsing the HTML and stumbles upon a meta tag which says charset=something else, it should switch to something else.
(In reply to comment #12) > type. And when the spec says "The META declaration must only be used when the > character encoding is organized such that ASCII-valued bytes stand for ASCII > characters" Nobody except for you would interpret the above as you do. The above just means that charset should be read off from META only if an html documents is in one of ASCII-compatible encodings. (there are encodings that are not, e.g. UTF-16, UTF-32). It does say NOTHING about the __relative__ precedence of HTTP header and META > That is to say, if Mozilla has been led to believe for any reason that the > charset is, say, windows-1255 (in which the 128 ASCII bytes stand for ASCII > characters IIRC), and it starts parsing the HTML and stumbles upon a meta tag > which says charset=something else, it should switch to something else. What on earth led you to believe that 'it should switch to something else'? Your interpretation has just one single value, which is that it's unique. Unfortunately, being unique doesn't imply that it's correct. This is the last comment I'm gonna make about this issue. I can't keep spamming others on Cc. No matter what you wrote and will write here wouldn't change anything. If you're not yet convinced, why don't you ask spec. authors or members of W3C I18N WG or even members of W3C TAG (which has expressed its strong opinion that charset specified in HTTP header MUST be given a higher priority than what's specified in META. Not everyone likes that (some people have tried to persuade W3C TAG to change its position), but their position has been firm, which is why the spec remains that way.
*** Bug 319959 has been marked as a duplicate of this bug. ***
(In reply to comment #13) > Nobody except for you would interpret the above as you do. Well, the fact this bug gets duped means more people interpret it the way I do. Anyway, I was searching the W3C tag archives today, and I stumbled upon the following, from "Authoritative Metadata - Draft TAG Finding 05 December 2005" "Specifications MUST NOT work against the Web architecture by requiring or suggesting that a recipient override authoritatve metadata without user consent." "Servers which generate representations MUST NOT generate the charset parameter unless there is certainty that the headers are correct. When correct, this information can be used by non-XML processors to determine authoritatively the character encoding of the XML MIME entity." "Now, when mozilla has good reason to believe that the meta charset is correct rather than the MIME header charset, it again seems to me that what should be done is respect the meta rather than the MIME." Now, although the document says: "As described above, inconsistency between representation data and metadata is an error. However, the tendency for some agents to attempt silent recovery from such errors is also an error." It goes on to say: "Web agents SHOULD have a configuration option that enables the display or logging of detected errors." and "Users benefit from clients that allow different configurations for handling hints, including: * Query the server, and when there is an inconsistency, choose the authoritative metadata, or * Query the server, and when there is an inconsistency, prompt the user for instructions on how to proceed." Now the second option would be the minimum I could live with. I would say: override the MIME charset but have a bar indicating the error (like the one you get when you try to install an extension from a non-whitelisted website in firefox). Anyway, if someone wants to close this, it's a WONTFIX, not an INVALID. Finally, I think the document in general is completely ridiculous in claiming that intentional misintepretation obvious charset specification errors is one of the "Web architecture principles that promote shared understanding and security".
Status: VERIFIED → REOPENED
Resolution: INVALID → ---
> the fact this bug gets duped means more people interpret it the way I do. Sorry to disappoint you, but that was just me pasting the wrong bug number into the duplicate field. I intended to dup against bug 27403, not this one. Regarding this bug: What has a draft on "Authoritative Metadata" to do with this problem? What is relevant here is the HTML specification (see comment 5), which clearly states: "To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest): 1. An HTTP "charset" parameter in a "Content-Type" field. 2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset". 3. The charset attribute set on an element that designates an external resource." Now, which part of that do you not understand? > Anyway, if someone wants to close this, it's a WONTFIX, not an INVALID. That is a decision the Mozilla developers make, not you. You may not like this, but Bugzilla is not a democracy. This is INVALID, as in "not a bug".
Status: REOPENED → RESOLVED
Closed: 22 years ago19 years ago
Resolution: --- → INVALID
(In reply to comment #16) >What has a draft on "Authoritative Metadata" to do with this problem? Jungshik Shin referred to the W3C TAG's expressed opinion. > Now, which part of that do you not understand? The part where I'm getting mis-decoded gibberish on my screen.
So as not to be too acrimonious, here's a constructive suggestion: bug 320024.
Hmm bug 238488 is a dupe of this
You need to log in before you can comment on or make changes to this bug.