Closed
Bug 214952
Opened 22 years ago
Closed 19 years ago
content-type disregarded causing text corruption on page
Categories
(Core :: Internationalization, enhancement)
Core
Internationalization
Tracking
()
RESOLVED
INVALID
People
(Reporter: u32858, Assigned: smontagu)
References
Details
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312
Even though I have <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8"> in my html headers, Mozilla disregards this, and uses the apache
default. Which is a terrible mistake in my opionion.
Reproducible: Always
Steps to Reproduce:
1.Go to any website where the webmaster specifies the charset in the html.
2. My home page http://jguk.org/ had all html sent as iso-8859-1, I will try and
get sysadmin to change to UTF-8 (the same as my HTML) however, it should not be
necessary for every other page to be
3.
Actual Results:
Japanese/Russian and other text is corrupted because incorrect charset useage by
mozilla
Expected Results:
Correct text display.
I checked, it is still present in moz 1.4 too.
Perhaps this has already been submitted, I hope there would be at least a pref
override. Perhaps mozilla is just following the HTTP spec
Comment 1•22 years ago
|
||
The URL appears to be fixed now. It's serving as UTF-8, the same as the META
tag. Resolving WFM.
This would not be a bug in any case. The charset specified by the server takes
precedence over the META tag.
Check out http://www.webstandards.org/learn/askw3c/dec2002.html for a good
tutorial reference.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
The server was changed, many servers exhibit this problem. There should be a
way for Mozilla to disregard the HTTP header charset, in favour of the charset
specificed in HTML head.
That URL may be broken again when the sysadmin changes httpd config. There are
other example sites in the meantime.
I have changed to ENHANCEMENT to indicate I believe it should be added at some
point.
JG
Severity: normal → enhancement
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 3•22 years ago
|
||
Re-resolving WFM.
You can already do that. Go to View - Character Coding and you can force the
charset to whatever you want.
Status: REOPENED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → WORKSFORME
> ------- Additional Comments From bugzilla@accessibleinter.net 2003-08-03
16:38 -------
> Re-resolving WFM.
>
> You can already do that. Go to View - Character Coding and you can force the
> charset to whatever you want.
You have missunderstood the issue here. Many HTTP servers are sending incorrect
charset in header. In this case, it is necessary for web browsers to have an
option to rely on the HTML charset. (Current this is disregarded, if HTTP
charset is present)
JG
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
The HTML spec is very clear that an explicit charset sent over HTTP overrides
any specified using the META element:
http://www.w3.org/TR/html4/charset.html#h-5.2.2
Resolving as invalid.
Status: REOPENED → RESOLVED
Closed: 22 years ago → 22 years ago
Resolution: --- → INVALID
Comment 7•20 years ago
|
||
*** Bug 255738 has been marked as a duplicate of this bug. ***
Comment 8•20 years ago
|
||
(In reply to comment #5)
> The HTML spec is very clear that an explicit charset sent over HTTP overrides
> any specified using the META element:
I disagree. The spec says (not in the 'to sum up' list which is less detailed):
"The META declaration must only be used when the character encoding is organized
such that ASCII-valued bytes stand for ASCII characters (at least until the META
element is parsed). META declarations should appear as early as possible in the
HEAD element."
This means (as I interpret it anyway) that META should be used when the browser
"sees" it as such (e.g. not in UCS-16). Specifically, META may be used when no
Content-Type is specified by the server (so IS-8859-1 is assumed) or when
ISO-8859-1 is specified by the server. Please consider re-opening this bug.
Comment 9•20 years ago
|
||
(In reply to comment #8)
> (In reply to comment #5)
> > The HTML spec is very clear that an explicit charset sent over HTTP overrides
> > any specified using the META element:
>
> I disagree. The spec says (not in the 'to sum up' list which is less detailed):
What are you disagreeing to? What you wrote is exactly what Mozilla does, what
Simon meant and what spec says.
Comment 10•20 years ago
|
||
Jungshik (is this your first name? I hope so...), this bug was marked as invalid
because it was claimed that the HTML spec says that if the server specifies a
charset, it always overrides the one specified in the META element. However,
reading the spec I believe that is not true, i.e. the server-specified charset
does not always override the one specified in the META element and thus the bug
is not invalid (since Mozilla _does_ always override the META element when the
server specifies a charset).
A common scenario is that Apache servers installed by Linux distributions are
configured with the
AddDefaultCharset on
option, due to which they send
Content-Type: text/html; charset=ISO-8859-1
even for webpages which are definitely not ISO-8859-1 (e.g. ISO-8859-8-I), and
are marked with the appropriate META tag. Now you could say: "well, let people
fix their Apache configurations", but not everyone has administrator access on
the box on which the web server is running.
Comment 11•20 years ago
|
||
(In reply to comment #10)
> because it was claimed that the HTML spec says that if the server specifies a
> charset, it always overrides the one specified in the META element. However,
It does period. There's absolutely no doubt about it. You don't need to be a
server admin to specify charset emitted by your Apache server . See
http://www.w3.org/International/questions/qa-htaccess-charset
> Specifically, META may be used when no Content-Type is specified by the server
(so IS-8859-1 is assumed)
In absence of charset in HTTP header, the value specified in META should be
respected.
> or when ISO-8859-1 is specified by the server.
No, this is not true. When ISO-8859-1 is explicitly specified in HTTP header,
it doesn't matter what's specified in META.
Comment 12•20 years ago
|
||
A server might not always read .htaccess ; or you may not have direct access to
the folder and are only sending your HTML files via some web interface. And the
mere (partial) availability of a workaround for a problem does not make it
non-existent.
"In absence of charset in HTTP header, the value specified in META should be
respected." - this is true.
"In presence of charset in HTTP header, the value specified in META should not
be respected." - this is not true. I again refer you to the spec. It says the
META tag is intended "To address server or configuration limitations" - like
when you're limited in configuring what the server sends as the default content
type. And when the spec says "The META declaration must only be used when the
character encoding is organized such that ASCII-valued bytes stand for ASCII
characters" we note that it is not referring only to what the server sent as the
content-type, but what the sum of the client and server configurations achieve.
That is to say, if Mozilla has been led to believe for any reason that the
charset is, say, windows-1255 (in which the 128 ASCII bytes stand for ASCII
characters IIRC), and it starts parsing the HTML and stumbles upon a meta tag
which says charset=something else, it should switch to something else.
Comment 13•20 years ago
|
||
(In reply to comment #12)
> type. And when the spec says "The META declaration must only be used when the
> character encoding is organized such that ASCII-valued bytes stand for ASCII
> characters"
Nobody except for you would interpret the above as you do. The above just means
that charset should be read off from META only if an html documents is in one of
ASCII-compatible encodings. (there are encodings that are not, e.g. UTF-16,
UTF-32). It does say NOTHING about the __relative__ precedence of HTTP header
and META
> That is to say, if Mozilla has been led to believe for any reason that the
> charset is, say, windows-1255 (in which the 128 ASCII bytes stand for ASCII
> characters IIRC), and it starts parsing the HTML and stumbles upon a meta tag
> which says charset=something else, it should switch to something else.
What on earth led you to believe that 'it should switch to something else'?
Your interpretation has just one single value, which is that it's unique.
Unfortunately, being unique doesn't imply that it's correct. This is the last
comment I'm gonna make about this issue. I can't keep spamming others on Cc. No
matter what you wrote and will write here wouldn't change anything. If you're
not yet convinced, why don't you ask spec. authors or members of W3C I18N WG or
even members of W3C TAG (which has expressed its strong opinion that charset
specified in HTTP header MUST be given a higher priority than what's specified
in META. Not everyone likes that (some people have tried to persuade W3C TAG to
change its position), but their position has been firm, which is why the spec
remains that way.
Comment 14•19 years ago
|
||
*** Bug 319959 has been marked as a duplicate of this bug. ***
Comment 15•19 years ago
|
||
(In reply to comment #13)
> Nobody except for you would interpret the above as you do.
Well, the fact this bug gets duped means more people interpret it the way I do.
Anyway, I was searching the W3C tag archives today, and I stumbled upon the following, from "Authoritative Metadata - Draft TAG Finding 05 December 2005"
"Specifications MUST NOT work against the Web architecture by requiring or suggesting that a recipient override authoritatve metadata without user consent."
"Servers which generate representations MUST NOT generate the charset parameter unless there is certainty that the headers are correct. When correct, this information can be used by non-XML processors to determine authoritatively the character encoding of the XML MIME entity."
"Now, when mozilla has good reason to believe that the meta charset is correct rather than the MIME header charset, it again seems to me that what should be done is respect the meta rather than the MIME."
Now, although the document says:
"As described above, inconsistency between representation data and metadata is an error. However, the tendency for some agents to attempt silent recovery from such errors is also an error."
It goes on to say:
"Web agents SHOULD have a configuration option that enables the display or logging of detected errors."
and "Users benefit from clients that allow different configurations for handling hints, including:
* Query the server, and when there is an inconsistency, choose the authoritative metadata, or
* Query the server, and when there is an inconsistency, prompt the user for instructions on how to proceed."
Now the second option would be the minimum I could live with. I would say: override the MIME charset but have a bar indicating the error (like the one you get when you try to install an extension from a non-whitelisted website in firefox).
Anyway, if someone wants to close this, it's a WONTFIX, not an INVALID.
Finally, I think the document in general is completely ridiculous in claiming that intentional misintepretation obvious charset specification errors is one of the "Web architecture principles that promote shared understanding and security".
Status: VERIFIED → REOPENED
Resolution: INVALID → ---
Comment 16•19 years ago
|
||
> the fact this bug gets duped means more people interpret it the way I do.
Sorry to disappoint you, but that was just me pasting the wrong bug number into the duplicate field. I intended to dup against bug 27403, not this one.
Regarding this bug:
What has a draft on "Authoritative Metadata" to do with this problem? What is relevant here is the HTML specification (see comment 5), which clearly states:
"To sum up, conforming user agents must observe the following priorities
when determining a document's character encoding (from highest priority to
lowest):
1. An HTTP "charset" parameter in a "Content-Type" field.
2. A META declaration with "http-equiv" set to "Content-Type" and a value
set for "charset".
3. The charset attribute set on an element that designates an external
resource."
Now, which part of that do you not understand?
> Anyway, if someone wants to close this, it's a WONTFIX, not an INVALID.
That is a decision the Mozilla developers make, not you. You may not like this, but Bugzilla is not a democracy. This is INVALID, as in "not a bug".
Status: REOPENED → RESOLVED
Closed: 22 years ago → 19 years ago
Resolution: --- → INVALID
Comment 17•19 years ago
|
||
(In reply to comment #16)
>What has a draft on "Authoritative Metadata" to do with this problem?
Jungshik Shin referred to the W3C TAG's expressed opinion.
> Now, which part of that do you not understand?
The part where I'm getting mis-decoded gibberish on my screen.
Comment 18•19 years ago
|
||
So as not to be too acrimonious, here's a constructive suggestion: bug 320024.
Comment 19•17 years ago
|
||
Hmm bug 238488 is a dupe of this
You need to log in
before you can comment on or make changes to this bug.
Description
•