Closed
Bug 50654
Opened 25 years ago
Closed 25 years ago
charset in Content-Type ignored
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
People
(Reporter: claus, Assigned: ftang)
References
()
Details
(Whiteboard: [nsbeta3+]fix in hand)
Mozilla M17 ignores the charset declaration within HTTP content-type headers.
Instead, it displays the page with the default character set.
Steps to reproduce:
On the page listed above, click on "Archiv". The (C) character in the bottom line will be displayed uncorrectly unless you manually switch to UTF-8.
This is a clear violation of HTTP standards and will cause major damage to WWW
i18n if Mozilla is released with this bug.
Nothing to do with ActiveX wrapper control. Marking as INVALID.
Status: UNCONFIRMED → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
| Reporter | ||
Comment 2•25 years ago
|
||
Hm, I'm quite sure I DID select Browser-General in the first place...
Status: RESOLVED → UNCONFIRMED
Component: ActiveX Wrapper → Browser-General
Resolution: INVALID → ---
Reassigning to module owner
Assignee: locka → asa
QA Contact: cpratt → doronr
Comment 4•25 years ago
|
||
changing component and setting defualt owner.
Assignee: asa → gagan
Component: Browser-General → Networking
QA Contact: doronr → tever
hmm shouldn't this be "Internationalization" ?
Resembles bug 50893.
Comment 6•25 years ago
|
||
Right. This should be first looked at in i18n.
Confirmign the bug and re-assigning
to ftang. Changing other fileds also.
The other bug is about not dealing correctly with
document-based HTTP Meta equivalent charset info.
This one is about server-generated HTTP content-type
charset info, which is UTF-8.
The other bug is a regression from a few days ago,
but this seems to have been there for some time.
It seems, however, it may not be a wholesale failure of
HTTP charset handling. I have seen some pages displayed
correctly with HTTP charset info sent form a server.
It may be handling of specific characters -- in this
case the copyright symbol.
Assignee: gagan → ftang
QA Contact: tever → teruko
Updated•25 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•25 years ago
|
Component: Networking → Internationalization
Comment 7•25 years ago
|
||
With the 8/30/2000 Win32 build, the menu checkmark
seems to be wrong. Even when it is displaying UTF-8 page
it stays at the default charset, e.g. ISO-8859-1.
The server for the URL provided by the orignal poster
is sending UTF-8 as the charset info.
| Assignee | ||
Comment 8•25 years ago
|
||
change the url to http://www.fachschaft.jura.uni-muenchen.de/archiv/
| Assignee | ||
Comment 9•25 years ago
|
||
this is an interesting bug-
The http heder return
Content-Type: text/html; charset="utf-8"
(I use telnet www.fachschaft.jura.uni-muenchen.de 80 to connect and use
GET /archiv/ HTTP/1.0
to get the page)
Notice it said
Content-Type: text/html; charset="utf-8"
but not
Content-Type: text/html; charset=utf-8
I first thought this is wrong, but after I double check with the HTTP 1.1 spec
http://www.cis.ohio-state.edu/htbin/rfc/rfc2068.html , it said
from [Page 16]
quoted-string = ( <"> *(qdtext) <"> )
qdtext = <any TEXT except <">>
and in page 24:
3.7 Media Types
HTTP uses Internet Media Types in the Content-Type (section 14.18)
and Accept (section 14.1) header fields in order to provide open and
extensible data typing and type negotiation.
media-type = type "/" subtype *( ";" parameter )
type = token
subtype = token
Parameters may follow the type/subtype in the form of attribute/value
pairs.
Fielding, et. al. Standards Track [Page 25]
RFC 2068 HTTP/1.1 January 1997
parameter = attribute "=" value
attribute = token
value = token | quoted-string
so it mean it is ok to use charet="utf-8"
We need to change nsHTMLDocument.cpp to fix it.
Status: NEW → ASSIGNED
Keywords: nsbeta3
Comment 10•25 years ago
|
||
Actually we support the sever-sent HTTP charset names like
"UTF-8" in Communicator 4.75 and the above page works.
Comment 11•25 years ago
|
||
I have NS-internal test cases for UTF-8 and "UTF-8" if you like.
| Assignee | ||
Comment 12•25 years ago
|
||
It should be easy to fix, need to change nsHTMLDocument.cpp and
nsXMLDocument.cpp. I estimate total 2 hours of debugging, codeing, and engineer
testing (not QA testing) to fix it.
| Assignee | ||
Comment 13•25 years ago
|
||
I have fix in hand. It take me totaly 20 minutes to write the code.
Whiteboard: fix in hand
| Assignee | ||
Comment 14•25 years ago
|
||
here is the patch
Index: nsHTMLDocument.cpp
===================================================================
RCS file: /cvsroot/mozilla/layout/html/document/src/nsHTMLDocument.cpp,v
retrieving revision 3.272
diff -c -2 -r3.272 nsHTMLDocument.cpp
*** nsHTMLDocument.cpp 2000/09/02 07:21:57 3.272
--- nsHTMLDocument.cpp 2000/09/06 19:26:58
***************
*** 550,556 ****
{
start += 8; // 8 = "charset=".length
! PRInt32 end = contentType.FindCharInSet(";\n\r ", start );
! if(kNotFound == end )
! end = contentType.Length();
nsAutoString theCharset;
contentType.Mid(theCharset, start, end - start);
--- 550,564 ----
{
start += 8; // 8 = "charset=".length
! PRInt32 end = 0;
! if(PRUnichar('"') == contentType.CharAt(start)) {
! start++;
! end = contentType.FindCharInSet("\"", start );
! if(kNotFound == end )
! end = contentType.Length();
! } else {
! end = contentType.FindCharInSet(";\n\r ", start );
! if(kNotFound == end )
! end = contentType.Length();
! }
nsAutoString theCharset;
contentType.Mid(theCharset, start, end - start);
and
Index: nsXMLDocument.cpp
===================================================================
RCS file: /cvsroot/mozilla/layout/xml/document/src/nsXMLDocument.cpp,v
retrieving revision 1.84
diff -c -2 -r1.84 nsXMLDocument.cpp
*** nsXMLDocument.cpp 2000/09/02 15:33:40 1.84
--- nsXMLDocument.cpp 2000/09/06 19:25:03
***************
*** 327,334 ****
if(kNotFound != start)
{
! start += 8; // 8 = "charset=".length
! PRInt32 end = contentType.FindCharInSet(";\n\r ", start );
! if(kNotFound == end )
! end = contentType.Length();
nsAutoString theCharset;
contentType.Mid(theCharset, start, end - start);
--- 327,342 ----
if(kNotFound != start)
{
! start += 8; // 8 = "charset=".length
! PRInt32 end = 0;
! if(PRUnichar('"') == contentType.CharAt(start)) {
! start++;
! end = contentType.FindCharInSet("\"", start );
! if(kNotFound == end )
! end = contentType.Length();
! } else {
! end = contentType.FindCharInSet(";\n\r ", start );
! if(kNotFound == end )
! end = contentType.Length();
! }
nsAutoString theCharset;
contentType.Mid(theCharset, start, end - start);
| Reporter | ||
Comment 15•25 years ago
|
||
Hm, for complete HTTP/1.1 compliance, you would also have to handle headers such as:
Content-Type: text/html; charset="u\t\f-8"
as quoted-string allows quoted-pair, i.e. "\" CHAR
So besides the missing decoding of "\x", a FindInStr(..."\"") is actually not enough, as the <"> might actually be part of a <\"> sequence...
I doubt that any user agent gets this right, though.
| Assignee | ||
Comment 16•25 years ago
|
||
[nsbeta3+] P3 per i18n bug meeting.
patch check in and mark it fixed
Whiteboard: fix in hand → [nsbeta3+]fix in hand
| Assignee | ||
Comment 17•25 years ago
|
||
mark it fixed
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
Comment 18•25 years ago
|
||
This is still reproduciable in 2000-09-18-05 Win32, 9-18-08 Mac and Linux build.
The (C) character in the bottom line will be displayed uncorrectly unless you
manually switch to UTF-8.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 19•25 years ago
|
||
UTF-8 in Character coding should be marked. After I went to the above URL,
UTF-8 is not even added in Cashed character menu.
| Assignee | ||
Comment 20•25 years ago
|
||
We try it again. It is fixed. We (teruko can I ) cannot reproduce this by using
today's build.
Status: REOPENED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
Comment 21•25 years ago
|
||
I verified this in 2000-09-19-05 Win32, 2000-09-19-10 Mac, and 2000-09-19-08
Linux build.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•