Closed
Bug 56626
Opened 25 years ago
Closed 25 years ago
garbage on the screen with UTF-16 charset, view page source likewise
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla0.9
People
(Reporter: sebmol, Assigned: nhottanscp)
References
()
Details
(Keywords: compat, intl, Whiteboard: WONTFIX ? -- non standards compliant)
Attachments
(2 files)
|
952 bytes,
patch
|
Details | Diff | Splinter Review | |
|
1.51 KB,
patch
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20001010
BuildID: 2000091312
Mozilla just prints garbage in the browser window (lots of question marks, angle
signs, and other mathematical symbols). Opening View|Page Source shows the same.
Reproducible: Always
Steps to Reproduce:
Type http://www.surakware.com/ into the address bar and watch.
Actual Results: Mozilla tries to access the index file and display it. Garbage
is printed.
Expected Results: Show the page. Interestingly enough, Netscape 4.75 displays
it. (Not as perfect as IE 5.5 but hey :) )
Comment 1•25 years ago
|
||
The cause is the:
<META http-equiv="Content-Type" content="text/html; charset=UTF-16">
in the html head
| Reporter | ||
Comment 2•25 years ago
|
||
Does that mean its mozilla's fault or the meta tag's?
sm
Comment 3•25 years ago
|
||
conor lenon - yes,what did oyu mean by that?
Comment 4•25 years ago
|
||
There seems a problem with implementation of UTF-16 in Mozilla.
See bug 56630.
Comment 5•25 years ago
|
||
If you go to View->Character Coding and select Unicode (UTF-8) the URL displays
correctly.
Comment 6•25 years ago
|
||
On my Win32, in UTF-16, it is displayed in asian symbols (chinese or japanese, i
don't know for sure, but it looks correct), in Western ISO-8859-1, the page
displays just fine, in good english.
Fabian.
Comment 7•25 years ago
|
||
I do not think the Japanese or Chinese(actually a mix of anything since
Mozilla is trying to render them as Unicode) being drawin is anything but
correct in this case. If Fabian is saying they "look correct" in the sense that
the rendering of characters are correct, that maybe true. However, those
characters appearing in the window have nothing to do with the information
specified in the source file. They are just a stream of garbage misproduced by
Mozilla. The problem is two-fold. One is Mozilla somehow set the codings to
UTF-16. The second is that under UTF-16 nothing is rendered correctly. Why is
there UTF-16 if we can not support it?
Comment 8•25 years ago
|
||
Hirata,
ok you're right I guess, not that I can tell for sure, not many people from
Belgium know Japanese or Chinese, I think. :-P
UTF-16 seems bugged indeed, changing the character coding of any page to utf-16
just displays random asian symbols.
Thanks Hirata,
Fabian.
Comment 9•25 years ago
|
||
updating component and owner
Assignee: asa → nhotta
Status: UNCONFIRMED → NEW
Component: Browser-General → Internationalization
Ever confirmed: true
QA Contact: doronr → teruko
| Assignee | ||
Comment 10•25 years ago
|
||
<META http-equiv="Content-Type" content="text/html; charset=UTF-16">
In charset menu, "UTF-16BE" is selected.
But the page doesn't seem to contain UTF-16 data, characters are 7 bit ASCII.
Reassign to ftang, cc to cata, shanjian.
Assignee: nhotta → ftang
Comment 11•25 years ago
|
||
send mail to the webmaster. Invalid this bug.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Comment 13•25 years ago
|
||
]I think this is the correct bug]
Please reopen. MSXML 3.0 XML parser (which installs pretty easily into NT4
IIS4) is broken and only outputs with UTF-16 meta tag. This means that if one
reads an ASP file on an IIS 4 server with MSXML 3.0 installed, which loads a
W3C-valid XML file and a W3C-valid XSLT file with the correct indications in the
XML and XSLT file for the encoding, BECAUSE MSXML 3.0 interpolates a UTF-16
encoding meta tag in the resulting HTML (which is generated dynamically), one
gets UTF-16 interpreted text and must select the correct encoding from the View
> Encoding menu.
When it comes to everyday users, well, this ain't gonna happen. They'll just
leave the page. And then when they see the same thing again, they'll give up in
on Moz. And I wouldn't count on webmasters just rejecting MSXML 3.0, either
(they're calling this a feature). Nor would I count on MS putting out a fix any
time soon (this bug is I'm pretty sure new to the "release" version of MSXML
3.0). So: do you want to just reject all IIS4/ASP/XML/XSL pages because the bug
gives them the wrong encoding, and thus have ignorant users reject the browser
because they don't understand that it's a Microsoft bug, or do you want to try
to build customer base?
See http://msdn.microsoft.com/xml/general/xmlparser.asp and also read the user
comments (including mention of the Netscape 6 problem). I can provide examples
if needed.
| Assignee | ||
Comment 14•25 years ago
|
||
So the problem is that MSXML parser always generates UTF-16 META charset tag
without applying a charset conversion from original ASP file's charset to
UTF-16, correct?
I am not sure how we can ignore META in this paricular case.
Comment 15•25 years ago
|
||
I have an internal test case:
http://kaze:8000/tests/utf16ascii.html
The display is extremely bad for Mozilla but non-problematic
for Communicator or IE4/5. The latter 2 look at the real
data and see that they are not in UTF-16 lacking BOM and
assumes Latin 1 (ASCII).
The best solution of course is get web page designers to
generate the charset tag correctly, but I think we should
consider defaulting to Latin 1 in this case.
| Assignee | ||
Comment 16•25 years ago
|
||
| Assignee | ||
Updated•25 years ago
|
Status: VERIFIED → REOPENED
Resolution: INVALID → ---
| Assignee | ||
Comment 17•25 years ago
|
||
Reopen, this is a server side problem but mozilla could do a better handling for
this case.
RFC 2781 - ftp://ftp.isi.edu/in-notes/rfc2781.txt
4.3 Interpreting text labelled as UTF-16
I cannot find in the document where it says UTF-16 without BOM is invalid. But
the section 4.3 is written expecting that a BOM at the begining of the file.
| Assignee | ||
Comment 18•25 years ago
|
||
*** Bug 63907 has been marked as a duplicate of this bug. ***
Comment 19•25 years ago
|
||
Added 'self to cc and "UTF-16 charset" to the Summary
Summary: mozilla returns garbage on the screen, view page source likewise → garbage on the screen with UTF-16 charset, view page source likewise
Comment 20•25 years ago
|
||
Is there any way a valid UTF-16 page could have a META tag claiming to be UTF-16
but not have the BOM? If yes, we really should WONTFIX (or INVALID) this bug.
Section 4.3 of RFC2781 referenced above and quoted below seems to indicate that
a document that does not start with a BOM but claims to be UTF-16 should be
treated as big endian UTF-16 and not UTF-8. If this is simply a bug in MSXML3
then I strongly, strongly propose we WONTFIX this and encourage Microsoft to
stop messing up the web with incorrect output.
# 4.3 Interpreting text labelled as UTF-16
#
# Text labelled with the "UTF-16" charset might be serialized in
# either big-endian or little-endian order. If the first two octets
# of the text is 0xFE followed by 0xFF, then the text can be
# interpreted as being big-endian. If the first two octets of the
# text is 0xFF followed by 0xFE, then the text can be interpreted
# as being little- endian. If the first two octets of the text is
# not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then
# the text SHOULD be interpreted as being big-endian.
#
# All applications that process text with the "UTF-16" charset
# label MUST be able to read at least the first two octets of the
# text and be able to process those octets in order to determine
# the serialization order of the text. Applications that process
# text with the "UTF-16" charset label MUST NOT assume the
# serialization without first checking the first two octets to see
# if they are a big-endian BOM, a little-endian BOM, or not a BOM.
# All applications that process text with the "UTF-16" charset
# label MUST be able to interpret both big- endian and
# little-endian text.
Keywords: compat
Whiteboard: WONTFIX ? -- non standards compliant
Comment 21•25 years ago
|
||
Until it's not clearly stated that this is invalid we should eagerly try to fix
this, as the ASP, XML & XSL platform is widely used among web-developers.
Comment 22•25 years ago
|
||
It _is_ clearly stated. Please read the paragraphs quoted above.
Comment 23•25 years ago
|
||
Note that we could base this on the quirks mode, since MSXML3 is generating
markup that triggers our quirks mode (namely, it has no DTD). i.e., in quirks
mode, use the patch attached (ignore META charset in case of UTF-16 and no
BOM), and in standard mode, do exactly what the page says (follow the specs).
Comment 24•25 years ago
|
||
This is an invalid bug. If MSXML 3.0 always generate UTF-16 as the meta tag,
they can still really generate the DATA in UTF-16. The current problem is the
data do not agree with the meta charset.
Mark this as wontfix.
Status: REOPENED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → WONTFIX
Comment 26•25 years ago
|
||
The problem is that the Microsoft development platform is widely used. Is is
that difficult to make it work?
If we don't we will leave out all these potential developers.
Comment 27•25 years ago
|
||
No, it shouldn't be. But people who are objecting to
the proposed fix is arguing about what is correct.
I happen to think that we need to be realistic
sometimes. This one will make Mozilla look bad and
often there is no easy way to tell people that
they are inserting invalid bytes -- pratly because
they don't even know how these invalid bytes got
in there.
I actually disagree with the disposition of this bug.
Let's see if there are others who agree with me
on this.
Comment 28•25 years ago
|
||
I change my mind. reopen it, nhotta- check in the patch. sr=ftang
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
| Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9
| Assignee | ||
Comment 30•25 years ago
|
||
| Assignee | ||
Comment 31•25 years ago
|
||
r=ftang for the new patch
| Assignee | ||
Comment 32•25 years ago
|
||
checked in
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•