Closed Bug 56626 Opened 25 years ago Closed 25 years ago

garbage on the screen with UTF-16 charset, view page source likewise

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: sebmol, Assigned: nhottanscp)

References

()

Details

(Keywords: compat, intl, Whiteboard: WONTFIX ? -- non standards compliant)

Attachments

(2 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20001010 BuildID: 2000091312 Mozilla just prints garbage in the browser window (lots of question marks, angle signs, and other mathematical symbols). Opening View|Page Source shows the same. Reproducible: Always Steps to Reproduce: Type http://www.surakware.com/ into the address bar and watch. Actual Results: Mozilla tries to access the index file and display it. Garbage is printed. Expected Results: Show the page. Interestingly enough, Netscape 4.75 displays it. (Not as perfect as IE 5.5 but hey :) )
The cause is the: <META http-equiv="Content-Type" content="text/html; charset=UTF-16"> in the html head
Does that mean its mozilla's fault or the meta tag's? sm
conor lenon - yes,what did oyu mean by that?
There seems a problem with implementation of UTF-16 in Mozilla. See bug 56630.
If you go to View->Character Coding and select Unicode (UTF-8) the URL displays correctly.
On my Win32, in UTF-16, it is displayed in asian symbols (chinese or japanese, i don't know for sure, but it looks correct), in Western ISO-8859-1, the page displays just fine, in good english. Fabian.
I do not think the Japanese or Chinese(actually a mix of anything since Mozilla is trying to render them as Unicode) being drawin is anything but correct in this case. If Fabian is saying they "look correct" in the sense that the rendering of characters are correct, that maybe true. However, those characters appearing in the window have nothing to do with the information specified in the source file. They are just a stream of garbage misproduced by Mozilla. The problem is two-fold. One is Mozilla somehow set the codings to UTF-16. The second is that under UTF-16 nothing is rendered correctly. Why is there UTF-16 if we can not support it?
Hirata, ok you're right I guess, not that I can tell for sure, not many people from Belgium know Japanese or Chinese, I think. :-P UTF-16 seems bugged indeed, changing the character coding of any page to utf-16 just displays random asian symbols. Thanks Hirata, Fabian.
updating component and owner
Assignee: asa → nhotta
Status: UNCONFIRMED → NEW
Component: Browser-General → Internationalization
Ever confirmed: true
QA Contact: doronr → teruko
<META http-equiv="Content-Type" content="text/html; charset=UTF-16"> In charset menu, "UTF-16BE" is selected. But the page doesn't seem to contain UTF-16 data, characters are 7 bit ASCII. Reassign to ftang, cc to cata, shanjian.
Assignee: nhotta → ftang
send mail to the webmaster. Invalid this bug.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → INVALID
Verified as Invalid.
Status: RESOLVED → VERIFIED
]I think this is the correct bug] Please reopen. MSXML 3.0 XML parser (which installs pretty easily into NT4 IIS4) is broken and only outputs with UTF-16 meta tag. This means that if one reads an ASP file on an IIS 4 server with MSXML 3.0 installed, which loads a W3C-valid XML file and a W3C-valid XSLT file with the correct indications in the XML and XSLT file for the encoding, BECAUSE MSXML 3.0 interpolates a UTF-16 encoding meta tag in the resulting HTML (which is generated dynamically), one gets UTF-16 interpreted text and must select the correct encoding from the View > Encoding menu. When it comes to everyday users, well, this ain't gonna happen. They'll just leave the page. And then when they see the same thing again, they'll give up in on Moz. And I wouldn't count on webmasters just rejecting MSXML 3.0, either (they're calling this a feature). Nor would I count on MS putting out a fix any time soon (this bug is I'm pretty sure new to the "release" version of MSXML 3.0). So: do you want to just reject all IIS4/ASP/XML/XSL pages because the bug gives them the wrong encoding, and thus have ignorant users reject the browser because they don't understand that it's a Microsoft bug, or do you want to try to build customer base? See http://msdn.microsoft.com/xml/general/xmlparser.asp and also read the user comments (including mention of the Netscape 6 problem). I can provide examples if needed.
So the problem is that MSXML parser always generates UTF-16 META charset tag without applying a charset conversion from original ASP file's charset to UTF-16, correct? I am not sure how we can ignore META in this paricular case.
I have an internal test case: http://kaze:8000/tests/utf16ascii.html The display is extremely bad for Mozilla but non-problematic for Communicator or IE4/5. The latter 2 look at the real data and see that they are not in UTF-16 lacking BOM and assumes Latin 1 (ASCII). The best solution of course is get web page designers to generate the charset tag correctly, but I think we should consider defaulting to Latin 1 in this case.
Status: VERIFIED → REOPENED
Resolution: INVALID → ---
Reopen, this is a server side problem but mozilla could do a better handling for this case. RFC 2781 - ftp://ftp.isi.edu/in-notes/rfc2781.txt 4.3 Interpreting text labelled as UTF-16 I cannot find in the document where it says UTF-16 without BOM is invalid. But the section 4.3 is written expecting that a BOM at the begining of the file.
Keywords: intl
*** Bug 63907 has been marked as a duplicate of this bug. ***
Added 'self to cc and "UTF-16 charset" to the Summary
Summary: mozilla returns garbage on the screen, view page source likewise → garbage on the screen with UTF-16 charset, view page source likewise
Is there any way a valid UTF-16 page could have a META tag claiming to be UTF-16 but not have the BOM? If yes, we really should WONTFIX (or INVALID) this bug. Section 4.3 of RFC2781 referenced above and quoted below seems to indicate that a document that does not start with a BOM but claims to be UTF-16 should be treated as big endian UTF-16 and not UTF-8. If this is simply a bug in MSXML3 then I strongly, strongly propose we WONTFIX this and encourage Microsoft to stop messing up the web with incorrect output. # 4.3 Interpreting text labelled as UTF-16 # # Text labelled with the "UTF-16" charset might be serialized in # either big-endian or little-endian order. If the first two octets # of the text is 0xFE followed by 0xFF, then the text can be # interpreted as being big-endian. If the first two octets of the # text is 0xFF followed by 0xFE, then the text can be interpreted # as being little- endian. If the first two octets of the text is # not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then # the text SHOULD be interpreted as being big-endian. # # All applications that process text with the "UTF-16" charset # label MUST be able to read at least the first two octets of the # text and be able to process those octets in order to determine # the serialization order of the text. Applications that process # text with the "UTF-16" charset label MUST NOT assume the # serialization without first checking the first two octets to see # if they are a big-endian BOM, a little-endian BOM, or not a BOM. # All applications that process text with the "UTF-16" charset # label MUST be able to interpret both big- endian and # little-endian text.
Keywords: compat
Whiteboard: WONTFIX ? -- non standards compliant
Until it's not clearly stated that this is invalid we should eagerly try to fix this, as the ASP, XML & XSL platform is widely used among web-developers.
It _is_ clearly stated. Please read the paragraphs quoted above.
Note that we could base this on the quirks mode, since MSXML3 is generating markup that triggers our quirks mode (namely, it has no DTD). i.e., in quirks mode, use the patch attached (ignore META charset in case of UTF-16 and no BOM), and in standard mode, do exactly what the page says (follow the specs).
This is an invalid bug. If MSXML 3.0 always generate UTF-16 as the meta tag, they can still really generate the DATA in UTF-16. The current problem is the data do not agree with the meta charset. Mark this as wontfix.
Status: REOPENED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → WONTFIX
Yeah, I agree.
Status: RESOLVED → VERIFIED
The problem is that the Microsoft development platform is widely used. Is is that difficult to make it work? If we don't we will leave out all these potential developers.
No, it shouldn't be. But people who are objecting to the proposed fix is arguing about what is correct. I happen to think that we need to be realistic sometimes. This one will make Mozilla look bad and often there is no easy way to tell people that they are inserting invalid bytes -- pratly because they don't even know how these invalid bytes got in there. I actually disagree with the disposition of this bug. Let's see if there are others who agree with me on this.
I change my mind. reopen it, nhotta- check in the patch. sr=ftang
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
nhotta- thanks.
Assignee: ftang → nhotta
Status: REOPENED → NEW
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9
r=ftang for the new patch
Keywords: review
checked in
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Verified as fixed in 3-2 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: