<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 10

•

25 years ago

<META http-equiv="Content-Type" content="text/html; charset=UTF-16"> In charset menu, "UTF-16BE" is selected. But the page doesn't seem to contain UTF-16 data, characters are 7 bit ASCII. Reassign to ftang, cc to cata, shanjian.

Assignee: nhotta → ftang

Comment 11

•

25 years ago

send mail to the webmaster. Invalid this bug.

Status: NEW → RESOLVED

Closed: 25 years ago

Resolution: --- → INVALID

Teruko Kobayashi

Comment 12

•

25 years ago

Verified as Invalid.

Status: RESOLVED → VERIFIED

PTRourke

Comment 13

•

25 years ago

]I think this is the correct bug] Please reopen. MSXML 3.0 XML parser (which installs pretty easily into NT4 IIS4) is broken and only outputs with UTF-16 meta tag. This means that if one reads an ASP file on an IIS 4 server with MSXML 3.0 installed, which loads a W3C-valid XML file and a W3C-valid XSLT file with the correct indications in the XML and XSLT file for the encoding, BECAUSE MSXML 3.0 interpolates a UTF-16 encoding meta tag in the resulting HTML (which is generated dynamically), one gets UTF-16 interpreted text and must select the correct encoding from the View > Encoding menu. When it comes to everyday users, well, this ain't gonna happen. They'll just leave the page. And then when they see the same thing again, they'll give up in on Moz. And I wouldn't count on webmasters just rejecting MSXML 3.0, either (they're calling this a feature). Nor would I count on MS putting out a fix any time soon (this bug is I'm pretty sure new to the "release" version of MSXML 3.0). So: do you want to just reject all IIS4/ASP/XML/XSL pages because the bug gives them the wrong encoding, and thus have ignorant users reject the browser because they don't understand that it's a Microsoft bug, or do you want to try to build customer base? See http://msdn.microsoft.com/xml/general/xmlparser.asp and also read the user comments (including mention of the Netscape 6 problem). I can provide examples if needed.

Assignee

Comment 14

•

25 years ago

So the problem is that MSXML parser always generates UTF-16 META charset tag without applying a charset conversion from original ASP file's charset to UTF-16, correct? I am not sure how we can ignore META in this paricular case.

Katsuhiko Momoi

Comment 15

•

25 years ago

I have an internal test case: http://kaze:8000/tests/utf16ascii.html The display is extremely bad for Mozilla but non-problematic for Communicator or IE4/5. The latter 2 look at the real data and see that they are not in UTF-16 lacking BOM and assumes Latin 1 (ASCII). The best solution of course is get web page designers to generate the charset tag correctly, but I think we should consider defaulting to Latin 1 in this case.

Assignee

Comment 16

•

25 years ago

Attached patch patch to ignore META charset in case of UTF-16 and no BOM — Details — Splinter Review

Assignee

Updated

•

25 years ago

Status: VERIFIED → REOPENED

Resolution: INVALID → ---

Assignee

Comment 17

•

25 years ago

Reopen, this is a server side problem but mozilla could do a better handling for this case. RFC 2781 - ftp://ftp.isi.edu/in-notes/rfc2781.txt 4.3 Interpreting text labelled as UTF-16 I cannot find in the document where it says UTF-16 without BOM is invalid. But the section 4.3 is written expecting that a BOM at the begining of the file.

Teruko Kobayashi

Updated

•

25 years ago

Keywords: intl

Assignee

Comment 18

•

25 years ago

*** Bug 63907 has been marked as a duplicate of this bug. ***

Diego

Comment 19

•

25 years ago

Added 'self to cc and "UTF-16 charset" to the Summary

Summary: mozilla returns garbage on the screen, view page source likewise → garbage on the screen with UTF-16 charset, view page source likewise

Comment 20

•

25 years ago

Is there any way a valid UTF-16 page could have a META tag claiming to be UTF-16 but not have the BOM? If yes, we really should WONTFIX (or INVALID) this bug. Section 4.3 of RFC2781 referenced above and quoted below seems to indicate that a document that does not start with a BOM but claims to be UTF-16 should be treated as big endian UTF-16 and not UTF-8. If this is simply a bug in MSXML3 then I strongly, strongly propose we WONTFIX this and encourage Microsoft to stop messing up the web with incorrect output. # 4.3 Interpreting text labelled as UTF-16 # # Text labelled with the "UTF-16" charset might be serialized in # either big-endian or little-endian order. If the first two octets # of the text is 0xFE followed by 0xFF, then the text can be # interpreted as being big-endian. If the first two octets of the # text is 0xFF followed by 0xFE, then the text can be interpreted # as being little- endian. If the first two octets of the text is # not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then # the text SHOULD be interpreted as being big-endian. # # All applications that process text with the "UTF-16" charset # label MUST be able to read at least the first two octets of the # text and be able to process those octets in order to determine # the serialization order of the text. Applications that process # text with the "UTF-16" charset label MUST NOT assume the # serialization without first checking the first two octets to see # if they are a big-endian BOM, a little-endian BOM, or not a BOM. # All applications that process text with the "UTF-16" charset # label MUST be able to interpret both big- endian and # little-endian text.

Keywords: compat

Whiteboard: WONTFIX ? -- non standards compliant

Markus Hübner

Comment 21

•

25 years ago

Until it's not clearly stated that this is invalid we should eagerly try to fix this, as the ASP, XML & XSL platform is widely used among web-developers.

Comment 22

•

25 years ago

It _is_ clearly stated. Please read the paragraphs quoted above.

Comment 23

•

25 years ago

Note that we could base this on the quirks mode, since MSXML3 is generating markup that triggers our quirks mode (namely, it has no DTD). i.e., in quirks mode, use the patch attached (ignore META charset in case of UTF-16 and no BOM), and in standard mode, do exactly what the page says (follow the specs).

Comment 24

•

25 years ago

This is an invalid bug. If MSXML 3.0 always generate UTF-16 as the meta tag, they can still really generate the DATA in UTF-16. The current problem is the data do not agree with the meta charset. Mark this as wontfix.

Status: REOPENED → RESOLVED

Closed: 25 years ago → 25 years ago

Resolution: --- → WONTFIX

Comment 25

•

25 years ago

Yeah, I agree.

Status: RESOLVED → VERIFIED

Markus Hübner

Comment 26

•

25 years ago

The problem is that the Microsoft development platform is widely used. Is is that difficult to make it work? If we don't we will leave out all these potential developers.

Katsuhiko Momoi

Comment 27

•

25 years ago

No, it shouldn't be. But people who are objecting to the proposed fix is arguing about what is correct. I happen to think that we need to be realistic sometimes. This one will make Mozilla look bad and often there is no easy way to tell people that they are inserting invalid bytes -- pratly because they don't even know how these invalid bytes got in there. I actually disagree with the disposition of this bug. Let's see if there are others who agree with me on this.

Comment 28

•

25 years ago

I change my mind. reopen it, nhotta- check in the patch. sr=ftang

Status: VERIFIED → REOPENED

Resolution: WONTFIX → ---

Comment 29

•

25 years ago

nhotta- thanks.

Assignee: ftang → nhotta

Status: REOPENED → NEW

Assignee

Updated

•

25 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → mozilla0.9

Assignee

Comment 30

•

25 years ago

Attached patch new patch, ignore charset which should be detected by parser — Details — Splinter Review

Assignee

Comment 31

•

25 years ago

r=ftang for the new patch

Assignee

Updated

•

25 years ago

Keywords: review