Closed Bug 285261 Opened 21 years ago Closed 21 years ago

Wrong encoding of external JavaScript generated code displays "right"

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: lion, Assigned: smontagu)

Details

Attachments

(1 file)

Test HTML (php) programs for the issue 21 years ago Ferenc Veres 149.18 KB, application/zip		Details

Ferenc Veres

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8a2) Gecko/20040714 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 I have a very strange problem: latin2 website, <script src=blabla> the script displays a lot of content in UTF-8 encoding, and it appears correctly on the page, while the page itself (other surrounding content and encoding set) is iso-8859-2. Switching to View->Utf-8 keeps the JS generated content ok(!), and breaks the original latin2 content. On IE I see the Js generated content as "buggy uft-8" (double letters for nonascii), that is the right behavior I think. Important: Is the FIREFOX behavior correct, or IE's one??? (When I finished the bug report, this question came up in my mind, since the browser knows that the external source is differently encoded, why should it display buggy chars, right?) Reproducible: Always Steps to Reproduce: 1. Create a website with Latin 2 encoding 2. Include an external JavaScript <script src="utf8_variables.js" 3. Let the javascript generate some document content from the external JS's variables. Actual Results: The page dispalys correcly, both Latin2 and UTF-8 input files are displayed normal in the browser window using Latin2 encoding. Expected Results: The content coming from UTF-8 should be "broken". A%+?bbb%!%? and similar "duouble-byte" trash. I am using the www.milonic.com menu as the External Javascript, generated from C#/.NET (utf-8) while the normal page is generated by ASP/VBscript (latin2). If necessary, I can make a simple test with much smaller JavaScript, or make screenshots of how it is displayed.

Jungshik Shin

Comment 1

•

21 years ago

A simple test would be appreciated. Without it, it's not clear what's the problem.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 2

•

21 years ago

what exact content type does the server send for the <script>?

Ferenc Veres

Reporter

Comment 3

•

21 years ago

http://www.xaraya.hu/moztest/ There you can see a 110% complete test of the whole issue (reimplemented the necessary parts in small PHP scripts). Please advice. Thanks, Ferenc Veres (Please note that I wont keep the test online "forever", thus I attach a zip of it here as well.)

Ferenc Veres

Reporter

Comment 4

•

21 years ago

Attached file Test HTML (php) programs for the issue — Details

Jungshik Shin

Comment 5

•

21 years ago

Are you saying that Mozilla works better than you expect it to? utf8js.php explicitly declares that it's in UTF-8 so that Mozilla's JS interpreter correctly interpret them as in UTF-8 and stores the values of two variables internally in UTF-16. When it has to print out, it converts them to the encoding of a page that inclues it (ISO-8859-1 or ISO-8859-2). Don't you like Mozilla to do the right thing? Btw, http://www.xaraya.hu/moztest/ seems to be in ISO-8859-1 (not in ISO-8859-2) Another Btw, you have to use 'text/javascript; charset=UTF-8' instead of 'text/html; charset=UTF-8' when generating an external javascript file using PHP.

Ferenc Veres

Reporter

Comment 6

•

21 years ago

I don't know which is the "right" behavior. I just read the encoding sections of HTML 4.01, there is nothing about this. More test results: Mozilla/Firefox: readable Opera 7: readable Internet Explorer 6: unreadable (utf-8 "linenoise") Konqueror (kde3.3): unreadable (utf-8 "linenoise") The index.php seems to be iso-8859-2 to me, by checking the "View->Character encoding" menu's status on Mozilla/Firefox/Konqueror. You can also see the PHP code sends the header for latin2. The _strangest_ problem is that this "feature" allows me to print e.g. latin2-only characters on a latin1 page (latin1.png). Probably this is up to the browser how it wants to handle this strange combination, so feel free to close the "bug" if you wish. (This caused us some hours of testing weird (different) behaviour of IE & Firefox, since the page was originally Latin2, I thought, the utf-8 line noise is the "correct display".) Both behaviours are reasonable: IE: "your page is Latin2, this cannot display UTF-8 direcly." Gecko: "you specified encoding for the "subcontent", so why not interpret it." (I am surprised to be the first one pinpointing this problem.) Thanks for your attention!

Boris Zbarsky [:bzbarsky]

Comment 7

•

21 years ago

Gecko's behavior is correct. There is no such thing as a "Latin-1 page". All pages are Unicode. There are "Latin-1 encoded bytes on the wire". But Gecko converts all the data (both the HTML and the UTF8-encoded JS) to Unicode characters before doing anything else with it (using the relevant encodings specified in the HTTP headers).

Status: UNCONFIRMED → RESOLVED

Closed: 21 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Wrong encoding of external JavaScript generated code displays "right"

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: lion, Assigned: smontagu)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type