Closed Bug 285261 Opened 21 years ago Closed 21 years ago

Wrong encoding of external JavaScript generated code displays "right"

Categories

(Core :: Internationalization, defect)

x86
Windows Server 2003
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: lion, Assigned: smontagu)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8a2) Gecko/20040714 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 I have a very strange problem: latin2 website, <script src=blabla> the script displays a lot of content in UTF-8 encoding, and it appears correctly on the page, while the page itself (other surrounding content and encoding set) is iso-8859-2. Switching to View->Utf-8 keeps the JS generated content ok(!), and breaks the original latin2 content. On IE I see the Js generated content as "buggy uft-8" (double letters for nonascii), that is the right behavior I think. Important: Is the FIREFOX behavior correct, or IE's one??? (When I finished the bug report, this question came up in my mind, since the browser knows that the external source is differently encoded, why should it display buggy chars, right?) Reproducible: Always Steps to Reproduce: 1. Create a website with Latin 2 encoding 2. Include an external JavaScript <script src="utf8_variables.js" 3. Let the javascript generate some document content from the external JS's variables. Actual Results: The page dispalys correcly, both Latin2 and UTF-8 input files are displayed normal in the browser window using Latin2 encoding. Expected Results: The content coming from UTF-8 should be "broken". A%+?bbb%!%? and similar "duouble-byte" trash. I am using the www.milonic.com menu as the External Javascript, generated from C#/.NET (utf-8) while the normal page is generated by ASP/VBscript (latin2). If necessary, I can make a simple test with much smaller JavaScript, or make screenshots of how it is displayed.
A simple test would be appreciated. Without it, it's not clear what's the problem.
what exact content type does the server send for the <script>?
http://www.xaraya.hu/moztest/ There you can see a 110% complete test of the whole issue (reimplemented the necessary parts in small PHP scripts). Please advice. Thanks, Ferenc Veres (Please note that I wont keep the test online "forever", thus I attach a zip of it here as well.)
Are you saying that Mozilla works better than you expect it to? utf8js.php explicitly declares that it's in UTF-8 so that Mozilla's JS interpreter correctly interpret them as in UTF-8 and stores the values of two variables internally in UTF-16. When it has to print out, it converts them to the encoding of a page that inclues it (ISO-8859-1 or ISO-8859-2). Don't you like Mozilla to do the right thing? Btw, http://www.xaraya.hu/moztest/ seems to be in ISO-8859-1 (not in ISO-8859-2) Another Btw, you have to use 'text/javascript; charset=UTF-8' instead of 'text/html; charset=UTF-8' when generating an external javascript file using PHP.
I don't know which is the "right" behavior. I just read the encoding sections of HTML 4.01, there is nothing about this. More test results: Mozilla/Firefox: readable Opera 7: readable Internet Explorer 6: unreadable (utf-8 "linenoise") Konqueror (kde3.3): unreadable (utf-8 "linenoise") The index.php seems to be iso-8859-2 to me, by checking the "View->Character encoding" menu's status on Mozilla/Firefox/Konqueror. You can also see the PHP code sends the header for latin2. The _strangest_ problem is that this "feature" allows me to print e.g. latin2-only characters on a latin1 page (latin1.png). Probably this is up to the browser how it wants to handle this strange combination, so feel free to close the "bug" if you wish. (This caused us some hours of testing weird (different) behaviour of IE & Firefox, since the page was originally Latin2, I thought, the utf-8 line noise is the "correct display".) Both behaviours are reasonable: IE: "your page is Latin2, this cannot display UTF-8 direcly." Gecko: "you specified encoding for the "subcontent", so why not interpret it." (I am surprised to be the first one pinpointing this problem.) Thanks for your attention!
Gecko's behavior is correct. There is no such thing as a "Latin-1 page". All pages are Unicode. There are "Latin-1 encoded bytes on the wire". But Gecko converts all the data (both the HTML and the UTF8-encoded JS) to Unicode characters before doing anything else with it (using the relevant encodings specified in the HTTP headers).
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: