Closed
Bug 285261
Opened 20 years ago
Closed 20 years ago
Wrong encoding of external JavaScript generated code displays "right"
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: lion, Assigned: smontagu)
Details
Attachments
(1 file)
|
149.18 KB,
application/zip
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8a2) Gecko/20040714 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1 I have a very strange problem: latin2 website, <script src=blabla> the script displays a lot of content in UTF-8 encoding, and it appears correctly on the page, while the page itself (other surrounding content and encoding set) is iso-8859-2. Switching to View->Utf-8 keeps the JS generated content ok(!), and breaks the original latin2 content. On IE I see the Js generated content as "buggy uft-8" (double letters for nonascii), that is the right behavior I think. Important: Is the FIREFOX behavior correct, or IE's one??? (When I finished the bug report, this question came up in my mind, since the browser knows that the external source is differently encoded, why should it display buggy chars, right?) Reproducible: Always Steps to Reproduce: 1. Create a website with Latin 2 encoding 2. Include an external JavaScript <script src="utf8_variables.js" 3. Let the javascript generate some document content from the external JS's variables. Actual Results: The page dispalys correcly, both Latin2 and UTF-8 input files are displayed normal in the browser window using Latin2 encoding. Expected Results: The content coming from UTF-8 should be "broken". A%+?bbb%!%? and similar "duouble-byte" trash. I am using the www.milonic.com menu as the External Javascript, generated from C#/.NET (utf-8) while the normal page is generated by ASP/VBscript (latin2). If necessary, I can make a simple test with much smaller JavaScript, or make screenshots of how it is displayed.
Comment 1•20 years ago
|
||
A simple test would be appreciated. Without it, it's not clear what's the problem.
Comment 2•20 years ago
|
||
what exact content type does the server send for the <script>?
| Reporter | ||
Comment 3•20 years ago
|
||
http://www.xaraya.hu/moztest/ There you can see a 110% complete test of the whole issue (reimplemented the necessary parts in small PHP scripts). Please advice. Thanks, Ferenc Veres (Please note that I wont keep the test online "forever", thus I attach a zip of it here as well.)
| Reporter | ||
Comment 4•20 years ago
|
||
Comment 5•20 years ago
|
||
Are you saying that Mozilla works better than you expect it to? utf8js.php explicitly declares that it's in UTF-8 so that Mozilla's JS interpreter correctly interpret them as in UTF-8 and stores the values of two variables internally in UTF-16. When it has to print out, it converts them to the encoding of a page that inclues it (ISO-8859-1 or ISO-8859-2). Don't you like Mozilla to do the right thing? Btw, http://www.xaraya.hu/moztest/ seems to be in ISO-8859-1 (not in ISO-8859-2) Another Btw, you have to use 'text/javascript; charset=UTF-8' instead of 'text/html; charset=UTF-8' when generating an external javascript file using PHP.
| Reporter | ||
Comment 6•20 years ago
|
||
I don't know which is the "right" behavior. I just read the encoding sections of HTML 4.01, there is nothing about this. More test results: Mozilla/Firefox: readable Opera 7: readable Internet Explorer 6: unreadable (utf-8 "linenoise") Konqueror (kde3.3): unreadable (utf-8 "linenoise") The index.php seems to be iso-8859-2 to me, by checking the "View->Character encoding" menu's status on Mozilla/Firefox/Konqueror. You can also see the PHP code sends the header for latin2. The _strangest_ problem is that this "feature" allows me to print e.g. latin2-only characters on a latin1 page (latin1.png). Probably this is up to the browser how it wants to handle this strange combination, so feel free to close the "bug" if you wish. (This caused us some hours of testing weird (different) behaviour of IE & Firefox, since the page was originally Latin2, I thought, the utf-8 line noise is the "correct display".) Both behaviours are reasonable: IE: "your page is Latin2, this cannot display UTF-8 direcly." Gecko: "you specified encoding for the "subcontent", so why not interpret it." (I am surprised to be the first one pinpointing this problem.) Thanks for your attention!
Comment 7•20 years ago
|
||
Gecko's behavior is correct. There is no such thing as a "Latin-1 page". All pages are Unicode. There are "Latin-1 encoded bytes on the wire". But Gecko converts all the data (both the HTML and the UTF8-encoded JS) to Unicode characters before doing anything else with it (using the relevant encodings specified in the HTTP headers).
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•