Closed Bug 285261 Opened 20 years ago Closed 20 years ago

Wrong encoding of external JavaScript generated code displays "right"

Categories

(Core :: Internationalization, defect)

x86
Windows Server 2003
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: lion, Assigned: smontagu)

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8a2) Gecko/20040714
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1

I have a very strange problem: latin2 website, <script src=blabla> the script
displays a lot of content in UTF-8 encoding, and it appears correctly on the
page, while the page itself (other surrounding content and encoding set) is
iso-8859-2.

Switching to View->Utf-8 keeps the JS generated content ok(!), and breaks the
original latin2 content.

On IE I see the Js generated content as "buggy uft-8" (double letters for
nonascii), that is the right behavior I think.

Important: Is the FIREFOX behavior correct, or IE's one???  (When I finished the
bug report, this question came up in my mind, since the browser knows that the
external source is differently encoded, why should it display buggy chars, right?)




Reproducible: Always

Steps to Reproduce:
1. Create a website with Latin 2 encoding
2. Include an external JavaScript <script src="utf8_variables.js"
3. Let the javascript generate some document content from the external JS's
variables.

Actual Results:  
The page dispalys correcly, both Latin2 and UTF-8 input files are displayed
normal in the browser window using Latin2 encoding.

Expected Results:  
The content coming from UTF-8 should be "broken". A%+?bbb%!%? and similar
"duouble-byte" trash.


I am using the www.milonic.com menu as the External Javascript, generated from
C#/.NET (utf-8) while the normal page is generated by ASP/VBscript (latin2). If
necessary, I can make a simple test with much smaller JavaScript, or make
screenshots of how it is displayed.
A simple test would be appreciated. Without it, it's not clear what's the problem. 
what exact content type does the server send for the <script>?
http://www.xaraya.hu/moztest/

There you can see a 110% complete test of the whole issue (reimplemented the
necessary parts in small PHP scripts). Please advice. 

Thanks,
Ferenc Veres

(Please note that I wont keep the test online "forever", thus I attach a zip of
it here as well.)

Are you saying that Mozilla works better than you expect it to? 

utf8js.php explicitly declares that it's in UTF-8 so that Mozilla's JS
interpreter correctly interpret them as in UTF-8 and stores the values of two
variables internally in UTF-16. When it has to print out, it converts them to
the encoding of a page that inclues it (ISO-8859-1 or ISO-8859-2). 

Don't you like Mozilla to do the right thing?

Btw,  http://www.xaraya.hu/moztest/ seems to be in ISO-8859-1 (not in ISO-8859-2)

Another Btw, you have to use 'text/javascript; charset=UTF-8' instead of
'text/html; charset=UTF-8' when generating an external javascript file using PHP.

I don't know which is the "right" behavior. I just read the encoding sections of
HTML 4.01, there is nothing about this. 

More test results:

Mozilla/Firefox: readable
Opera 7: readable
Internet Explorer 6: unreadable (utf-8 "linenoise")
Konqueror (kde3.3): unreadable  (utf-8 "linenoise")

The index.php seems to be iso-8859-2 to me, by checking the "View->Character
encoding" menu's status on Mozilla/Firefox/Konqueror. You can also see the PHP
code sends the header for latin2.

The _strangest_ problem is that this "feature" allows me to print e.g.
latin2-only characters on a latin1 page (latin1.png). 

Probably this is up to the browser how it wants to handle this strange
combination, so feel free to close the "bug" if you wish.

(This caused us some hours of testing weird (different) behaviour of IE &
Firefox, since the page was originally Latin2, I thought, the utf-8 line noise
is the "correct display".) 

Both behaviours are reasonable:

IE: "your page is Latin2, this cannot display UTF-8 direcly."

Gecko: "you specified encoding for the "subcontent", so why not interpret it."

(I am surprised to be the first one pinpointing this problem.)

Thanks for your attention!
Gecko's behavior is correct.  There is no such thing as a "Latin-1 page".  All
pages are Unicode.  There are "Latin-1 encoded bytes on the wire".  But Gecko
converts all the data (both the HTML and the UTF8-encoded JS) to Unicode
characters before doing anything else with it (using the relevant encodings
specified in the HTTP headers).
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: