Closed Bug 278540 Opened 20 years ago Closed 18 years ago

emitting DOM content (created with Javascript), lossy UTF-16 to ASCII conversion is used

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 335298

People

(Reporter: gumbas, Assigned: jst)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

In the page http://y2k-design.net/_cms_data/charset_iframetest.html you can see
a simple JavaScript that creates iframe and sets its contents.
The problem is that although the iframe has "meta http-equiv" charset definition
 (iso-8859-2 in this case) it's completely ignored.

The script below works in IE but doesnt in Firefox.

Code:
--------------------------
<html>
<head><meta http-equiv="content-type" content="text/html;
charset=iso-8859-2"><title>TEST</title></head>

<body>

Polish chars test: ISO:¶¶¶¶¶¶¶¶¶¶¶¶¶¶ Win:śśśśśśśśś
	
<br>
<br>

<script language="JavaScript1.2" type="text/javascript">
	function getContents()
	 {
		return("<html>\
				<head>\
				<meta http-equiv=\"content-type\" content=\"text/plain; charset=iso-8859-2\">\
				<title>TEST</title></head>\
				<body bgcolor=\"#ffffef\" margin=0 border=0>\
				Polish chars test: ISO:¶¶¶¶¶¶¶¶¶¶¶¶¶¶ Win:śśśśśśśśś\
				</body></html>");
	 };
	
	translatorPadIFrame = document.createElement("iframe");
	translatorPadIFrame.id = "translatorPadIFrame";
	translatorPadIFrame.style.width="200px";
	translatorPadIFrame.style.height="200px";
	translatorPadIFrame.src="javascript:parent.getContents();";
	document.body.appendChild(translatorPadIFrame);
</script>

</body>
</html>


Reproducible: Always

Steps to Reproduce:
1. Visit http://y2k-design.net/_cms_data/charset_iframetest.html and see it for
yourself

Actual Results:  
See the output screen (Firefox 1.0):
http://y2k-design.net/_cms_data/charset_iframetest.png

Expected Results:  
The iframe contents should display chars properly, according to the charset
definition.
Your test page given at the URL is NOT in ISO-8859-2 but mixes two character
encodings, ISO-8859-2 and Windows-1250 in a single HTML file. 
'ś' (this is in UTF-8. please, use UTF-8 whenever you add a comment with
non-ASCII content by setting View | Character encoding to UTF-8 BEFORE writing
your comment) in ISO-8859-2 is 0xB6 while it's 0x9C. Note that '0x9C' is invalid
(or represents a C1 control character) in ISO-8859-2 so that it's rendered as '?'

As for 'iframe'... this is an interesting case.. I'm not sure which component
this belongs to...
Keywords: intl
OS: Windows 2000 → All
Hardware: PC → All
I meant to change the product to Core 
Component: General → DOM
Product: Firefox → Core
Version: unspecified → Trunk
What's going on here is:

'ś' (0xB6) in ISO-8859-2 is translated into UTF-16 (0x01 0x5B) and then emitted
as '0x5B' by trucating high-byte (UTF16 -> ASCII conversion). 
The same happens to 0x9C which is translated into 0xFF 0xFD (U+0FFFD : Not a
character) because 0x9C in ISO-8859-2 is invalid. When emitted, it turns to 0xFD
(0xFF is truncated by lossy UTF16 -> ASCII conversion) which is 'ý' in
ISO-8859-2 (as well as in ISO-8859-1)

I don't think MS IE honors charset in 'meta tag' (of a dynamically created
iframe) but it just emit 'DOM content' in the charset of the parent document.


Assignee: firefox → jst
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: iframe document charset ignored when dynamically created using Javascript → emitting DOM content (created with Javascript), lossy UTF-16 to ASCII conversion is used
See the XXX comment at
http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#264
 -- that's what's causing this bug.

The idea of having a real UTF-data stream has come up before.  See what
nsDOMParser does, eg, in ConvertWStringToStream.  The JS url code should
probably do something similar, or we should factor the code out into a
NS_NewUTF8InputStream method or something.

Whiteboard: DUPEME
Depends on: 335298
Fixed by bug 335298

*** This bug has been marked as a duplicate of 335298 ***
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → DUPLICATE
Whiteboard: DUPEME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: