Closed Bug 10680 Opened 21 years ago Closed 20 years ago
Can't decode UTF-8 Japanese text in JS function
Please reassign to the proper owner. To reproduce: 1. Launch the apprunner. 2. Click on "QA|Strres Test" menuitem to go to a test cases I hacked up based on openLocation.xul. 3. There are 3 titledbuttons labled "ok", "test", and "cancel: ...". Note that the 3rd button's label properly display Japanese texts in utf-8. The label string is set in XUL. (see attached figure 1) 4. Now, click on the "test" button which will 4.1. set 'cancel' btn label to "\u88e6\ue3bb\u8b82" which the escaped unicode for a Japanese text. 4.2. set 'ok' btn label to "onetwothree" which is a plain English text retrieved by nsIStringBundle from an external property file, resource:/res/strres.properties. 4.3. set 'test' button to "æ »ã " which is a utf-8 encoded Japanese tex text which is retrieved by nsIStringBundle from resource:/res/strres.properties. 5. As you can see in the attached figure 2: 5.1 Japanese text in escaped unicode displays properly. 5.2 English texts retrieved from property file display correctly. 5.3 UTF-8 encoded Japanese texts do not display correctly. Even I store it in JS var and pass it to setAttribute() function, it does not draw correctly. Feel free to let me know if you need assistence in reproducing this bug. Thanks
Assignee: brendan → tao
First, this is almost certainly not a JS engine bug. The JS engine uses UCS-2 character arrays for strings, without combining sequences and other hard stuff, per ECMA-262 (ISO 16262) and very much like Java. It does not and should not know anything about utf-8 or other encodings. If you use utf-8 in a document, inside of a script tag or not, you must specify the doc's charset as utf-8 to get it transcoded by code in htmlparser to a UCS-2 string. The substrings of that string that lie within script tags are then parsed by the JS engine. But the external scripts loaded by script src= are not necessarily so converted. Looking at http://lxr.mozilla.org/mozilla/source/intl/strres/tests/strres-test.xul (I think this bug's URL field should point to strres-test.xul, not strres-test.js), I see an encoding="UTF-8" attribute on the mandatory-first ?xml tag. But it's not clear to me that the encoding attribute on ?xml must govern the script loaded via html:script src= -- in fact I think it's unlikely, and can't find a clear statement in the HTML 4.0 spec that requires the doc's charset to be used if no charset attribute is given in the script tag. My memory of HTML 4.0 days corresponding with Dave Raggett of w3.org muddled. Tao, can you please try adding charset="UTF-8" to the html:script src= tag? Cc'ing nisheeth for XML advice. /be
I tried that before logging this bug. It didn't make any difference. I believe, in Seamonkey, XML doc is in UTF-8 if the charset is not set.
I'll look into the C++ code to see if the data got corrupted in between modules.
Duh, I mean "I hope someone tried charset= on html:script src= and if that failed, then I hope someone will reassign this bug to nisheeth" -- or whoever in layout owns hooking up the converter for script src=. Sorry for suggesting that a new bug be filed. Either this bug is against layout, or it should be fixed by tao using charset= on html:script around line 20 of the testcase URL above (I set that to the .xul file). /be
As I said earlier in this bug report, I did set charset="UTF-8" in the <html:script > tag before logging this bug. It did not work. If I tweak the property file parser to decode the file as UTF-8 encoded, then the utf-8 encoded double-byte characters are displayed corrrectly. The escaped unicode are packed as 2 byte Unichar array. For example, "\u9cdf" takes up 6 unichar as opposed to 1. Is there a converted to identify and convert the array to UCS-2?
Oops, the last sentense shall read as "Is there a converter to identify and convert the array to UCS-2?"
changing qa contact to teruko
Severity: blocker → critical
Priority: P3 → P1
Whiteboard: Need to determine where to convert the escape-unicode
Target Milestone: M9 → M10
I have a patch to fix the truncated ascii text problem. The non-English texts display problem has been isolated in property file loader code. Move to M10 since it isn't M9 blocker. Raise priority to P1 since it blocks non-English text display.
Assignee: tao → ftang
Status: ASSIGNED → NEW
Whiteboard: Need to determine where to convert the escape-unicode → Need a converter for escape-unicode
Frank, I am reassigned this bug to you per our earlier discussion. Please mark it fixed when the escaped unicode converter is done. Also update status summary.
create a seperate bug about \u converters (11725) and mark this bug depend on it. Reassign this bug back to tao and tao should fix it after I fix 11725.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Tao, is there any test case that Teruko can follow to verify this bug? If not, can you verify it yourself? Thanks!
Hi, Teruko: I believe that you can verify this via "QA|String Bundle Test" or simply type in "resources:/res/strres-test.xul" in the location bar to load a XUL page that has 3 cmd buttons at the bottom of the page. Clicking on the middle one will trigger a JS function which retrieve CJK charater, "loyalty" and set the middle button to it. Let me know if this does not work.
I verified this in 9-28 build.
You need to log in before you can comment on or make changes to this bug.