Closed Bug 68210 Opened 24 years ago Closed 24 years ago

String.split() shows non-ASCII characters as Unicode

Tracking

()

Status:

VERIFIED INVALID

People

(Reporter: teruko, Assigned: rogerl)

Details

(Whiteboard: [HTML testcase files are corrupted; save the text version!!][js1.2])

Attachments

(3 files)

HTML testcase 24 years ago Phil Schwartau 8.83 KB, text/html		Details
HTML testcase (second try) 24 years ago Phil Schwartau 8.79 KB, text/html		Details
Source of HTML testcase (as text file !!!) 24 years ago Phil Schwartau 8.79 KB, text/plain		Details

Teruko Kobayashi

Reporter

Description

•

24 years ago

<HTML>
Test case

<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Shift_JIS">
<SCRIPT LANGUAGE="JavaScript1.2">
myVar = "  これは 日本語 表示です。  ";
splits = myVar.split(" ", 3);
document.write(splits);
</script>
</HTML>

After the Japanese string is split, the result is displayed as unicode as follows.

["\u3053\u308C\u306F", "\u65E5\u672C\u8A9E",
"\u8868\u793A\u3067\u3059\u3002"]

The Japanese character should be displeyed.

I tested this with Netscape 6 rtm, 2001020604 Mtrunk build, and
4.x.

Roger said
The result you're seeing is because when the array result from split is
converted to a string the individual sub-strings are escape-mapped before being
joined. This is specific to version 1.2 and only happens during the
array.toString part of the call. If you access the array elements individually,
you won't get this behaviour.

Phil Schwartau

Comment 1

•

24 years ago

Attached file HTML testcase — Details

Phil Schwartau

Comment 2

•

24 years ago

Attached file HTML testcase (second try) — Details

Phil Schwartau

Comment 3

•

24 years ago

OK, the testcase does not run properly coming off the server !!!
In order to use them, you'll have to save them and run them locally...


Here is the output of the testcase:


Different versions of JavaScript: apply myString.split(" ") to the string:

                                                                                                                 
                          これは 日本語 侮ｦですB


JS version 1.1:

myArray.toSource() =   ["", "", "\u201A\xB1\u201A\xEA\u201A\xCD", 
"\u201C\xFA\u2013{\u0152\xEA", "\u2022\u017D\xA6\u201A\xC5\u201A\xB7\uFFFDB", 
""]

myArray.toString() =     ,,これは,日本語,侮ｦですB,


myArray[0] =
myArray[1] =
myArray[2] = これは
myArray[3] = 日本語
myArray[4] = 侮ｦですB
myArray[5] =



JS version 1.2:

myArray.toSource() =   ["\u201A\xB1\u201A\xEA\u201A\xCD", 
"\u201C\xFA\u2013{\u0152\xEA", "\u2022\u017D\xA6\u201A\xC5\u201A\xB7\uFFFDB"]

myArray.toString() =     ["\u201A\xB1\u201A\xEA\u201A\xCD", 
"\u201C\xFA\u2013{\u0152\xEA", "\u2022\u017D\xA6\u201A\xC5\u201A\xB7\uFFFDB"]


myArray[0] = これは
myArray[1] = 日本語
myArray[2] = 侮ｦですB



JS version 1.3:

myArray.toSource() =   ["", "", "\u201A\xB1\u201A\xEA\u201A\xCD", 
"\u201C\xFA\u2013{\u0152\xEA", "\u2022\u017D\xA6\u201A\xC5\u201A\xB7\uFFFDB", 
"", ""]

myArray.toString() =     ,,これは,日本語,侮ｦですB,,


myArray[0] =
myArray[1] =
myArray[2] = これは
myArray[3] = 日本語
myArray[4] = 侮ｦですB
myArray[5] =
myArray[6] =



JS version 1.4:

myArray.toSource() =   ["", "", "\u201A\xB1\u201A\xEA\u201A\xCD", 
"\u201C\xFA\u2013{\u0152\xEA", "\u2022\u017D\xA6\u201A\xC5\u201A\xB7\uFFFDB", 
"", ""]

myArray.toString() =     ,,これは,日本語,侮ｦですB,,


myArray[0] =
myArray[1] =
myArray[2] = これは
myArray[3] = 日本語
myArray[4] = 侮ｦですB
myArray[5] =
myArray[6] =



JS version 1.5:

myArray.toSource() =   ["", "", "\u201A\xB1\u201A\xEA\u201A\xCD", 
"\u201C\xFA\u2013{\u0152\xEA", "\u2022\u017D\xA6\u201A\xC5\u201A\xB7\uFFFDB", 
"", ""]

myArray.toString() =     ,,これは,日本語,侮ｦですB,,


myArray[0] =
myArray[1] =
myArray[2] = これは
myArray[3] = 日本語
myArray[4] = 侮ｦですB
myArray[5] =
myArray[6] =

Phil Schwartau

Comment 4

•

24 years ago

The testcase illustrates what Roger said above - JS1.2 differs from 
all other JS versions in its treatment of Array.toString()


In JS1.2, Array.toString() is the same as Array.toSource(). In all 
other versions of JS, Array.toString() and Array.toSource() are different.


When you call String.split(), it returns an Array object and applies 
Array.toString() to it. In JS1.2, that will give you Array.toSource(); 
unlike other versions of JS. That's why String.split() looks funny in JS1.2


Note, that even in JS1.2,  however, the individual elements myArray[i]
appear without any Unicode escape-mapping.


Note: all comments apply to the Netcape implementation of JavaScript.
They do not apply to Microsoft's implementation. Note also that 
Microsoft JavaScript does not have a toSource() method...

Summary: Javascript-Split shows Non-ASCII characters as unicode → String.split() shows non-ASCII characters as Unicode

Whiteboard: [Testcases DON'T WORK OFF SERVER; save and run locally!]

Phil Schwartau

Comment 5

•

24 years ago

Attached file Source of HTML testcase (as text file !!!) — Details

Phil Schwartau

Comment 6

•

24 years ago

If you want to run the HTML testcase, you'll have to save the TEXT of it
that I've attached above (attachment id = 24963), and run it locally.


I don't know why, but the HTML files got corrupted when saved to the
Bugzilla server. 


The string you test, and the split() command you apply to the string,
can be adjusted in these variables in the file: 


          var myString = "  これは 日本語 表示です?B  ";

          var mySplitCommand =  'myString.split(" ")';

Phil Schwartau

Comment 7

•

24 years ago

Based on Roger's explanation, which the testcase confirmed, I have to 
mark this bug as invalid. In order to get the Japanese characters to display,
you should use 


                <SCRIPT LANGUAGE="JavaScript"> 
NOT

                <SCRIPT LANGUAGE="JavaScript1.2">



And even in JavaScript1.2, if you access the array elements individually,
(myArray[i]), you will get the Japanese characters and not Unicode escaping -

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → INVALID

Phil Schwartau

Comment 8

•

24 years ago

Marking Verified -

Status: RESOLVED → VERIFIED

Phil Schwartau

Updated

•

24 years ago

Whiteboard: [Testcases DON'T WORK OFF SERVER; save and run locally!] → [HTML testcase files are corrupted; save the text version!!]

Bob Clary [:bc] (inactive)

Updated

•

20 years ago

Whiteboard: [HTML testcase files are corrupted; save the text version!!] → [HTML testcase files are corrupted; save the text version!!][js1.2]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

String.split() shows non-ASCII characters as Unicode

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: teruko, Assigned: rogerl)

References

Details

(Whiteboard: [HTML testcase files are corrupted; save the text version!!][js1.2])

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Attachment

General

Description

File Name

Content Type