Closed Bug 96716 Opened 23 years ago Closed 23 years ago

DOM escape() truncates characters inappropriately

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 44272

People

(Reporter: rich.foyle, Assigned: smontagu)

References

()

Details

Attachments

(3 files)

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
BuildID:    0.9.2

After using document.getSelection(), and verifying the selection was intact. I 
escaped the selection. It truncated text including the special character '— ' 
and all text after this character. The url has a document that contains the 
character.

Reproducible: Always
Steps to Reproduce:
1.create a bookmark with as its targert url: javascript:void(alert(escape
(document.getSelection()));
2.select text that contains the '— ' character
3.select the the bookmark, observe the output

Actual Results:  The selection was truncated when it hit the character

Expected Results:  Should not have truncated the selection
The special character in the form submission for the bug was jacked up...it is 
the dash. see the text on the specified url: "KANSAS CITY, Mo.  — Investigators 
were" 
Attached file Reduced HTML testcase
Trying the (corrected) HTML testcase in IE4.x, NN4.x, and Mozilla:


IE4.7
----------------------------------------------------------------
CHARACTERS IN THE STRING "A—B"
char = A     charCode = 65
char = —     charCode = 8212
char = B     charCode = 66

CHARACTERS IN escape("A—B") = "A%u2014B"
char = A     charCode = 65
char = %     charCode = 37
char = u     charCode = 117
char = 2     charCode = 50
char = 0     charCode = 48
char = 1     charCode = 49
char = 4     charCode = 52
char = B     charCode = 66
----------------------------------------------------------------



NN4.7
----------------------------------------------------------------
CHARACTERS IN THE STRING "A—B"

char = A     charCode = 65
char = —     charCode = 8212
char = B     charCode = 66

CHARACTERS IN escape("A—B") = "A%97B"
char = A     charCode = 65
char = %     charCode = 37
char = 9     charCode = 57
char = 7     charCode = 55
char = B     charCode = 66
----------------------------------------------------------------



Mozilla 2001-08-22
----------------------------------------------------------------
CHARACTERS IN THE STRING "A—B"
char = A     charCode = 65
char = —     charCode = 8212
char = B     charCode = 66


CHARACTERS IN escape("A—B") = "A"
char = A     charCode = 65
----------------------------------------------------------------



You can see the bug here: escape("A—B") gets truncated.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Running the JS shell version of the testcase, I get the following output.
The character "—" (Unicode 2014 or JS charCode 8012) is interpreted by
my Cygwin shell as "ù" 


Testing the string  "AùB"

CHARACTERS IN THE STRING  "AùB"
char = A     charCode = 65
char = ù     charCode = 151
char = B     charCode = 66

CHARACTERS IN escape("AùB") =  "A%97B"
char = A     charCode = 65
char = %     charCode = 37
char = 9     charCode = 57
char = 7     charCode = 55
char = B     charCode = 66
Sorry, that's JS charCode 8212 (the decimal representation of hex 2014)
I mistyped '8012'... 
Let me ask rogerl if this bug is a JS Engine bug or not. Is it a dupe
of bug 72964, "Pattern matching failing on non-Latin1 characters"?

There, we discovered something wrong with JS Engine string processing 
for high Unicode characters (> 00FF). 


This comment was made there:

 ----- Additional Comments From nhotta@netscape.com 2001-03-23 16:19 -----

 Looks like non Latin1 characters without unicode escape are regarded as spaces
 when a document charset is multibyte (e.g. UTF-8, EUC-JP, GB2312).


Of course, the document charset is a browser-based issue, but we found
problems directly within the JS Engine. This bug may be another consequence.

                                 (?)
NOTE: if I alter the JS shell testcase by defining var cnTEST = 'A\u2014B' 
      instead of var cnTEST = 'A—B', I get this output:


Testing the string  "A¶B"

CHARACTERS IN THE STRING  "A¶B"
char = A     charCode = 65
char = ¶     charCode = 8212
char = B     charCode = 66

CHARACTERS IN escape("A¶B") = "A%u2014B"
char = A     charCode = 65
char = %     charCode = 37
char = u     charCode = 117
char = 2     charCode = 50
char = 0     charCode = 48
char = 1     charCode = 49
char = 4     charCode = 52
char = B     charCode = 66
Looks like the JS Engine is not to blame here. The JS shell is performing 
as expected on the testcase above...

One must remember that the DOM has its own escape() function. This supersedes
the JS Engine escape() in the browser. Using a debug build of Mozilla, in fact,
we found that the JS Engine escape() is not even called by the HTML testcase.

Reassigning to DOM Level 0 -
Assignee: rogerl → jst
Component: Javascript Engine → DOM Level 0
OS: Windows 2000 → All
QA Contact: pschwartau → desale
Hardware: PC → All
Summary: escaped text is truncated → DOM escape() truncates characters inappropriately
Here is a very reduced testcase:


             javascript: alert(escape('A\u2014B').length);

RESULTS:
                         IE4.7    :  8
                         NN4.7    :  5
                         Moz      :  1
                         JS shell :  8



This summarizes the results from the testcases above, and shows 
the truncation occuring in the Mozilla DOM escape().

I'm also curious as to the NN4.7 escape() differing from IE4.7's,
but that's another story...
Over to internationalization.
Assignee: jst → yokoyama
Component: DOM Level 0 → Internationalization
QA Contact: desale → teruko
-> smontagu
Assignee: yokoyama → smontagu
Dupe of bug 44272. Refer also to discussion in bug 42221.

*** This bug has been marked as a duplicate of 44272 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Verified as dup.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: