DOM escape() truncates characters inappropriately

VERIFIED DUPLICATE of bug 44272

Status

()

VERIFIED DUPLICATE of bug 44272
17 years ago
17 years ago

People

(Reporter: rich.foyle, Assigned: smontagu)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(3 attachments)

(Reporter)

Description

17 years ago
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
BuildID:    0.9.2

After using document.getSelection(), and verifying the selection was intact. I 
escaped the selection. It truncated text including the special character '— ' 
and all text after this character. The url has a document that contains the 
character.

Reproducible: Always
Steps to Reproduce:
1.create a bookmark with as its targert url: javascript:void(alert(escape
(document.getSelection()));
2.select text that contains the '— ' character
3.select the the bookmark, observe the output

Actual Results:  The selection was truncated when it hit the character

Expected Results:  Should not have truncated the selection
(Reporter)

Comment 1

17 years ago
The special character in the form submission for the bug was jacked up...it is 
the dash. see the text on the specified url: "KANSAS CITY, Mo.  — Investigators 
were" 

Comment 2

17 years ago
Created attachment 46974 [details]
Reduced HTML testcase

Comment 3

17 years ago
Created attachment 46978 [details]
Reduced JS shell testcase

Comment 4

17 years ago
Created attachment 46980 [details]
Reduced HTML testcase (corrected): runs in IE, NN, & Moz

Comment 5

17 years ago
Trying the (corrected) HTML testcase in IE4.x, NN4.x, and Mozilla:


IE4.7
----------------------------------------------------------------
CHARACTERS IN THE STRING "A—B"
char = A     charCode = 65
char = —     charCode = 8212
char = B     charCode = 66

CHARACTERS IN escape("A—B") = "A%u2014B"
char = A     charCode = 65
char = %     charCode = 37
char = u     charCode = 117
char = 2     charCode = 50
char = 0     charCode = 48
char = 1     charCode = 49
char = 4     charCode = 52
char = B     charCode = 66
----------------------------------------------------------------



NN4.7
----------------------------------------------------------------
CHARACTERS IN THE STRING "A—B"

char = A     charCode = 65
char = —     charCode = 8212
char = B     charCode = 66

CHARACTERS IN escape("A—B") = "A%97B"
char = A     charCode = 65
char = %     charCode = 37
char = 9     charCode = 57
char = 7     charCode = 55
char = B     charCode = 66
----------------------------------------------------------------



Mozilla 2001-08-22
----------------------------------------------------------------
CHARACTERS IN THE STRING "A—B"
char = A     charCode = 65
char = —     charCode = 8212
char = B     charCode = 66


CHARACTERS IN escape("A—B") = "A"
char = A     charCode = 65
----------------------------------------------------------------



You can see the bug here: escape("A—B") gets truncated.
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 6

17 years ago
Running the JS shell version of the testcase, I get the following output.
The character "—" (Unicode 2014 or JS charCode 8012) is interpreted by
my Cygwin shell as "ù" 


Testing the string  "AùB"

CHARACTERS IN THE STRING  "AùB"
char = A     charCode = 65
char = ù     charCode = 151
char = B     charCode = 66

CHARACTERS IN escape("AùB") =  "A%97B"
char = A     charCode = 65
char = %     charCode = 37
char = 9     charCode = 57
char = 7     charCode = 55
char = B     charCode = 66

Comment 7

17 years ago
Sorry, that's JS charCode 8212 (the decimal representation of hex 2014)
I mistyped '8012'... 

Comment 8

17 years ago
Let me ask rogerl if this bug is a JS Engine bug or not. Is it a dupe
of bug 72964, "Pattern matching failing on non-Latin1 characters"?

There, we discovered something wrong with JS Engine string processing 
for high Unicode characters (> 00FF). 


This comment was made there:

 ----- Additional Comments From nhotta@netscape.com 2001-03-23 16:19 -----

 Looks like non Latin1 characters without unicode escape are regarded as spaces
 when a document charset is multibyte (e.g. UTF-8, EUC-JP, GB2312).


Of course, the document charset is a browser-based issue, but we found
problems directly within the JS Engine. This bug may be another consequence.

                                 (?)

Comment 9

17 years ago
NOTE: if I alter the JS shell testcase by defining var cnTEST = 'A\u2014B' 
      instead of var cnTEST = 'A—B', I get this output:


Testing the string  "A¶B"

CHARACTERS IN THE STRING  "A¶B"
char = A     charCode = 65
char = ¶     charCode = 8212
char = B     charCode = 66

CHARACTERS IN escape("A¶B") = "A%u2014B"
char = A     charCode = 65
char = %     charCode = 37
char = u     charCode = 117
char = 2     charCode = 50
char = 0     charCode = 48
char = 1     charCode = 49
char = 4     charCode = 52
char = B     charCode = 66

Comment 10

17 years ago
Looks like the JS Engine is not to blame here. The JS shell is performing 
as expected on the testcase above...

One must remember that the DOM has its own escape() function. This supersedes
the JS Engine escape() in the browser. Using a debug build of Mozilla, in fact,
we found that the JS Engine escape() is not even called by the HTML testcase.

Reassigning to DOM Level 0 -
Assignee: rogerl → jst
Component: Javascript Engine → DOM Level 0
OS: Windows 2000 → All
QA Contact: pschwartau → desale
Hardware: PC → All
Summary: escaped text is truncated → DOM escape() truncates characters inappropriately

Comment 11

17 years ago
Here is a very reduced testcase:


             javascript: alert(escape('A\u2014B').length);

RESULTS:
                         IE4.7    :  8
                         NN4.7    :  5
                         Moz      :  1
                         JS shell :  8



This summarizes the results from the testcases above, and shows 
the truncation occuring in the Mozilla DOM escape().

I'm also curious as to the NN4.7 escape() differing from IE4.7's,
but that's another story...
Over to internationalization.
Assignee: jst → yokoyama
Component: DOM Level 0 → Internationalization
QA Contact: desale → teruko

Comment 13

17 years ago
-> smontagu
Assignee: yokoyama → smontagu
(Assignee)

Comment 14

17 years ago
Dupe of bug 44272. Refer also to discussion in bug 42221.

*** This bug has been marked as a duplicate of 44272 ***
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → DUPLICATE

Comment 15

17 years ago
Verified as dup.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.