Closed Bug 331773 Opened 19 years ago Closed 12 years ago

encodeURI fails on decodeURI("%ED%A0%80")

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: danswer, Unassigned)

References

(Blocks 1 open bug)

Details

Csaba Gabor

Reporter

Description

•

19 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 It is possible to give DecodeURI a string such that EncodeURI cannot process it. This same string could just as well come from HTML markup. Reproducible: Always Steps to Reproduce: var js = decodeURI ("%ED%A0%80"); alert (js.length + "\n" + js.charCodeAt(0)); // 1, 0xD800 alert (escape (js)); // %uD800 alert (encodeURI (js)); // fails here Example 2: <span id=myspan>&#55296;𐀂</span> <script type='text/javascript'> var txt = document.getElementById('myspan').innerHTML; alert (txt.length + "\n" + txt); // length is 3 alert (escape (txt)); // %uD800%uD800%uDC02 alert (encodeURI(txt)); // fails here </script> Actual Results: In the examples, encodeURI fails (on \uD800-\uDFFF) even though decodeURI and escape are successful. Expected Results: I expect that encodeURI should give me "%ED%A0%80" corresponding to the equivalent of (%uD800) what the escape is returning. I have encountered this in trying to safely pass strings between the client and server. Going from the server to the browser is not so bad because one can use either 1. decodeURI(utf-8 encoded string), 2. read the characters out from an html element (such as a span) into which they have been encoded with &#unicodePointInDecimal; 3. or "\xHH" or "\uHHHH" or "\uHHHH\uHHHH" the latter being for unicode characters with 17-21 bits where the 8 H correspond to the 8 nibble UTF-8 encoding. Eg. 𐀂 (dec) -> 10002 (unicode,hex) -> decodeURI("%F0%90%80%82") (UTF-8) -> "\uD800\uDC02" (UTF-16, right?) To go from a javascript string to an ascii representation, one would expect to use encodeURI, but this fails on decodeURI ("%ED%A0%80"). The reason it fails, I presume, is because the specified character is not valid. But in that case, escape should not work either. I would rather have consistent behaviour - if we don't die on creating the string, then I would rather not die upon manipulating it, especially user entered string data. Csaba Gabor from Vienna For unicode charts see: http://www.macchiato.com/unicode/chart/ References: http://en.wikipedia.org/wiki/UTF-8 and http://en.wikipedia.org/wiki/UTF-16/UCS-2

Simon Montagu :smontagu

Comment 1

•

19 years ago

Example 1 in comment 0 is pretty much a dupe of bug 316338. Example 2 is fixed in trunk by bug 316394.

Boris Zbarsky [:bzbarsky]

Comment 2

•

19 years ago

So basically, decodeURI can produce bogus UTF16? Sounds like we should fix that in the JS engine. Same for decodeURIFragment.

Assignee: smontagu → general

Blocks: 316338

Status: UNCONFIRMED → NEW

Component: Internationalization → JavaScript Engine

Ever confirmed: true

OS: Windows XP → All

QA Contact: amyy → general

Hardware: PC → All

Tom S. (please needinfo tschuster)

Updated

•

14 years ago

Blocks: test262

Paul Biggar

Updated

•

14 years ago

No longer blocks: test262

Masahiro YAMADA

Comment 3

•

14 years ago

Now deocdeURI("%ED%A0%80") throws URIError (see bug 660612)

Tom S. (please needinfo tschuster)

Comment 4

•

12 years ago

Throwing is okay.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

encodeURI fails on decodeURI("%ED%A0%80")

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: danswer, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4