Closed Bug 580381 Opened 15 years ago Closed 15 years ago

decodeURIComponent throws exception: malformed URI sequence

Categories

(Core :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: aquilax, Unassigned)

References

()

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 ( .NET CLR 3.5.30729; .NET4.0E) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 ( .NET CLR 3.5.30729; .NET4.0E) Some german letters get encoded as follow ö = %F6, ü = %FC and ä = %E4, but the javascript function decodeURIComponent can't decode them. Reproducible: Always Steps to Reproduce: 1. Use the following bookmarklet : javascript:alert(decodeURIComponent("%F6%20%FC%20%E4")); Actual Results: "malformed URI sequence" exception is thrown Expected Results: An alert message with "ö ü ä".
> Some german letters get encoded as follow ö = %F6, ü = %FC and ä = %E4, Not if you expect to use decodeURIComponent. The spec for decodeURIComponent is very clear: the result of URI-unescaping must be treated as a byte sequence of bytes representing the UTF-8 encoding of a Unicode string. You seem to be using ISO-8859-1 encoding instead. See http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf section 15.1.3.2 and the definition of Decode in section 15.1.3. In particular, step 4.d.vii.8, which says: Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octets does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception.
Status: UNCONFIRMED → RESOLVED
Closed: 15 years ago
Resolution: --- → INVALID
I don't create the url, I receive it from google. Try it yourself: http://www.google.com/search?q=ö ü ä and you get "redirected" to http://www.google.com/search?q=%F6%20%FC%20%E4
Yes, but all that means is that you can't use decodeURIComponent to decode that... because it's using the wrong character encoding. You may be able to get away with window.unescape, but in general there's pretty much no support for working with arbitrary encodings (which is what you want here) built into the language. You'd have to use a library that does that.
You need to log in before you can comment on or make changes to this bug.