Closed Bug 660612 Opened 13 years ago Closed 13 years ago

Utf8ToOneUcs4Char passes invalid UTF-8 octets '%ED%A0%80', so decodeURIComponent('%ED%A0%80') doesn't throw

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: utatane.tea, Assigned: masa141421356)

Details

Attachments

(1 file, 6 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.9) Gecko/2008052906 Firefox/3.0
Build Identifier: 

according to the RFC3629, http://www.ietf.org/rfc/rfc3629.txt
following octets
%xED %A0 %80 (U+D800. this code point is in surrogate area)
is invalid UTF-8 octets.

so, according to the ES5.1, section 15.1.3 Decode Algorithm, step 4-d-vii-8,
"Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octets does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception."

but, decodeURIComponent('%ED%A0%80') doesn't throw URIError.

Reproducible: Always

Steps to Reproduce:
1. evaluates `decodeURIComponent('%ED%A0%80')`

Actual Results:  
1. returns '\uDC00'. This code point is in surrogate area.

Expected Results:  
1. throw URIError

I found the same bug in V8, and report the issue. http://code.google.com/p/v8/issues/detail?id=1415
JSC is checking in wtf/unicode/UTC8.cpp, decodeUTF8Sequence function, Handle 3-byte sequences section.

code:
  // UTF-16 surrogates should never appear in UTF-8 data
  if (c >= 0xD800 && c <= 0xDFFFF)
    return -1;
Status: UNCONFIRMED → NEW
Ever confirmed: true
What do Chakra and Carakan do?  This might need a spec change, depending....
> What do Chakra and Carakan do?
Carakan doesn't throw URIError when this '%ED%A0%80' representation comes. (in Opera version 11.11)

But, Chakra (in IE9 and IE10 platform preview) and IE6, 7, 8 rejects this.
Probably safe to do this, then....  hopefully.
Rhino seems to have same problem.
I filed new bug for Rhino (bug 660799).
FYI: Safari5.0.5(Windows) rejects this.
Attached patch patch rev.1 (obsolete) — Splinter Review
Assignee: general → masa141421356
Status: NEW → ASSIGNED
Attachment #537339 - Flags: review?(jwalden+bmo)
Attached file Test (obsolete) —
Comment on attachment 537339 [details] [diff] [review]
patch rev.1

Review of attachment 537339 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +6062,5 @@
>              ucs4Char = ucs4Char<<6 | (*utf8Buffer++ & 0x3F);
>          }
>          if (JS_UNLIKELY(ucs4Char < minucs4Char)) {
> +            ucs4Char = INVALID_UTF8;
> +        } else if (JS_UNLIKELY(ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF)) {

I'd combine the two ifs into a single if, but it's not that important, really.
Attachment #537339 - Flags: review?(jwalden+bmo) → review+
Actually: add a test for this to js/src/tests/ecma_5/Global, please, with the patch.  Nothing fancy, just check decodeURIComponent("%ED%A0%80") throws a URIError.
Attached patch Patch rev.1.1 (obsolete) — Splinter Review
Attachment #537339 - Attachment is obsolete: true
Attached patch Test rev.1.1 (obsolete) — Splinter Review
Attachment #537340 - Attachment is obsolete: true
Attachment #538687 - Flags: review?(jwalden+bmo)
Attached patch Test rev.1.2 (obsolete) — Splinter Review
Attachment #538688 - Attachment is obsolete: true
Attachment #538689 - Flags: review?(jwalden+bmo)
Comment on attachment 538687 [details] [diff] [review]
Patch rev.1.1

>-        if (JS_UNLIKELY(ucs4Char < minucs4Char)) {
>-            ucs4Char = OVERLONG_UTF8;
>+        if (JS_UNLIKELY((ucs4Char < minucs4Char) || (ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF))) {
>+            ucs4Char = INVALID_UTF8;

Nitpicky, yeah, but this is what I meant by combining them:

if (JS_UNLIKELY(ucs4Char < minucs4Char || (ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF)))

Looks fine with that.
Attachment #538687 - Flags: review?(jwalden+bmo) → review+
Attachment #538689 - Flags: review?(jwalden+bmo) → review+
Attached patch Unified patch (obsolete) — Splinter Review
Attachment #538687 - Attachment is obsolete: true
Attachment #538689 - Attachment is obsolete: true
Attachment #541051 - Flags: review?(jwalden+bmo)
Comment on attachment 541051 [details] [diff] [review]
Unified patch

Review of attachment 541051 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +5973,5 @@
>          while (--utf8Length) {
>              JS_ASSERT((*utf8Buffer & 0xC0) == 0x80);
>              ucs4Char = ucs4Char<<6 | (*utf8Buffer++ & 0x3F);
>          }
> +        if (JS_UNLIKELY(ucs4Char < minucs4Char || ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF)) {

You should over-parenthesize the second condition here, as I did in comment 13 -- first because I suspect people generally don't fully internalize the precedence of || versus &&, and second because some compilers will warn precisely for that reason.  r+ with that change, and I see no reason to take a last look after you've made it.  :-)
Attachment #541051 - Flags: review?(jwalden+bmo) → review+
added ().
Attachment #541051 - Attachment is obsolete: true
Attachment #541685 - Flags: review?(jwalden+bmo)
Attachment #541685 - Flags: review?(jwalden+bmo) → review+
Masahiro, do you need this landed?  I can't remember if I've seen you commit patches or not.
Summary: Utf8ToOneUcs4Char passes invalid UTF-8 octets '%ED%A0%80' → Utf8ToOneUcs4Char passes invalid UTF-8 octets '%ED%A0%80', so decodeURIComponent('%ED%A0%80') doesn't throw
Attachment #541685 - Flags: checkin?(jwalden+bmo)
(In reply to comment #17)
> Masahiro, do you need this landed?  I can't remember if I've seen you commit
> patches or not.

Jeff, I want to land.(But I'm not commiter).
I've got it queued up locally -- next time I have things to push, it'll get pushed to TM.
http://hg.mozilla.org/integration/mozilla-inbound/rev/0d5626387b8a
Whiteboard: [inbound]
Target Milestone: --- → mozilla8
Attachment #541685 - Flags: checkin?(jwalden+bmo) → checkin+
Merged:
http://hg.mozilla.org/mozilla-central/rev/0d5626387b8a
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [inbound]
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: