Closed
Bug 660612
Opened 13 years ago
Closed 13 years ago
Utf8ToOneUcs4Char passes invalid UTF-8 octets '%ED%A0%80', so decodeURIComponent('%ED%A0%80') doesn't throw
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla8
People
(Reporter: utatane.tea, Assigned: masa141421356)
Details
Attachments
(1 file, 6 obsolete files)
2.44 KB,
patch
|
Waldo
:
review+
Waldo
:
checkin+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.9) Gecko/2008052906 Firefox/3.0 Build Identifier: according to the RFC3629, http://www.ietf.org/rfc/rfc3629.txt following octets %xED %A0 %80 (U+D800. this code point is in surrogate area) is invalid UTF-8 octets. so, according to the ES5.1, section 15.1.3 Decode Algorithm, step 4-d-vii-8, "Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octets does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception." but, decodeURIComponent('%ED%A0%80') doesn't throw URIError. Reproducible: Always Steps to Reproduce: 1. evaluates `decodeURIComponent('%ED%A0%80')` Actual Results: 1. returns '\uDC00'. This code point is in surrogate area. Expected Results: 1. throw URIError I found the same bug in V8, and report the issue. http://code.google.com/p/v8/issues/detail?id=1415 JSC is checking in wtf/unicode/UTC8.cpp, decodeUTF8Sequence function, Handle 3-byte sequences section. code: // UTF-16 surrogates should never appear in UTF-8 data if (c >= 0xD800 && c <= 0xDFFFF) return -1;
Assignee | ||
Updated•13 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 1•13 years ago
|
||
What do Chakra and Carakan do? This might need a spec change, depending....
Reporter | ||
Comment 2•13 years ago
|
||
> What do Chakra and Carakan do?
Carakan doesn't throw URIError when this '%ED%A0%80' representation comes. (in Opera version 11.11)
But, Chakra (in IE9 and IE10 platform preview) and IE6, 7, 8 rejects this.
Comment 3•13 years ago
|
||
Probably safe to do this, then.... hopefully.
Assignee | ||
Comment 4•13 years ago
|
||
Rhino seems to have same problem. I filed new bug for Rhino (bug 660799).
Assignee | ||
Comment 5•13 years ago
|
||
FYI: Safari5.0.5(Windows) rejects this.
Assignee | ||
Comment 6•13 years ago
|
||
Assignee: general → masa141421356
Status: NEW → ASSIGNED
Attachment #537339 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 7•13 years ago
|
||
Comment 8•13 years ago
|
||
Comment on attachment 537339 [details] [diff] [review] patch rev.1 Review of attachment 537339 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.cpp @@ +6062,5 @@ > ucs4Char = ucs4Char<<6 | (*utf8Buffer++ & 0x3F); > } > if (JS_UNLIKELY(ucs4Char < minucs4Char)) { > + ucs4Char = INVALID_UTF8; > + } else if (JS_UNLIKELY(ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF)) { I'd combine the two ifs into a single if, but it's not that important, really.
Attachment #537339 -
Flags: review?(jwalden+bmo) → review+
Comment 9•13 years ago
|
||
Actually: add a test for this to js/src/tests/ecma_5/Global, please, with the patch. Nothing fancy, just check decodeURIComponent("%ED%A0%80") throws a URIError.
Assignee | ||
Comment 10•13 years ago
|
||
Attachment #537339 -
Attachment is obsolete: true
Assignee | ||
Comment 11•13 years ago
|
||
Attachment #537340 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #538687 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 12•13 years ago
|
||
Attachment #538688 -
Attachment is obsolete: true
Attachment #538689 -
Flags: review?(jwalden+bmo)
Comment 13•13 years ago
|
||
Comment on attachment 538687 [details] [diff] [review] Patch rev.1.1 >- if (JS_UNLIKELY(ucs4Char < minucs4Char)) { >- ucs4Char = OVERLONG_UTF8; >+ if (JS_UNLIKELY((ucs4Char < minucs4Char) || (ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF))) { >+ ucs4Char = INVALID_UTF8; Nitpicky, yeah, but this is what I meant by combining them: if (JS_UNLIKELY(ucs4Char < minucs4Char || (ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF))) Looks fine with that.
Attachment #538687 -
Flags: review?(jwalden+bmo) → review+
Updated•13 years ago
|
Attachment #538689 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 14•13 years ago
|
||
Attachment #538687 -
Attachment is obsolete: true
Attachment #538689 -
Attachment is obsolete: true
Attachment #541051 -
Flags: review?(jwalden+bmo)
Comment 15•13 years ago
|
||
Comment on attachment 541051 [details] [diff] [review] Unified patch Review of attachment 541051 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.cpp @@ +5973,5 @@ > while (--utf8Length) { > JS_ASSERT((*utf8Buffer & 0xC0) == 0x80); > ucs4Char = ucs4Char<<6 | (*utf8Buffer++ & 0x3F); > } > + if (JS_UNLIKELY(ucs4Char < minucs4Char || ucs4Char >= 0xD800 && ucs4Char <= 0xDFFF)) { You should over-parenthesize the second condition here, as I did in comment 13 -- first because I suspect people generally don't fully internalize the precedence of || versus &&, and second because some compilers will warn precisely for that reason. r+ with that change, and I see no reason to take a last look after you've made it. :-)
Attachment #541051 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 16•13 years ago
|
||
added ().
Attachment #541051 -
Attachment is obsolete: true
Attachment #541685 -
Flags: review?(jwalden+bmo)
Updated•13 years ago
|
Attachment #541685 -
Flags: review?(jwalden+bmo) → review+
Comment 17•13 years ago
|
||
Masahiro, do you need this landed? I can't remember if I've seen you commit patches or not.
Updated•13 years ago
|
Summary: Utf8ToOneUcs4Char passes invalid UTF-8 octets '%ED%A0%80' → Utf8ToOneUcs4Char passes invalid UTF-8 octets '%ED%A0%80', so decodeURIComponent('%ED%A0%80') doesn't throw
Assignee | ||
Updated•13 years ago
|
Attachment #541685 -
Flags: checkin?(jwalden+bmo)
Assignee | ||
Comment 18•13 years ago
|
||
(In reply to comment #17) > Masahiro, do you need this landed? I can't remember if I've seen you commit > patches or not. Jeff, I want to land.(But I'm not commiter).
Comment 19•13 years ago
|
||
I've got it queued up locally -- next time I have things to push, it'll get pushed to TM.
Comment 20•13 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/0d5626387b8a
Whiteboard: [inbound]
Target Milestone: --- → mozilla8
Updated•13 years ago
|
Attachment #541685 -
Flags: checkin?(jwalden+bmo) → checkin+
Comment 21•13 years ago
|
||
Merged: http://hg.mozilla.org/mozilla-central/rev/0d5626387b8a
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [inbound]
Version: unspecified → Trunk
You need to log in
before you can comment on or make changes to this bug.
Description
•