Last Comment Bug 511859 - Utf8ToOneUcs4Char in jsstr.cpp ,overlong UTF-8 seqence detection problem.
: Utf8ToOneUcs4Char in jsstr.cpp ,overlong UTF-8 seqence detection problem.
Status: VERIFIED FIXED
[sg:want?] fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: P1 normal (vote)
: ---
Assigned To: Masahiro YAMADA
:
:
Mentors:
http://mxr.mozilla.org/mozilla-centra...
: 514760 518347 522634 (view as bug list)
Depends on:
Blocks: 172699 505991 514760
  Show dependency treegraph
 
Reported: 2009-08-21 05:36 PDT by Masahiro YAMADA
Modified: 2011-04-14 18:31 PDT (History)
7 users (show)
sayrer: blocking1.9.2+
bob: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta3-fixed
wontfix


Attachments
testcase (1.49 KB, text/html)
2009-08-21 05:36 PDT, Masahiro YAMADA
no flags Details
fixed testcase (1.52 KB, text/html)
2009-08-21 05:45 PDT, Masahiro YAMADA
no flags Details
PAtch rev.1.0 (772 bytes, patch)
2009-08-21 13:53 PDT, Masahiro YAMADA
no flags Details | Diff | Splinter Review
Patch rev.1.1 (5.31 KB, patch)
2009-09-26 15:38 PDT, Masahiro YAMADA
jwalden+bmo: review+
Details | Diff | Splinter Review
Test (5.05 KB, text/plain)
2009-10-02 08:28 PDT, Masahiro YAMADA
no flags Details
The test-fix I was going to check in, but the previous changes are better (888 bytes, patch)
2009-10-02 13:43 PDT, Jeff Walden [:Waldo] (remove +bmo to email)
no flags Details | Diff | Splinter Review

Description Masahiro YAMADA 2009-08-21 05:36:19 PDT
Created attachment 395816 [details]
testcase

see http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#5132

>    /* from Unicode 3.1, non-shortest form is illegal */
>    static const uint32 minucs4Table[] = {
>        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>    };

It should be as following (three zeros are missing)

    /* from Unicode 3.1, non-shortest form is illegal */
    static const uint32 minucs4Table[] = {
        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
    };
Comment 1 Masahiro YAMADA 2009-08-21 05:45:22 PDT
Created attachment 395817 [details]
fixed testcase
Comment 2 Jeff Walden [:Waldo] (remove +bmo to email) 2009-08-21 06:06:29 PDT
For the record, ECMA-262 ed. 3 required that overlong sequences be accepted.  ES5 changes to the now-standard position of forbidding overlong sequences.
Comment 3 Masahiro YAMADA 2009-08-21 13:53:15 PDT
Created attachment 395934 [details] [diff] [review]
PAtch rev.1.0
Comment 4 Masahiro YAMADA 2009-08-25 02:54:01 PDT
"Rhino" does not have same problem (see also : bug 173180).
Comment 5 Jeff Walden [:Waldo] (remove +bmo to email) 2009-08-31 18:21:03 PDT
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -5124,17 +5124,17 @@ js_OneUcs4ToUtf8Char(uint8 *utf8Buffer, 
>  */
> static uint32
> Utf8ToOneUcs4Char(const uint8 *utf8Buffer, int utf8Length)
> {
>     uint32 ucs4Char;
>     uint32 minucs4Char;
>     /* from Unicode 3.1, non-shortest form is illegal */
>     static const uint32 minucs4Table[] = {
>-        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>+        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
>     };
> 
>     JS_ASSERT(utf8Length >= 1 && utf8Length <= 6);
>     if (utf8Length == 1) {
>         ucs4Char = *utf8Buffer;
>         JS_ASSERT(!(ucs4Char & 0x80));
>     } else {
>         JS_ASSERT((*utf8Buffer & (0x100 - (1 << (7-utf8Length)))) ==

This patch is sufficient to fix the problem according to ECMAScript-262 ed. 3, to be sure.  However, I'm not sure, at this point, that that's sufficient for an acceptable fix, given that ES5 is very nearly out and requires rejection of overlong sequences, and that the necessary changes to address that are not large in scope.  Would you mind creating a patch which addresses those problems?  Minimally, I see the length of the array here, the 6 and 7 in the assertions in context, and the 6 in Decode -- I'd appreciate it if you could look and see if there are any others that need addressing.

Also, if I could trouble you to do so, adding a test or two to TestUTF8 in js/src/shell/js.cpp that will catch overlong sequences would be appreciated.

Thanks!
Comment 6 Jeff Walden [:Waldo] (remove +bmo to email) 2009-09-01 13:48:33 PDT
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

Removing review request in case the previous comment went unnoticed...
Comment 7 Brendan Eich [:brendan] 2009-09-01 13:53:59 PDT
We often make progress (esp. with contributed patches -- thanks to the reporter indeed for this one) by taking a bird in the hand and doing a followup bug to get the second in the bush.

/be
Comment 8 Jeff Walden [:Waldo] (remove +bmo to email) 2009-09-24 01:18:40 PDT
*** Bug 518347 has been marked as a duplicate of this bug. ***
Comment 9 Jeff Walden [:Waldo] (remove +bmo to email) 2009-09-24 01:20:14 PDT
*** Bug 514760 has been marked as a duplicate of this bug. ***
Comment 10 Masahiro YAMADA 2009-09-25 05:57:41 PDT
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

I wrote unified patch for this bug and bug 514760 and bug 518347.
And now testing.
Comment 11 Masahiro YAMADA 2009-09-26 15:38:51 PDT
Created attachment 403067 [details] [diff] [review]
Patch rev.1.1

This patch contains fix of bug 518347 and bug 514760.

Memo: This patch causes to test failure of js1_5/Regress/regress-172699.js but it is by design.
Comment 12 Bob Clary [:bc:] 2009-09-27 08:59:32 PDT
Masahiro, please fix or disable the test as part of the checkin.
Comment 13 Jeff Walden [:Waldo] (remove +bmo to email) 2009-10-01 16:59:52 PDT
Comment on attachment 403067 [details] [diff] [review]
Patch rev.1.1

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp

>+#define OVERLONG_UTF8 ((uint32)0xFFFFFFFF)

For future reference, we're using C++ const values rather than defines these days -- better typing and all that.


>-        if (ucs4Char < minucs4Char ||
>-            ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>+        if (ucs4Char < minucs4Char) {
>+            ucs4Char = OVERLONG_UTF8;
>+        } else if (ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>             ucs4Char = 0xFFFD;
>         }

Seems like a good idea to add a JS_UNLIKELY() around the overlong comparison as a helper for compilation without PGO.


I'll push this with nits fixed to TM later tonight -- definite fodder for 192 as well, I think.  Thanks!
Comment 14 Jeff Walden [:Waldo] (remove +bmo to email) 2009-10-01 17:41:40 PDT
Hum.  So, looking and thinking about the existing behavior not changed by this patch, I don't see the U+FFF(EF)-to-replacement behavior in the spec.  Bug 172699 added it when doing the prior-to-this-patch decode-overlong-to-replacement behavior.  Should we implement this extra conversion not specified by ES5?  It's not clear, certainly not without any motivating help from bug comments.  Filed bug 520095 to address this, even if addressing is simply adding a comment next to those lines saying why the deviation from spec exists.
Comment 15 Masahiro YAMADA 2009-10-02 08:28:55 PDT
Created attachment 404251 [details]
Test

This test fails on unpatched browser, but not fails on patched edition.

# This test does not contain test of changes on Ucs4 to UTF-8 convertion.
# it is reason why I don't set "review?" flag to this.
Comment 16 Jeff Walden [:Waldo] (remove +bmo to email) 2009-10-02 13:43:46 PDT
Created attachment 404323 [details] [diff] [review]
The test-fix I was going to check in, but the previous changes are better

I was going to make these minimal changes to get the test to work again, but yours are much better, touching more of the different ways the patch here would change behavior.
Comment 17 Jeff Walden [:Waldo] (remove +bmo to email) 2009-10-02 13:58:51 PDT
...actually, on second look, it probably makes sense to add both your test and make the changes I'd made.  Did that, with a few formatting tweaks to your test (we surround binary operators with a space on each side, so |expect = foo| rather than |expect=foo|), and pushed to TM (also with appropriate changes to the relevant jstests.list file):

http://hg.mozilla.org/tracemonkey/rev/8bb236bece1f

Thanks again!
Comment 18 Jeff Walden [:Waldo] (remove +bmo to email) 2009-10-02 14:05:59 PDT
Accepting overlong UTF-8 sequences is a well-known bug with potential security implications.  In the context of JavaScript those implications are likely less serious than in, say, C code, but it still seems important to fix this.

If this is to be fixed for 1.9.2 (I think it should given the security considerations), I think we can all agree it must be fixed for the beta.
Comment 19 Bob Clary [:bc:] 2009-10-02 15:09:17 PDT
thanks waldo!
Comment 21 Bob Clary [:bc:] 2009-10-15 10:44:30 PDT
v 1.9.3
Comment 22 Masahiro YAMADA 2009-10-18 20:22:09 PDT
*** Bug 522634 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.