Closed Bug 511859 Opened 12 years ago Closed 12 years ago

Utf8ToOneUcs4Char in jsstr.cpp ,overlong UTF-8 seqence detection problem.


(Core :: JavaScript Engine, defect, P1)




Tracking Status
status1.9.2 --- beta3-fixed
status1.9.1 --- wontfix


(Reporter: masa141421356, Assigned: masa141421356)




(Whiteboard: [sg:want?] fixed-in-tracemonkey)


(4 files, 2 obsolete files)

Attached file testcase (obsolete) —

>    /* from Unicode 3.1, non-shortest form is illegal */
>    static const uint32 minucs4Table[] = {
>        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>    };

It should be as following (three zeros are missing)

    /* from Unicode 3.1, non-shortest form is illegal */
    static const uint32 minucs4Table[] = {
        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
Attached file fixed testcase
Attachment #395816 - Attachment is obsolete: true
Blocks: 505991
For the record, ECMA-262 ed. 3 required that overlong sequences be accepted.  ES5 changes to the now-standard position of forbidding overlong sequences.
Whiteboard: [good first bug]
Attached patch PAtch rev.1.0 (obsolete) — Splinter Review
Assignee: general → masa141421356
Attachment #395934 - Flags: review?(jwalden+bmo)
OS: Windows XP → All
Hardware: x86 → All
"Rhino" does not have same problem (see also : bug 173180).
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -5124,17 +5124,17 @@ js_OneUcs4ToUtf8Char(uint8 *utf8Buffer, 
>  */
> static uint32
> Utf8ToOneUcs4Char(const uint8 *utf8Buffer, int utf8Length)
> {
>     uint32 ucs4Char;
>     uint32 minucs4Char;
>     /* from Unicode 3.1, non-shortest form is illegal */
>     static const uint32 minucs4Table[] = {
>-        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>+        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
>     };
>     JS_ASSERT(utf8Length >= 1 && utf8Length <= 6);
>     if (utf8Length == 1) {
>         ucs4Char = *utf8Buffer;
>         JS_ASSERT(!(ucs4Char & 0x80));
>     } else {
>         JS_ASSERT((*utf8Buffer & (0x100 - (1 << (7-utf8Length)))) ==

This patch is sufficient to fix the problem according to ECMAScript-262 ed. 3, to be sure.  However, I'm not sure, at this point, that that's sufficient for an acceptable fix, given that ES5 is very nearly out and requires rejection of overlong sequences, and that the necessary changes to address that are not large in scope.  Would you mind creating a patch which addresses those problems?  Minimally, I see the length of the array here, the 6 and 7 in the assertions in context, and the 6 in Decode -- I'd appreciate it if you could look and see if there are any others that need addressing.

Also, if I could trouble you to do so, adding a test or two to TestUTF8 in js/src/shell/js.cpp that will catch overlong sequences would be appreciated.

Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

Removing review request in case the previous comment went unnoticed...
Attachment #395934 - Flags: review?(jwalden+bmo)
We often make progress (esp. with contributed patches -- thanks to the reporter indeed for this one) by taking a bird in the hand and doing a followup bug to get the second in the bush.

Blocks: 514760
Duplicate of this bug: 518347
Duplicate of this bug: 514760
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

I wrote unified patch for this bug and bug 514760 and bug 518347.
And now testing.
Attachment #395934 - Attachment is obsolete: true
Attached patch Patch rev.1.1Splinter Review
This patch contains fix of bug 518347 and bug 514760.

Memo: This patch causes to test failure of js1_5/Regress/regress-172699.js but it is by design.
Attachment #403067 - Flags: review?(jwalden+bmo)
Masahiro, please fix or disable the test as part of the checkin.
Attachment #403067 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 403067 [details] [diff] [review]
Patch rev.1.1

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp

>+#define OVERLONG_UTF8 ((uint32)0xFFFFFFFF)

For future reference, we're using C++ const values rather than defines these days -- better typing and all that.

>-        if (ucs4Char < minucs4Char ||
>-            ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>+        if (ucs4Char < minucs4Char) {
>+            ucs4Char = OVERLONG_UTF8;
>+        } else if (ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>             ucs4Char = 0xFFFD;
>         }

Seems like a good idea to add a JS_UNLIKELY() around the overlong comparison as a helper for compilation without PGO.

I'll push this with nits fixed to TM later tonight -- definite fodder for 192 as well, I think.  Thanks!
Hum.  So, looking and thinking about the existing behavior not changed by this patch, I don't see the U+FFF(EF)-to-replacement behavior in the spec.  Bug 172699 added it when doing the prior-to-this-patch decode-overlong-to-replacement behavior.  Should we implement this extra conversion not specified by ES5?  It's not clear, certainly not without any motivating help from bug comments.  Filed bug 520095 to address this, even if addressing is simply adding a comment next to those lines saying why the deviation from spec exists.
Attached file Test
This test fails on unpatched browser, but not fails on patched edition.

# This test does not contain test of changes on Ucs4 to UTF-8 convertion.
# it is reason why I don't set "review?" flag to this.
I was going to make these minimal changes to get the test to work again, but yours are much better, touching more of the different ways the patch here would change behavior.
...actually, on second look, it probably makes sense to add both your test and make the changes I'd made.  Did that, with a few formatting tweaks to your test (we surround binary operators with a space on each side, so |expect = foo| rather than |expect=foo|), and pushed to TM (also with appropriate changes to the relevant jstests.list file):

Thanks again!
Whiteboard: [good first bug] → fixed-in-tracemonkey
Accepting overlong UTF-8 sequences is a well-known bug with potential security implications.  In the context of JavaScript those implications are likely less serious than in, say, C code, but it still seems important to fix this.

If this is to be fixed for 1.9.2 (I think it should given the security considerations), I think we can all agree it must be fixed for the beta.
Flags: blocking1.9.2?
Whiteboard: fixed-in-tracemonkey → [sg:want?] fixed-in-tracemonkey
thanks waldo!
Flags: in-testsuite+
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
Priority: P2 → P1
Closed: 12 years ago
Resolution: --- → FIXED
v 1.9.3
Duplicate of this bug: 522634
You need to log in before you can comment on or make changes to this bug.