Closed Bug 511859 Opened 15 years ago Closed 15 years ago

Utf8ToOneUcs4Char in jsstr.cpp ,overlong UTF-8 seqence detection problem.

Tracking

()

Status:

VERIFIED FIXED

Tracking Flags:

Tracking

Status

status1.9.2

---

beta3-fixed

status1.9.1

---

wontfix

People

(Reporter: masa141421356, Assigned: masa141421356)

References

(
URL
)

Details

(Whiteboard: [sg:want?] fixed-in-tracemonkey)

Attachments

(4 files, 2 obsolete files)

testcase 15 years ago Masahiro YAMADA 1.49 KB, text/html		Details
fixed testcase 15 years ago Masahiro YAMADA 1.52 KB, text/html		Details
PAtch rev.1.0 15 years ago Masahiro YAMADA 772 bytes, patch		Details \| Diff \| Splinter Review
Patch rev.1.1 15 years ago Masahiro YAMADA 5.31 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Test 15 years ago Masahiro YAMADA 5.05 KB, text/plain		Details
The test-fix I was going to check in, but the previous changes are better 15 years ago Jeff Walden [:Waldo] 888 bytes, patch		Details \| Diff \| Splinter Review

Masahiro YAMADA

Assignee

Description

•

15 years ago

Attached file testcase (obsolete) — Details

see http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#5132

>    /* from Unicode 3.1, non-shortest form is illegal */
>    static const uint32 minucs4Table[] = {
>        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>    };

It should be as following (three zeros are missing)

    /* from Unicode 3.1, non-shortest form is illegal */
    static const uint32 minucs4Table[] = {
        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
    };

Masahiro YAMADA

Assignee

Comment 1

•

15 years ago

Attached file fixed testcase — Details

Attachment #395816 - Attachment is obsolete: true

Masahiro YAMADA

Assignee

Updated

•

15 years ago

Blocks: 505991

Jeff Walden [:Waldo]

Comment 2

•

15 years ago

For the record, ECMA-262 ed. 3 required that overlong sequences be accepted.  ES5 changes to the now-standard position of forbidding overlong sequences.

Whiteboard: [good first bug]

Masahiro YAMADA

Assignee

Comment 3

•

15 years ago

Attached patch PAtch rev.1.0 (obsolete) — Details — Splinter Review

Assignee: general → masa141421356

Status: NEW → ASSIGNED

Masahiro YAMADA

Assignee

Updated

•

15 years ago

Attachment #395934 - Flags: review?(jwalden+bmo)

Masahiro YAMADA

Assignee

Updated

•

15 years ago

OS: Windows XP → All

Hardware: x86 → All

Masahiro YAMADA

Assignee

Comment 4

•

15 years ago

"Rhino" does not have same problem (see also : bug 173180).

Jeff Walden [:Waldo]

Comment 5

•

15 years ago

Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -5124,17 +5124,17 @@ js_OneUcs4ToUtf8Char(uint8 *utf8Buffer, 
>  */
> static uint32
> Utf8ToOneUcs4Char(const uint8 *utf8Buffer, int utf8Length)
> {
>     uint32 ucs4Char;
>     uint32 minucs4Char;
>     /* from Unicode 3.1, non-shortest form is illegal */
>     static const uint32 minucs4Table[] = {
>-        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>+        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
>     };
> 
>     JS_ASSERT(utf8Length >= 1 && utf8Length <= 6);
>     if (utf8Length == 1) {
>         ucs4Char = *utf8Buffer;
>         JS_ASSERT(!(ucs4Char & 0x80));
>     } else {
>         JS_ASSERT((*utf8Buffer & (0x100 - (1 << (7-utf8Length)))) ==

This patch is sufficient to fix the problem according to ECMAScript-262 ed. 3, to be sure.  However, I'm not sure, at this point, that that's sufficient for an acceptable fix, given that ES5 is very nearly out and requires rejection of overlong sequences, and that the necessary changes to address that are not large in scope.  Would you mind creating a patch which addresses those problems?  Minimally, I see the length of the array here, the 6 and 7 in the assertions in context, and the 6 in Decode -- I'd appreciate it if you could look and see if there are any others that need addressing.

Also, if I could trouble you to do so, adding a test or two to TestUTF8 in js/src/shell/js.cpp that will catch overlong sequences would be appreciated.

Thanks!

Jeff Walden [:Waldo]

Comment 6

•

15 years ago

Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

Removing review request in case the previous comment went unnoticed...

Attachment #395934 - Flags: review?(jwalden+bmo)

Brendan Eich [:brendan]

Comment 7

•

15 years ago

We often make progress (esp. with contributed patches -- thanks to the reporter indeed for this one) by taking a bird in the hand and doing a followup bug to get the second in the bush.

/be

Masahiro YAMADA

Assignee

Updated

•

15 years ago

Blocks: 514760

Masahiro YAMADA

Assignee

Comment 10

•

15 years ago

Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

I wrote unified patch for this bug and bug 514760 and bug 518347.
And now testing.

Attachment #395934 - Attachment is obsolete: true

Masahiro YAMADA

Assignee

Comment 11

•

15 years ago

Attached patch Patch rev.1.1 — Details — Splinter Review

This patch contains fix of bug 518347 and bug 514760.

Memo: This patch causes to test failure of js1_5/Regress/regress-172699.js but it is by design.

Attachment #403067 - Flags: review?(jwalden+bmo)

Bob Clary [:bc] (inactive)

Comment 12

•

15 years ago

Masahiro, please fix or disable the test as part of the checkin.

Jeff Walden [:Waldo]

Updated

•

15 years ago

Attachment #403067 - Flags: review?(jwalden+bmo) → review+

Jeff Walden [:Waldo]

Comment 13

•

15 years ago

Comment on attachment 403067 [details] [diff] [review]
Patch rev.1.1

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp

>+#define OVERLONG_UTF8 ((uint32)0xFFFFFFFF)

For future reference, we're using C++ const values rather than defines these days -- better typing and all that.


>-        if (ucs4Char < minucs4Char ||
>-            ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>+        if (ucs4Char < minucs4Char) {
>+            ucs4Char = OVERLONG_UTF8;
>+        } else if (ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>             ucs4Char = 0xFFFD;
>         }

Seems like a good idea to add a JS_UNLIKELY() around the overlong comparison as a helper for compilation without PGO.


I'll push this with nits fixed to TM later tonight -- definite fodder for 192 as well, I think.  Thanks!

Jeff Walden [:Waldo]

Comment 14

•

15 years ago

Hum.  So, looking and thinking about the existing behavior not changed by this patch, I don't see the U+FFF(EF)-to-replacement behavior in the spec.  Bug 172699 added it when doing the prior-to-this-patch decode-overlong-to-replacement behavior.  Should we implement this extra conversion not specified by ES5?  It's not clear, certainly not without any motivating help from bug comments.  Filed bug 520095 to address this, even if addressing is simply adding a comment next to those lines saying why the deviation from spec exists.

Masahiro YAMADA

Assignee

Comment 15

•

15 years ago

Attached file Test — Details

This test fails on unpatched browser, but not fails on patched edition.

# This test does not contain test of changes on Ucs4 to UTF-8 convertion.
# it is reason why I don't set "review?" flag to this.

Jeff Walden [:Waldo]

Comment 16

•

15 years ago

Attached patch The test-fix I was going to check in, but the previous changes are better — Details — Splinter Review

I was going to make these minimal changes to get the test to work again, but yours are much better, touching more of the different ways the patch here would change behavior.

Jeff Walden [:Waldo]

Comment 17

•

15 years ago

...actually, on second look, it probably makes sense to add both your test and make the changes I'd made.  Did that, with a few formatting tweaks to your test (we surround binary operators with a space on each side, so |expect = foo| rather than |expect=foo|), and pushed to TM (also with appropriate changes to the relevant jstests.list file):

http://hg.mozilla.org/tracemonkey/rev/8bb236bece1f

Thanks again!

Whiteboard: [good first bug] → fixed-in-tracemonkey

Jeff Walden [:Waldo]

Comment 18

•

15 years ago

Accepting overlong UTF-8 sequences is a well-known bug with potential security implications.  In the context of JavaScript those implications are likely less serious than in, say, C code, but it still seems important to fix this.

If this is to be fixed for 1.9.2 (I think it should given the security considerations), I think we can all agree it must be fixed for the beta.

Flags: blocking1.9.2?

Whiteboard: fixed-in-tracemonkey → [sg:want?] fixed-in-tracemonkey

Bob Clary [:bc] (inactive)

Comment 19

•

15 years ago

thanks waldo!

Flags: in-testsuite+

Robert Sayre

Updated

•

15 years ago

Flags: blocking1.9.2? → blocking1.9.2+

Robert Sayre

Updated

•

15 years ago

Priority: -- → P2

Robert Sayre

Updated

•

15 years ago

Priority: P2 → P1

Robert Sayre

Comment 20

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/8bb236bece1f

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Bob Clary [:bc] (inactive)

Comment 21

•

15 years ago

v 1.9.3

Status: RESOLVED → VERIFIED

Robert Sayre

Comment 23

•

15 years ago

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/e42c563313a0

status1.9.2: --- → final-fixed

Daniel Veditz [:dveditz]

Updated

•

13 years ago

status1.9.1: --- → wontfix

You need to log in before you can comment on or make changes to this bug.