Utf8ToOneUcs4Char in jsstr.cpp ,overlong UTF-8 seqence detection problem.




JavaScript Engine
8 years ago
6 years ago


(Reporter: Masahiro YAMADA, Assigned: Masahiro YAMADA)


Dependency tree / graph
Bug Flags:
blocking1.9.2 +
in-testsuite +

Firefox Tracking Flags

(status1.9.2 beta3-fixed, status1.9.1 wontfix)


(Whiteboard: [sg:want?] fixed-in-tracemonkey, URL)


(4 attachments, 2 obsolete attachments)



8 years ago
Created attachment 395816 [details]

see http://mxr.mozilla.org/mozilla-central/source/js/src/jsstr.cpp#5132

>    /* from Unicode 3.1, non-shortest form is illegal */
>    static const uint32 minucs4Table[] = {
>        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>    };

It should be as following (three zeros are missing)

    /* from Unicode 3.1, non-shortest form is illegal */
    static const uint32 minucs4Table[] = {
        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000

Comment 1

8 years ago
Created attachment 395817 [details]
fixed testcase
Attachment #395816 - Attachment is obsolete: true


8 years ago
Blocks: 505991
For the record, ECMA-262 ed. 3 required that overlong sequences be accepted.  ES5 changes to the now-standard position of forbidding overlong sequences.
Whiteboard: [good first bug]

Comment 3

8 years ago
Created attachment 395934 [details] [diff] [review]
PAtch rev.1.0
Assignee: general → masa141421356


8 years ago
Attachment #395934 - Flags: review?(jwalden+bmo)


8 years ago
OS: Windows XP → All
Hardware: x86 → All

Comment 4

8 years ago
"Rhino" does not have same problem (see also : bug 173180).
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -5124,17 +5124,17 @@ js_OneUcs4ToUtf8Char(uint8 *utf8Buffer, 
>  */
> static uint32
> Utf8ToOneUcs4Char(const uint8 *utf8Buffer, int utf8Length)
> {
>     uint32 ucs4Char;
>     uint32 minucs4Char;
>     /* from Unicode 3.1, non-shortest form is illegal */
>     static const uint32 minucs4Table[] = {
>-        0x00000080, 0x00000800, 0x0001000, 0x0020000, 0x0400000
>+        0x00000080, 0x00000800, 0x00010000, 0x00200000, 0x04000000
>     };
>     JS_ASSERT(utf8Length >= 1 && utf8Length <= 6);
>     if (utf8Length == 1) {
>         ucs4Char = *utf8Buffer;
>         JS_ASSERT(!(ucs4Char & 0x80));
>     } else {
>         JS_ASSERT((*utf8Buffer & (0x100 - (1 << (7-utf8Length)))) ==

This patch is sufficient to fix the problem according to ECMAScript-262 ed. 3, to be sure.  However, I'm not sure, at this point, that that's sufficient for an acceptable fix, given that ES5 is very nearly out and requires rejection of overlong sequences, and that the necessary changes to address that are not large in scope.  Would you mind creating a patch which addresses those problems?  Minimally, I see the length of the array here, the 6 and 7 in the assertions in context, and the 6 in Decode -- I'd appreciate it if you could look and see if there are any others that need addressing.

Also, if I could trouble you to do so, adding a test or two to TestUTF8 in js/src/shell/js.cpp that will catch overlong sequences would be appreciated.

Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

Removing review request in case the previous comment went unnoticed...
Attachment #395934 - Flags: review?(jwalden+bmo)
We often make progress (esp. with contributed patches -- thanks to the reporter indeed for this one) by taking a bird in the hand and doing a followup bug to get the second in the bush.



8 years ago
Blocks: 514760
Duplicate of this bug: 518347
Duplicate of this bug: 514760

Comment 10

8 years ago
Comment on attachment 395934 [details] [diff] [review]
PAtch rev.1.0

I wrote unified patch for this bug and bug 514760 and bug 518347.
And now testing.
Attachment #395934 - Attachment is obsolete: true

Comment 11

8 years ago
Created attachment 403067 [details] [diff] [review]
Patch rev.1.1

This patch contains fix of bug 518347 and bug 514760.

Memo: This patch causes to test failure of js1_5/Regress/regress-172699.js but it is by design.
Attachment #403067 - Flags: review?(jwalden+bmo)

Comment 12

8 years ago
Masahiro, please fix or disable the test as part of the checkin.
Attachment #403067 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 403067 [details] [diff] [review]
Patch rev.1.1

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp

>+#define OVERLONG_UTF8 ((uint32)0xFFFFFFFF)

For future reference, we're using C++ const values rather than defines these days -- better typing and all that.

>-        if (ucs4Char < minucs4Char ||
>-            ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>+        if (ucs4Char < minucs4Char) {
>+            ucs4Char = OVERLONG_UTF8;
>+        } else if (ucs4Char == 0xFFFE || ucs4Char == 0xFFFF) {
>             ucs4Char = 0xFFFD;
>         }

Seems like a good idea to add a JS_UNLIKELY() around the overlong comparison as a helper for compilation without PGO.

I'll push this with nits fixed to TM later tonight -- definite fodder for 192 as well, I think.  Thanks!
Hum.  So, looking and thinking about the existing behavior not changed by this patch, I don't see the U+FFF(EF)-to-replacement behavior in the spec.  Bug 172699 added it when doing the prior-to-this-patch decode-overlong-to-replacement behavior.  Should we implement this extra conversion not specified by ES5?  It's not clear, certainly not without any motivating help from bug comments.  Filed bug 520095 to address this, even if addressing is simply adding a comment next to those lines saying why the deviation from spec exists.

Comment 15

8 years ago
Created attachment 404251 [details]

This test fails on unpatched browser, but not fails on patched edition.

# This test does not contain test of changes on Ucs4 to UTF-8 convertion.
# it is reason why I don't set "review?" flag to this.
Created attachment 404323 [details] [diff] [review]
The test-fix I was going to check in, but the previous changes are better

I was going to make these minimal changes to get the test to work again, but yours are much better, touching more of the different ways the patch here would change behavior.
...actually, on second look, it probably makes sense to add both your test and make the changes I'd made.  Did that, with a few formatting tweaks to your test (we surround binary operators with a space on each side, so |expect = foo| rather than |expect=foo|), and pushed to TM (also with appropriate changes to the relevant jstests.list file):


Thanks again!
Whiteboard: [good first bug] → fixed-in-tracemonkey
Accepting overlong UTF-8 sequences is a well-known bug with potential security implications.  In the context of JavaScript those implications are likely less serious than in, say, C code, but it still seems important to fix this.

If this is to be fixed for 1.9.2 (I think it should given the security considerations), I think we can all agree it must be fixed for the beta.
Flags: blocking1.9.2?
Whiteboard: fixed-in-tracemonkey → [sg:want?] fixed-in-tracemonkey

Comment 19

8 years ago
thanks waldo!
Flags: in-testsuite+


8 years ago
Flags: blocking1.9.2? → blocking1.9.2+


8 years ago
Priority: -- → P2


8 years ago
Priority: P2 → P1

Comment 20

8 years ago
Last Resolved: 8 years ago
Resolution: --- → FIXED

Comment 21

8 years ago
v 1.9.3


8 years ago
Duplicate of this bug: 522634

Comment 23

8 years ago
status1.9.2: --- → final-fixed
status1.9.1: --- → wontfix
You need to log in before you can comment on or make changes to this bug.