encodeURIComponent and decodeURIComponent give wrong output when input contains surrogate pairs.

RESOLVED FIXED

Status

Tamarin
Library
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: Frank Stokes-Guinan, Unassigned)

Tracking

Details

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

7 years ago
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10
Build Identifier: 

encodeURIComponent simply does not transcode UTF16 to UTF8 properly when the UTF16 code points contain a surrogate pair.
decodeURIComponent contains a logic flaw in a switch statement that mistakenly treats valid code points as invalid.

Reproducible: Always

Steps to Reproduce:
1. Create a string called src from char code and pass in a surrogate pair. 
    var src:String = String.fromCharCode(0xD842, 0xDF9F);
2. Encode the src by encodeURIComponent and trace it.
   var uri:String = encodeURIComponent(src);
	trace(uri);
3. Check the output.
4. Decode the uri by decodeURIComponent and check the result
	decodedStr = decodeURIComponent( uri );
	trace(decodedStr);

Actual Results:  
For step 3: %FB%A8%80%A0%80%00
For step 4: An error is raised.


Expected Results:  
For step3: %F0%A0%AE%9F%00
For step4: No error should be raised and the string can be decoded to original one.

I have a patch ready.  I'll be submitting it shortly.

Updated

7 years ago
Component: Virtual Machine → Library
QA Contact: vm → library

Comment 1

7 years ago
Created attachment 489523 [details] [diff] [review]
patch #1
Attachment #489523 - Flags: review?(stejohns)

Comment 2

7 years ago
Comment on attachment 489523 [details] [diff] [review]
patch #1

Looks right, and I can push this for you if you like, but we need a testcase for it. Perhaps you can sweet-talk someone on the VM QE team into writing one for you (perhaps based on whatever testcase originally found the bug)?
Attachment #489523 - Flags: review?(stejohns) → review+

Comment 3

7 years ago
Created attachment 489541 [details]
test case

you will only see the fix if you publish as SWF 12 to enable the bug compatibility code branches
Attachment #489541 - Flags: review?(stejohns)

Comment 4

7 years ago
cool, I'll clean it up and add it.

Comment 5

7 years ago
cpeyer, any chance you have the bandwidth to make this a proper testcase?

Updated

7 years ago
Depends on: 611114

Comment 6

7 years ago
Created attachment 489897 [details] [diff] [review]
Testcase converted to run in tamarin
Attachment #489541 - Attachment is obsolete: true
Attachment #489897 - Flags: review?(stejohns)
Attachment #489541 - Flags: review?(stejohns)

Comment 7

7 years ago
Created attachment 489900 [details] [diff] [review]
path #2 (apply after patch #1)

regression testing located a previously unknown issue - masked by bug - that would cause an %00 to be appended to an encoding string. this patch resolves the issue. (we needed to keep our traveller pointer in sync with the len var to determine when we reached the end of a string).

Comment 8

7 years ago
Comment on attachment 489523 [details] [diff] [review]
patch #1

>Change 746183 by cthilgen@CTHILGEN-AIR-WIN-CLEAN-SB on 2010/11/10 09:04:19
>
>	Summary: Fix for bad unicode conversion
>	Severity: important
>	Detailed Description: Conversion from UTF16 to Unicode Scalar Value was incorrect for characters outside the BMP. Conversion was doing a multiplication where it needs to do an addition. Decoding URI components was also faulty due to a logic error in a switch statement that mistakenly treats valid characters as invalid.
>	Dev/QA impact:
>	QA testing notes: [Any notes you put here should also be copied to the bug notes]
>	API change:
>	Bugs fixed: #2739179
>	Doc impact: 
>	Smokes passed: yep, full build and smoke on //depot/users/FlashRuntime/AirRed/FlashRuntime/thilgen/Main/code/...
>	Reviewer: frank, steven johnson, code collab #313613
>	Spec link: Bugzilla Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=609416, patch coming directly after this check-in
>	Chicken: open
>	Third-party code: n/a
>	Cryptography/encryption code: n/a
>
>Affected files ...
>
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp#64 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h#44 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp#13 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp#36 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp#29 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp#9 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h#6 integrate
>
>Differences ...
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp#64 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp	Wed Nov 10 09:12:34 2010
>@@ -4940,6 +4940,7 @@
>             bugzilla526662 = 1;     // XMLParser stops at NUL char
>             bugzilla558863 = 1;     // in operator on bytearray throws exception for non-natural number
>             bugzilla585791 = 1;     // String.localeCompare with a null String object returns 0
>+            bugzilla609416 = 1;     // encodeURIComponent and decodeURIComponent give wrong output when input contains surrogate pairs
>         }
>     }
> 
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h#44 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h	Wed Nov 10 09:12:34 2010
>@@ -335,6 +335,7 @@
>         unsigned bugzilla526662:1;      // XMLParser stops at NUL char
>         unsigned bugzilla558863:1;      // in operator on bytearray throws exception for non-natural number
>         unsigned bugzilla585791:1;      // String.localeCompare with a null String object returns 0
>+        unsigned bugzilla609416:1;      // encodeURIComponent and decodeURIComponent give wrong output when input contains surrogate pairs		
> 
>     protected:
>         friend class AvmCore;
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp#13 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp	Wed Nov 10 09:12:34 2010
>@@ -726,7 +726,7 @@
>             if (newLastIndex < subjectLength)
>             {
>                 uint32_t ch;
>-                int n = UnicodeUtils::Utf8ToUcs4((const uint8_t*)src+newLastIndex, subjectLength-newLastIndex, &ch);
>+                int n = UnicodeUtils::Utf8ToUcs4((const uint8_t*)src+newLastIndex, subjectLength-newLastIndex, &ch, core()->currentBugCompatibility()->bugzilla609416);
>                 if (n <= 0)
>                 {
>                     // Invalid UTF8 sequence, advance one uint8_t
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp#36 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp	Wed Nov 10 09:12:34 2010
>@@ -2611,6 +2611,7 @@
>                 uint32_t uch;
>                 int32_t bytesRead;
>                 Pointers dst(s); // NB, we assume that Utf8ToUcs4 doesn't allocate memory!
>+                bool bugzilla609416 = core->currentBugCompatibility()->bugzilla609416;
>                 while (len > 0)
>                 {
>                     if (*((int8_t*) buffer) > 0)
>@@ -2621,7 +2622,7 @@
>                     }
>                     else
>                     {
>-                        bytesRead = UnicodeUtils::Utf8ToUcs4(buffer, len, &uch);
>+                        bytesRead = UnicodeUtils::Utf8ToUcs4(buffer, len, &uch, bugzilla609416);
>                         if (bytesRead == 0)
>                         {
>                             // invalid sequence (only if strict was false)
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp#29 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp	Wed Nov 10 09:12:34 2010
>@@ -1002,6 +1002,8 @@
>         StUTF16String in16(in);
>         const wchar *src = in16.c_str();
>         int len = in->length();
>+		
>+        bool bugzilla609416 = core->currentBugCompatibility()->bugzilla609416;
> 
>         while (len--) {
>             wchar ch = *src;
>@@ -1021,7 +1023,10 @@
>                     if (src[1] < 0xDC00 || src[1] > 0xDFFF) {
>                         return NULL;
>                     }
>-                    V = (ch - 0xD800) * 0x400 + (src[1] - 0xDC00) * 0x10000;
>+                    if (bugzilla609416)
>+                        V = (ch - 0xD800) * 0x400 + (src[1] - 0xDC00) + 0x10000;
>+                    else
>+                        V = (ch - 0xD800) * 0x400 + (src[1] - 0xDC00) * 0x10000;
>                     src += 2;
>                 } else {
>                     V = ch;
>@@ -1047,6 +1052,8 @@
>         int length = in->length();
>         wchar *out = (wchar*) core->GetGC()->Alloc(length*2+1); // decoded result is at most length wchar chars long
>         int outLen = 0;
>+		
>+        bool bugzilla609416 = core->currentBugCompatibility()->bugzilla609416;
> 
>         for (int k = 0; k < length; k++) {
>             wchar C = chars[k];
>@@ -1113,7 +1120,7 @@
> 
>                     // 29. Let V be the value obtained by applying the UTF-8 transformation
>                     // to Octets, that is, from an array of octets into a 32-bit value.
>-                    if (!UnicodeUtils::Utf8ToUcs4(Octets, n, &V)) {
>+                    if (!UnicodeUtils::Utf8ToUcs4(Octets, n, &V, bugzilla609416)) {
>                         return NULL;
>                     }
>                 }
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp#9 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp	Wed Nov 10 09:12:34 2010
>@@ -303,7 +303,8 @@
> 
>     int32_t UnicodeUtils::Utf8ToUcs4(const uint8_t *chars,
>                                      int32_t len,
>-                                     uint32_t *out)
>+                                     uint32_t *out,
>+                                     bool bugzilla609416)
>     {
>         // U-00000000 - U-0000007F:     0xxxxxxx
>         // U-00000080 - U-000007FF:     110xxxxx 10xxxxxx
>@@ -356,8 +357,16 @@
>                 n = 6;
>                 b = chars[0]&0x01;
>                 break;
>+            default:  // invalid character, should not get here
>+                if ( bugzilla609416 )
>+                {
>+                    return 0;
>+                }
>+            }
>+            if ( bugzilla609416 )
>+            {
>+                break;
>             }
>-            // fall through intentional
> 
>         default: // invalid character, should not get here
>             return 0;
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h#6 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h	Wed Nov 10 09:12:34 2010
>@@ -131,7 +131,8 @@
>          */
>         static int32_t Utf8ToUcs4(const uint8_t *chars,
>                                   int32_t len,
>-                                  uint32_t *out);
>+                                  uint32_t *out,
>+                                  bool bugzilla609416);
> 
>         /**
>          * Ucs4ToUtf8 takes a single 32-bit UCS-4 character as
>End of Patch.
Attachment #489523 - Attachment description: patch → patch #1

Comment 9

7 years ago
Created attachment 489906 [details] [diff] [review]
updated tamarin test - now includes .avm_args file to test with USES_SWFVERSION
Attachment #489897 - Attachment is obsolete: true
Attachment #489906 - Flags: review?(stejohns)
Attachment #489897 - Flags: review?(stejohns)

Comment 10

7 years ago
Comment on attachment 489900 [details] [diff] [review]
path #2 (apply after patch #1)

tabs vs spaces are scrambled, otherwise fine
Attachment #489900 - Flags: review+

Updated

7 years ago
Attachment #489906 - Flags: review?(stejohns) → review+

Comment 11

7 years ago
pushed to TR as 5541:5247e1d0d32e
Status: UNCONFIRMED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.