Closed Bug 609416 Opened 14 years ago Closed 14 years ago

encodeURIComponent and decodeURIComponent give wrong output when input contains surrogate pairs.

Categories

(Tamarin Graveyard :: Library, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fguinan, Unassigned)

References

Details

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10
Build Identifier: 

encodeURIComponent simply does not transcode UTF16 to UTF8 properly when the UTF16 code points contain a surrogate pair.
decodeURIComponent contains a logic flaw in a switch statement that mistakenly treats valid code points as invalid.

Reproducible: Always

Steps to Reproduce:
1. Create a string called src from char code and pass in a surrogate pair. 
    var src:String = String.fromCharCode(0xD842, 0xDF9F);
2. Encode the src by encodeURIComponent and trace it.
   var uri:String = encodeURIComponent(src);
	trace(uri);
3. Check the output.
4. Decode the uri by decodeURIComponent and check the result
	decodedStr = decodeURIComponent( uri );
	trace(decodedStr);

Actual Results:  
For step 3: %FB%A8%80%A0%80%00
For step 4: An error is raised.


Expected Results:  
For step3: %F0%A0%AE%9F%00
For step4: No error should be raised and the string can be decoded to original one.

I have a patch ready.  I'll be submitting it shortly.
Component: Virtual Machine → Library
QA Contact: vm → library
Attached patch patch #1Splinter Review
Attachment #489523 - Flags: review?(stejohns)
Comment on attachment 489523 [details] [diff] [review]
patch #1

Looks right, and I can push this for you if you like, but we need a testcase for it. Perhaps you can sweet-talk someone on the VM QE team into writing one for you (perhaps based on whatever testcase originally found the bug)?
Attachment #489523 - Flags: review?(stejohns) → review+
Attached file test case (obsolete) —
you will only see the fix if you publish as SWF 12 to enable the bug compatibility code branches
Attachment #489541 - Flags: review?(stejohns)
cool, I'll clean it up and add it.
cpeyer, any chance you have the bandwidth to make this a proper testcase?
Depends on: 611114
Attachment #489541 - Attachment is obsolete: true
Attachment #489897 - Flags: review?(stejohns)
Attachment #489541 - Flags: review?(stejohns)
regression testing located a previously unknown issue - masked by bug - that would cause an %00 to be appended to an encoding string. this patch resolves the issue. (we needed to keep our traveller pointer in sync with the len var to determine when we reached the end of a string).
Comment on attachment 489523 [details] [diff] [review]
patch #1

>Change 746183 by cthilgen@CTHILGEN-AIR-WIN-CLEAN-SB on 2010/11/10 09:04:19
>
>	Summary: Fix for bad unicode conversion
>	Severity: important
>	Detailed Description: Conversion from UTF16 to Unicode Scalar Value was incorrect for characters outside the BMP. Conversion was doing a multiplication where it needs to do an addition. Decoding URI components was also faulty due to a logic error in a switch statement that mistakenly treats valid characters as invalid.
>	Dev/QA impact:
>	QA testing notes: [Any notes you put here should also be copied to the bug notes]
>	API change:
>	Bugs fixed: #2739179
>	Doc impact: 
>	Smokes passed: yep, full build and smoke on //depot/users/FlashRuntime/AirRed/FlashRuntime/thilgen/Main/code/...
>	Reviewer: frank, steven johnson, code collab #313613
>	Spec link: Bugzilla Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=609416, patch coming directly after this check-in
>	Chicken: open
>	Third-party code: n/a
>	Cryptography/encryption code: n/a
>
>Affected files ...
>
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp#64 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h#44 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp#13 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp#36 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp#29 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp#9 integrate
>... //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h#6 integrate
>
>Differences ...
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp#64 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.cpp	Wed Nov 10 09:12:34 2010
>@@ -4940,6 +4940,7 @@
>             bugzilla526662 = 1;     // XMLParser stops at NUL char
>             bugzilla558863 = 1;     // in operator on bytearray throws exception for non-natural number
>             bugzilla585791 = 1;     // String.localeCompare with a null String object returns 0
>+            bugzilla609416 = 1;     // encodeURIComponent and decodeURIComponent give wrong output when input contains surrogate pairs
>         }
>     }
> 
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h#44 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/AvmCore.h	Wed Nov 10 09:12:34 2010
>@@ -335,6 +335,7 @@
>         unsigned bugzilla526662:1;      // XMLParser stops at NUL char
>         unsigned bugzilla558863:1;      // in operator on bytearray throws exception for non-natural number
>         unsigned bugzilla585791:1;      // String.localeCompare with a null String object returns 0
>+        unsigned bugzilla609416:1;      // encodeURIComponent and decodeURIComponent give wrong output when input contains surrogate pairs		
> 
>     protected:
>         friend class AvmCore;
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp#13 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/RegExpObject.cpp	Wed Nov 10 09:12:34 2010
>@@ -726,7 +726,7 @@
>             if (newLastIndex < subjectLength)
>             {
>                 uint32_t ch;
>-                int n = UnicodeUtils::Utf8ToUcs4((const uint8_t*)src+newLastIndex, subjectLength-newLastIndex, &ch);
>+                int n = UnicodeUtils::Utf8ToUcs4((const uint8_t*)src+newLastIndex, subjectLength-newLastIndex, &ch, core()->currentBugCompatibility()->bugzilla609416);
>                 if (n <= 0)
>                 {
>                     // Invalid UTF8 sequence, advance one uint8_t
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp#36 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/StringObject.cpp	Wed Nov 10 09:12:34 2010
>@@ -2611,6 +2611,7 @@
>                 uint32_t uch;
>                 int32_t bytesRead;
>                 Pointers dst(s); // NB, we assume that Utf8ToUcs4 doesn't allocate memory!
>+                bool bugzilla609416 = core->currentBugCompatibility()->bugzilla609416;
>                 while (len > 0)
>                 {
>                     if (*((int8_t*) buffer) > 0)
>@@ -2621,7 +2622,7 @@
>                     }
>                     else
>                     {
>-                        bytesRead = UnicodeUtils::Utf8ToUcs4(buffer, len, &uch);
>+                        bytesRead = UnicodeUtils::Utf8ToUcs4(buffer, len, &uch, bugzilla609416);
>                         if (bytesRead == 0)
>                         {
>                             // invalid sequence (only if strict was false)
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp#29 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/Toplevel.cpp	Wed Nov 10 09:12:34 2010
>@@ -1002,6 +1002,8 @@
>         StUTF16String in16(in);
>         const wchar *src = in16.c_str();
>         int len = in->length();
>+		
>+        bool bugzilla609416 = core->currentBugCompatibility()->bugzilla609416;
> 
>         while (len--) {
>             wchar ch = *src;
>@@ -1021,7 +1023,10 @@
>                     if (src[1] < 0xDC00 || src[1] > 0xDFFF) {
>                         return NULL;
>                     }
>-                    V = (ch - 0xD800) * 0x400 + (src[1] - 0xDC00) * 0x10000;
>+                    if (bugzilla609416)
>+                        V = (ch - 0xD800) * 0x400 + (src[1] - 0xDC00) + 0x10000;
>+                    else
>+                        V = (ch - 0xD800) * 0x400 + (src[1] - 0xDC00) * 0x10000;
>                     src += 2;
>                 } else {
>                     V = ch;
>@@ -1047,6 +1052,8 @@
>         int length = in->length();
>         wchar *out = (wchar*) core->GetGC()->Alloc(length*2+1); // decoded result is at most length wchar chars long
>         int outLen = 0;
>+		
>+        bool bugzilla609416 = core->currentBugCompatibility()->bugzilla609416;
> 
>         for (int k = 0; k < length; k++) {
>             wchar C = chars[k];
>@@ -1113,7 +1120,7 @@
> 
>                     // 29. Let V be the value obtained by applying the UTF-8 transformation
>                     // to Octets, that is, from an array of octets into a 32-bit value.
>-                    if (!UnicodeUtils::Utf8ToUcs4(Octets, n, &V)) {
>+                    if (!UnicodeUtils::Utf8ToUcs4(Octets, n, &V, bugzilla609416)) {
>                         return NULL;
>                     }
>                 }
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp#9 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.cpp	Wed Nov 10 09:12:34 2010
>@@ -303,7 +303,8 @@
> 
>     int32_t UnicodeUtils::Utf8ToUcs4(const uint8_t *chars,
>                                      int32_t len,
>-                                     uint32_t *out)
>+                                     uint32_t *out,
>+                                     bool bugzilla609416)
>     {
>         // U-00000000 - U-0000007F:     0xxxxxxx
>         // U-00000080 - U-000007FF:     110xxxxx 10xxxxxx
>@@ -356,8 +357,16 @@
>                 n = 6;
>                 b = chars[0]&0x01;
>                 break;
>+            default:  // invalid character, should not get here
>+                if ( bugzilla609416 )
>+                {
>+                    return 0;
>+                }
>+            }
>+            if ( bugzilla609416 )
>+            {
>+                break;
>             }
>-            // fall through intentional
> 
>         default: // invalid character, should not get here
>             return 0;
>
>==== //depot/main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h#6 (text) ====
>Index: main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h
>--- main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h.~1~	Wed Nov 10 09:12:34 2010
>+++ main/FlashRuntime/Main/code/third_party/avmplus/core/UnicodeUtils.h	Wed Nov 10 09:12:34 2010
>@@ -131,7 +131,8 @@
>          */
>         static int32_t Utf8ToUcs4(const uint8_t *chars,
>                                   int32_t len,
>-                                  uint32_t *out);
>+                                  uint32_t *out,
>+                                  bool bugzilla609416);
> 
>         /**
>          * Ucs4ToUtf8 takes a single 32-bit UCS-4 character as
>End of Patch.
Attachment #489523 - Attachment description: patch → patch #1
Attachment #489897 - Attachment is obsolete: true
Attachment #489906 - Flags: review?(stejohns)
Attachment #489897 - Flags: review?(stejohns)
Comment on attachment 489900 [details] [diff] [review]
path #2 (apply after patch #1)

tabs vs spaces are scrambled, otherwise fine
Attachment #489900 - Flags: review+
Attachment #489906 - Flags: review?(stejohns) → review+
pushed to TR as 5541:5247e1d0d32e
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: