Open Bug 396436 Opened 13 years ago Updated 13 years ago

HTTP parser fails to recognize an utf-8 broken at the edge of given conversion buffer

Categories

(Core :: DOM: HTML Parser, defect)

1.8 Branch
x86
Linux
defect
Not set
major

Tracking

()

UNCONFIRMED

People

(Reporter: buniofh, Unassigned)

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20061201 Firefox/2.0.0.6 (Ubuntu-feisty)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20061201 Firefox/2.0.0.6 (Custom build)

While converting an UTF-8 encoded HTML page with embedded javascript to UTF-16 (see attachment) when nsScanner encounters a multibyte character broken at the edge of the given and next conversion buffer (may be seen in nsNativeUConvService) an error is returned. Few characters may get misinterpreted at that point which causes errors in script parsing.

Bug was seen on PC/Linux although code inspection proves that it may be affecting all platforms and operating systems. Build config attached.

Reproducible: Always

Steps to Reproduce:
1. Simply run attached test on mozilla compiled with attached flags.
Actual Results:  
script parsing crashed. hitting the 'zonk' key will produce a small report which will state exactly which script commands were ignored or ill parsed.

Expected Results:  
will be seen in browser window as broken tex.
good result should produce: a button and a frame filled with only one character repeated 1000 times.
bad will show parts of code within the frame as well as break script execution.
two files:
test.html - the test
test2.html - the contents of the frame.
please file bugs like this against the appropriate Core component. people on irc.mozilla.org can help, but so can bonsai.mozilla.org/cvsblame.cgi?file=... (just look through bugs against the relevant file to see where they live).

and please file bugs based on trunk code, not based on a branch.
Component: General → HTML: Parser
Product: Firefox → Core
QA Contact: general → parser
Version: unspecified → 1.8 Branch
point taken. even though mozconfig was used in fact to build minimo the  code has been checked the against mozilla trunk and the bug still shows (flag --enable-necko-small-buffers is to be suspected). that's why i've reported it as general.

fix that takes care of the problem (credit to E. Mironov):

--- mozilla/intl/uconv/native/nsNativeUConvService.cpp  14 Mar 2006 08:35:00 -0000      1.1.1.1
+++ mozilla/intl/uconv/native/nsNativeUConvService.cpp  18 Sep 2007 14:02:17 -0000      1.2
@@ -307,7 +307,13 @@
                 res = 0;
                 break;
             }
-            
+           
+           if(errno == EINVAL)
+            {
+                res = 0;
+                break;
+            }
+
             if (errno == EILSEQ) {
 
                 if (mReplaceOnError) {
--- mozilla/parser/htmlparser/src/nsScanner.cpp 14 Mar 2006 08:38:29 -0000      1.1.1.1
+++ mozilla/parser/htmlparser/src/nsScanner.cpp 18 Sep 2007 14:03:08 -0000      1.2
@@ -345,6 +345,14 @@
   nsresult res=NS_OK;
   PRUnichar *unichars, *start;
   if(mUnicodeDecoder) {
+    int spareBufferLen = spareBuffer.Length();
+    if(spareBufferLen > 0)
+    {
+        spareBuffer.Append(aBuffer, aLen);
+        aLen += spareBufferLen;
+        aBuffer = spareBuffer.get();
+    }
+
     PRInt32 unicharBufLen = 0;
     mUnicodeDecoder->GetMaxLength(aBuffer, aLen, &unicharBufLen);
     nsScannerString::Buffer* buffer = nsScannerString::AllocBuffer(unicharBufLen + 1);
@@ -358,6 +366,20 @@
       res = mUnicodeDecoder->Convert(aBuffer, &srcLength, unichars, &unicharLength);
 
       totalChars += unicharLength;
+      
+      if((NS_OK == res) && (srcLength < aLen))
+      {
+        nsCString tmp;
+        tmp.Assign(aBuffer + srcLength, aLen - srcLength);
+        spareBuffer.Assign(tmp);
+        break;
+      }
+      else 
+      if((srcLength == aLen) && (spareBuffer.Length() > 0))
+      {
+        spareBuffer.Cut(0, spareBuffer.Length());
+      }
+      
       // Continuation of failure case
       if(NS_FAILED(res)) {
         // if we failed, we consume one byte, replace it with U+FFFD
@@ -370,7 +392,7 @@
           NS_ERROR("Unexpected end of destination buffer");
           break;
         }
-
+
         unichars[unicharLength++] = (PRUnichar)0xFFFD;
         unichars = unichars + unicharLength;
         unicharLength = unicharBufLen - (++totalChars);
--- mozilla/parser/htmlparser/src/nsScanner.h   14 Mar 2006 08:38:29 -0000      1.1.1.1
+++ mozilla/parser/htmlparser/src/nsScanner.h   18 Sep 2007 14:03:08 -0000      1.2
@@ -403,6 +403,8 @@
       nsCString       mCharset;
       nsIUnicodeDecoder *mUnicodeDecoder;
       nsParser        *mParser;
+  private:
+      nsCString spareBuffer;
 };
 
 #endif
You need to log in before you can comment on or make changes to this bug.