Closed Bug 236941 Opened 17 years ago Closed 17 years ago

UTF-8 converter looses full lines of text if they contain invalid characters

Categories

(MailNews Core :: Internationalization, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmdesp, Assigned: jmdesp)

References

(Blocks 1 open bug)

Details

(Keywords: fixed1.7, intl, regression, Whiteboard: fixed-aviary1.0)

Attachments

(2 files, 8 obsolete files)

Tested on 2004022608

Reproduction :
The attached RFC822 message demonstrates the problem.

After "quelque temps un message d'une", several lines of quoted text are not
displayed, and then back again.
That the lines that are not displayed all contain some invalid characters
(encoded in ISO-8859 when the message is declared in UTF-8).

The problem occurs in mail, and also when displaying a RFC822 message in
browser. It does not happen in normal web pages.

This is a regression over bug 35004.
I didn't see a duplicate, but I just seem to never be able to find them.
If you can confirm there is none, mark as new.
Attached file rfc 822 test case
If you visualize inside browser, you just have to do a "view source" to see the
missing lines.
I can't even save it from Mozilla, no "Save Link Target As", no Shift-Click or
"Save Page As" when displayed in Browser as page or as source.
Does Mozilla try to interpret the file even when saving from a link?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Uhm, sorry. I just installed the current nightly before reading this bug and ran
straight into bug 236903 ...
I try to set the encoding to some other values where the caracters would be
illegal to see if they are missing lines too, but without much success except
for the encoding HZ.

But I think that the decoders are usually quite tolerant, and try to make sense
of many different things, so that it seldom happens that they give out a illegal
error even if they something really strange.
So I still think this is probably a general problem, not specific to UTF68, and
that with the string reorganization the decoding is now by line, and does not
handle errors on a sub-line level.
Keywords: intl
I'm seeing this under Linux as well, 20040311 build.
I'm not yet properly set up with a developing and debugging environment, but I
suspect this may be a regression caused by bug 228543.

Jungshik, are the following two assumptions correct?
 (1) The fix for bug 228543 made mailnews use CopyUTF8toUTF16() for UTF-8 mails,
which previously would have been decoded by nsUTF8ToUnicode.
 (2) When CopyUTF8toUTF16() encounters invalid sequences, it drops the rest of
the input instead of inserting a replacement character and continuing
Oups, it seems I should have added it in the comment earlier, I tested the
regression appeard on the day bug 228543 was checked in, and Jungshik confirmed
me by mail he thinks the call to CopyUTF8toUTF16 might be responsible for the
disparition of the string as you suspect.

I still need some time to confirm on a build environnement this is it.

I suggested him a solution by simply getting rid of lines 152 to 155 here :
http://lxr.mozilla.org/mozilla/source/mailnews/base/util/nsMsgI18N.cpp#152
but he doesnt't like too much the idea of getting rid of this optimization all
together. 

I myself do think it's better to use the generic conversion routines for all
data that comes directy from the external world, and if that call is too slow,
then that the solution would better be to deCOMtaminate it and then add the
optimisations in the deCOMtaminated version.
*** Bug 239236 has been marked as a duplicate of this bug. ***
Attached patch catching failed conversions (obsolete) — Splinter Review
The critical point of failure is MimeInlineTextPlain_parse_line in
mimetpla.cpp. When making the conversion, we assume the line to be in UTF-8 -
and fail. The result is an empty line.

This patch adds a simple fallback: if the conversion fails, assume the charset
to be ISO-8859-1.
Assignee: smontagu → mnyromyr
Status: NEW → ASSIGNED
Attachment #145186 - Flags: superreview?(bienvenu)
Attachment #145186 - Flags: review?(jshin)
OS: Windows 2000 → All
Hardware: PC → All
This is not the clean way to do it Karsten. 
You can not just assume the input encoding is in truth ISO-8859-1 on a wide
scale, and just silently ignore invalid charset declaration is really bad practice.
But I'm of course happy someone has more time than to test how this bug can be
solved.

The code you modified calls the nsMsgI18NConvertToUnicode function and that
where we should make sure the return is coherent with what browser does, that is
*not* to drop the full line if the input encoding is invalid, but to show the
bad part as diamond characters. 

In the discussion with Jshin that made the change that very probably caused the
bug, he said he thinks it's the fact to CopyUTF8toUTF16 inside
nsMsgI18NConvertToUnicode that drops the lines, and we should test to either
remove CopyUTF8toUTF16 (only remove three lines of code) or to test if the
string is truly UTF-8 by making a IsUTF8 call first before calling
CopyUTF8toUTF16. This assume the default encoder will do the same job as browser
for bad UTF-8.

Can you test if either of the two works ? That would make it a very minor change
that would have a better chance of being accepted fast (I'd love it to be in 1.7
but I'm not holding my breath).
Comment on attachment 145186 [details] [diff] [review]
catching failed conversions

> This is not the clean way to do it Karsten. You can not just assume
> the input encoding is in truth ISO-8859-1 on a wide scale

Yes, you're right. 
I completely forgot the code for the charset "guessing" in case of failure.

> silently ignore invalid charset declaration is really bad practice. 

Well, that depends. Throwing charset alerts when *displaying* would be rather
nagging. But *saving* and similar operations shouldn't change the charset
anyway.

> The code you modified calls the nsMsgI18NConvertToUnicode function
> and that where we should make sure the return is coherent with what
> browser does, that is *not* to drop the full line if the input
> encoding is invalid, but to show the bad part as diamond characters.

That is an option, but is it "nice"?
We could give the charset sniffer a try and /display/ at least what might have
been meant if the encoding had been correct.

> In the discussion with Jshin that made the change that very probably
> caused the bug, he said he thinks it's the fact to CopyUTF8toUTF16
> inside nsMsgI18NConvertToUnicode that drops the lines,

It is called CopyUTF8toUTF16, that's right; the actual failure assertion 
is thrown in ConvertUTF8toUTF16::write. Like AppendUTF8toUTF16/CopyUTF8toUTF16,
that's outside of MailNews and shouldn't be tampered with here.

> and we should test to either remove CopyUTF8toUTF16 (only remove
> three lines of code)

That won't suffice, because the CopyUTF8toUTF16 calls I eliminated in
mimetpla.cpp would still drop lines.

> or to test if the string is truly UTF-8 by making a IsUTF8 call first
> before calling CopyUTF8toUTF16.

That would be extremely "expensive". We won't need that check most of the time
anyway. I think it's better to make as sure as possible that the charset is
right and only take other measures if that fails.

> Can you test if either of the two works ? That would make it a very
> minor change that would have a better chance of being accepted fast

I'll have a look.

> (I'd love it to be in 1.7 but I'm not holding my breath).

I recommend not holding it, then. ;-)
Attachment #145186 - Attachment is obsolete: true
Attachment #145186 - Flags: superreview?(bienvenu)
Attachment #145186 - Flags: review?(jshin)
Attached patch autodetect charset if possible (obsolete) — Splinter Review
The removal of the UTF-8 check in nsMsgI18N.cpp:152-154 does not solve the bug,
because the following routine does not produce any output. It does not cause
any assertions either, so it may be useful to remove the call to avoid
cluttering of the debug console; I haven't done that in this patch, though.

This updated patch now does this if we're allowed to use the "real" charset
(ie. no overruling or 'global' autodetection is set):
- try to use the charset given in the content-type header
- if that fails, try to autodetect the charset of this line
- if that fails, take the raw bytes as is (that is, as ASCII)

Let's see what Jshin thinks...
Attachment #145223 - Flags: review?(jshin)
Comment on attachment 145223 [details] [diff] [review]
autodetect charset if possible

I'd not do this way. I don't think we  want to recover from the mislabelled
UTF-8 lines. Just signalling to users that there's something wrong would be
more than enough. 

 A lot simpler way is to  call IsUTF8 (that is not as expensive as you think
after ASCII lines are filtered out) in nsMsgI18N.cpp for lines labelled as
'UTF-8'. If it fails the test, it can fall through. I've already made a patch
for that on top of my patch for bug 229032. I planned to upload the patch once
bug 229032 is fixed. Unfortunately, the patch for bug 229032 will not go in
until 1.8alpha. 

Anyway, if this bug is regarded as serious enough (IMHO, it doesn't often very
often) to be fixed in 1.7, I can make a patch against the current trunk.
Attachment #145223 - Flags: review?(jshin)
Karsten "Mnyromyr" Düsterloh
> This updated patch now does this if we're allowed to use the "real" charset
> (ie. no overruling or 'global' autodetection is set):
> - try to use the charset given in the content-type header
> - if that fails, try to autodetect the charset of this line

There's already a general framework in mail/news to decide what encoding must be
applied to message. If change are needed to handle some case (like the messages
labeled with wrong charset described in Bug 239236), it should be done at this
level.

Your point is to automatically run a charset detection if the message is invalid
in the selected charset. This removes control to the user, he can no more force
the charset to use, which is in some case useful.

I think too Bug 239236 is not truly a duplicate of this bug, because the
question of what should be done with messages that have an unknown charset is
quite unrelated to how we should best handle invalid content in messages that we
have already determined we will display in UTF-8.

Jshin 
> I don't think we  want to recover from the mislabelled
> UTF-8 lines.

- Open in browser a page in ISO-8859-1.
- Force encoding to UTF-8
- Invalid characters are replaced by question marks inside a black diamond.
- *this* is the old behaviour of mail/news for invalid UTF-8 and this is what we
must return to (display the user as much as can be properly interpreted). 
And I expect there is an easy way to do it, I just don't know exactly how :-)
(In reply to comment #15)

> Jshin 
> > I don't think we  want to recover from the mislabelled
> > UTF-8 lines.

> - Invalid characters are replaced by question marks inside a black diamond.
> - *this* is the old behaviour of mail/news for invalid UTF-8 and this is what 

That's exactly what I'm gonna do. Karsten's patch tries to go a lot further than
that, which I don't think is a good idea as you wrote and I wrote earlier.

> A lot simpler way is to  call IsUTF8 (that is not as expensive as you
> think after ASCII lines are filtered out) in nsMsgI18N.cpp for lines
> labelled as 'UTF-8'. If it fails the test, it can fall through.

Jshin, could you please test your fix with the situation and the testcase of bug
239236?

The problem isn't just illegal characters in lines labelled as UTF-8, but also
lines with an *unknown* charset. These lines run into the CopyUTF8toUTF16
because they're *wrongly* *assumed* to be UTF-8!
Assignee: mnyromyr → smontagu
Status: ASSIGNED → NEW
Sorry, I had missed some bugmail.
I'm stepping down, feel free to not use anything from my patches here and over
in 239941. :)
(I still do think that *displaying* garbage is not a good thing[tm], since we
don't *send* it - but at least we shouldn't *drop* lines.) 
I'm trying to write a working patch that will replace invalid characters with
0xFFFD as is done in the browser.

Line 343 of mimetpla.cpp tests this 
(obj->options->format_out == nsMimeOutput::nsMimeMessageSaveAs
and decide if true, the line is UTF-8 and CopyUTF8toUTF16 is adequate to copy it.

Unfortunately when displaying the sample attachment, this test holds true.
Currently I'm in favor of suppressing the test alltogether, mimetpla.cpp is not
the right place to recover encoding problems.
Jshin:
This line 338 of mimetpla.cpp 
   nsDependentCString inputStr(line, length);
is raising an assertion everytime it's executed that 'line' should be zero
terminated. It's not blocking, and I don't know how I should change it, but it
certainly needs to be corrected.
The change to mimetpla is very minor, I commented out the line I removed. 
Removing the line and reindenting made it look more complex.

The change to nsMsgI18N may look more important, but in fact there is no new
code in that. I'm juste replacing the method jshin used to convert by the one
in nsXMLHttpRequest::ConvertBodyToText.
The code is nothing but a line by line copy (a few variable renamed) from 
http://lxr.mozilla.org/mozilla/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#523


jshin, can you review this ?
Assignee: smontagu → jmdesp
Attachment #145223 - Attachment is obsolete: true
Status: NEW → ASSIGNED
I uploaded the wrong file :-( 
That was the one with the line removed and reindentation from mimetpla.cpp, and
without the change to nsMsgI18N.
Attachment #145824 - Attachment is obsolete: true
> This line 338 of mimetpla.cpp 
>    nsDependentCString inputStr(line, length);
> is raising an assertion everytime it's executed that 'line' should be zero
> terminated.

It should be fairly safe to terminate it, because of the +1 in the definition of
desired_size in mime_LineBuffer in mimebuf.cpp:

line[length] = 0; // see desired_size in mime_LineBuffer in mimebuf.cpp
nsDependentCString inputStr(line, length);
I tested patch 145825 and it did indeed solve the problem with the 'rfc 822
testcase' here (all non-UTF-8 characters are shown as ? in my font).
But it completely breaks the display of the testcase in bug 239236: the entire
body is missing!
If this patch gets checked in, bug 239236 needs to be reopened and will be even
worse...
Can you investigate the regression more carefully, Karsten ?
I don't have time for that today. 
Of course, I'll make any needed change so that we get acceptable result for bug
239236.

I'm quite surprised it can regress something still. 
All the patch does is to try harder when the converter fails.
The problem with the test case of bug 239236 is this:

mimetpla.cpp:466:
      { // convert back to mailCharset before writing.       
        rv = nsMsgI18NConvertFromUnicode(nsDependentCString(mailCharset), 
                                         lineResultUnichar, outString);
        NS_ENSURE_SUCCESS(rv, -1);
      }

Since the mailCharset is illegal/unknown, nsMsgI18NConvertFromUnicode will
always return an error and hence force NS_ENSURE_SUCCESS to quit the function
without outputting any data...
OK, the regression came from blindly removing the
nsMimeOutput::nsMimeMessageSaveAs test. In bug 239236, we are not able to
convert the input *at all*, so blindly applying CopyUTF8toUTF16 was better than
trying to get an adequate decoder, because at least it gave a line of result
when the input line was fully US-ASCII.

I've been reading some more of the source.
When we arrive in mimetpla.cpp:466, the input string ought to have *already*
been transcoded to utf-8, this must be the reason why when nsMimeMessageSaveAs
is true we use CopyUTF8toUTF16 blindly.

MimeInlineText_convert_and_parse_line does the conversion around here
http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/mimetext.cpp#381 and
then calls obj->clazz->parse_line that leads to mimetpla.cpp:466

In that code 'charset_conversion_fn' should do the conversion. If it returns a
void pointer and a valid return code, the input string is used directly instead
of a converted string. 
This is that line :
http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/mimetext.cpp#406

In practice,  the implementation use the mime_convert_charset wrapper around
ConvertUsingEncoderAndDecoder, that makes sure that in case of failure, we
*always* returns a positive error code and a NULL return value.
"If we fail, we just use unconverted input" looks really ugly to me and opening
an avenue for an attaquer to be able to insert arbitrary content.

In bug 236941, charset_conversion_fn fails because the input is not valid WRT
utf-8 encoding rules, in bug 239236, it fails because the charset does not
exists. In both case, we end up with the raw input where we are expecting
already converted data.

I think the adequate treatement for bug 239236 should be to use US-ASCII when
the input charset is unknown and make it fall in a loop that like my patch
inserts 0xFFFD for every undecodable character (that would be every character
above 0x80).
The question is about *where* it's best to do it.
This new patch is quite cleaner, I'm no more taking the risk of doing anything
dangerous.
I test that the string is not utf-8 before ignoring nsMimeMessageSaveAs, and I
use a fall-back charset if there's no encoding for input. 
That fall-back is UTF-8, because 'us-ascii' is mapped to 'cp1252', and I want a
warning that somethings wrong with the message. We know at this stage the
message is not UTF-8, so that's for the best, the user will be visually warned
something has gone wrong.
(If currently unknown charsets are to be displayed as ISO-8859-1, the correct
mechanisme to use is to add them to the alias list for ISO-8859-1, not to
silently map anything unknown)

This patch works, and it makes MimeInlineTextPlain_parse_line rock solid WRT
problem in content that ought to be UTF-8. 
It probably would be cleaner to always feed MimeInlineTextPlain_parse_line rock
with the valid UTF-8 it should get, but I'd prefer someone with a better
knowledge of the intrisics of mailnews to do that kind of change.
I just wanted a patch as simple as possible to have a chance of getting it in
the Mozilla 1.7 branch.
Attachment #145825 - Attachment is obsolete: true
Comment on attachment 146038 [details] [diff] [review]
Patch that replace invalid char with 0xFFFD, corrected

Sorry it took me so long. 

>Index: mimetpla.cpp

>+    line[length] = 0; // see desired_size in mime_LineBuffer in mimebuf.cpp

Can you elaborate on this? 

>     nsDependentCString inputStr(line, length);


BTW, mimetpfl.cpp needs to be fixed the same way as mimetpla.cpp


>Index: nsMsgI18N.cpp

>+      PRUnichar * outBuffer = 
>+              NS_STATIC_CAST(PRUnichar*, nsMemory::Alloc((outBufferLength + 1) *
>+                                               sizeof(PRUnichar)));

You can use nsAutoBuffer<PRUnichar, 512> here to avoid the memory allocation in
most cases. 

With that change, this patch can go into 1.7branch. For 1.8alpha, the patch
needs to be changed a bit more.
Hum, I have a different patch since yesterday. 
It's almost the same code (slightly slimpler), but in mimetext.cpp and
mimemoz2.cpp. 
This is where mozilla originally tries to convert the string, and keeps the
same string if the conversion fails (which is a non-sense, but if we go there
we'd end up rewriting all of mailnews).

I conceptually feel it a lot better to correct there than in mimetpla.cpp.
This also means I don't need to guess something went wrong by adding lot of
"isutf8" testing, I now correct the problem "at the source".

>>Index: mimetpla.cpp
>>+    line[length] = 0; // see desired_size in mime_LineBuffer in mimebuf.cpp
>
> Can you elaborate on this? 

See comment #23 about that.

With the new patch, when the input line arrives there, it is already correctly
zero terminated, there is no assertion anymore, and no reason to change
anything in mimetpla.cpp.

> BTW, mimetpfl.cpp needs to be fixed the same way as mimetpla.cpp

I did a quick check and it seems that with the new patch the change applies to
both normal and flowed text without needing to change two things.

> You can use nsAutoBuffer<PRUnichar, 512> here to avoid the memory allocation 

> in most cases. 

The new patch is a bit shorter, I didn't need to change anything to the
allocation already done, so the problem doesn't occur.

Sorry about the late change, I wasn't really satisfied with the solution in the
first patch, but it took a while to get the time to rewrite it.
Attachment #146038 - Attachment is obsolete: true
Comment on attachment 146932 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

The patch does not need to be changed to apply to the 1.7 branch

I checked it works well under linux, for flowed too.
Attachment #146932 - Flags: superreview?(bienvenu)
Attachment #146932 - Flags: review?(jshin)
Attachment #146932 - Flags: approval1.7?
Comment on attachment 146932 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

The patch does not need to be changed to apply to the 1.7 branch

Please attach unified diffs (cvs diff -pu8), and please request approval after
you have review and super-review (and preferably after landing on the trunk).
Attachment #146932 - Flags: approval1.7?
Comment on attachment 146932 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

The patch does not need to be changed to apply to the 1.7 branch

Index: mimetext.cpp
===================================================================
RCS file: /cvsroot/mozilla/mailnews/mime/src/mimetext.cpp,v
retrieving revision 1.46
diff -p -u -8 -r1.46 mimetext.cpp
--- mimetext.cpp	17 Apr 2004 18:33:13 -0000	1.46
+++ mimetext.cpp	29 Apr 2004 18:40:53 -0000
@@ -385,16 +385,20 @@ MimeInlineText_convert_and_parse_line(ch
	   SetMailCharacterSetToMsgWindow(obj, text->charset);
       }
     }
   }

   //initiate decoder if not yet
   if (text->inputDecoder == nsnull)
     MIME_get_unicode_decoder(text->charset,
getter_AddRefs(text->inputDecoder));
+  // If no decoder found, use ""UTF-8"", that will map most non-US-ASCII chars
as 
+  // Invalid. A pure-ASCII only decoder would be better, but there is none
+  if (text->inputDecoder == nsnull)
+    MIME_get_unicode_decoder("UTF-8", getter_AddRefs(text->inputDecoder));
   if (text->utf8Encoder == nsnull)
     MIME_get_unicode_encoder("UTF-8", getter_AddRefs(text->utf8Encoder));

   PRBool useInputCharsetConverter =
obj->options->m_inputCharsetToUnicodeDecoder &&
!nsCRT::strcasecmp(text->charset,
obj->options->charsetForCachedInputDecoder.get());

   if (useInputCharsetConverter)
     status = obj->options->charset_conversion_fn(line, length,
			  text->charset,
Index: mimemoz2.cpp
===================================================================
RCS file: /cvsroot/mozilla/mailnews/mime/src/mimemoz2.cpp,v
retrieving revision 1.207
diff -p -u -8 -r1.207 mimemoz2.cpp
--- mimemoz2.cpp	17 Apr 2004 18:33:13 -0000	1.207
+++ mimemoz2.cpp	29 Apr 2004 18:40:55 -0000
@@ -760,31 +760,64 @@ int ConvertUsingEncoderAndDecoder(const 
   else {
     unichars = localbuf;
     unicharLength = klocalbufsize+1;
   }
   if (unichars == nsnull) {
     rv = NS_ERROR_OUT_OF_MEMORY;
   }
   else {
-    // convert to unicode
-    rv = decoder->Convert(stringToUse, &srcLen, unichars, &unicharLength);
+    // convert to unicode, replacing failed chars with 0xFFFD as in
+    // the methode used in nsXMLHttpRequest::ConvertBodyToText and
nsScanner::Append
+    const char * inBuffer = stringToUse;
+    PRInt32 totalChars = 0,
+	     outBufferIndex = 0,
+	     outLen = unicharLength;
+    PRInt32 dataLen = srcLen;		
+
+    do {
+      PRInt32 inBufferLength = dataLen;
+      rv = decoder->Convert(inBuffer, 
+			    &inBufferLength, 
+			    &unichars[outBufferIndex], 
+			    &outLen);
+      totalChars += outLen;
+      if (NS_FAILED(rv)) {
+	 // We consume one byte, replace it with U+FFFD
+	 // and try the conversion again.
+	 unichars[outBufferIndex + outLen++] = (PRUnichar)0xFFFD;
+	 outBufferIndex += outLen;
+	 outLen = unicharLength - (++totalChars);
+
+	 decoder->Reset();
+
+	 if((inBufferLength + 1) > dataLen) {
+	   inBufferLength = dataLen;
+	 } else {
+	   inBufferLength++;
+	 }
+
+	 inBuffer = &inBuffer[inBufferLength];
+	 dataLen -= inBufferLength;
+      }
+    } while ( NS_FAILED(rv) && (dataLen > 0) );
+    
     if (NS_SUCCEEDED(rv)) {
-      rv = encoder->GetMaxLength(unichars, unicharLength, &dstLength);
+      rv = encoder->GetMaxLength(unichars, totalChars, &dstLength);
       // allocale an output buffer
       dstPtr = (char *) PR_Malloc(dstLength + 1);
       if (dstPtr == nsnull) {
	 rv = NS_ERROR_OUT_OF_MEMORY;
       }
       else {
	 PRInt32 buffLength = dstLength
Attachment #146932 - Attachment description: Patch that does the change in mimemoz2.cpp instead → Patch that does the change in mimemoz2.cpp instead The patch does not need to be changed to apply to the 1.7 branch
Next time I'll remember what "edit attachment as comment" really means :-(
I hope jshin has some time to review the new version now. This patch can be
applied to trunk and 1.7 branch as is.
Attachment #146932 - Attachment is obsolete: true
Attachment #147329 - Flags: superreview?(bienvenu)
Attachment #147329 - Flags: review?(jshin)
Attachment #146932 - Flags: superreview?(bienvenu)
Attachment #146932 - Flags: review?(jshin)
Comment on attachment 147329 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

r=jshin 

>+    const char * inBuffer = stringToUse;
>+    PRInt32 totalChars = 0,
>+            outBufferIndex = 0,
>+            outLen = unicharLength;
>+    PRInt32 dataLen = srcLen;          
>+
>+    do {
>+      PRInt32 inBufferLength = dataLen;
>+      rv = decoder->Convert(inBuffer, 
>+                           &inBufferLength, 
>+                           &unichars[outBufferIndex], 
>+                           &outLen);
>+      totalChars += outLen;
>+      if (NS_FAILED(rv)) {
>+        // We consume one byte, replace it with U+FFFD
>+        // and try the conversion again.
>+        unichars[outBufferIndex + outLen++] = (PRUnichar)0xFFFD;

	   |PRUnichar(0xFFFD)| seems better.

>+        outBufferIndex += outLen;
>+        outLen = unicharLength - (++totalChars);

It might be nice to add some comments about totalChars, dataLen, etc.
If I'm reading this correctly,

        if((inBufferLength + 1) > dataLen) {
          inBufferLength = dataLen;
        } else {
          inBufferLength++;

could be:
        if ((inBufferLength + 1) > dataLen)
          break;
        inBufferLength++;

is that correct? Other than that, it looks fine to me.	

 
Attached patch New version of the patch (obsolete) — Splinter Review
The comments tempted me into rewriting the loop, especially bienvenu's comment
that it's OK to use 'break'.

New version is a bit simpler, has more comment, does not call decoder->Convert
for a zero length string if the last byte is invalid (as the code does a
decoder->Reset() before, it really was not useful). The code was dependant on
the fact that call would return NS_SUCCEEDED.

I tested border cases (last character of the line invalid, all characters in
the line invalid, format=flowed)

A "if (NS_SUCCEEDED(rv)" test goes out, and this changes the indentation of the
encoder-> part, but the lines did not change at all truly.
Attachment #147329 - Attachment is obsolete: true
Preceeding version was not the right patch file :-(

comment #37 still applies about simplifying the loop, adding comments.
Attachment #147732 - Attachment is obsolete: true
Attachment #147329 - Flags: superreview?(bienvenu)
Attachment #147329 - Flags: review?(jshin)
Attachment #147733 - Flags: superreview?(bienvenu)
Attachment #147733 - Flags: review?(jshin)
Comment on attachment 147733 [details] [diff] [review]
New version of the patch

r=jshin
sorry for the delay
Attachment #147733 - Flags: review?(jshin) → review+
Blocks: 243747
Bienvenu, can you SR this patch or forward to someone who can ?
Too bad 1.8a1 is already out (I should have realized this was coming !), but at
least let's not loose more time ...

Scott, in my opinion this patch should go in aviary 1.0 branch too, but I don't
know what is the proper way to request that. 
Maybe you can do the SR if bienvenu can not.
Attachment #147733 - Flags: superreview?(bienvenu) → superreview+
thanks a lot for the heads up. I checked this into the aviary 1.0 branch
Whiteboard: fixed-aviary1.0
OK, I could verify the fix in the 2004-06-01-02-0.7 TB nightly.
Who can check in trunk ?
No longer blocks: 243747
*** Bug 243747 has been marked as a duplicate of this bug. ***
Ok. I can check that into the trunk. How about 1.7branch? Has it been taken care
of? Apparently not because even a1.7 was not asked for, yet. 
Comment on attachment 147733 [details] [diff] [review]
New version of the patch

Requesting approval for 1.7
Attachment #147733 - Flags: approval1.7?
Comment on attachment 147733 [details] [diff] [review]
New version of the patch

a=chofmann for 1.7
Attachment #147733 - Flags: approval1.7? → approval1.7+
No longer blocks: 244829
fix checked into 1.7 branch
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Whiteboard: fixed-aviary1.0 → fixed-aviary1.0,fixed1.7
Keywords: fixed1.7
Whiteboard: fixed-aviary1.0,fixed1.7 → fixed-aviary1.0
Blocks: 251634
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.