236941 - UTF-8 converter looses full lines of text if they contain invalid characters

Assignee

Description

•

20 years ago

Tested on 2004022608

Reproduction :
The attached RFC822 message demonstrates the problem.

After "quelque temps un message d'une", several lines of quoted text are not
displayed, and then back again.
That the lines that are not displayed all contain some invalid characters
(encoded in ISO-8859 when the message is declared in UTF-8).

The problem occurs in mail, and also when displaying a RFC822 message in
browser. It does not happen in normal web pages.

This is a regression over bug 35004.
I didn't see a duplicate, but I just seem to never be able to find them.
If you can confirm there is none, mark as new.

Jean-Marc Desperrier

Assignee

Comment 1

•

20 years ago

Attached file rfc 822 test case — Details

Jean-Marc Desperrier

Assignee

Comment 2

•

20 years ago

If you visualize inside browser, you just have to do a "view source" to see the
missing lines.

Christian Eyrich

Comment 3

•

20 years ago

I can't even save it from Mozilla, no "Save Link Target As", no Shift-Click or
"Save Page As" when displayed in Browser as page or as source.
Does Mozilla try to interpret the file even when saving from a link?

Status: UNCONFIRMED → NEW

Ever confirmed: true

Christian Eyrich

Comment 4

•

20 years ago

Uhm, sorry. I just installed the current nightly before reading this bug and ran
straight into bug 236903 ...

Jean-Marc Desperrier

Assignee

Comment 5

•

20 years ago

I try to set the encoding to some other values where the caracters would be
illegal to see if they are missing lines too, but without much success except
for the encoding HZ.

But I think that the decoders are usually quite tolerant, and try to make sense
of many different things, so that it seldom happens that they give out a illegal
error even if they something really strange.
So I still think this is probably a general problem, not specific to UTF68, and
that with the string reorganization the decoding is now by line, and does not
handle errors on a sub-line level.

Keywords: intl

kmike

Comment 6

•

20 years ago

I'm seeing this under Linux as well, 20040311 build.

Simon Montagu :smontagu

Comment 7

•

20 years ago

I'm not yet properly set up with a developing and debugging environment, but I
suspect this may be a regression caused by bug 228543.

Jungshik, are the following two assumptions correct?
 (1) The fix for bug 228543 made mailnews use CopyUTF8toUTF16() for UTF-8 mails,
which previously would have been decoded by nsUTF8ToUnicode.
 (2) When CopyUTF8toUTF16() encounters invalid sequences, it drops the rest of
the input instead of inserting a replacement character and continuing

Jean-Marc Desperrier

Assignee

Comment 8

•

20 years ago

Oups, it seems I should have added it in the comment earlier, I tested the
regression appeard on the day bug 228543 was checked in, and Jungshik confirmed
me by mail he thinks the call to CopyUTF8toUTF16 might be responsible for the
disparition of the string as you suspect.

I still need some time to confirm on a build environnement this is it.

I suggested him a solution by simply getting rid of lines 152 to 155 here :
http://lxr.mozilla.org/mozilla/source/mailnews/base/util/nsMsgI18N.cpp#152
but he doesnt't like too much the idea of getting rid of this optimization all
together. 

I myself do think it's better to use the generic conversion routines for all
data that comes directy from the external world, and if that call is too slow,
then that the solution would better be to deCOMtaminate it and then add the
optimisations in the deCOMtaminated version.

Karsten Düsterloh

Comment 9

•

20 years ago

*** Bug 239236 has been marked as a duplicate of this bug. ***

Karsten Düsterloh

Comment 10

•

20 years ago

Attached patch catching failed conversions (obsolete) — Details — Splinter Review

The critical point of failure is MimeInlineTextPlain_parse_line in
mimetpla.cpp. When making the conversion, we assume the line to be in UTF-8 -
and fail. The result is an empty line.

This patch adds a simple fallback: if the conversion fails, assume the charset
to be ISO-8859-1.

Assignee: smontagu → mnyromyr

Status: NEW → ASSIGNED

Karsten Düsterloh

Updated

•

20 years ago

Attachment #145186 - Flags: superreview?(bienvenu)

Attachment #145186 - Flags: review?(jshin)

Karsten Düsterloh

Updated

•

20 years ago

OS: Windows 2000 → All

Hardware: PC → All

Jean-Marc Desperrier

Assignee

Comment 11

•

20 years ago

This is not the clean way to do it Karsten. 
You can not just assume the input encoding is in truth ISO-8859-1 on a wide
scale, and just silently ignore invalid charset declaration is really bad practice.
But I'm of course happy someone has more time than to test how this bug can be
solved.

The code you modified calls the nsMsgI18NConvertToUnicode function and that
where we should make sure the return is coherent with what browser does, that is
*not* to drop the full line if the input encoding is invalid, but to show the
bad part as diamond characters. 

In the discussion with Jshin that made the change that very probably caused the
bug, he said he thinks it's the fact to CopyUTF8toUTF16 inside
nsMsgI18NConvertToUnicode that drops the lines, and we should test to either
remove CopyUTF8toUTF16 (only remove three lines of code) or to test if the
string is truly UTF-8 by making a IsUTF8 call first before calling
CopyUTF8toUTF16. This assume the default encoder will do the same job as browser
for bad UTF-8.

Can you test if either of the two works ? That would make it a very minor change
that would have a better chance of being accepted fast (I'd love it to be in 1.7
but I'm not holding my breath).

Karsten Düsterloh

Comment 12

•

20 years ago

Comment on attachment 145186 [details] [diff] [review]
catching failed conversions

> This is not the clean way to do it Karsten. You can not just assume
> the input encoding is in truth ISO-8859-1 on a wide scale

Yes, you're right. 
I completely forgot the code for the charset "guessing" in case of failure.

> silently ignore invalid charset declaration is really bad practice. 

Well, that depends. Throwing charset alerts when *displaying* would be rather
nagging. But *saving* and similar operations shouldn't change the charset
anyway.

> The code you modified calls the nsMsgI18NConvertToUnicode function
> and that where we should make sure the return is coherent with what
> browser does, that is *not* to drop the full line if the input
> encoding is invalid, but to show the bad part as diamond characters.

That is an option, but is it "nice"?
We could give the charset sniffer a try and /display/ at least what might have
been meant if the encoding had been correct.

> In the discussion with Jshin that made the change that very probably
> caused the bug, he said he thinks it's the fact to CopyUTF8toUTF16
> inside nsMsgI18NConvertToUnicode that drops the lines,

It is called CopyUTF8toUTF16, that's right; the actual failure assertion 
is thrown in ConvertUTF8toUTF16::write. Like AppendUTF8toUTF16/CopyUTF8toUTF16,
that's outside of MailNews and shouldn't be tampered with here.

> and we should test to either remove CopyUTF8toUTF16 (only remove
> three lines of code)

That won't suffice, because the CopyUTF8toUTF16 calls I eliminated in
mimetpla.cpp would still drop lines.

> or to test if the string is truly UTF-8 by making a IsUTF8 call first
> before calling CopyUTF8toUTF16.

That would be extremely "expensive". We won't need that check most of the time
anyway. I think it's better to make as sure as possible that the charset is
right and only take other measures if that fails.

> Can you test if either of the two works ? That would make it a very
> minor change that would have a better chance of being accepted fast

I'll have a look.

> (I'd love it to be in 1.7 but I'm not holding my breath).

I recommend not holding it, then. ;-)

Attachment #145186 - Attachment is obsolete: true

Attachment #145186 - Flags: superreview?(bienvenu)

Attachment #145186 - Flags: review?(jshin)

Karsten Düsterloh

Comment 13

•

20 years ago

Attached patch autodetect charset if possible (obsolete) — Details — Splinter Review

The removal of the UTF-8 check in nsMsgI18N.cpp:152-154 does not solve the bug,
because the following routine does not produce any output. It does not cause
any assertions either, so it may be useful to remove the call to avoid
cluttering of the debug console; I haven't done that in this patch, though.

This updated patch now does this if we're allowed to use the "real" charset
(ie. no overruling or 'global' autodetection is set):
- try to use the charset given in the content-type header
- if that fails, try to autodetect the charset of this line
- if that fails, take the raw bytes as is (that is, as ASCII)

Let's see what Jshin thinks...

Karsten Düsterloh

Updated

•

20 years ago

Attachment #145223 - Flags: review?(jshin)

Jungshik Shin

Comment 14

•

20 years ago

Comment on attachment 145223 [details] [diff] [review]
autodetect charset if possible

I'd not do this way. I don't think we  want to recover from the mislabelled
UTF-8 lines. Just signalling to users that there's something wrong would be
more than enough. 

 A lot simpler way is to  call IsUTF8 (that is not as expensive as you think
after ASCII lines are filtered out) in nsMsgI18N.cpp for lines labelled as
'UTF-8'. If it fails the test, it can fall through. I've already made a patch
for that on top of my patch for bug 229032. I planned to upload the patch once
bug 229032 is fixed. Unfortunately, the patch for bug 229032 will not go in
until 1.8alpha. 

Anyway, if this bug is regarded as serious enough (IMHO, it doesn't often very
often) to be fixed in 1.7, I can make a patch against the current trunk.

Attachment #145223 - Flags: review?(jshin)

Jean-Marc Desperrier

Assignee

Comment 15

•

20 years ago

Karsten "Mnyromyr" Düsterloh
> This updated patch now does this if we're allowed to use the "real" charset
> (ie. no overruling or 'global' autodetection is set):
> - try to use the charset given in the content-type header
> - if that fails, try to autodetect the charset of this line

There's already a general framework in mail/news to decide what encoding must be
applied to message. If change are needed to handle some case (like the messages
labeled with wrong charset described in Bug 239236), it should be done at this
level.

Your point is to automatically run a charset detection if the message is invalid
in the selected charset. This removes control to the user, he can no more force
the charset to use, which is in some case useful.

I think too Bug 239236 is not truly a duplicate of this bug, because the
question of what should be done with messages that have an unknown charset is
quite unrelated to how we should best handle invalid content in messages that we
have already determined we will display in UTF-8.

Jshin 
> I don't think we  want to recover from the mislabelled
> UTF-8 lines.

- Open in browser a page in ISO-8859-1.
- Force encoding to UTF-8
- Invalid characters are replaced by question marks inside a black diamond.
- *this* is the old behaviour of mail/news for invalid UTF-8 and this is what we
must return to (display the user as much as can be properly interpreted). 
And I expect there is an easy way to do it, I just don't know exactly how :-)

Jungshik Shin

Comment 16

•

20 years ago

(In reply to comment #15)

> Jshin 
> > I don't think we  want to recover from the mislabelled
> > UTF-8 lines.

> - Invalid characters are replaced by question marks inside a black diamond.
> - *this* is the old behaviour of mail/news for invalid UTF-8 and this is what 

That's exactly what I'm gonna do. Karsten's patch tries to go a lot further than
that, which I don't think is a good idea as you wrote and I wrote earlier.

Karsten Düsterloh

Comment 17

•

20 years ago

> A lot simpler way is to  call IsUTF8 (that is not as expensive as you
> think after ASCII lines are filtered out) in nsMsgI18N.cpp for lines
> labelled as 'UTF-8'. If it fails the test, it can fall through.

Jshin, could you please test your fix with the situation and the testcase of bug
239236?

The problem isn't just illegal characters in lines labelled as UTF-8, but also
lines with an *unknown* charset. These lines run into the CopyUTF8toUTF16
because they're *wrongly* *assumed* to be UTF-8!

Karsten Düsterloh

Updated

•

20 years ago

Assignee: mnyromyr → smontagu

Status: ASSIGNED → NEW

Karsten Düsterloh

Comment 18

•

20 years ago

Sorry, I had missed some bugmail.
I'm stepping down, feel free to not use anything from my patches here and over
in 239941. :)
(I still do think that *displaying* garbage is not a good thing[tm], since we
don't *send* it - but at least we shouldn't *drop* lines.)

Jean-Marc Desperrier

Assignee

Comment 19

•

20 years ago

I'm trying to write a working patch that will replace invalid characters with
0xFFFD as is done in the browser.

Line 343 of mimetpla.cpp tests this 
(obj->options->format_out == nsMimeOutput::nsMimeMessageSaveAs
and decide if true, the line is UTF-8 and CopyUTF8toUTF16 is adequate to copy it.

Unfortunately when displaying the sample attachment, this test holds true.
Currently I'm in favor of suppressing the test alltogether, mimetpla.cpp is not
the right place to recover encoding problems.

Jean-Marc Desperrier

Assignee

Comment 20

•

20 years ago

Jshin:
This line 338 of mimetpla.cpp 
   nsDependentCString inputStr(line, length);
is raising an assertion everytime it's executed that 'line' should be zero
terminated. It's not blocking, and I don't know how I should change it, but it
certainly needs to be corrected.

Jean-Marc Desperrier

Assignee

Comment 21

•

20 years ago

Attached patch Patch that displays invalid charcters as 0xFFFD (obsolete) — Details — Splinter Review

The change to mimetpla is very minor, I commented out the line I removed. 
Removing the line and reindenting made it look more complex.

The change to nsMsgI18N may look more important, but in fact there is no new
code in that. I'm juste replacing the method jshin used to convert by the one
in nsXMLHttpRequest::ConvertBodyToText.
The code is nothing but a line by line copy (a few variable renamed) from 
http://lxr.mozilla.org/mozilla/source/extensions/xmlextras/base/src/nsXMLHttpRequest.cpp#523


jshin, can you review this ?

Assignee: smontagu → jmdesp

Attachment #145223 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Jean-Marc Desperrier

Assignee

Comment 22

•

20 years ago

Attached patch Patch that displays invalid charcters as 0xFFFD (obsolete) — Details — Splinter Review

I uploaded the wrong file :-( 
That was the one with the line removed and reindentation from mimetpla.cpp, and
without the change to nsMsgI18N.

Attachment #145824 - Attachment is obsolete: true

Karsten Düsterloh

Comment 23

•

20 years ago

> This line 338 of mimetpla.cpp 
>    nsDependentCString inputStr(line, length);
> is raising an assertion everytime it's executed that 'line' should be zero
> terminated.

It should be fairly safe to terminate it, because of the +1 in the definition of
desired_size in mime_LineBuffer in mimebuf.cpp:

line[length] = 0; // see desired_size in mime_LineBuffer in mimebuf.cpp
nsDependentCString inputStr(line, length);

Karsten Düsterloh

Comment 24

•

20 years ago

I tested patch 145825 and it did indeed solve the problem with the 'rfc 822
testcase' here (all non-UTF-8 characters are shown as ? in my font).
But it completely breaks the display of the testcase in bug 239236: the entire
body is missing!
If this patch gets checked in, bug 239236 needs to be reopened and will be even
worse...

Jean-Marc Desperrier

Assignee

Comment 25

•

20 years ago

Can you investigate the regression more carefully, Karsten ?
I don't have time for that today. 
Of course, I'll make any needed change so that we get acceptable result for bug
239236.

I'm quite surprised it can regress something still. 
All the patch does is to try harder when the converter fails.

Karsten Düsterloh

Comment 26

•

20 years ago

The problem with the test case of bug 239236 is this:

mimetpla.cpp:466:
      { // convert back to mailCharset before writing.       
        rv = nsMsgI18NConvertFromUnicode(nsDependentCString(mailCharset), 
                                         lineResultUnichar, outString);
        NS_ENSURE_SUCCESS(rv, -1);
      }

Since the mailCharset is illegal/unknown, nsMsgI18NConvertFromUnicode will
always return an error and hence force NS_ENSURE_SUCCESS to quit the function
without outputting any data...

Jean-Marc Desperrier

Assignee

Comment 27

•

20 years ago

OK, the regression came from blindly removing the
nsMimeOutput::nsMimeMessageSaveAs test. In bug 239236, we are not able to
convert the input *at all*, so blindly applying CopyUTF8toUTF16 was better than
trying to get an adequate decoder, because at least it gave a line of result
when the input line was fully US-ASCII.

I've been reading some more of the source.
When we arrive in mimetpla.cpp:466, the input string ought to have *already*
been transcoded to utf-8, this must be the reason why when nsMimeMessageSaveAs
is true we use CopyUTF8toUTF16 blindly.

MimeInlineText_convert_and_parse_line does the conversion around here
http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/mimetext.cpp#381 and
then calls obj->clazz->parse_line that leads to mimetpla.cpp:466

In that code 'charset_conversion_fn' should do the conversion. If it returns a
void pointer and a valid return code, the input string is used directly instead
of a converted string. 
This is that line :
http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/mimetext.cpp#406

In practice,  the implementation use the mime_convert_charset wrapper around
ConvertUsingEncoderAndDecoder, that makes sure that in case of failure, we
*always* returns a positive error code and a NULL return value.
"If we fail, we just use unconverted input" looks really ugly to me and opening
an avenue for an attaquer to be able to insert arbitrary content.

In bug 236941, charset_conversion_fn fails because the input is not valid WRT
utf-8 encoding rules, in bug 239236, it fails because the charset does not
exists. In both case, we end up with the raw input where we are expecting
already converted data.

I think the adequate treatement for bug 239236 should be to use US-ASCII when
the input charset is unknown and make it fall in a loop that like my patch
inserts 0xFFFD for every undecodable character (that would be every character
above 0x80).
The question is about *where* it's best to do it.

Jean-Marc Desperrier

Assignee

Comment 28

•

20 years ago

Attached patch Patch that replace invalid char with 0xFFFD, corrected (obsolete) — Details — Splinter Review

This new patch is quite cleaner, I'm no more taking the risk of doing anything
dangerous.
I test that the string is not utf-8 before ignoring nsMimeMessageSaveAs, and I
use a fall-back charset if there's no encoding for input. 
That fall-back is UTF-8, because 'us-ascii' is mapped to 'cp1252', and I want a
warning that somethings wrong with the message. We know at this stage the
message is not UTF-8, so that's for the best, the user will be visually warned
something has gone wrong.
(If currently unknown charsets are to be displayed as ISO-8859-1, the correct
mechanisme to use is to add them to the alias list for ISO-8859-1, not to
silently map anything unknown)

This patch works, and it makes MimeInlineTextPlain_parse_line rock solid WRT
problem in content that ought to be UTF-8. 
It probably would be cleaner to always feed MimeInlineTextPlain_parse_line rock
with the valid UTF-8 it should get, but I'd prefer someone with a better
knowledge of the intrisics of mailnews to do that kind of change.
I just wanted a patch as simple as possible to have a chance of getting it in
the Mozilla 1.7 branch.

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Attachment #145825 - Attachment is obsolete: true

Jungshik Shin

Comment 29

•

20 years ago

Comment on attachment 146038 [details] [diff] [review]
Patch that replace invalid char with 0xFFFD, corrected

Sorry it took me so long. 

>Index: mimetpla.cpp

>+    line[length] = 0; // see desired_size in mime_LineBuffer in mimebuf.cpp

Can you elaborate on this? 

>     nsDependentCString inputStr(line, length);


BTW, mimetpfl.cpp needs to be fixed the same way as mimetpla.cpp


>Index: nsMsgI18N.cpp

>+      PRUnichar * outBuffer = 
>+              NS_STATIC_CAST(PRUnichar*, nsMemory::Alloc((outBufferLength + 1) *
>+                                               sizeof(PRUnichar)));

You can use nsAutoBuffer<PRUnichar, 512> here to avoid the memory allocation in
most cases. 

With that change, this patch can go into 1.7branch. For 1.8alpha, the patch
needs to be changed a bit more.

Jean-Marc Desperrier

Assignee

Comment 30

•

20 years ago

Attached patch Patch that does the change in mimemoz2.cpp instead The patch does not need to be changed to apply to the 1.7 branch (obsolete) — Details — Splinter Review

Hum, I have a different patch since yesterday. 
It's almost the same code (slightly slimpler), but in mimetext.cpp and
mimemoz2.cpp. 
This is where mozilla originally tries to convert the string, and keeps the
same string if the conversion fails (which is a non-sense, but if we go there
we'd end up rewriting all of mailnews).

I conceptually feel it a lot better to correct there than in mimetpla.cpp.
This also means I don't need to guess something went wrong by adding lot of
"isutf8" testing, I now correct the problem "at the source".

>>Index: mimetpla.cpp
>>+    line[length] = 0; // see desired_size in mime_LineBuffer in mimebuf.cpp
>
> Can you elaborate on this? 

See comment #23 about that.

With the new patch, when the input line arrives there, it is already correctly
zero terminated, there is no assertion anymore, and no reason to change
anything in mimetpla.cpp.

> BTW, mimetpfl.cpp needs to be fixed the same way as mimetpla.cpp

I did a quick check and it seems that with the new patch the change applies to
both normal and flowed text without needing to change two things.

> You can use nsAutoBuffer<PRUnichar, 512> here to avoid the memory allocation 

> in most cases. 

The new patch is a bit shorter, I didn't need to change anything to the
allocation already done, so the problem doesn't occur.

Sorry about the late change, I wasn't really satisfied with the solution in the
first patch, but it took a while to get the time to rewrite it.

Attachment #146038 - Attachment is obsolete: true

Jean-Marc Desperrier

Assignee

Comment 31

•

20 years ago

Comment on attachment 146932 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

The patch does not need to be changed to apply to the 1.7 branch

I checked it works well under linux, for flowed too.

Attachment #146932 - Flags: superreview?(bienvenu)

Attachment #146932 - Flags: review?(jshin)

Attachment #146932 - Flags: approval1.7?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 32

•

20 years ago

Comment on attachment 146932 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

The patch does not need to be changed to apply to the 1.7 branch

Please attach unified diffs (cvs diff -pu8), and please request approval after
you have review and super-review (and preferably after landing on the trunk).

Attachment #146932 - Flags: approval1.7?

Jean-Marc Desperrier

Assignee

Comment 33

•

20 years ago

Comment on attachment 146932 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

The patch does not need to be changed to apply to the 1.7 branch

Index: mimetext.cpp
===================================================================
RCS file: /cvsroot/mozilla/mailnews/mime/src/mimetext.cpp,v
retrieving revision 1.46
diff -p -u -8 -r1.46 mimetext.cpp
--- mimetext.cpp	17 Apr 2004 18:33:13 -0000	1.46
+++ mimetext.cpp	29 Apr 2004 18:40:53 -0000
@@ -385,16 +385,20 @@ MimeInlineText_convert_and_parse_line(ch
	   SetMailCharacterSetToMsgWindow(obj, text->charset);
       }
     }
   }

   //initiate decoder if not yet
   if (text->inputDecoder == nsnull)
     MIME_get_unicode_decoder(text->charset,
getter_AddRefs(text->inputDecoder));
+  // If no decoder found, use ""UTF-8"", that will map most non-US-ASCII chars
as 
+  // Invalid. A pure-ASCII only decoder would be better, but there is none
+  if (text->inputDecoder == nsnull)
+    MIME_get_unicode_decoder("UTF-8", getter_AddRefs(text->inputDecoder));
   if (text->utf8Encoder == nsnull)
     MIME_get_unicode_encoder("UTF-8", getter_AddRefs(text->utf8Encoder));

   PRBool useInputCharsetConverter =
obj->options->m_inputCharsetToUnicodeDecoder &&
!nsCRT::strcasecmp(text->charset,
obj->options->charsetForCachedInputDecoder.get());

   if (useInputCharsetConverter)
     status = obj->options->charset_conversion_fn(line, length,
			  text->charset,
Index: mimemoz2.cpp
===================================================================
RCS file: /cvsroot/mozilla/mailnews/mime/src/mimemoz2.cpp,v
retrieving revision 1.207
diff -p -u -8 -r1.207 mimemoz2.cpp
--- mimemoz2.cpp	17 Apr 2004 18:33:13 -0000	1.207
+++ mimemoz2.cpp	29 Apr 2004 18:40:55 -0000
@@ -760,31 +760,64 @@ int ConvertUsingEncoderAndDecoder(const 
   else {
     unichars = localbuf;
     unicharLength = klocalbufsize+1;
   }
   if (unichars == nsnull) {
     rv = NS_ERROR_OUT_OF_MEMORY;
   }
   else {
-    // convert to unicode
-    rv = decoder->Convert(stringToUse, &srcLen, unichars, &unicharLength);
+    // convert to unicode, replacing failed chars with 0xFFFD as in
+    // the methode used in nsXMLHttpRequest::ConvertBodyToText and
nsScanner::Append
+    const char * inBuffer = stringToUse;
+    PRInt32 totalChars = 0,
+	     outBufferIndex = 0,
+	     outLen = unicharLength;
+    PRInt32 dataLen = srcLen;		
+
+    do {
+      PRInt32 inBufferLength = dataLen;
+      rv = decoder->Convert(inBuffer, 
+			    &inBufferLength, 
+			    &unichars[outBufferIndex], 
+			    &outLen);
+      totalChars += outLen;
+      if (NS_FAILED(rv)) {
+	 // We consume one byte, replace it with U+FFFD
+	 // and try the conversion again.
+	 unichars[outBufferIndex + outLen++] = (PRUnichar)0xFFFD;
+	 outBufferIndex += outLen;
+	 outLen = unicharLength - (++totalChars);
+
+	 decoder->Reset();
+
+	 if((inBufferLength + 1) > dataLen) {
+	   inBufferLength = dataLen;
+	 } else {
+	   inBufferLength++;
+	 }
+
+	 inBuffer = &inBuffer[inBufferLength];
+	 dataLen -= inBufferLength;
+      }
+    } while ( NS_FAILED(rv) && (dataLen > 0) );
+    
     if (NS_SUCCEEDED(rv)) {
-      rv = encoder->GetMaxLength(unichars, unicharLength, &dstLength);
+      rv = encoder->GetMaxLength(unichars, totalChars, &dstLength);
       // allocale an output buffer
       dstPtr = (char *) PR_Malloc(dstLength + 1);
       if (dstPtr == nsnull) {
	 rv = NS_ERROR_OUT_OF_MEMORY;
       }
       else {
	 PRInt32 buffLength = dstLength

Attachment #146932 - Attachment description: Patch that does the change in mimemoz2.cpp instead → Patch that does the change in mimemoz2.cpp instead The patch does not need to be changed to apply to the 1.7 branch

Jean-Marc Desperrier

Assignee

Comment 34

•

20 years ago

Attached patch Patch that does the change in mimemoz2.cpp instead (obsolete) — Details — Splinter Review

Next time I'll remember what "edit attachment as comment" really means :-(
I hope jshin has some time to review the new version now. This patch can be
applied to trunk and 1.7 branch as is.

Attachment #146932 - Attachment is obsolete: true

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Attachment #147329 - Flags: superreview?(bienvenu)

Attachment #147329 - Flags: review?(jshin)

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Attachment #146932 - Flags: superreview?(bienvenu)

Attachment #146932 - Flags: review?(jshin)

Jungshik Shin

Comment 35

•

20 years ago

Comment on attachment 147329 [details] [diff] [review]
Patch that does the change in mimemoz2.cpp instead

r=jshin 

>+    const char * inBuffer = stringToUse;
>+    PRInt32 totalChars = 0,
>+            outBufferIndex = 0,
>+            outLen = unicharLength;
>+    PRInt32 dataLen = srcLen;          
>+
>+    do {
>+      PRInt32 inBufferLength = dataLen;
>+      rv = decoder->Convert(inBuffer, 
>+                           &inBufferLength, 
>+                           &unichars[outBufferIndex], 
>+                           &outLen);
>+      totalChars += outLen;
>+      if (NS_FAILED(rv)) {
>+        // We consume one byte, replace it with U+FFFD
>+        // and try the conversion again.
>+        unichars[outBufferIndex + outLen++] = (PRUnichar)0xFFFD;

	   |PRUnichar(0xFFFD)| seems better.

>+        outBufferIndex += outLen;
>+        outLen = unicharLength - (++totalChars);

It might be nice to add some comments about totalChars, dataLen, etc.

David :Bienvenu

Comment 36

•

20 years ago

If I'm reading this correctly,

        if((inBufferLength + 1) > dataLen) {
          inBufferLength = dataLen;
        } else {
          inBufferLength++;

could be:
        if ((inBufferLength + 1) > dataLen)
          break;
        inBufferLength++;

is that correct? Other than that, it looks fine to me.

Jean-Marc Desperrier

Assignee

Comment 37

•

20 years ago

Attached patch New version of the patch (obsolete) — Details — Splinter Review

The comments tempted me into rewriting the loop, especially bienvenu's comment
that it's OK to use 'break'.

New version is a bit simpler, has more comment, does not call decoder->Convert
for a zero length string if the last byte is invalid (as the code does a
decoder->Reset() before, it really was not useful). The code was dependant on
the fact that call would return NS_SUCCEEDED.

I tested border cases (last character of the line invalid, all characters in
the line invalid, format=flowed)

A "if (NS_SUCCEEDED(rv)" test goes out, and this changes the indentation of the
encoder-> part, but the lines did not change at all truly.

Attachment #147329 - Attachment is obsolete: true

Jean-Marc Desperrier

Assignee

Comment 38

•

20 years ago

Attached patch New version of the patch — Details — Splinter Review

Preceeding version was not the right patch file :-(

comment #37 still applies about simplifying the loop, adding comments.

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Attachment #147732 - Attachment is obsolete: true

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Attachment #147329 - Flags: superreview?(bienvenu)

Attachment #147329 - Flags: review?(jshin)

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Attachment #147733 - Flags: superreview?(bienvenu)

Attachment #147733 - Flags: review?(jshin)

Jungshik Shin

Comment 39

•

20 years ago

Comment on attachment 147733 [details] [diff] [review]
New version of the patch

r=jshin
sorry for the delay

Attachment #147733 - Flags: review?(jshin) → review+

Simon Montagu :smontagu

Updated

•

20 years ago

Blocks: 243747

Jean-Marc Desperrier

Assignee

Comment 40

•

20 years ago

Bienvenu, can you SR this patch or forward to someone who can ?
Too bad 1.8a1 is already out (I should have realized this was coming !), but at
least let's not loose more time ...

Scott, in my opinion this patch should go in aviary 1.0 branch too, but I don't
know what is the proper way to request that. 
Maybe you can do the SR if bienvenu can not.

David :Bienvenu

Updated

•

20 years ago

Attachment #147733 - Flags: superreview?(bienvenu) → superreview+

Scott MacGregor

Comment 41

•

20 years ago

thanks a lot for the heads up. I checked this into the aviary 1.0 branch

Whiteboard: fixed-aviary1.0

Jean-Marc Desperrier

Assignee

Comment 42

•

20 years ago

OK, I could verify the fix in the 2004-06-01-02-0.7 TB nightly.
Who can check in trunk ?

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

No longer blocks: 243747

Jean-Marc Desperrier

Assignee

Comment 43

•

20 years ago

*** Bug 243747 has been marked as a duplicate of this bug. ***

Jungshik Shin

Comment 44

•

20 years ago

Ok. I can check that into the trunk. How about 1.7branch? Has it been taken care
of? Apparently not because even a1.7 was not asked for, yet.

Jean-Marc Desperrier

Assignee

Comment 45

•

20 years ago

Comment on attachment 147733 [details] [diff] [review]
New version of the patch

Requesting approval for 1.7

Attachment #147733 - Flags: approval1.7?

chris hofmann

Comment 46

•

20 years ago

Comment on attachment 147733 [details] [diff] [review]
New version of the patch

a=chofmann for 1.7

Attachment #147733 - Flags: approval1.7? → approval1.7+

(not reading, please use seth@sspitzer.org instead)

Updated

•

20 years ago

Blocks: 244829

Jungshik Shin

Updated

•

20 years ago

No longer blocks: 244829

Jungshik Shin

Comment 47

•

20 years ago

fix checked into 1.7 branch

Status: ASSIGNED → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

Whiteboard: fixed-aviary1.0 → fixed-aviary1.0,fixed1.7

Bernard Alleysson

Updated

•

20 years ago

Keywords: fixed1.7

Whiteboard: fixed-aviary1.0,fixed1.7 → fixed-aviary1.0

Jean-Marc Desperrier

Assignee

Updated

•

20 years ago

Blocks: 251634

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: MailNews → Core

Nobody; OK to take it and work on it

Updated

•

16 years ago

Product: Core → MailNews Core

rfc 822 test case 20 years ago Jean-Marc Desperrier 2.15 KB, message/rfc822		Details
catching failed conversions 20 years ago Karsten Düsterloh 2.90 KB, patch		Details \| Diff \| Splinter Review
autodetect charset if possible 20 years ago Karsten Düsterloh 4.53 KB, patch		Details \| Diff \| Splinter Review
Patch that displays invalid charcters as 0xFFFD 20 years ago Jean-Marc Desperrier 1.96 KB, patch		Details \| Diff \| Splinter Review
Patch that displays invalid charcters as 0xFFFD 20 years ago Jean-Marc Desperrier 5.97 KB, patch		Details \| Diff \| Splinter Review
Patch that replace invalid char with 0xFFFD, corrected 20 years ago Jean-Marc Desperrier 6.28 KB, patch		Details \| Diff \| Splinter Review
Patch that does the change in mimemoz2.cpp instead The patch does not need to be changed to apply to the 1.7 branch 20 years ago Jean-Marc Desperrier 2.31 KB, patch		Details \| Diff \| Splinter Review
Patch that does the change in mimemoz2.cpp instead 20 years ago Jean-Marc Desperrier 4.09 KB, patch		Details \| Diff \| Splinter Review
New version of the patch 20 years ago Jean-Marc Desperrier 2.31 KB, patch		Details \| Diff \| Splinter Review
New version of the patch 20 years ago Jean-Marc Desperrier 5.32 KB, patch	jshin1987 : review+ Bienvenu : superreview+ chofmann : approval1.7+	Details \| Diff \| Splinter Review