User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20051012 Netscape/8.0.4
Build Identifier: Mozilla Thunderbird version 1.5 (20051025)
I receive a message with an attached quoted-printable encoded PDF document. Thunderberd decoded the '=00' encoded Null bytes in this attachment as '@' which resulted in an unuseable PDF document.
This occurred in both 1.5 beta 1 and 1.5 rc1
Steps to Reproduce:
1.open mail with quoted-printable encoded attachment containing '=00' encoded null bytes.
Saved attachment has '=00' decoded as '@'
'=00' decoded as NULL
Maybe related to Core bug 243199?
It looks like it could be related to the fix for bug 243199, but the patch "handle null the way bienvenu says mozmail handles null/patch/2004-05-13 00:33 PST" that's attached to that report replaces '=00' with ' ', not '@', and it may only be for decoding text, not attachments. I don't know if that code is used for non-text parts or not.
Outlook Express sends PDF in Quoted Printable if a ".PDF" file doesn't contains data greater than 0x7F, even though he sends ".PDF" in Base 64 if a ".PDF" contains data geater than 0x7F.
And latest Thunderbird trunk nightly saved 0x20 for the '=00' of ".PDF" in mail by O.E. when "Save As" for the attachment.
This means I can confirm problem if the problem is "=00 is saved as 0x20", instead of "=00 is decoded as @".
> Thunderberd decoded the '=00' encoded Null bytes in this attachment as '@'
Who displayed "@" and where "@" is displayed for '=00' of Quoted Printable (0x00 in Hexa)?
Created attachment 203685 [details]
Mail folder file(contains a mail created by Outlook Express)
The attached ".PDF' file contains data from 0x00 to 0x7F and some ASCII texts.
Please note that this is not a valid PDF data.
(1) Save file in your mail directry(say file name of 'TEST').
(2) Restart Thunderbird, then folder of "TEST" appears.
(3) "Save as" the PDF attachment.
Note: Outlook Express doesn't convert 0x09(Tab) to '=09' when Quoted Printable,
then 'Tab' is 'Tab' in test mail.
"FC /B" on original PDF and saved PDF.
> C:\TEST\X00-XFF>fc /b X00-X7F.pdf X00-X7F.pdf.By-Tb.pdf
> Comparing files X00-X7F.pdf and X00-X7F.PDF.BY-TB.PDF
> 00000019: 00 20
(In reply to comment #3)
> This means I can confirm problem if the problem is "=00 is saved as 0x20",
> instead of "=00 is decoded as @".
Yes. The statement above "=00 is saved as 0x20" correctly describes the problem.
> > Thunderberd decoded the '=00' encoded Null bytes in this attachment as '@'
> Who displayed "@" and where "@" is displayed for '=00' of Quoted Printable
> (0x00 in Hexa)?
I opened the email with PDF attached with a different MUA and saved the attachment. The second attachment was the same length and opened properly in Acrobat Reader so I did 'cmp -l first_file second_file' and determined that there were 5 bytes that were different and in the good file these 5 bytes were 000 and in the bad file they were 040 = 0x20 = us-ascii '@'. I also determined that the original quoted-printable encoded part had these bytes encoded as '=00'.
I chose to call them '@' in my original report. I now realize it would have been more clear to call them 0x20. Sorry for any confusion.
> 040 = 0x20 = us-ascii '@'
Your last '=' is invalid, if 'C' notation.
32 = 040 = 0x20 = '=20' in Q.P. = us-ascii Space
64 = 080 = 0x40 = '=40' in Q.P. = us-ascii '@'
Changing summary(@ -> 0x20).
(In reply to comment #6)
> > 040 = 0x20 = us-ascii '@'
> Your last '=' is invalid, if 'C' notation.
> 32 = 040 = 0x20 = '=20' in Q.P. = us-ascii Space
> 64 = 080 = 0x40 = '=40' in Q.P. = us-ascii '@'
You are correct, and I don't know what I was thinking - 6 bit BCD ??
(In reply to comment #7)
> Changing summary(@ -> 0x20).
Thanks. It now seems likely that this bug _is_ caused by the fix for bug 243199.
If that turns out to be the case, I would argue that it is more important to correctly decode =00 when saving a quoted-printable non-text part than it is to render as displayable a quoted-printable text part that probably erroneously contains =00. If bug 243199 can't be 'fixed' in some way that allows quoted-printable =00 to be correctly saved as 0x00, then I think that correctly saving quoted-printable non-text parts should take priority. RFC2045 is very clear that quoted-printable =00 translates to 0x00.
(Still trying to wipe the egg off my face)
Problem has been re-created with seamonkey/nightly/latest-trunk 2005111809 build (Win-2K).
Changing to Product=Core.
(In reply to comment #9)
> If that turns out to be the case, I would argue that it is more important to
> correctly decode =00 when saving a quoted-printable non-text part than it is
> to render as displayable a quoted-printable text part that probably
> erroneously contains =00.
Since the PDF is correctly identified in the message as "application/pdf" -- which will never be displayed inline -- there may be a way to do both. It
will mean more complexity in the decoder, since a flag would need to be
passed in to indicate whether the decoding is for display or for saving.
Or maybe the renderer is where the nul->space substitution needs to take place.
But I agree, timeless's patch in that other bug is, at best, too simplistic.
Breaking valid attachments is bad.
*** Bug 320274 has been marked as a duplicate of this bug. ***
Changing "Assigned To:" to timeless, the mother of this regression.
um. i'm very unlikely to fix this bug. note that a pdf can be handled by a plugin.
someone figure out a decent algorithm to distinguish between the two cases describe it here, and then poke me online.
(In reply to comment #14)
> um. i'm very unlikely to fix this bug. note that a pdf can be handled by a
This problem does not apply only to pdf's, and in any case, I'm not sure how a plugin would handle it unless you're saying that a plugin would be handed the quoted-printable encoded part and have to decode it itself.
> someone figure out a decent algorithm to distinguish between the two cases
> describe it here, and then poke me online.
What we apparently have given the current fix to bug 234199 is ANY quoted-printable encoded part is incorrectly decoded with =00 decoded as <space> instead of <null>. This breaks the decoding of quoted-printable parts.
Presumably, bug 234199 was caused by some stringification of a decoded quoted-printable text part being truncated because the embedded <null> was seen as a string terminator. Also, maybe this happened only when the text was being displayed, not when it was being written to a non-display file.
In any case, bug 234199 would better be fixed in whatever renders the text for display, not by breaking quoted-printable decoding. Furthermore, in rendering for display, it would be better to simply drop the <null> rather than convert it to <space>. If the decoding has to be munged in some way for text/* parts, then at least do it only for text/* content-types and not for any non-text content-types.
What happens for a base64 encoded text/plain part that contains a null byte?
Maybe it would be best to just say that text/* parts containing null bytes are inherently broken and not try to fix them. This would be better IMO than breaking other quoted-printable encoded parts.
Note: Bug 234199 in Comment #15 is not proper bug number.
This regression was generated by patch for Bug 243199.
David Bienvenu, who was the reviewer/super reviewer of the patch for Bug 243199, what do you think about "should be"?
I think we should try to fix this. I'm not sure if we can do that w/o regressing the previous fix, but I'll look into it when I get a chance.
Created attachment 206155 [details]
Unencoded text/plain email containing null byte
Created attachment 206156 [details]
base64 encoded text/plain email containing a null byte
(In reply to comment #16)
> Note: Bug 234199 in Comment #15 is not proper bug number.
> This regression was generated by patch for Bug 243199.
I can't seem to get anything right in this report. Thanks for the correction.
Re: my question in comment 15
>What happens for a base64 encoded text/plain part that contains a null byte?
The answer is the line containing the null byte is truncated at the null byte when the message is displayed. Now PLEASE let's not suggest converting <null> to <space> when decoding base64 encoding.
I have attached two emails to this report. Both have the same short text/plain body which contains a <null>. The base64.eml displays in Thunderbird with the line containg <null> truncated at the <null>. The Unencoded.eml displays the entire line with the null byte simply dropped, so it seems there is *something* that can display lines containing <null> without truncating them.
> I'm not sure how a
> plugin would handle it unless you're saying that a plugin would be handed the
> quoted-printable encoded part and have to decode it itself.
gecko would hand the plugin the decoded content...
(In reply to comment #17)
> I'm not sure if we can do that w/o regressing the previous fix
Should we propose one line patch which removes the added one line by patch of bug 243199?
Should we open new bug for regression of problem of bug 243199, which will certainly be created by above patch for this Bug 317009?
I hope developers to backout the patch who generated this regression as soon as possible, then re-consider on new solution of problem of bug 243199,
because I believe "Tb 1.5 without problem of 243199 but with problem of 317009"
is far worse than "Tb 1.5 without problem of 317009 but with problem of 243199".
(Please note that latter is same as Tb 1.0.x on these two bugs.)
we should back out 243199 for 1.5.0.x if we don't have a better fix. I'm getting reports of people not being able to open PDF attachments because of 243199 in addition to this bug.
*** Bug 308121 has been marked as a duplicate of this bug. ***
Created attachment 209629 [details] [diff] [review]
I made QP decoding know about the obj it was decoding, so it can handle embedded nulls differently. If we're decoding for display, convert 0x0 to space, otherwise leave it as null. I tried this with the example attachment, and it decodes it correctly now; however Acrobat won't open it - is it a real .pdf file?
Comment on attachment 209629 [details] [diff] [review]
libmime always makes me happy.
fix checked into trunk. It should be in tomorrow's build.
(In reply to comment #25)
> I tried this with the example attachment,
> and it decodes it correctly now; however Acrobat won't open it - is it a real
> .pdf file?
No, it is not a real PDF - see comment #4.
Out of curiosity: what was the point of the changes to mimecryp and mimedrft?
The code you added appears to have the same net result as the code you're now bypassing (by not setting 'fn').
Thanks to Scott for rattling the cages.
I just tested the nightly build of Thunderbird on Mac OS X. The new fix did the trick. Yesterday's nightly build corrupted a perfectly good PDF, but today's build handled it with flying colors.
Cheers to the Team.
Comment on attachment 209629 [details] [diff] [review]
we want this for 18.104.22.168 as well...
Mike, I had to change the signature of the qp decoder init function, to take an extra arg, so it no longer matches the type signature of the other decoders; thus, MimeQPDecoder init can't be assigned to fn, or used. I could have changed all the other decoders to take an unused extra arg but this seemed to be the lesser of two evils. Marking fixed.
Comment on attachment 209629 [details] [diff] [review]
based on Jason's testing feedback from the trunk builds, approving for Thunderbird 2.0.
I also confirm that the 01/26 Windows build fixes the PDF issue for me. Opening a PDF that was corrupted using 1.5 release now works like a charm. Thanx for all the hard work!
*** Bug 310546 has been marked as a duplicate of this bug. ***
Verified with Thunderbird 22.214.171.124 candidate from 2006-03-08 on Windows
*** Bug 316356 has been marked as a duplicate of this bug. ***
verified fixed 126.96.36.199 using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:188.8.131.52) Gecko/20070326 Thunderbird/184.108.40.206 Mnenhy/0.7.5.0 ID:2007032620