Last Comment Bug 317009 - Thunderbird incorrectly decodes =00 in quoted-printable attachments as 0x20, not NULL (regression by bug 243199)
: Thunderbird incorrectly decodes =00 in quoted-printable attachments as 0x20, ...
Status: VERIFIED FIXED
[qa:verified-tb-1802]
: fixed1.8.0.2, fixed1.8.1, regression, verified1.8.1.3
Product: MailNews Core
Classification: Components
Component: Attachments (show other bugs)
: Trunk
: x86 Windows XP
: -- major with 6 votes (vote)
: ---
Assigned To: David :Bienvenu
:
:
Mentors:
: 308121 316356 320274 (view as bug list)
Depends on: 243199
Blocks:
  Show dependency treegraph
 
Reported: 2005-11-18 10:04 PST by Mark Sapiro
Modified: 2008-07-31 01:21 PDT (History)
14 users (show)
mscott: blocking‑thunderbird2+
mscott: blocking1.8.0.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Mail folder file(contains a mail created by Outlook Express) (1.18 KB, text/plain)
2005-11-19 18:40 PST, WADA
no flags Details
Unencoded text/plain email containing null byte (644 bytes, message/rfc822)
2005-12-16 17:56 PST, Mark Sapiro
no flags Details
base64 encoded text/plain email containing a null byte (670 bytes, message/rfc822)
2005-12-16 17:58 PST, Mark Sapiro
no flags Details
proposed fix (9.93 KB, patch)
2006-01-25 14:01 PST, David :Bienvenu
mscott: superreview+
mscott: approval1.8.0.2+
mscott: approval1.8.1+
Details | Diff | Splinter Review

Description Mark Sapiro 2005-11-18 10:04:36 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20051012 Netscape/8.0.4
Build Identifier: Mozilla Thunderbird version 1.5 (20051025)

I receive a message with an attached quoted-printable encoded PDF document. Thunderberd decoded the '=00' encoded Null bytes in this attachment as '@' which resulted in an unuseable PDF document.

This occurred in both 1.5 beta 1 and 1.5 rc1

Reproducible: Always

Steps to Reproduce:
1.open mail with quoted-printable encoded attachment containing '=00' encoded null bytes.
2.Save attachment.
3.

Actual Results:  
Saved attachment has '=00' decoded as '@'

Expected Results:  
'=00' decoded as NULL
Comment 1 zug_treno 2005-11-19 05:28:26 PST
Maybe related to Core bug 243199?
Comment 2 Mark Sapiro 2005-11-19 07:58:26 PST
It looks like it could be related to the fix for bug 243199, but the patch "handle null the way bienvenu says mozmail handles null/patch/2004-05-13 00:33 PST" that's attached to that report replaces '=00' with ' ', not '@', and it may only be for decoding text, not attachments. I don't know if that code is used for non-text parts or not.
Comment 3 WADA 2005-11-19 18:20:11 PST
Outlook Express sends PDF in Quoted Printable if a ".PDF" file doesn't contains data greater than 0x7F, even though he sends ".PDF" in Base 64 if a ".PDF" contains data geater than 0x7F.
And latest Thunderbird trunk nightly saved 0x20 for the '=00' of ".PDF" in mail by O.E. when "Save As" for the attachment.   
This means I can confirm problem if the problem is "=00 is saved as 0x20", instead of "=00 is decoded as @".

> Thunderberd decoded the '=00' encoded Null bytes in this attachment as '@'

Who displayed "@" and where "@" is displayed for '=00' of Quoted Printable (0x00 in Hexa)?
Comment 4 WADA 2005-11-19 18:40:41 PST
Created attachment 203685 [details]
Mail folder file(contains a mail created by Outlook Express)

The attached ".PDF' file contains data from 0x00 to 0x7F and some ASCII texts.
Please note that this is not a valid PDF data.

(1) Save file in your mail directry(say file name of 'TEST').
(2) Restart Thunderbird, then folder of "TEST" appears.
(3) "Save as" the PDF attachment.
Note: Outlook Express doesn't convert 0x09(Tab) to '=09' when Quoted Printable,
      then 'Tab' is 'Tab' in test mail.

"FC /B" on original PDF and saved PDF.
> C:\TEST\X00-XFF>fc /b X00-X7F.pdf X00-X7F.pdf.By-Tb.pdf
> Comparing files X00-X7F.pdf and X00-X7F.PDF.BY-TB.PDF
> 00000019: 00 20
Comment 5 Mark Sapiro 2005-11-19 18:41:10 PST
(In reply to comment #3)
> This means I can confirm problem if the problem is "=00 is saved as 0x20",
> instead of "=00 is decoded as @".

Yes. The statement above "=00 is saved as 0x20" correctly describes the problem.
 
> > Thunderberd decoded the '=00' encoded Null bytes in this attachment as '@'
> 
> Who displayed "@" and where "@" is displayed for '=00' of Quoted Printable
> (0x00 in Hexa)?

I opened the email with PDF attached with a different MUA and saved the attachment. The second attachment was the same length and opened properly in Acrobat Reader so I did 'cmp -l first_file second_file' and determined that there were 5 bytes that were different and in the good file these 5 bytes were 000 and in the bad file they were 040 = 0x20 = us-ascii '@'. I also determined that the original quoted-printable encoded part had these bytes encoded as '=00'.

I chose to call them '@' in my original report. I now realize it would have been more clear to call them 0x20. Sorry for any confusion.
Comment 6 WADA 2005-11-19 19:20:02 PST
> 040 = 0x20 = us-ascii '@'
Your last '=' is invalid, if 'C' notation. 
 32 = 040 = 0x20 = '=20' in Q.P. = us-ascii Space
 64 = 080 = 0x40 = '=40' in Q.P. = us-ascii '@'
Comment 7 WADA 2005-11-19 19:28:44 PST
Changing summary(@ -> 0x20).
Comment 8 Mark Sapiro 2005-11-19 20:00:55 PST
(In reply to comment #6)
> > 040 = 0x20 = us-ascii '@'
> Your last '=' is invalid, if 'C' notation. 
>  32 = 040 = 0x20 = '=20' in Q.P. = us-ascii Space
>  64 = 080 = 0x40 = '=40' in Q.P. = us-ascii '@'

You are correct, and I don't know what I was thinking - 6 bit BCD ??

Comment 9 Mark Sapiro 2005-11-19 20:16:09 PST
(In reply to comment #7)
> Changing summary(@ -> 0x20).
> 

Thanks. It now seems likely that this bug _is_ caused by the fix for bug 243199.

If that turns out to be the case, I would argue that it is more important to correctly decode =00 when saving a quoted-printable non-text part than it is to render as displayable a quoted-printable text part that probably erroneously contains =00. If bug 243199 can't be 'fixed' in some way that allows quoted-printable =00 to be correctly saved as 0x00, then I think that correctly saving quoted-printable non-text parts should take priority. RFC2045 is very clear that quoted-printable =00 translates to 0x00.

(Still trying to wipe the egg off my face)
Comment 10 WADA 2005-11-20 12:48:42 PST
Problem has been re-created with seamonkey/nightly/latest-trunk 2005111809 build (Win-2K).
Changing to Product=Core.
Comment 11 Mike Cowperthwaite 2005-12-05 08:31:04 PST
(In reply to comment #9)
> If that turns out to be the case, I would argue that it is more important to
> correctly decode =00 when saving a quoted-printable non-text part than it is
> to render as displayable a quoted-printable text part that probably
> erroneously contains =00. 

Since the PDF is correctly identified in the message as "application/pdf" -- which will never be displayed inline -- there may be a way to do both.  It 
will mean more complexity in the decoder, since a flag would need to be 
passed in to indicate whether the decoding is for display or for saving.

Or maybe the renderer is where the nul->space substitution needs to take place.

But I agree, timeless's patch in that other bug is, at best, too simplistic.
Breaking valid attachments is bad.
Comment 12 Herbert Jägle 2005-12-14 22:32:33 PST
*** Bug 320274 has been marked as a duplicate of this bug. ***
Comment 13 WADA 2005-12-14 23:41:44 PST
Changing "Assigned To:" to timeless, the mother of this regression.
Comment 14 timeless 2005-12-15 00:49:41 PST
um. i'm very unlikely to fix this bug. note that a pdf can be handled by a plugin.

someone figure out a decent algorithm to distinguish between the two cases describe it here, and then poke me online.
Comment 15 Mark Sapiro 2005-12-15 08:31:07 PST
(In reply to comment #14)
> um. i'm very unlikely to fix this bug. note that a pdf can be handled by a
> plugin.


This problem does not apply only to pdf's, and in any case, I'm not sure how a plugin would handle it unless you're saying that a plugin would be handed the quoted-printable encoded part and have to decode it itself.


> someone figure out a decent algorithm to distinguish between the two cases
> describe it here, and then poke me online.
> 

What we apparently have given the current fix to bug 234199 is ANY quoted-printable encoded part is incorrectly decoded with =00 decoded as <space> instead of <null>. This breaks the decoding of quoted-printable parts.

Presumably, bug 234199 was caused by some stringification of a decoded quoted-printable text part being truncated because the embedded <null> was seen as a string terminator. Also, maybe this happened only when the text was being displayed, not when it was being written to a non-display file.

In any case, bug 234199 would better be fixed in whatever renders the text for display, not by breaking quoted-printable decoding. Furthermore, in rendering for display, it would be better to simply drop the <null> rather than convert it to <space>. If the decoding has to be munged in some way for text/* parts, then at least do it only for text/* content-types and not for any non-text content-types.

What happens for a base64 encoded text/plain part that contains a null byte?

Maybe it would be best to just say that text/* parts containing null bytes are inherently broken and not try to fix them. This would be better IMO than breaking other quoted-printable encoded parts.
Comment 16 WADA 2005-12-15 20:19:45 PST
Note: Bug 234199 in Comment #15 is not proper bug number.
      This regression was generated by patch for Bug 243199.

David Bienvenu, who was the reviewer/super reviewer of the patch for Bug 243199,  what do you think about "should be"?  
Comment 17 David :Bienvenu 2005-12-16 00:15:37 PST
I think we should try to fix this. I'm not sure if we can do that w/o regressing the previous fix, but I'll look into it when I get a chance.
Comment 18 Mark Sapiro 2005-12-16 17:56:31 PST
Created attachment 206155 [details]
Unencoded text/plain email containing null byte
Comment 19 Mark Sapiro 2005-12-16 17:58:18 PST
Created attachment 206156 [details]
base64 encoded text/plain email containing a null byte
Comment 20 Mark Sapiro 2005-12-16 18:11:33 PST
(In reply to comment #16)
> Note: Bug 234199 in Comment #15 is not proper bug number.
>       This regression was generated by patch for Bug 243199.

I can't seem to get anything right in this report. Thanks for the correction.

Re: my question in comment 15
>What happens for a base64 encoded text/plain part that contains a null byte?

The answer is the line containing the null byte is truncated at the null byte when the message is displayed. Now PLEASE let's not suggest converting <null> to <space> when decoding base64 encoding.

I have attached two emails to this report. Both have the same short text/plain body which contains a <null>. The base64.eml displays in Thunderbird with the line containg <null> truncated at the <null>. The Unencoded.eml displays the entire line with the null byte simply dropped, so it seems there is *something* that can display lines containing <null> without truncating them.
Comment 21 Christian :Biesinger (don't email me, ping me on IRC) 2005-12-17 13:06:31 PST
> I'm not sure how a
> plugin would handle it unless you're saying that a plugin would be handed the
> quoted-printable encoded part and have to decode it itself.

gecko would hand the plugin the decoded content...
Comment 22 WADA 2005-12-17 19:08:51 PST
(In reply to comment #17)
> I'm not sure if we can do that w/o regressing the previous fix

Should we propose one line patch which removes the added one line by patch of bug 243199?
Should we open new bug for regression of problem of bug 243199, which will certainly be created by above patch for this Bug 317009?

I hope developers to backout the patch who generated this regression as soon as possible, then re-consider on new solution of problem of bug 243199,
because I believe "Tb 1.5 without problem of 243199 but with problem of 317009"
is far worse than "Tb 1.5 without problem of 317009 but with problem of 243199".
(Please note that latter is same as Tb 1.0.x on these two bugs.)
Comment 23 Scott MacGregor 2006-01-25 08:52:52 PST
we should back out 243199 for 1.5.0.x if we don't have a better fix. I'm getting reports of people not being able to open PDF attachments because of 243199 in addition to this bug.
Comment 24 Scott MacGregor 2006-01-25 08:53:27 PST
*** Bug 308121 has been marked as a duplicate of this bug. ***
Comment 25 David :Bienvenu 2006-01-25 14:01:32 PST
Created attachment 209629 [details] [diff] [review]
proposed fix

I made QP decoding know about the obj it was decoding, so it can handle embedded nulls differently. If we're decoding for display, convert 0x0 to space, otherwise leave it as null. I tried this with the example attachment, and it decodes it correctly now; however Acrobat won't open it - is it a real .pdf file?
Comment 26 Scott MacGregor 2006-01-25 14:32:06 PST
Comment on attachment 209629 [details] [diff] [review]
proposed fix

libmime always makes me happy.
Comment 27 David :Bienvenu 2006-01-25 14:49:29 PST
fix checked into trunk. It should be in tomorrow's build.
Comment 28 Mark Sapiro 2006-01-25 15:49:19 PST
(In reply to comment #25)

> I tried this with the example attachment,
> and it decodes it correctly now; however Acrobat won't open it - is it a real
> .pdf file?


No, it is not a real PDF - see comment #4.

Comment 29 Mike Cowperthwaite 2006-01-26 06:23:49 PST
Out of curiosity: what was the point of the changes to mimecryp and mimedrft?
The code you added appears to have the same net result as the code you're now bypassing (by not setting 'fn').
Comment 30 Jason Wohlford 2006-01-26 07:41:15 PST
Thanks to Scott for rattling the cages.

I just tested the nightly build of Thunderbird on Mac OS X. The new fix did the trick. Yesterday's nightly build corrupted a perfectly good PDF, but today's build handled it with flying colors.

Cheers to the Team.
Comment 31 David :Bienvenu 2006-01-26 07:49:12 PST
Comment on attachment 209629 [details] [diff] [review]
proposed fix

we want this for 1.5.0.2 as well...
Comment 32 David :Bienvenu 2006-01-26 07:56:07 PST
Mike, I had to change the signature of the qp decoder init function, to take an extra arg, so it no longer matches the type signature of the other decoders; thus, MimeQPDecoder init can't be assigned to fn, or used. I could have changed all the other decoders to take an unused extra arg but this seemed to be the lesser of two evils. Marking fixed.
Comment 33 Scott MacGregor 2006-01-26 13:18:48 PST
Comment on attachment 209629 [details] [diff] [review]
proposed fix

based on Jason's testing feedback from the trunk builds, approving for Thunderbird 2.0.
Comment 34 Alexis Dorais-Joncas 2006-01-26 13:54:00 PST
I also confirm that the 01/26 Windows build fixes the PDF issue for me. Opening a PDF that was corrupted using 1.5 release now works like a charm. Thanx for all the hard work!
Comment 35 Christian J. Callsen 2006-02-01 00:53:52 PST
*** Bug 310546 has been marked as a duplicate of this bug. ***
Comment 36 Tracy Walker [:tracy] 2006-03-21 12:02:08 PST
Verified with Thunderbird 1.5.0.2 candidate from 2006-03-08 on Windows
Comment 37 Tomaz 2006-04-22 14:13:46 PDT
*** Bug 316356 has been marked as a duplicate of this bug. ***
Comment 38 Carsten Book [:Tomcat] 2007-04-05 07:38:59 PDT
verified fixed 1.8.1.3 using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.3) Gecko/20070326 Thunderbird/2.0.0.0 Mnenhy/0.7.5.0 ID:2007032620

Note You need to log in before you can comment on or make changes to this bug.