Closed Bug 1350080 Opened 4 years ago Closed 4 years ago

base64 message/rfc822 attachment gets corrupted if saved or forwarded. Body is base64 decoded but CTE is still base64

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(thunderbird_esr5253+ fixed, thunderbird53 fixed, thunderbird54 fixed, thunderbird55 fixed)

RESOLVED FIXED
Thunderbird 55.0
Tracking Status
thunderbird_esr52 53+ fixed
thunderbird53 --- fixed
thunderbird54 --- fixed
thunderbird55 --- fixed

People

(Reporter: carlos.velasco, Assigned: infofrommozilla)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 3 obsolete files)

Attached file message.eml
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20170316213829

Steps to reproduce:

Received mail message with RFC822 attachment base64 encoded.
If attach is saved or forwarded it gets corrupted by TB.
Tested with nightly too (Win 64).

Attached same message to reproduce issue.


Actual results:

Attach saved base64 decoded.


Expected results:

Attach saved base64 encoded (without alteration).
There are a few things funny here. The attachment header is somewhat missing information:
Content-Type: message/rfc822
Content-Disposition: attachment;
	creation-date="Thu, 23 Mar 2017 16:03:52 GMT";
	modification-date="Thu, 23 Mar 2017 16:03:52 GMT"

Thunderbird usually creates:
Content-Type: message/rfc822;
 name="Attached Message"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="Attached Message"

In the attachment area I see "foo.eml", but when saving it to disk, I get a file whose name is the name of the folder I had imported the message into and from which I dragged it to the desktop.

The content of the message is decoded, but the header still says |Content-Transfer-Encoding: base64|, so that's where the corruption comes from.

Forwarding the entire message (from attachment 8850706 [details]) works since the attachment is left untouched, tested TB 45 and Daily TB 55. Did you really forward the message or did you do some other operation?

Another bug in the same area is bug 1343488.

In summary: Handling of message/rfc822 attachments has various problems :-(
Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 1343488
Sample is taken from a real life email with most information (headers) removed for submitting here and privacy reasons.
I can send you a real life email (or some) like this if it is handled privately, although I think the sample provided show well the issue.

Note that these emails (with RFC822 attachment) are sent from outlook clients, not TB.

IMHO TB should not change attachment at all, just save it using base64 as in the original.

Forwarding: the issue happens when you open the RFC822 attachment inside email in TB and then try to forward it as attachment. Same behaviour as saving it, body base64 decoded but transfer-encoding still base64.

Bug 1343488 seems similar in that relates to RFC822 attachment handling, but no base64 involved there.
I think the problem could be similar. I can not understand why TB changes the encoding of the original attachment at all. It should just copy it unchanged.
(In reply to Carlos Velasco from comment #2)
> Sample is taken from a real life email with most information (headers)
> removed for submitting here and privacy reasons.
Sure.

> I can send you a real life email (or some) like this if it is handled
> privately, although I think the sample provided show well the issue.
Not necessary, thanks for the offer.
 
> Note that these emails (with RFC822 attachment) are sent from outlook
> clients, not TB.
Thanks for letting us know. So we should take some action to stay compatible.

> IMHO TB should not change attachment at all, just save it using base64 as in
> the original.
Agreed.

> Forwarding: the issue happens when you open the RFC822 attachment inside
> email in TB and then try to forward it as attachment. Same behaviour as
> saving it, body base64 decoded but transfer-encoding still base64.
OK, STR: Open "Foo.eml" by double-clicking it in the attachment area. Then click "Forward", right?
That gives a nicely garbled result.

> Bug 1343488 seems similar in that relates to RFC822 attachment handling, but
> no base64 involved there.
The problem in bug 1343488 is that when a message that originally had CTE quoted-printable is attached, it is decoded before attaching while maintaining the header |Content-Transfer-Encoding: quoted-printable|. So then, when reading that attachment, every "=" will cause trouble since it's interpreted as part of the QP encoding.

> I think the problem could be similar.
I think it's the same problem. The attachment is decoded while maintaining the same CTE. And that doesn't fly :-(

I'll look into it but can't promise a quick solution. Frankly, I also want to see bug 1343488 fixed.
Summary: base64 attachment gets corrupted if saved or forwarded. Body is base64 decoded but encoding is still base64 → base64 message/rfc822 attachment gets corrupted if saved or forwarded. Body is base64 decoded but CTE is still base64
'MimeLeaf_parse_begin(MimeObject *obj)' from mimeleaf.cpp is called, to decode the Body for displaying. But it is also called when reloading the Attachment for saving or forwarding.
Adding an exception for text messages, could be an solution.
This works for me and could also fix Bug 1343488
Just when I got ready to roll up my sleeves to look at it ;-)
Can you please attach a patch for me to review. If I write the patch, I need to find a reviewer, and they are all so busy with long review queues :-(
Attachment #8851403 - Flags: review?(jorgk)
Comment on attachment 8851403 [details] [diff] [review]
Prevent Content-Type decoding when saving or forwarding Text-Attachments.

Review of attachment 8851403 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch. It's after midnight here now, so I'll try it out tomorrow. I'll also send it to our try server to make sure it doesn't break any tests.

In the meantime, can you answer the question below.

::: mailnews/mime/src/mimeleaf.cpp
@@ +91,5 @@
>  
>    /* Initialize a decoder if necessary.
>     */
> +  if (!obj->encoding ||
> +	  (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw &&

Can you explain the this part. I'm sure you've looked into it in detail. I've worked in MIME code before and I'm also a reviewer here (since all the others disappeared), but I can't say that I'm 100% familiar with it.

@@ +95,5 @@
> +	  (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw &&
> +	   (!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
> +	    !PL_strcasecmp(obj->content_type, TEXT_HTML) ||
> +		!PL_strcasecmp(obj->content_type, MESSAGE_NEWS) ||
> +		!PL_strcasecmp(obj->content_type, MESSAGE_RFC822))))

Nit. Indentation isn't right here. But I can fix that for you. Looks two tabs slipped in here. We only use spaces.
(In reply to Jorg K (GMT+1) from comment #6)
> Comment on attachment 8851403 [details] [diff] [review]
> Prevent Content-Type decoding when saving or forwarding Text-Attachments.

> > +  if (!obj->encoding ||
> > +	  (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw &&
> 
> Can you explain the this part. I'm sure you've looked into it in detail.

nsMimeOutput::nsMimeMessageRaw stands for 'the raw RFC822 data (view source) and attachments'
It is used when the raw data is needed, as at saving. (Note: But binary attachments have to be decoded)

When displaying, ...->format_out has the value nsMimeOutput::nsMimeMessageBodyDisplay.

> I've worked in MIME code before and I'm also a reviewer here (since all the
> others disappeared), but I can't say that I'm 100% familiar with it.

I am certainly also not an expert. If in doubt, we should ask someone else.

> @@ +95,5 @@
> > +	  (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw &&
> > +	   (!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
> > +	    !PL_strcasecmp(obj->content_type, TEXT_HTML) ||
> > +		!PL_strcasecmp(obj->content_type, MESSAGE_NEWS) ||
> > +		!PL_strcasecmp(obj->content_type, MESSAGE_RFC822))))
> 
> Nit. Indentation isn't right here. But I can fix that for you. Looks two
> tabs slipped in here. We only use spaces.

Yes, actually there are tabs in all five lines.
I took the liberty to fix the white-space issue here and add a comment derived from what was said in comment #7. I hope you agree. Some reviewers get the contributor to do all the changes, but I think my way is faster and it becomes a more collaborative effort, of course always maintaining the credit for the original author. I've also tweaked the commit message.

I've tested this in the cases described in comment #0 and clarified in comment #3. It fixes bug 1343488, too.

One question: Why did you add an exception for text, html, news and message/rfc822 when we're testing message/rfc822 here? Can you think of a test case for the other types?

I've done some debugging with attachments of type "text/plain" and "text/html"

  if (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw)
    printf("=== obj->content_type %s %d\n", obj->content_type, obj->encoding?1:0);

and saw no effect of having those types included, the attachment is decoded as it should be.
So how do you feel about reducing the patch to:

+  if (!obj->encoding ||
+      // If we need the object as "raw" for saving or forwarding,
+      // don't decode certain types. Other output formats,
+      // like "display" (nsMimeMessageBodyDisplay), need decoding.
+      (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw &&
+       (!PL_strcasecmp(obj->content_type, MESSAGE_NEWS) ||
+        !PL_strcasecmp(obj->content_type, MESSAGE_RFC822))))
+    /* no-op */ ;

For now, I've left some comments in the code. I'm requesting your feedback on the changes I made.

Try run here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=5271c9a2fcbf848e823e69c3255117bbae38ff4e

(In reply to Alfred Peters from comment #7)
> If in doubt, we should ask someone else.
There is no one left to ask, that's why I became MIME peer:
https://wiki.mozilla.org/Modules/MailNews_Core#MIME_Parser
Assignee: nobody → jorgk
Attachment #8851403 - Attachment is obsolete: true
Attachment #8851403 - Flags: review?(jorgk)
Attachment #8851499 - Flags: feedback?(infofrommozilla)
With the try push coming out green, I'd be happy to accept my version of the patch (of course removing the lines commented out).

text/html must not be included in that condition since otherwise you can't save encoded text and html attachments, as I've just tried.

In fact, previously I typo-ed my manual tests adding test/plain and test/html instead of text/plain and text/html. With the correct tests and the additional
(!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
 !PL_strcasecmp(obj->content_type, TEXT_HTML) ||
we'd destroy the ability to save such attachments in a decoded form.
Final patch. Please provide feedback.
Attachment #8851499 - Attachment is obsolete: true
Attachment #8851499 - Flags: feedback?(infofrommozilla)
Attachment #8851534 - Flags: review+
Attachment #8851534 - Flags: feedback?(infofrommozilla)
Comment on attachment 8851534 [details] [diff] [review]
Bug1350080-message-base64.patch (v2).

Sorry for the late answer, but it was a long day at work.

No, that doesn't work. 'obj->content_type' is the header of the attached message not from the Mime-Part. Just as it is the Content-Transfer-Encoding header from the message, which is BASE64.

See attachment 8850706 [details]:
Content-Type: text/html; charset="utf-8"
Attachment #8851534 - Flags: feedback?(infofrommozilla) → feedback-
(In reply to Jorg K (GMT+1) from comment #8)
> I've tested this in the cases described in comment #0 and clarified in

Not here. With your patch The Mail ('foo.eml') is saved with decoded Body. So the Mail is broken.
 
> I've done some debugging with attachments of type "text/plain" and
> "text/html"

Could you provide an example?
OK, you originally had for conditions:
(!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
 !PL_strcasecmp(obj->content_type, TEXT_HTML) ||
 !PL_strcasecmp(obj->content_type, MESSAGE_NEWS) ||
 !PL_strcasecmp(obj->content_type, MESSAGE_RFC822))))

I changed it to the last two only, and I must have messed up this morning, that doesn't work. Neither the forward of the attached message works, nor saving it since it is decoded.

However, if I reduce it to the first two, it still works, but saving an encoded text doesn't, I'll attach a sample in a minute.
Attached file Foo - text.eml
If I haven't messed up again, I believe that this message is handled correctly in current TB, but with your patch, the attachment can't be saved. Well, it is saved "raw", which is undesired.

As I said, the two conditions that make your patch work
(!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
 !PL_strcasecmp(obj->content_type, TEXT_HTML) ||
get in the way of decoding any encoded text attachment.

I think we need to test the type of the mime part, hence my failed attempt of testing for MESSAGE_RFC822 (and news).
How about this, this works for me on messages and on the encoded text message attached here as well.
Attachment #8851534 - Attachment is obsolete: true
Attachment #8851789 - Flags: feedback?(infofrommozilla)
Comment on attachment 8851789 [details] [diff] [review]
Bug1350080-message-base64.patch (v3).

That's okay for me.
I had expected that we would get problems with binary attachments. But I was wrong.
Attachment #8851789 - Flags: feedback?(infofrommozilla) → feedback-
(In reply to Alfred Peters from comment #16)
> That's okay for me.
So why the f- then, shouldn't it be f+? :-(

> I had expected that we would get problems with binary attachments. But I was
> wrong.
Yes, I was looking for a case where binary attachments now wouldn't work, I even tried nested attachments, so a message containing a message with an image. Still worked. I've also considered doing:

  if (!obj->encoding ||
      // If we need the object as "raw" for saving or forwarding,
      // don't decode message types. Other output formats,
      // like "display" (nsMimeMessageBodyDisplay), need decoding.
      (obj->options->format_out == nsMimeOutput::nsMimeMessageRaw &&
       obj->parent &&
       (!PL_strcasecmp(obj->parent->content_type, MESSAGE_NEWS) ||
        !PL_strcasecmp(obj->parent->content_type, MESSAGE_RFC822))
       (!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
        !PL_strcasecmp(obj->content_type, TEXT_HTML))))
    /* no-op */ ;

That should be even safer.
Oops, && missing:
       obj->parent &&
       (!PL_strcasecmp(obj->parent->content_type, MESSAGE_NEWS) ||
        !PL_strcasecmp(obj->parent->content_type, MESSAGE_RFC822)) &&
       (!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
        !PL_strcasecmp(obj->content_type, TEXT_HTML))))

Now I'm wondering that if we do that the additional checks, whether we'd have to list other text types, like:
text/enriched, text/richtext, text/rft and more importantly text/calendar.

Maybe better without any additional checks. Trying that here:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=54748b4f1759ad370aeed45ea4ff12d1b0163879
Comment on attachment 8851789 [details] [diff] [review]
Bug1350080-message-base64.patch (v3).

(In reply to Jorg K (GMT+1) from comment #17)
> (In reply to Alfred Peters from comment #16)
> > That's okay for me.
> So why the f- then, shouldn't it be f+? :-(

Yes
Attachment #8851789 - Flags: feedback- → feedback+
Comment on attachment 8851789 [details] [diff] [review]
Bug1350080-message-base64.patch (v3).

OK, let's go with this then. Thanks for the feedback and thanks for getting the ball rolling and finding the spot that needed tweaking. I'll get it landed.
Attachment #8851789 - Flags: review+
(In reply to Jorg K (GMT+1) from comment #17)
> (In reply to Alfred Peters from comment #16)

> > I had expected that we would get problems with binary attachments. But I was
> > wrong.
> Yes, I was looking for a case where binary attachments now wouldn't work, I
> even tried nested attachments, so a message containing a message with an
> image. Still worked.

Me too.

>   I've also considered doing:
>        (!PL_strcasecmp(obj->parent->content_type, MESSAGE_NEWS) ||
>         !PL_strcasecmp(obj->parent->content_type, MESSAGE_RFC822))
>        (!PL_strcasecmp(obj->content_type, TEXT_PLAIN) ||
>         !PL_strcasecmp(obj->content_type, TEXT_HTML))))

> That should be even safer.

I do not see where that would be necessary. If we assume that CT='message/*' is always a mimepart, the CT of this message is either 'text/*' or is itself a 'multipart/*'. In the latter case, there is no BASE64/QP
https://hg.mozilla.org/comm-central/rev/2967d03cf5bc66717e3ff62ece9fc94faeae45e6
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 55.0
Component: Untriaged → MIME
Product: Thunderbird → MailNews Core
No idea how this got assigned to me, must have been accidental.
Assignee: jorgk → infofrommozilla
Comment on attachment 8851789 [details] [diff] [review]
Bug1350080-message-base64.patch (v3).

Let's uplift this since this is an annoying bug that makes TB looks bad when it comes to processing message attachments sent using the MS Outlook client.
Attachment #8851789 - Flags: approval-comm-esr52?
Attachment #8851789 - Flags: approval-comm-beta+
Attachment #8851789 - Flags: approval-comm-aurora+
Attachment #8850706 - Attachment mime type: message/rfc822 → text/plain
Blocks: 1352740
Attachment #8851789 - Flags: approval-comm-esr52? → approval-comm-esr52+
Depends on: 1361422
You need to log in before you can comment on or make changes to this bug.