Closed Bug 1473893 Opened 6 years ago Closed 6 years ago

Detaching attachments corrupts the main body of the e-mail text

Categories

(MailNews Core :: MIME, defect)

defect
Not set
critical

Tracking

(thunderbird_esr52 fixed, thunderbird_esr60 fixed, thunderbird60 fixed, thunderbird61 wontfix, thunderbird62 fixed, thunderbird63 fixed)

RESOLVED FIXED
Thunderbird 63.0
Tracking Status
thunderbird_esr52 --- fixed
thunderbird_esr60 --- fixed
thunderbird60 --- fixed
thunderbird61 --- wontfix
thunderbird62 --- fixed
thunderbird63 --- fixed

People

(Reporter: kristen.panfilio, Assigned: jorgk-bmo)

References

()

Details

(Keywords: dataloss, regression)

Attachments

(3 files, 4 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8

Steps to reproduce:

This problem newly arose on 5th July 2018, possibly after the latest Thunderbird upgrade (now on 52.9.0, 64-bit)?  When detaching or deleting attachments from messages in either the Inbox or Sent folders, the main body of the e-mail message becomes corrupted in a message-specific way.  Variously the entire body of the message loses correct text encoding and becomes a string of nonsense symbols, or formatting is altered such that hidden formatting is rendered in visible characters (e.g., "<span>" or  in place of double spaces or carriage returns), or the message suddenly acquires a random non-white background color.

This is true for the specific message even when I close and reopen it, move it out of the Inbox into an offline (not IMAP) folder, and restart Thunderbird and my entire laptop.  If I change the setting to View > Message Body As > Plain text, then I can readily see the contents in a clean way, but plain text isn't very suitable for many incoming e-mails these days, meaning I'm perpetually toggling back and forth.  Also, the plain text view is only helpful in Thunderbird -- if I view the same IMAP message through my university's webmail interface, then regardless of the Thunderbird viewing settings, through that interface I can only see the corrupted, illegible version. 

This problem persists regardless of the text encoding options chosen (Preferences > Display > Formatting > Advanced..., now changed from the default Western ISO-8859-1 to Unicode UTF-8 for incoming mail).  Also, the altered text is carried over when included in a reply message from my account.  The "repair folder" function makes no difference (perhaps not surprising, given that message alteration persists even when transferred between folders).

Simply double-clicking to view attachments does not have this effect, but detaching and deleting files is important both for file organization and to keep the Thunderbird mail folder size down.
Joe, does this reproduce for you?
Flags: needinfo?(jsabash)
(In reply to Wayne Mery (:wsmwk) from comment #1)
> Joe, does this reproduce for you?

I am not able to repro this using win10
Flags: needinfo?(jsabash)
I am on Mac OS Sierra (10.12.6) and have been using Thunderbird on this system/ machine for three years.
I can (sadly) confirm this issue using a message from Outlook/Exchange (as stated at https://support.mozilla.org/en-US/questions/1224698). We have an example in attachment 8986821 [details] from bug 1470210. Most likely yet another regression from the security fixes we shipped in TB 52.9 :-(

STR: Import message, delete the two attachments (I didn't try detach).

In the test message I mentioned, changing the text encoding to Unicode almost displays the message correctly.

Alice, can you confirm the regression range on trunk only. Should be the same as bug 1470210 comment #14.
Status: UNCONFIRMED → NEW
Component: Untriaged → MIME
Ever confirmed: true
Flags: needinfo?(alice0775)
Product: Thunderbird → MailNews Core
Version: 52 Branch → 52
Attached file Test.eml
Reduced test case: Import message, delete attachment, content garbled.
The test message has the body encoded in windows-1252. After the attachment is removed, the body is re-encoded in UTF-8 but the message header still says:

Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
I repeated the steps using TB 52.8. There the message body is left untouched.
Thanks Alice!!
Severity: normal → critical
Keywords: dataloss
Somehow the problem appears to be bigger than some charset/encoding issue. Using the message from attachment 8986821 [details] from bug 1470210, we see that after deleting both attachments, a third attachment appears, which is the embedded image from the HTML part. The HTML part somehow lost the reference to it, so it now appears as attachment.

Due to the intricacies of MIME, this won't be easy to fix at all :-(
This makes the delete for the test.eml message work again. Of course it's totally wrong to hard-code "windows-1252", but it works for the message in windows-1252. The debug above prints UTF-8 twice, so even is this approach were heading in the right direction, it would be hard to get the correct charset at that point.

I'm not sure this is the correct approach.

Interestingly, I just produced a test message in ISO-2022-JP and remove and attachment. Lo and behold, *no* problem, the body remained ISO-2022-JP and wasn't re-encoded into UTF-8.
Korean in EUC-KR is obliterated, as is Russian in KOI8-U, or Japanese in Shift_JIS. ISO-2022-JP always had special treatment, since it's the only stateful encoding supported by Mozilla. It's certainly a different code path; no I idea why those messages withstand attachment removal when all other encodings, apart from UTF-8, are destroyed.
Reverting this change
https://hg.mozilla.org/comm-central/rev/96fab4a2b811#l3.54
-          clazz = (MimeObjectClass *)&mimeInlineTextHTMLClass;
+          clazz = (MimeObjectClass *)&mimeInlineTextHTMLParsedClass;
so reverting back to the original HTML inline class, fixes the issue. Of course it also reverts the EFAIL fix :-(

So there's something in that new class that's wrong.
Looking into this a little further with attachment 8986821 [details].

Previously, when removing attachments, the HTML was just copied over to the new message that didn't have the attachment. Now it's run through the HTML parser and serialised again. The effect is that the HTML is changed in the process.

Here a comparison:
Before       Now
&nbsp;       NBSP character encoded in UTF-8
we&#8217;ve  we’ve encoded in UTF-8

There's also a case of
wasn&#821=
7;t
where the code is broken into two lines and that gets messed up completely and we get: wasn̵7;t, yes some funny character after the n.

I think it's basically very wrong to subject a message to parsing and serialising of its HTML, just to copy that part to a new message where the attachments will be replaced.

So the fix would be to find the code-path that does that and switch that back to the previous HTML class which is still available in the system.
This fixes the issue. Looking at
https://dxr.mozilla.org/comm-central/rev/3bcadb80b56798587d199b6cbd4d80807c0ad619/mailnews/mime/public/nsIMimeStreamConverter.idl#19
I wonder whether other output types should be exempt from using the parsed class. When importing the test message I saw nsMimeMessageFilterSniffer = 12, I saw 2 for display, 4 for reply and 6 for forward.

Looking at
https://dxr.mozilla.org/comm-central/rev/3bcadb80b56798587d199b6cbd4d80807c0ad619/mailnews/mime/src/mimei.cpp#420
convinced myself to add the filter sniffer and others to the list.
Assignee: nobody → jorgk
Attachment #8990519 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8990526 - Flags: review?(mkmelin+mozilla)
Attachment #8990526 - Flags: review?(ben.bucksch)
Attachment #8990526 - Flags: review?(acelists)
Got a tip from FRG to add parenthesis. Maybe more noise here will wake up the reviewers during cucumber time (https://en.wikipedia.org/wiki/Silly_season).
Attachment #8990526 - Attachment is obsolete: true
Attachment #8990526 - Flags: review?(mkmelin+mozilla)
Attachment #8990526 - Flags: review?(ben.bucksch)
Attachment #8990526 - Flags: review?(acelists)
Attachment #8990545 - Flags: review?(mkmelin+mozilla)
Attachment #8990545 - Flags: review?(ben.bucksch)
Attachment #8990545 - Flags: review?(acelists)
OK, one test failed, I had to revert the first part of
https://hg.mozilla.org/comm-central/rev/a2472fbf9a2c07e95419cfefa1eb5dd3cd396d88
Now it passes.

Note that this bug keeping the entire project on tenterhooks, we had to scrap TB 60 beta 10 and purge TB 52.9 from our websites so it's no longer handed out to users.
Attachment #8990545 - Attachment is obsolete: true
Attachment #8990545 - Flags: review?(mkmelin+mozilla)
Attachment #8990545 - Flags: review?(ben.bucksch)
Attachment #8990545 - Flags: review?(acelists)
Attachment #8990577 - Flags: review?(mkmelin+mozilla)
Attachment #8990577 - Flags: review?(ben.bucksch)
Attachment #8990577 - Flags: review?(acelists)
Discussed with Aceman via IRC which other types might want to use the previous HTML class.

https://dxr.mozilla.org/comm-central/rev/a75878fe1f574a55f9e03ac27027f11e5fa336e3/mailnews/mime/src/nsStreamConverter.cpp#520
gives a clue. nsMimeOutput::nsMimeMessageSource might apply to HTML processing, but nsMimeOutput::nsMimeMessageRaw might need to be added. BTW, nsMimeOutput::nsMimeUnknown is unused.
Posted notice to tb-planning
I did some more testing: Without the patch, when saving a message as HTML, not the original HTML is saved, but the parsed/serialised version, so that's undesired.

nsMimeMessageRaw(5) happens with nsMimeMessageBodyDisplay(2) on a multipart/related message with HTML+image. So I don't think we want the original behaviour for "raw".

nsMimeMessageDecrypt is used in nsMsgAttachmentHandler::SnarfMsgAttachment, so I think that should have the original HTML returned, same as the filter sniffer.

So I think the patch is right.
Comment on attachment 8990577 [details] [diff] [review]
1473893-no-parsing-for-save.patch (v1b) + fixed test

Review of attachment 8990577 [details] [diff] [review]:
-----------------------------------------------------------------

So I can confirm this at least fixes the attachment detaching.
The other modes also sound reasonable to use the original message source, so let's try to add them.
Attachment #8990577 - Flags: review?(acelists) → review+
Same code, clarified comment a bit.
Attachment #8990577 - Attachment is obsolete: true
Attachment #8990577 - Flags: review?(mkmelin+mozilla)
Attachment #8990577 - Flags: review?(ben.bucksch)
Attachment #8990621 - Flags: review+
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/16adae064e80
Don't use parsed HTML MIME class if saving, processing attachments and in other cases. r=aceman
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 63.0
Attachment #8990621 - Flags: approval-comm-esr60+
Attachment #8990621 - Flags: approval-comm-beta+
question - I'm wondering how this squeaked past unit tests and I haven't followed closely enough - is this a case where delete/detach testcase succeeded in both PRE-bug-1419417 unit tests and bug-1419417 unit tests?  I.E. both testcases were flawed?
Flags: needinfo?(jorgk)
There appears to be a unit test
https://dxr.mozilla.org/comm-central/source/mailnews/base/test/unit/test_detachToFile.js
that tests detaching attachments. Briefly glancing at it, I see that it checks for "AttachmentDetached" in the resulting message. It certainly doesn't check the body of the resulting e-mail. So the test is somewhat deficient.

In general, I'm surprised on a regular basis how much of TB's functionality is covered by tests, but there is also a lot of functionality *not* covered by tests. All the security fixes were rushed through review and release and none of them is covered by any test.
Flags: needinfo?(jorgk)
See Also: → 1474288
Attached file test base64.eml
Another test case motivated by bug 1474330.
You need to log in before you can comment on or make changes to this bug.