Detaching attachments corrupts the main body of the e-mail text

RESOLVED FIXED in Thunderbird 63.0

Status

defect
--
critical
RESOLVED FIXED
10 months ago
9 months ago

People

(Reporter: kristen.panfilio, Assigned: jorgk)

Tracking

({dataloss, regression})

Thunderbird 63.0
Dependency tree / graph

Thunderbird Tracking Flags

(thunderbird_esr52 fixed, thunderbird_esr60 fixed, thunderbird60 fixed, thunderbird61 wontfix, thunderbird62 fixed, thunderbird63 fixed)

Details

(URL)

Attachments

(3 attachments, 4 obsolete attachments)

(Reporter)

Description

10 months ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8

Steps to reproduce:

This problem newly arose on 5th July 2018, possibly after the latest Thunderbird upgrade (now on 52.9.0, 64-bit)?  When detaching or deleting attachments from messages in either the Inbox or Sent folders, the main body of the e-mail message becomes corrupted in a message-specific way.  Variously the entire body of the message loses correct text encoding and becomes a string of nonsense symbols, or formatting is altered such that hidden formatting is rendered in visible characters (e.g., "<span>" or  in place of double spaces or carriage returns), or the message suddenly acquires a random non-white background color.

This is true for the specific message even when I close and reopen it, move it out of the Inbox into an offline (not IMAP) folder, and restart Thunderbird and my entire laptop.  If I change the setting to View > Message Body As > Plain text, then I can readily see the contents in a clean way, but plain text isn't very suitable for many incoming e-mails these days, meaning I'm perpetually toggling back and forth.  Also, the plain text view is only helpful in Thunderbird -- if I view the same IMAP message through my university's webmail interface, then regardless of the Thunderbird viewing settings, through that interface I can only see the corrupted, illegible version. 

This problem persists regardless of the text encoding options chosen (Preferences > Display > Formatting > Advanced..., now changed from the default Western ISO-8859-1 to Unicode UTF-8 for incoming mail).  Also, the altered text is carried over when included in a reply message from my account.  The "repair folder" function makes no difference (perhaps not surprising, given that message alteration persists even when transferred between folders).

Simply double-clicking to view attachments does not have this effect, but detaching and deleting files is important both for file organization and to keep the Thunderbird mail folder size down.

Comment 1

10 months ago
Joe, does this reproduce for you?
Flags: needinfo?(jsabash)

Comment 2

10 months ago
(In reply to Wayne Mery (:wsmwk) from comment #1)
> Joe, does this reproduce for you?

I am not able to repro this using win10
Flags: needinfo?(jsabash)
(Reporter)

Comment 3

10 months ago
I am on Mac OS Sierra (10.12.6) and have been using Thunderbird on this system/ machine for three years.
(Assignee)

Comment 4

10 months ago
I can (sadly) confirm this issue using a message from Outlook/Exchange (as stated at https://support.mozilla.org/en-US/questions/1224698). We have an example in attachment 8986821 [details] from bug 1470210. Most likely yet another regression from the security fixes we shipped in TB 52.9 :-(

STR: Import message, delete the two attachments (I didn't try detach).

In the test message I mentioned, changing the text encoding to Unicode almost displays the message correctly.

Alice, can you confirm the regression range on trunk only. Should be the same as bug 1470210 comment #14.
Status: UNCONFIRMED → NEW
Component: Untriaged → MIME
Ever confirmed: true
Flags: needinfo?(alice0775)
Product: Thunderbird → MailNews Core
Version: 52 Branch → 52
(Assignee)

Comment 5

10 months ago
Posted file Test.eml
Reduced test case: Import message, delete attachment, content garbled.
(Assignee)

Comment 6

10 months ago
The test message has the body encoded in windows-1252. After the attachment is removed, the body is re-encoded in UTF-8 but the message header still says:

Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
(Assignee)

Comment 7

10 months ago
I repeated the steps using TB 52.8. There the message body is left untouched.
(Assignee)

Comment 9

10 months ago
Thanks Alice!!
Severity: normal → critical
Keywords: dataloss
(Assignee)

Comment 10

10 months ago
Somehow the problem appears to be bigger than some charset/encoding issue. Using the message from attachment 8986821 [details] from bug 1470210, we see that after deleting both attachments, a third attachment appears, which is the embedded image from the HTML part. The HTML part somehow lost the reference to it, so it now appears as attachment.

Due to the intricacies of MIME, this won't be easy to fix at all :-(
(Assignee)

Comment 11

10 months ago
This makes the delete for the test.eml message work again. Of course it's totally wrong to hard-code "windows-1252", but it works for the message in windows-1252. The debug above prints UTF-8 twice, so even is this approach were heading in the right direction, it would be hard to get the correct charset at that point.

I'm not sure this is the correct approach.

Interestingly, I just produced a test message in ISO-2022-JP and remove and attachment. Lo and behold, *no* problem, the body remained ISO-2022-JP and wasn't re-encoded into UTF-8.
(Assignee)

Comment 12

10 months ago
Korean in EUC-KR is obliterated, as is Russian in KOI8-U, or Japanese in Shift_JIS. ISO-2022-JP always had special treatment, since it's the only stateful encoding supported by Mozilla. It's certainly a different code path; no I idea why those messages withstand attachment removal when all other encodings, apart from UTF-8, are destroyed.
(Assignee)

Comment 13

10 months ago
Reverting this change
https://hg.mozilla.org/comm-central/rev/96fab4a2b811#l3.54
-          clazz = (MimeObjectClass *)&mimeInlineTextHTMLClass;
+          clazz = (MimeObjectClass *)&mimeInlineTextHTMLParsedClass;
so reverting back to the original HTML inline class, fixes the issue. Of course it also reverts the EFAIL fix :-(

So there's something in that new class that's wrong.
(Assignee)

Comment 14

10 months ago
Looking into this a little further with attachment 8986821 [details].

Previously, when removing attachments, the HTML was just copied over to the new message that didn't have the attachment. Now it's run through the HTML parser and serialised again. The effect is that the HTML is changed in the process.

Here a comparison:
Before       Now
&nbsp;       NBSP character encoded in UTF-8
we&#8217;ve  we’ve encoded in UTF-8

There's also a case of
wasn&#821=
7;t
where the code is broken into two lines and that gets messed up completely and we get: wasn̵7;t, yes some funny character after the n.

I think it's basically very wrong to subject a message to parsing and serialising of its HTML, just to copy that part to a new message where the attachments will be replaced.

So the fix would be to find the code-path that does that and switch that back to the previous HTML class which is still available in the system.
(Assignee)

Comment 15

10 months ago
This fixes the issue. Looking at
https://dxr.mozilla.org/comm-central/rev/3bcadb80b56798587d199b6cbd4d80807c0ad619/mailnews/mime/public/nsIMimeStreamConverter.idl#19
I wonder whether other output types should be exempt from using the parsed class. When importing the test message I saw nsMimeMessageFilterSniffer = 12, I saw 2 for display, 4 for reply and 6 for forward.

Looking at
https://dxr.mozilla.org/comm-central/rev/3bcadb80b56798587d199b6cbd4d80807c0ad619/mailnews/mime/src/mimei.cpp#420
convinced myself to add the filter sniffer and others to the list.
Assignee: nobody → jorgk
Attachment #8990519 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8990526 - Flags: review?(mkmelin+mozilla)
Attachment #8990526 - Flags: review?(ben.bucksch)
Attachment #8990526 - Flags: review?(acelists)
(Assignee)

Comment 17

10 months ago
Got a tip from FRG to add parenthesis. Maybe more noise here will wake up the reviewers during cucumber time (https://en.wikipedia.org/wiki/Silly_season).
Attachment #8990526 - Attachment is obsolete: true
Attachment #8990526 - Flags: review?(mkmelin+mozilla)
Attachment #8990526 - Flags: review?(ben.bucksch)
Attachment #8990526 - Flags: review?(acelists)
Attachment #8990545 - Flags: review?(mkmelin+mozilla)
Attachment #8990545 - Flags: review?(ben.bucksch)
Attachment #8990545 - Flags: review?(acelists)
(Assignee)

Comment 18

10 months ago
Didn't even compile without the parenthesis, so here again:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=eed585bdaeb665dee51e1832dcba75de136fb418
Blocks: SM2.49.4
(Assignee)

Comment 19

10 months ago
OK, one test failed, I had to revert the first part of
https://hg.mozilla.org/comm-central/rev/a2472fbf9a2c07e95419cfefa1eb5dd3cd396d88
Now it passes.

Note that this bug keeping the entire project on tenterhooks, we had to scrap TB 60 beta 10 and purge TB 52.9 from our websites so it's no longer handed out to users.
Attachment #8990545 - Attachment is obsolete: true
Attachment #8990545 - Flags: review?(mkmelin+mozilla)
Attachment #8990545 - Flags: review?(ben.bucksch)
Attachment #8990545 - Flags: review?(acelists)
Attachment #8990577 - Flags: review?(mkmelin+mozilla)
Attachment #8990577 - Flags: review?(ben.bucksch)
Attachment #8990577 - Flags: review?(acelists)
(Assignee)

Comment 20

10 months ago
Discussed with Aceman via IRC which other types might want to use the previous HTML class.

https://dxr.mozilla.org/comm-central/rev/a75878fe1f574a55f9e03ac27027f11e5fa336e3/mailnews/mime/src/nsStreamConverter.cpp#520
gives a clue. nsMimeOutput::nsMimeMessageSource might apply to HTML processing, but nsMimeOutput::nsMimeMessageRaw might need to be added. BTW, nsMimeOutput::nsMimeUnknown is unused.

Comment 21

10 months ago
Posted notice to tb-planning
Blocks: 1419417
(Assignee)

Comment 22

10 months ago
I did some more testing: Without the patch, when saving a message as HTML, not the original HTML is saved, but the parsed/serialised version, so that's undesired.

nsMimeMessageRaw(5) happens with nsMimeMessageBodyDisplay(2) on a multipart/related message with HTML+image. So I don't think we want the original behaviour for "raw".

nsMimeMessageDecrypt is used in nsMsgAttachmentHandler::SnarfMsgAttachment, so I think that should have the original HTML returned, same as the filter sniffer.

So I think the patch is right.

Comment 23

10 months ago
Comment on attachment 8990577 [details] [diff] [review]
1473893-no-parsing-for-save.patch (v1b) + fixed test

Review of attachment 8990577 [details] [diff] [review]:
-----------------------------------------------------------------

So I can confirm this at least fixes the attachment detaching.
The other modes also sound reasonable to use the original message source, so let's try to add them.
Attachment #8990577 - Flags: review?(acelists) → review+
(Assignee)

Comment 24

10 months ago
Same code, clarified comment a bit.
Attachment #8990577 - Attachment is obsolete: true
Attachment #8990577 - Flags: review?(mkmelin+mozilla)
Attachment #8990577 - Flags: review?(ben.bucksch)
Attachment #8990621 - Flags: review+

Comment 25

10 months ago
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/16adae064e80
Don't use parsed HTML MIME class if saving, processing attachments and in other cases. r=aceman
Status: ASSIGNED → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
(Assignee)

Updated

10 months ago
Target Milestone: --- → Thunderbird 63.0
(Assignee)

Updated

10 months ago
Attachment #8990621 - Flags: approval-comm-esr60+
Attachment #8990621 - Flags: approval-comm-beta+

Comment 28

10 months ago
question - I'm wondering how this squeaked past unit tests and I haven't followed closely enough - is this a case where delete/detach testcase succeeded in both PRE-bug-1419417 unit tests and bug-1419417 unit tests?  I.E. both testcases were flawed?
Flags: needinfo?(jorgk)
(Assignee)

Comment 29

10 months ago
There appears to be a unit test
https://dxr.mozilla.org/comm-central/source/mailnews/base/test/unit/test_detachToFile.js
that tests detaching attachments. Briefly glancing at it, I see that it checks for "AttachmentDetached" in the resulting message. It certainly doesn't check the body of the resulting e-mail. So the test is somewhat deficient.

In general, I'm surprised on a regular basis how much of TB's functionality is covered by tests, but there is also a lot of functionality *not* covered by tests. All the security fixes were rushed through review and release and none of them is covered by any test.
Flags: needinfo?(jorgk)

Updated

10 months ago
Duplicate of this bug: 1474330

Updated

10 months ago
See Also: → 1474288
(Assignee)

Comment 31

9 months ago
Posted file test base64.eml
Another test case motivated by bug 1474330.
(Assignee)

Updated

9 months ago
Duplicate of this bug: 1475271
You need to log in before you can comment on or make changes to this bug.