Closed Bug 1239658 Opened 8 years ago Closed 8 years ago

Charset Override set erroneously for "Open Draft"/"Forward"/Edit As New" contributing to bug 597369

Categories

(MailNews Core :: Composition, defect)

defect
Not set
major

Tracking

(thunderbird45 fixed, thunderbird46 fixed)

RESOLVED FIXED
Thunderbird 46.0
Tracking Status
thunderbird45 --- fixed
thunderbird46 --- fixed

People

(Reporter: jorgk-bmo, Assigned: jorgk-bmo)

References

Details

(Keywords: dataloss)

Attachments

(8 files, 4 obsolete files)

The charset override should only be set when the user explicitly chooses to display a message using a different text encoding.

This is handled here:
https://dxr.mozilla.org/comm-central/source/mail/base/content/mailWindow.js#627

Since bug 494912 the charset override is also programmatically set here:
https://dxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgComposeService.cpp#522

This is 100% wrong. This was implemented as a kludge to fix problems with forwarding UTF-8 messages.

The consequence of setting the charset override is the following: A draft that is opened for editing or a message that is forwarded or "edited as new" receives the charset of the previously viewed message even if that was never overridden (!!!). This only happens if the message that is used for "edit-draft"/"forward"/"edit-as-new" is not viewed before selecting the action, for example due to the message pane being switched off or tricky user right-click.

This is described in bug 597369 and there is also a nice description of how to reproduce it in bug 644780 (which is a duplicate of bug 597369).

Fixing this will fix 95% of the problem described in bug 597369. The part that isn't fixed will be handled in that bug. The part that isn't fixed is that if the user really overrides the charset of the previously viewed message, it will still be used for the next message.
This causes dataloss, since if the wrong encoding is not detected visually, the message is sent with garbled content.
Assignee: nobody → mozilla
Blocks: 597369
Status: NEW → ASSIGNED
Keywords: dataloss
Simple as that. Just take the erroneous line out with a big comment.

UTF-8 forwarding still works, so I have no idea what the original author had in mind. If you read through bug 494912, you will see that no one really understood the change.
Attachment #8707848 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8707848 [details] [diff] [review]
Backing out the change from bug 494912.

Review of attachment 8707848 [details] [diff] [review]:
-----------------------------------------------------------------

It's certainly possible the override is the wrong thing to do there, but try copying https://bugzilla.mozilla.org/attachment.cgi?id=438477 from 494912 into a folder. With your patch I get garbled text when forwarding, with trunk I don't - so some thing's wrong here.
Attachment #8707848 - Flags: review?(mkmelin+mozilla)
Of course I tested with this message. For me, it gets perfectly forwarded as UTF-8 as it should be since it's UTF-8 encoded. The faulty override code would have sent it UTF-8 even harder ;-)

Even with the wrong line removed it's send UTF-8.

I tested with
Tools > Options, Formatting, Advanced, Text Encoding: Outgoing Mail
both set to UTF-8 and ISO-8859-1.

In both cases the forwarded message is sent OK as UTF-8, but the UTF-8 only shows in the windows title if the default is set to something else.

I really don't see a garbled bit. Can you please attach a screenshot.
Flags: needinfo?(mkmelin+mozilla)
Outgoing is UTF-8, Incoming ISO-8859-1
Flags: needinfo?(mkmelin+mozilla)
Magnus, do you have trouble with this one, too?

Also, when viewing the French message, what does View > Text Encoding say?
Attachment #8708079 - Attachment mime type: message/rfc822 → text/plain
Flags: needinfo?(mkmelin+mozilla)
As hard as I tried, I don't see the garbled text you're seeing. I looked for "charset" in config editor/about:config and reset everything to default. I set mailnews.force_charset_override (which seems to be 100% deprecated:
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/nsStreamConverter.cpp#192)
yet still no garbled text.

How do I fix it if I can't see it :-((
(In reply to Jorg K (GMT+1) from comment #7)
> which seems to be 100% deprecated:
> https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/
> nsStreamConverter.cpp#192
Wrong. I sets a member of the mime stream/draft data options:
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/nsStreamConverter.cpp#116
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/nsStreamConverter.cpp#127
Anyway, I don't see the forwarded message garbled, perhaps I have to build on Linux??
Magnus, I'm looking very hard at your screenshot.
Can you please explain to me why the window title says ... - Western??
What you're seeing is exactly what happens when you forward the UTF-8 as Western/ISO-8859-1. That's what the patch was going to fix.

I can reproduce it now *without* the patch applied!! I have to view a Western encoded e-mail before and then click forward on this one!!!
Can you please describe the exact steps that lead to this.

Do the message look OK when viewed? Which text encoding is used?
Does it look OK in the preview pane? Do you have the preview pane switched on when forwarding?
How do you forward? Right click in the message list? Message Forward? <ctrl>L?
Or from a button when viewing the message in the message pane or opening it in a new window/tab?

I've tried all the above and I never see the message forwarded in Western encoding.

Please also try the attached message.
Dunno why it says Western, likely that's a cause for the bug. (It should be UTF-8, but for some reason chooses ISO-8859-1).

I've done nothing else than (have the file in a folder already), then start up, possibly with no initial mails selected, select the mail so it's visible in the message pane, then hit the Forward button in the message header. 

Sorry, but I won't be able to test further until late tonight or tomorrow.
Flags: needinfo?(mkmelin+mozilla)
First of all we need to notice that the test case from attachment 438477 [details] is INVALID!
So basically the people in bug 494912 implemented a horrible nonsense which has caused much confusion (see duplicates of original bug 597369) to force the proper display of a broken message.

The broken message hides inside a base64 encoded block which I will attach decoded. I used "base64 -d". Please view the HTML in Firefox and you will see that it is garbled. The problem is that the page is UTF-8 encoded but doesn't mention the charset in the heading.

Here is the start of the message:
<html><body>
<p><font size="2" color="#800080">----- Réacheminé par Alain ISSARNI/DGI/FINANCES/GOUV/FR le 21/03/2010 22:44 -----</font><br>
<br>

So what happens without the patch is that the system decides to view the message in UTF-8, most likely based on the e-mail header. Then the UTF-8 overrides any charset detection that comes later.

With my patch which removes the faulty override, some part of the system defaults the charset to "Western" (windows-1252, can't be ISO-8859-1 since they're using this quote in line 27: "l’invitation", \u2019) given that the HTML carries no charset information and uses that as encoding.

I still don't understand why this is happening to you and not to me and I will investigate that by looking where further charset detection happens.

In any case, please forward some *valid* UTF-8 messages (like the attached plaintext message) or any you can created with Thunderbird. I will also attach a more complex base64 encoded message which is valid.

You can also view the broken French message as plaintext and try forwarding that.
Try forwarding that.
Please view this. Firefox should display the same garbled text that you're seeing in your forwarded message.
Weird stuff going on here. If I open the attachment, it gets shown in UTF-8. If I save it and open it from the desktop, I get Western and it's garbled. Maybe a new tab inherits the charset from the page that opened the link, so UTF-8 since BMO uses UTF-8.
Attached patch Try with this patch. (obsolete) — Splinter Review
This is the original patch plus one debug line.

For the French message I only see
===== SetMailCharacterSetToMsgWindow: |UTF-8|
So somehow it copes with the invalid message.

I will now boot up my Linux machine and try on there.
OK, hours later I have a Linux build.
What can I say, I can reproduce what you're reporting. Forwarding the French message which is stored in a folder looks garbled. It even looks garbled when viewing it as plain text and then forwarding.

However, if you manually set the charset to Unicode (despite already being shown as Unicode) then the forward works.

The debug also shows ===== SetMailCharacterSetToMsgWindow: |UTF-8| like on Windows, but the the window title shows "Western (Windows-1252)".

Now what do we do? Hold up the backing out of an obviously wrong fix for a broken message because this broken message shows a (mysterious) problem on one of our platforms?

I don't think so. I kindly ask for approval again. I can investigate what's going on in a different bug.

You can convince yourself that forwarding generally works for *valid* UTF-8 messages, like the two enclosed ones.
Attachment #8708480 - Attachment is obsolete: true
Attachment #8707848 - Flags: review?(mkmelin+mozilla)
(In reply to Jorg K (GMT+1) from comment #12)
> First of all we need to notice that the test case from attachment 438477 [details]
> [details] is INVALID!
> So basically the people in bug 494912 implemented a horrible nonsense which
> has caused much confusion (see duplicates of original bug 597369) to force
> the proper display of a broken message.
> 
> The broken message hides inside a base64 encoded block which I will attach
> decoded. I used "base64 -d". Please view the HTML in Firefox and you will
> see that it is garbled. The problem is that the page is UTF-8 encoded but
> doesn't mention the charset in the heading.

Strictly speaking it's not invalid. The charset for the part is specified in the header
"Content-type: text/html; charset=UTF-8"
... which would correspond to the HTTP headers if sent as a doc over HTTP.

This is also why it's displayed correctly as utf-8 in the folder when viewing. 

So our code will say charset is overriden if it gets the charset data from the header?
Might be worth checking the occurrences of "http-equiv" in mailnews/ to see if one of those sets the wrong default when missing.
(In reply to Magnus Melin from comment #18)
> This is also why it's displayed correctly as utf-8 in the folder when
> viewing.
Maybe. The display and composition processing are different.

> So our code will say charset is overriden if it gets the charset data from
> the header?
No. The code should only say it's overridden when the user chooses another one charset.

With the faulty line I'd like to remove, the charset as *always* overridden for "forward"/"edit draft"/"edit as new". So it picks up whatever viewing charset was used.

Typically, if the message is viewed before forwarding, then it's a bonus that the viewing charset overrides whatever is determined or defaulted elsewhere later.

If some other message is viewed before, then that faulty override is fatal as it forces the charset of the other message. This is 95% of the problem described in bug 597369.

This is why the line *must* go. Fixing this invalid French message is another issue.

I'd like to repeat that it only happens on Linux.

My theory is that the HTML content is presented to the Gecko renderer without charset information. And for Linux it decides to fallback to windows-1252 and for Windows it decides to fallback to UTF-8.

I'm currently looking for whether you can get Linux to force UTF-8 instead of windows-1252 if unknown.

I don't think anyone but Gecko sets anything "when missing". All the MIME code always says UTF-8 wherever I look, but since with my patch the override is removed, the information is not passed on as it seems.

Hold on, I said even the plaintext part of the multipart message gets forwarded badly. I'll have a play with that since there the charset must come from the message header. The simple message included here gets forwarded OK on Linux, so something is working. I'll take a closer look at the plaintext part of the French message. BTW, you are aware that there are heaps of problems with multipart messages, so I'm wondering whether the French message falls into that category.
OK, took the French message, kept the plaintext part, deleted all the other parts and separators.

Displays OK and forwards OK, so we're definitely running into on of the multipart bugs here with this message, see bug 505172.

Can you please approve the patch. We can lead the discussion on how to fix the issue of the bad forwarding of (somewhat) invalid messages if they happen to be multipart elsewhere.
OK, here comes the HTML part of the original HTML message.

Still no meta-information about the charset, but forward works (I'm still on Linux).

So the French message that is holding up this bug is affected by some multipart bug. So be it. That can't lead to holding on to the faulty fix which leads to absolutely horrible side-effects of encodings being derived from different messages. I must sound like a broken record.
OK, I've stripped the attached/embedded images and made this a simple multipart alternative message (removed the related parts). Result: Forwards fine (on Linux).

So with the French message we're talking about this *other* bug:
On Linux a multipart/related+multipart/alternative message (plaintext+HTML+embedded images) does not forward correctly. Plaintext doesn't forward correctly and HTML part doesn't forward correctly if no charset is given in the HTML header.

Please don't let this bug block this one here!

I suspect that the MIME processing forgets the charset, since the images parsed later don't have charset information.
One last comment: I've edited the French message by hand, replaced the base64 with decoded 8bit and added <meta charset="utf-8"> to the header. No success. So basically it's the multipart related that causes the problem. As I said in comment #23: Multipart alternative works OK.

I'll do a last test: Create a message like this with TB: plain+HTML+embedded image.
This forwards fine. And it shows that the French message is really INVALD, this time for real.

Look, TB produces:

Alternative first, and inside the second alternative come the related bits (I deleted stuff to make it shorter):

Content-Type: multipart/alternative;
 boundary="------------8BCAF13E08863460C9F39CAD"
This is a multi-part message in MIME format.
--------------8BCAF13E08863460C9F39CAD
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Test
--------------8BCAF13E08863460C9F39CAD
Content-Type: multipart/related;
 boundary="------------411901BA20E3B2A5B3638198"
--------------411901BA20E3B2A5B3638198
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

<html>
...
</html>
--------------411901BA20E3B2A5B3638198
Content-Type: image/png;
 name="bcponbnjdmbfjkgb.png"
Content-Transfer-Encoding: base64
Content-ID: <part1.ADF7E8BA.4353A6B0@jorgk.com>
Content-Disposition: inline;
 filename="bcponbnjdmbfjkgb.png"
...
--------------411901BA20E3B2A5B3638198--
--------------8BCAF13E08863460C9F39CAD--

The French message has it all confused the other way around. That causes the (mysterious Linux-only) loss of the charset.

Have I convinced you finally ;-)
This one works as discussed in comment #24. I've even edited the charset out, and it still works as long as the parts are in the right order.
Comment on attachment 8707848 [details] [diff] [review]
Backing out the change from bug 494912.

Review of attachment 8707848 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, so yes this is indeed correct - and the problem from the other bug is that the charset is forgotten somewhere along the way due to the image. 
I just wanted to understand what's going on, so we don't change it wrongly like bug 494912 did.

Instead of keeping a comment, I'd prefer we just remove it and instead add a test for it, with some international characters. Add to test-eml-actions.js maybe?
Well, "blame" shows you new lines, but you can't see removed lines. I also prefer to just remove code, but removing this one line will leave no trace.

I can see that backing out an erroneous one-line addition from six years ago is proving to become an expensive operation. A test would be quite complicated:
- switch off the message pane, so no viewing takes place.
- import a message in charset 1
- import a message in charset 2
- view the message in charset 1
- now forward the message in charset 2 and make sure charset 2 is used and not charset 1.
Well, I only meant to add a test that basically forwarding and UTF-8 message don't get garbled. And/Or that edit as new works correctly for UTF-8. 

Had such a test been available for bug 494912 (and people remembered) they would have required investigation on why that message needed the override, and the patch would not had got in as it's a special case.
Hmm, I'm not really convinced. As far as I can see, forwarding a UTF-8 message always worked apart from this one "malicious" case.

Having a test that tests UTF-8 forwarding in general wouldn't have prevented the introduction of the kludge to fix one broken case. As you said, some intelligent being would have had to say: Hey, why are we doing this, we know that UTF-8 forwarding works in general. We even have a test for it!

No offence intended: The people reviewing the patch of a "newcomer" (and there is even a "super-review" by David Bienvenu!!!) must have been absent-minded (to say the very least) not to notice what a charset override means and that you can't just set it to patch over some other problem. Or maybe I'm misjudging that and UTF-8 forward really didn't work at all back then??

It's pretty poor too that when the regressions started coming in starting with bug 597369, no one took the time to relate this back to this fatal change.

Look, I made a proposal: Take out the line and add a comment. Thinking about it, I should change it to:
+    // Do NOT set override programmatically.
+    // It should only get set when the user changes the text encoding via the menu.
+    // aMsgWindow->SetCharsetOverride(true);
+    // Note: The line above was added in bug 494912 but it's not correct causing
+    // bug 597369 and its duplicates.
+    // The line has been left here so you can see it in "blame".

The only test that gives any value-add is the one I described in comment #27. Perhaps we can land this now with the slightly modified comment given that the beta branch date is looming and postpone the test until later for a rainy day ;-)
Attachment #8707848 - Attachment is obsolete: true
Attachment #8707848 - Flags: review?(mkmelin+mozilla)
Attachment #8708761 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8708761 [details] [diff] [review]
Backing out the change from bug 494912. (v2: modified comment)

Review of attachment 8708761 [details] [diff] [review]:
-----------------------------------------------------------------

Ah well, let's land this, but do remove the comment. It's just not something we do in general as that adds mental burden for anyone reading the file later. 
Besides, wholesale setting override is not a logical thing to do, so any investigation of the file history would point this bug out.
Attachment #8708761 - Flags: review?(mkmelin+mozilla) → review+
Attachment #8708761 - Attachment is obsolete: true
Attachment #8708814 - Flags: review+
Keywords: checkin-needed
Removed erroneous second "# Parent" line in patch header.
Attachment #8708814 - Attachment is obsolete: true
Attachment #8709346 - Flags: review+
Comment on attachment 8709346 [details] [diff] [review]
Backing out the change from bug 494912. (v3a: No comment).

[Approval Request Comment]
Regression caused by (bug #): Bug 494912.
User impact if declined: Data loss.
Testing completed (on c-c, etc.): Manual testing.
Risk to taking this patch (and alternatives if risky):
No risky. Backing out one line introduced in bug 494912.
Attachment #8709346 - Flags: approval-comm-aurora+
https://hg.mozilla.org/comm-central/rev/8de961aed2b4458271312ca689a447996ddb8966
Bug 1239658 - Do not set charset override for forward/edit draft/edit as new. r=mkmelin
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 46.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: