Closed Bug 1568392 Opened 6 years ago Closed 6 years ago

strange characters being added to outgoing mail

Categories

(Thunderbird :: Untriaged, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1435903

People

(Reporter: ToddAndMargo, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

Thunderbird 60.8.0 +

I am a consultant. I have Thunderbird spread out across two counties. I am starting to get call from customer stating that they are getting complaints from recipients of their email that they are getting loaded with unprintable characters.

Upon analyzing the binary contents of the files they send from their sent boxes, Thunderbird is indeed adding stray extended ascii character to their outgoing eMail.

For instance, after a period (2E), check:

2E C2 A0 C2 A0 20 48 6D ..... Hm

The stray extended ascii character are C2 A0 C2 A0

And after a # sign and a return:

20 20 20 52 69 76 65 74 EF BF BD EF BF BD EF BF Rivet........
BD 20 24 38 2E 31 32 23 EF BF BD EF BF BD EF BF . $8.12#........
BD 20 53 54 4B 0D 0A 0D 0A 20 20 20 20 4E 41 53 . STK.... NAS

The stray added characters are EF, BF, and BD

So to get the customers back up and running, I had to go into about:config and flip

  mail.strictly_mime to true

This is NOT the default.

I do not look forward to having to flip this on several hundred installations of Thunderbird.

Please fix.

Many thanks,
-T

Interesting,

I confirm that the mail.strictly_mime is set to false, but I don't see the extra characters in my mail messages (yet).
Besides 60.8 are the machines running anything in common? Add-ons or an encryption software? Are they a mix of POP and IMAP email accounts or all one or the other?

Flags: needinfo?(ToddAndMargo)

A few of them have lookout extension installed. They are all Windows 10 machines so far.

To test for the extra characters, save the file from your sent bin and then inspect it with a hex editor.

I am thinking the mail.strictly_mime should actually be defaulting to true and somehow got reversed somewhere.

Flags: needinfo?(ToddAndMargo)

Ongoing issue with Yahoo and a bunch of other incapable friends who are corrupting mail since at least Feb. 2018.

FYI: C2 A0 is the NBSP in UTF-8. EF BF BD is the so-called replacement character.

Further reading, bug 1435903 comment #44.

Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE

Hi Jorg,

I do not believe this is a duplicate. This is why:

  1. the smtp servers in question are att and gmail, not Yahoo,

  2. the stray character were found in BOTH the send box and the recipient's in box, and

  3. if the smtp server adding characters, it would not matter the state of mail.strictly_mime

I could be wrong though

I doubt that Gmail shows the issue, AT&T belongs to the incapable Yahoo club, see bug 1435903.

All the reports we have is that the copy of the sent e-mail is fine, the corruption happens when the e-mail is processed by the server. Feel free to provide proof of the contrary by exporting and attaching a sent e-mail using "Attach File" above.

mail.strictly_mime causes the message to be 7-bit QP encoded (quoted printable). The defective servers of Yahoo and friends can fortunately deliver 7bit messages.

Currently sending äöü from Yahoo as UTF-8 or windows-1252 results in ?????? or ???. Well done, Yahoo. Mileage can vary based on which of the Yahoo friends you use.

(In reply to Jorg K (GMT+2) from comment #5)

I doubt that Gmail shows the issue, AT&T belongs to the incapable Yahoo club, see bug 1435903.

All the reports we have is that the copy of the sent e-mail is fine, the corruption happens when the e-mail is processed by the server. Feel free to provide proof of the contrary by exporting and attaching a sent e-mail using "Attach File" above.

mail.strictly_mime causes the message to be 7-bit QP encoded (quoted printable). The defective servers of Yahoo and friends can fortunately deliver 7bit messages.

Since all previous reports said the sent box was fine and I located the issue in two (one computer and five computers) customer's sent boxes, then this is not a duplicate of 1435903.

Although I do not know that it is necessary, as I have copied and pasted the hexedit evidence (you either trust me or your don't), I have two offending eMails, saved from the sent box, that I can upload to you, but they contain private information. Do you have a mechanism to upload these messages to you without making them public to the world?

Or I could go through and remove all the private information, but I don't know what that would do to your troubleshooting.

As for bug 1435903, does toggling mail.strictly_mime to true correct the issue (another way to test for a duplicate)?

If setting mail.strictly_mime fixes the issue, then I don't believe you ;-)

You can e-mail me the message as attachment, I won't publish it.

Maybe we're misunderstanding each other. If your users types "dot space space Hi there" into a compose window, the editor will replace at the first space with a NBSP. Install https://addons.thunderbird.net/en-GB/thunderbird/addon/thunderhtmledit/ to see it for yourself.

If the message is saved or sent in UTF-8, then the sent message will contain the C2 A0 since that's the NBSP in UTF-8. You don't need to prove that. Or if the message is sent as windows-1252, the NBSP is encoded as one byte A0 (https://en.wikipedia.org/wiki/Windows-1252).

Currently it seems that the Yahoo service, or whatever other badly configured service, destroys any highbit byte, so all the bytes mentioned in the previous paragraph are replaced with either a ? or the Unicode replacement character EF BF BD.

So I'm only interested if the sent mail in the Sent box doesn't display correctly in TB or contains EF BF BD. I know that it can contain C2 A0 or just A0 if encoded in windows-1252.

Hi Jorg,

I eMailed you a zip file to jorgk@jorgk.com with two offending eMails saved from the customer's sent boxes. Please keep ALL the information in these eMails absolutely private.

Where you able to test if toggling mail.strictly_mime to true corrects the issue with 1435903?

-T

Yes, in the ZIP file were two messages.

The first e-mail is UTF-8 encoded. As detailed, there are NBSP in the message and that's C2 A0 in hex. The message is totally valid and will be displayed properly in TB. No problem here.

The second e-mail has a hotchpotch of encodings. The main part is Windows-1252 encoded and has some 8bit characters in it, like 0xA0 for NPSP or 0x92 for some "smart" quotes. There is a reply part:

On 6/13/2018 9:18 PM, Donald (name changed) wrote:
Mickey (name changed),
�

� is the UTF-8 replacement character EF BF BD but interpreted as windows-1252. That's exactly what is described in bug 1435903 comment #44.

So I can't see any fault here. Somehow Donald's message got corrupted so the reply is also corrupted. As we know, some servers corrupt messages.

To be clear: The e-mail in the sent boxes is bad since it contains a reply to a badly received message which got corrupted before being received.

BTW, the provider you mentioned, I won't disclose the name here, is AT&T and member of the club of the incapable :-(

The reply was picked from their customer's sent box, not my or other's inboxes.

And toggling mail.strictly_mime to true corrected the issue.

The worst offender was the one using gmail hosting.

And it wasn't just one of their customer that received eMail from them that complained, it was all of them. The gmail customer send the same message or any message to me on my zoho account and problem reproduced. After toggling the issue disappeared on new eMail.

I have mentioned this before, but how about making mail.strictly_mime to true the default? That would solve the problem.

Where you able to test if toggling mail.strictly_mime to true corrects the issue with 1435903?

No, we won't toggle that pref to true. I don't need to test anything, I know what's going on. I've been on the project for a while and I'm actually the guy who fixed bazillions of encoding errors in TB. To my knowledge, there is not a single encoding error left and your case is clearly the case of a mail server not handling 8bit characters correctly.

So far, we identified Yahoo, AT&T, Verizon and a few more as culprits.

I've just tried sending via the Gmail SMTP server both in UTF-8 and windows-1252 and I received the äöü OK in both cases.

Sorry, I don't believe what you write and the case it closed now. The summary is that Yahoo and friends corrupt 8bit characters and once that has happened, it's downhill and you can construct many cases of strange behaviour from there.

Hi Jorg,

Maybe I misunderstand how the Sent box works. I thought it was a straight copy from Thunderbird. In which case, this is a Thunderbirds issue. If it is a loop back from the SMTP server, then I can see the mail server being the culprit.

Is there some reason not to make mail.strictly_mime to true the default?

And maybe I misunderstand mail servers too. I thought they just relayed what you wrote, not processed and altered your stuff.

-T

Maybe I misunderstand how the Sent box works. I thought it was a straight copy from Thunderbird. In which case, this is a Thunderbirds issue. If it is a loop back from the SMTP server, then I can see the mail server being the culprit.

You still don't understand. Yes, the Sent box is a straight copy of what TB sent out. Therefore, in the two messages you sent, the first one, which purely contained content generated in TB, was fine. The second one, which contained content in a reply that had already been mangled by the server, of course contained the mangled content in the reply. Do I really need to go though it again?

TB user 1 types multiple spaces and thus TB sends out C2 A0 or just A0. The sent message is totally fine in the sent box. The mail server now corrupts the message which is then received by TB user 2. TB user 2 sees the message corrupted and replies with the corrupted part. Now the corrupted part is in the sent box of user 2.

Is there some reason not to make mail.strictly_mime to true the default?

Yes, messages are bigger and all modern mail servers accept 8bit encodings, so the 7bit QP encoding is not necessary. Besides, the mail server is queried for its 8bit capability and last I checked, the Yahoo servers confirm that capability only to trample over 8bit characters later on. You should really not your waste your (and my) energy here, get in contact with your provider.

And maybe I misunderstand mail servers too. I thought they just relayed what you wrote, not processed and altered your stuff.

Totally right. Mail servers should 100% relay what they were given, apart from adding headers and perhaps stripping parts which are considered dangerous. Text parts of the message MUST be left unaltered. But hey, the club of the incapable infringes that rule an mangles e-mail bodies.

More information here:
https://wiki.mozilla.org/User:Jorgk/8-bit_bytes_and_e-mail_corruption_at_Verizon,_Yahoo,_etc. <--- EDIT: Not working, see next comment.
This will be published as a SUMO article soon.

Damn, the URL has a dot at the end and it's not linkified. So try this:
https://wiki.mozilla.org/User:Jorgk/8-bit_bytes_and_e-mail_corruption_at_Verizon,_Yahoo,_etc.

You need to log in before you can comment on or make changes to this bug.