Open Bug 1435903 Opened 2 years ago Updated 7 days ago

E-mail corrupted by Yahoo, Verizon, BellSouth, etc. servers. Was: Use `CTE: 8bit` only when the server is advertising the 8BITMIME support

Categories

(MailNews Core :: Networking: SMTP, enhancement)

enhancement
Not set

Tracking

(Not tracked)

People

(Reporter: jorgk, Unassigned)

References

Details

User Story

Initially this but was filed since Thunderbird was sending 8bit encoded mail to Yahoo, Verizon, BellSouth, etc. servers.

Whilst that was incorrect and could still be fixed, the situation has changed. Those servers now advertise 8BITMIME capability, see comment #28, yet can't handle anything but UTF-8.

Second complaint at Yahoo in Oct. 2018:
https://forums.yahoo.net/t5/Sending/Yahoo-SMTP-server-corrupting-e-mail-in-windwos-1252-despite/td-p/556011

Workaround for affected users:
1) Always use UTF-8 for outgoing e-mail
   Tools > Options, Display, Formatting tab, Advanced button, Fonts & Encodings, Text Encoding, Outgoing Mail.
   Also reply in UTF-8:
   Tools > Options, Display, Formatting tab, Advanced button, Fonts & Encodings, Text Encoding, click: When possible, use the default text encoding in replies.
2) Set pref mail.strictly_mime to true
   Tools > Options, Advanced, General tab, Config Editor, paste "mail.strictly_mime".

You need to use option 2 if UTF-8 sending through Yahoo and friends doesn't work either, which has been the case at times.
+++ This bug was initially created as a clone of Bug #1427636 +++

After a long discussion in bug 1427636 we noticed that Yahoo SMTP servers answer EHLO with
250-8 BITMIME
(amongst other things).

The correct reply is 8BITMIME.

We also noticed that those servers can process UTF-8 when sent with CTE: 8bit but not windows-1252. In fact, all non-ASCII characters get replaced with the Unicode replacement character leading to a garbled display of �.

When the server doesn't advertise 8bit support correctly, we shouldn't send 8bit.
Duplicate of this bug: 1435536
Duplicate of this bug: 1427636
See Also: → 1032302
Inspired by bug 1427636 comment #43 I sent two messages via the Yahoo SMTP server, both with ä or similar in windows-1252, once with CTE: base64 and the other with CTE: QP. Both were received correctly.

So reading bug 1032302 comment #1 I'm confused now. I'm also confused be the discussion in bug 1379096 where it was requested to set mail.strictly_mime to true by default, which would also fix this bug here.
No longer blocks: 1427636
See Also: → 1379096
I think the following is fair to say:
- Yahoo's servers send a non-compliant reply "8 BITMIME".
- It appears that they try to advertise "8BITMIME" which is confirmed by
  handling UTF-8 with CTE: 8bit correctly.
- We ship out "8BITMIME", and the servers don't issue an error, like the one in bug 1379096.

So I don't think TB's behaviour is unreasonable, but the behaviour of the Yahoo servers is unreasonable.

100% strictly speaking, we should ignore the non-compliant response and not send 8bit, but that's what the preference is for.

So what's the way forward here?
https://www.limilabs.com/blog/yahoo-smtp-8-bitmime-bug reckons it's a Yahoo bug to advertise the (faulty) 8bit capability incorrectly.
(In reply to Jorg K (GMT+1) from comment #4)
> So what's the way forward here?

Not a clear one, I'm afraid...

(1) We could remove whitespace chars from the advertisement tokens, thus "8 BITMIME" == "8BITMIME";
    if Yahoo works correctly for Win1252 with that, we are fine, reasonable effort. However, that's
    adding an uncertainty by interpreting non-standard responses as standard-compliant ones.

(2) We could downgrade 8bit messages to quoted-printable if "8BITMIME" isn't advertised, regardless
    of the strictly_mime pref:

    * sounds reasonable at the first glimpse, but
    * how do we figure out what to send where?
       - only send qp without 8BITMIME if no 8BITMIME was advertised, 8bit to those which do?
       - or, sent qp to /all/ if /any/ of the involved MSAs doesn't advertise 8BITMIME?
       - what will happen if the originating MSA accepts 8bit but an MTA down the road doesn't?
    * either way, problem is that - while strictly_mime is known in advance - it's only known at the
      time of sending the message (i.e., not when forming the message body) whether or not we can send
      a message in 8bit encoding. In the worst case, we need to either prepare two versions of the
      same message (one qp, the other 8bit), of have two passes over the recipients to figure out the
      "flavor" to be sent.
    * thus, implementation sounds expensive to satisfy a fringe case.

(3) Keep as is (=WONTFIX) and accept that some people have to manually toggle the qp pref manually.

Definitely, I wouldn't like setting strictly_mime to true by default; that's introducing a substantial overhead for encodings primarily or mostly based on 8bit characters (i.e., everything not using latin alphabets, and languages using latin characters with diacritics) for those 99.9% of the cases where 8bit /does/ work.
(In reply to rsx11m from comment #6)
>        - what will happen if the originating MSA accepts 8bit but an MTA down the road doesn't?

See bug 1379096 comment #10 for this case, the proposed (1) or (2) solutions wouldn't help here.
Re. (1): So now since we don't detect "8 BITMIME", we also don't send "8BITMIME"? So let me try that.
OK, detecting "8 BITMIME" and sending "8BITMIME" or "8 BITMIME" doesn't make a difference.

BTW, I forgot to mention (from 1427636 comment #37) "... the server changes the mails? This is an absolute no-go" in my comment #4. :-(
I finally got around to reading up on the matter. So bug 1032302 (https://hg.mozilla.org/comm-central/rev/eda5c42f3b4cac18b2334b7f4fc86d194f797571) really only ships out "8BITMIME" is the server advertised it and mail.strictly_mime isn't set. It doesn't change anything in message processing.

So in the case of Yahoo, we don't detect the capability and thus don't send out "8BITMIME", but we ship 8bit anyway.

Setting mail.strictly_mime has the effect that the message will be prepared in QP:
https://dxr.mozilla.org/comm-central/rev/63f09d10244cd7100cb5955a17993160fa180937/mailnews/compose/src/nsMsgSend.cpp#3080
  mime_use_quoted_printable_p = strictly_mime;
So having mail.strictly_mime at true by default doesn't appear attractive.

Since we prepare the message *before* we contact the receiving SMTP server, it would be a lot of change to rebuild the message at that point, see comment #2 "implementation sounds expensive", also see 1032302 comment #2 "possibly downgrading seems an overkill".

My apologies to those who working in this area before and already knew all that ;-)

Just for clarification of comment #2:
> sent qp to /all/ if /any/ of the involved MSAs doesn't advertise 8BITMIME?
If I understand it correctly, TB only talks to one server, even if sending to multiple recipients. So as far as I can tell, there is no "multiple" problem. So we could extensively recode the message if it turns out that the server might not accept it.

> what will happen if the originating MSA accepts 8bit but an MTA down the road doesn't
That's not our problem to fix. We talk to the outgoing SMTP server and only have to satisfy it.

Factually, this bug will be a WONTFIX, since we don't have resources to redo the compose/send pipeline, even if it were decided that it was desirable.

It would be much more useful to get global mail provider Yahoo to 1) advertise its server properties correctly 2) fulfil its own advertising 3) not mess with content of messages.

I know that Masatoshi-san disagrees.
(In reply to rsx11m from comment #7)
> (In reply to rsx11m from comment #6)
> >        - what will happen if the originating MSA accepts 8bit but an MTA down the road doesn't?
> 
> See bug 1379096 comment #10 for this case, the proposed (1) or (2) solutions
> wouldn't help here.

That should not be a problem. The server could then change the transport encoding to qp or base64.
But he should of course not change the charset.
(In reply to rsx11m from comment #6)

> (2) We could downgrade 8bit messages to quoted-printable if "8BITMIME" isn't
> advertised, regardless
>     of the strictly_mime pref:

>     * either way, problem is that - while strictly_mime is known in advance
> - it's only known at the
>       time of sending the message (i.e., not when forming the message body)
> whether or not we can send

We could query the server for "8BITMIME" when setting up the account and store the result server-related in the prefs.js.
Each time we send a new e-mail, we update the flag.
But there are also servers that support 8-bit without any problem, without announcing "8BITMIME".

German "GMX" for example.
(In reply to Alfred Peters from comment #12)
> We could query the server for "8BITMIME" when setting up the account and
> store the result server-related in the prefs.js.
> Each time we send a new e-mail, we update the flag.
Nice idea.

Meanwhile I posted this:
https://forums.yahoo.net/t5/Sending/Yahoo-SMTP-server-advertising-8bit-capability-incorrectly-and/m-p/449275
Good morning.  I am a user that posted this bug.  Today the behavior changed.  I am now getting strings of "?" instead of the other funky characters.  I checked on the post over at yahoo forums noted above and they show it as solved although there is a post after that disputes that.
It is not surprising at all. They can change their behavior about 8-bit bytes as they like at any time because they say they don't support 8-bit bytes. Do not send 8-bit bytes unless the server explicitly says so, period.
(In reply to Masatoshi Kimura [:emk] from comment #16)
> It is not surprising at all. They can change their behavior about 8-bit
> bytes as they like at any time because they say they don't support 8-bit
> bytes. Do not send 8-bit bytes unless the server explicitly says so, period.

To be clear, this comment is not against the reporter, but against Thunderbird developers. You (users) can freely compose your messages using 8-bit bytes. Thunderbird should encode 8-bit bytes as appropreate, but it does not at the moment. This is a bug of Thunderbird.
(In reply to kricks from comment #15)
> Good morning.  I am a user that posted this bug.  Today the behavior
> changed.  I am now getting strings of "?" instead of the other funky
> characters.  I checked on the post over at yahoo forums noted above and they
> show it as solved although there is a post after that disputes that.
I tested this yesterday and I got the "funky" characters. I'm the one who reported it on the Yahoo forum, and yes, it's obviously not solved since now you get ???.

Masatoshi-san, as was explained a few times, by the time we get the SMTP response, it's too late to recode the message.

The work-around is to set the preference mail.strictly_mime to true.
Duplicate of this bug: 1439648
Duplicate of this bug: 1440090
Duplicate of this bug: 1440264
Duplicate of this bug: 1440245
Duplicate of this bug: 1440526
Duplicate of this bug: 1440017
Duplicate of this bug: 1443737
Duplicate of this bug: 1452433
As of today the problem became much worse, and the responded messages are on the boundary of the complete ineligibility.  I hope that this is a temporary state, and somebody got finally a chance to attend to this problem.
Workaround: Set pref mail.strictly_mime to true. Otherwise complain to Yahoo:
https://forums.yahoo.net/t5/Sending/Yahoo-SMTP-server-advertising-8bit-capability-incorrectly-and/

There's also an interesting solution idea in comment #12 here.

I've just tried Yahoo's servers again and they actually fixed the server response now:
telnet smtp.mail.yahoo.com 25
EHLO Yahoo
gives:

250-smtp408.mail.ir2.yahoo.com Hello Yahoo [85.181.255.143])
250-PIPELINING
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-SIZE 41697280
250 STARTTLS

Before it was "8 BITMIME". So Thunderbird will now ship 8bit to the server and if that doesn't work, it's 100% Yahoo's fault.
Sorry, wrong link:
https://forums.yahoo.net/t5/Sending/Yahoo-SMTP-server-advertising-8bit-capability-incorrectly-and/m-p/509636#M50602

Tested again at Yahoo and still not working after three months :-(
Duplicate of this bug: 1461059
Duplicate of this bug: 1514523
User Story: (updated)
Summary: Use `CTE: 8bit` only when the server is advertising the 8BITMIME support → E-mail corrupted by Yahoo, Verizon, BellSouth, etc. servers. Was: Use `CTE: 8bit` only when the server is advertising the 8BITMIME support
Duplicate of this bug: 1544293

After running the recommended patch of Internet Properties - Advanced - International (in Windows 7 Professional) to set the UTF-8 query strings to the ON state, all problems disappeared for about one year. At the end of the last week a new problem appeared which is very similar to the old one. This time, the incorrectly represented unprintable characters are replaced with the pair of question marks for each instance which renders the e-mailed document totally incomprehensible, and converting all statements to the questions.

As in the past, it applies to the majority of the Latin characters (A-Z, a-z) with diacritical signs and to the formatting sequences, e.g. in the "Paragraph" mode when it detects the two spaces delimiting the sentences in one paragraph, or two line feeds delimiting the spaces between paragraphs.

The problem became very annoying because now it has an impact on English messages as well, when the symbols like degree, apostrophe, etc., are detected and replaced with a pair of question marks.

It appears that somebody started working on this outstanding problem, and the users are delegated the task of testing. Does this situation requires some other change of Internet Properties?

Flags: needinfo?(jorgk)

I don't understand "After running the recommended patch of Internet Properties - Advanced - International (in Windows 7 Professional) to set the UTF-8 query strings to the ON state, ...".

Yahoo messed up in two ways. They have messed up windows-1252 encoded messages all the time. And then there were times were even UTF-8 encoded messages were messed up.

You can work around the former problem by always sending in UTF-8: In Thunderbird:
Tools > Options, Display, General tab, Advances button, Fonts & Encodings, Text Encoding, Outgoing Mail.
EDIT: Correction:
Tools > Options, Display, Formatting tab, Advanced button, Fonts & Encodings, Text Encoding, Outgoing Mail.

If UTF-8 is also messed up, you can force TB to sending QP encoded by setting pref mail.strictly_mime to true:
Tools > Options, Advanced, General tab, Config Editor, paste "mail.strictly_mime".

That's mostly written into the "user story" of the bug.

User Story: (updated)
Flags: needinfo?(jorgk)

Bug 1435903
In T Bird, go to TOOLS:
TOOLS, OPTIONS, ADVANCED, GENERAL, CONFIG EDITOR
in Search type Mail.Strictly
Look for Mail.Strictly_MIME and toggle from False to TRUE. This fixes it.

mail.strictly_mime; Toggle from FALSE to true.

This works fine... did it on four MS PC's and even on my MAC. All worked fine.
BH

Thank you both, Jorg and Bob.

The problem is fixed and tested on e-mails overseas and back, it was caused by "mail.strictly_mime", which became FALSE apparently at the end of the last week. Until that time, my e-mails were working fine, and I did not make any changes until today to restore the TRUE value. Maybe some recent Thunderbird upgrades changed this status which got me into this frustrating situation.

I wanted to check the status of the "Text Encoding of Outgoing mail" as well, but I had the problem to find it. I think that there is a problem in the User Story (or it is stated for some older version of Thunderbird) which should be:
Tools > Options, Display, Formatting tab, Advance button, Fonts & Encoding, Text Encoding, Outgoing Mail.
There is no General tab on the specified level. But this one was set correctly to Unicode(UTF-8).

Many thanks for your fast support. It is really appreciated.

User Story: (updated)
Duplicate of this bug: 1549553

If the workaround (per user story) is to "Always use UTF-8 for outgoing e-mail", I'm confused on exactly which users are affected. That's on by default for en-US, and even for Japanese now since 2017.

Is it only for replies? We might want to also set mailnews.reply_in_default_charset true

There were (and still are) times where Yahoo even corrupt{ed/s} non-ASCII 8bit UTF-8 characters, see for example bug 1549553 (I don't have time to scan all the others now, but it happened before). In that bug one NBSP which we transmit as UTF-8, C2 A0 (2 bytes) is replaced by ??. So then only mail.strictly_mime works. I wonder how CJK users are affected. They can't send at all in UTF-8?

That said, there are many locales using ISO-8859-NN:
https://dxr.mozilla.org/l10n-central/search?q=mailnews.send_default_charset&redirect=false

User Story: (updated)

Yahoo status on 18th May 2019: UTF-8 working, Western (windows-1252) corrupted.

I haven't seen ?? in my UTF-8 emails that I've sent for a couple of weeks now — about the time I had added an apologetic note to the "signature" portion of my email "template." Is it possible that Verizon has updated the AOL and Yahoo servers they were saddled with to more acceptable standards? (I didn't even get a chance to file a complaint.)

My provider is Verizon (AOL mail server), so the problem has been especially severe for me. But it's disappearing, or is gone... not sure yet, but will f/u here in a month.

Duplicate of this bug: 1568412

This message was sent to a Yahoo responsible on 6th June 2019:

===

Description of Problem:

To encode non-ASCII text, for example German umlauts, äöü, French or Spanish accents áéó, Greek, Hebrew, Korean, Japanese, Chinese, Thai, Arabic, etc., a variety of encodings can be used.

The universal encoding UTF-8 can encode absolutely all characters in all languages, but traditionally, other encodings are used regionally, for example windows-1252 (very similar to ISO-8859-1), also called "Western", is used for "Latin"-derived languages, like German, French or Spanish.

The problem is that the Yahoo service only supports UTF-8 and treats all other encodings as if they were UTF-8 leading to corrupt messages.

For example the letter ä is encoded in UTF-8 as two bytes (hexadecimal): C3 A4. In windows-1252 the letter ä is encoded as one byte (hexadecimal) E4. The byte E4 is not a valid character if interpreted as UTF-8.

The Yahoo service does this: Modifies message bodies and replaces all characters that are not valid UTF-8 with the UTF-8 "illegal character" character (replacement character, hexadecimal EF BF BD), which consists of three bytes. It ignores the fact that messages carry the information of which character set/encoding is used in a message header. Furthermore, no mail service should ever modify message content, it should pass the message data through unaltered. It can read and interpret the headers, but it should never modify anything other than adding its own headers where required.

In an example this happens:

A Thunderbird users sends the word "Hägar" in windows-1252 with a message header indicating the use of that encoding. The Yahoo service modifies the word and replaces the ä (E4) with three bytes which represent the "illegal character" in UTF-8, although the message is not encoded in UTF-8.

The recipient sees something like "H�gar" since the recipient's e-mail client does interpret the encoding header correctly and interprets the three bytes of the UTF-8 replacement character as "�".

There are other aspects of the problem: Even English speakers using strictly ASCII text can be affected if they send multiple so-called no-break spaces (NBSP, in windows-1252 (hexadecimal) A0). Those A0 bytes which are valid in windows-1252 are invalid if interpreted as UTF-8; Yahoo replaces them and the recipient sees "�" where the sender had intended spaces.

In short: The Yahoo service actively corrupts e-mail and so far we've had 15 bug reports in Thunderbird. The issue was also reported at the Yahoo support forums twice since February 2018, but Yahoo never fixed it, and now the forums appear to have been shut down.

Duplicate of this bug: 1568392

More information here:
https://wiki.mozilla.org/User:Jorgk/8-bit_bytes_and_e-mail_corruption_at_Verizon,_Yahoo,_etc. <--- EDIT: Not working, see next comment.
This will be published as a SUMO article soon.

Damn, the URL has a dot at the end and it's not linkified. So try this:
https://wiki.mozilla.org/User:Jorgk/8-bit_bytes_and_e-mail_corruption_at_Verizon,_Yahoo,_etc.

I understand the reluctance to make quoted-printable (mail.strictly_mime=true) the default because it introduces overhead, but the fact of the matter is that the overhead is minimal and pretty much invisible to users, whereas corrupted emails are very much visible to users. I would therefore ask that we reconsider the idea of making mail.strictly_mime true by default, at least as a short-term solution until we have a better long-term solution.

One level up from that would be to implement the idea proposed above of querying the server for 8BITMIME and storing that for future emails, with one twist on the above idea: rather than assuming 8BITMIME is safe until the server says otherwise, assume 8BITMIME isn't safe until the server says it is. I.e., the default should be cautious and conservative.

If we're not going to do either above, then I'd argue for exposing the setting in the user preferences with documentation there rather than requiring it to be set in the configuration editor, so at least this is easier for users to fix when they encounter the issue.

The final idea I'd throw out for consideration is supporting a blacklist of domains which are known not to handle 8-bit MIME properly and use quoted-printable whenever sending to a domain on that blacklist. We could have a built-in blacklist of domains we know are problematic and allow the user to add additional domains. This would be in addition to the other solutions proposed above, not instead of them.

Duplicate of this bug: 1570424
Duplicate of this bug: 1570910

@Jonathan Kamens:
(Comment 48)

Very well written and case very well presented. Thanks for pointng out the matter so clearly.
I second all you propose to the fullest extent as I´m suffering from the TB/yahoo-dilemma, too.

Greetings
Rosika

Duplicate of this bug: 1574007
You need to log in before you can comment on or make changes to this bug.