Errant question marks in place of multiple spaces
Categories
(MailNews Core :: Composition, defect)
Tracking
(Not tracked)
People
(Reporter: raysatiro, Unassigned)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Steps to reproduce:
I sent an e-mail in Thunderbird 68.3.0 that had this text:
{
curl_version_info_data *info = curl_version_info(CURLVERSION_NOW);
printf("HTTP2 is %s\n", ((info->features & CURL_VERSION_HTTP2) ? "supported" : "NOT supported"));
printf("%s\n", curl_version());
}
Actual results:
Thunderbird instead sent the e-mail with this text:
?? {
?????? curl_version_info_data *info = curl_version_info(CURLVERSION_NOW);
?????? printf("HTTP2 is %s\n", ((info->features & CURL_VERSION_HTTP2) ?
"supported" : "NOT supported"));
?????? printf("%s\n", curl_version());
?? }
Expected results:
The space characters should not have been converted to question marks. This has happened once or twice to me at random. I couldn't reproduce it sending the exact same text a second time. I hesitate to say this is bug 1435903 since it's arbitrary and I'm not sending foreign characters, just ascii.
Config Editor > mail.strictly_mime: False.
Composition > Configure text format behavior > Send messages as plain text if possible: On.
Composition > Configure text format behavior > When sending as HTML if recipient cannot receive HTML: Send as both "plain text and HTML" is selected.
I am using Yahoo SMTP servers.
Reporter | ||
Comment 1•4 years ago
|
||
Looks like I had logging enabled so I've attached an excerpt of the log that includes the SMTP transfer and the IMAP save to outbox. My mozilla log level is set to IMAP:5,SMTP:5,POP3:5,timestamp,sync
. The SMTP body is not present in the log (why?) but the IMAP body is. The log shows spaces were appended or replaced by UTF-8 encoded non-breaking spaces.
Example:
{
curl_version_info_data *info = curl_version_info(CURLVERSION_NOW);
There are 2 spaces before the brace, or in other words 2020
but it was saved as 20C2A020
. There are 4 spaces before curl, or in other words 20202020
but it was saved as 20C2A0C2A0C2A020
Also, the e-mail was sent to a mailing list and is archived at https://curl.haxx.se/mail/lib-2019-12/0051.html
Updated•4 years ago
|
Comment 2•4 years ago
|
||
https://wiki.mozilla.org/User:Jorgk/8-bit_bytes_and_e-mail_corruption_at_Verizon,_Yahoo,_etc.
and
Magnus, everyone on the project should know this bug by now.
Reporter | ||
Comment 3•4 years ago
|
||
Thanks. Why does Thunderbird encode multiple spaces as non-breaking spaces for plaintext? Shouldn't that be for HTML only?
Comment 4•4 years ago
|
||
That's a jolly good question. It shouldn't and it doesn't as far as I can tell from a very simple test sending something to my local outbox. Are you sure the message didn't get send as HTML? Or plaintext+HTML as per the settings described at the end of comment #0?
A very simple check you can do is to view the message source and then switch between Unicode and Western encodings. Unicode C2A0 will show some character and a NBSP when displayed as Western (windows-1252), so you can check whether they are there without a trace or hex editor.
I think you need to research this a bit yourself, maybe using a different outgoing server. All we know is that Yahoo messes up big time as per the article I quoted. And turning valid UTF-8 into ?? isn't so cool either.
Reporter | ||
Comment 5•4 years ago
|
||
In the repro that was attached yes the e-mail was a reply sent to a mailing list and sent as plaintext+HTML. To eliminate possible contamination I tried just now sending an e-mail to myself with the contents " test. foobar." typed out so only plaintext would be used. In the debug log I can see the IMAP copy of the sent message that Thunderbird saves to the outbox has the double-space as "20C2A020", was only sent plaintext and the plaintext headers show the same as the e-mail in the original repro:
Content-Transfer-Encoding: 8bit
...
Content-Type: text/plain; charset=utf-8; format=flowed
Since Thunderbird does not record the SMTP details such as body (shouldn't it do that? my debug level is smtp:5) I cannot say for sure what is sent for SMTP. However the contents received in my inbox is also "20C2A020" and when I open it it looks normal.
I then did a similar test but appended some HTML so it would be sent as plaintext+HTML. The same thing happened and it looks normal.
Based on what I've experienced we can assume the bad conversion resulting in is arbitrary. Without being able to record what raw SMTP data is sent I cannot be certain the ???? is a Yahoo bug but I think it's pretty likely.
Also it appears at least from my results that Thunderbird may be changing multiple spaces to nbsp in strictly plaintext UTF-8 messages. Again without being able to see the SMTP raw data I can't say for sure.
Reporter | ||
Comment 6•4 years ago
|
||
hm bugzilla kind of messed that up, it's "<SPACE><SPACE>test. foobar." not "<SPACE>test. foobar." which is how it's shown above. Do you know if that is a bug in bugzilla that the HTML was able to collapse the space, or is that expected?
Comment 7•4 years ago
|
||
I see the dilemma not being able to inspect the content that's going over the SMTP wire (unless you use WireShark or some such). Why don't you send the message to the local outbox and inspect it there: "File > Send Later" or Ctrl+Shift+Enter. I'm 99.99995 sure that whatever is in the outbox will be shipped out 1:1.
As described in my Wiki article, Yahoo's behaviour can change on a daily basis, so if you get ?? one day, you might not get them the next. Typically we've seen ?? for messages that were delivered further by Yahoo's SMTP server, so typically recipients see the ??, not the Yahoo users.
Another thing: You said that you sent two spaces but received three? 20 C2A0 20? Something is really fishy here.
Reporter | ||
Comment 8•4 years ago
|
||
Thanks, I saved the e-mail to the outbox as send later and that shows that the spaces are converted by Thunderbird to nbsp. Regarding the number of nbsp I always see space count - 1 so for example
<SPACE><SPACE>foo ===> <SPACE><NBSP><SPACE>foo
<SPACE><SPACE><SPACE>foo ===> <SPACE><NBSP><NBSP><SPACE>foo
<SPACE><SPACE><SPACE><SPACE>foo ===> <SPACE><NBSP><NBSP><NBSP><SPACE>foo
and so on.
Description
•