Closed Bug 1718119 Opened 1 year ago Closed 1 year ago

Daily: First message displayed after start looks garbled (non-ascii messages)

Categories

(MailNews Core :: Internationalization, defect, P1)

Thunderbird 91

Tracking

(thunderbird_esr78 unaffected, thunderbird91+ fixed)

RESOLVED FIXED
92 Branch
Tracking Status
thunderbird_esr78 --- unaffected
thunderbird91 + fixed

People

(Reporter: jose, Assigned: darktrojan)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36

Steps to reproduce:

I've come here from bug 1717523 comment #8. There I observed that when importing a message with an incorrectly declared charset, it's displayed "double-garbled" instead of "simple-garbled", that means, a message with a UTF-8 å (0xC3 0xA5) should be displayed as Ã¥ when interpreted as windows-1252, but it instead shows Ã¥, that's the four UTF-8 bytes for Ã¥ displayed as ISO-8859-1.

It turns out that any message displayed after TB started is displayed garbled, for example the attached one.

I think this is a fallout from bug 1717523 and bug 1713627.

Keywords: regression
Regressed by: 1717523, 1713627
Attached file utf-8.eml

Simple UTF-8 message.

Alice, could you check this out, please:
STR:
Import/drag the attached message to a local folder.
Click on on it. It displays garbled.
F8 to hide the preview, F8 to show it again. It doesn't display garbled any more.
Restart TB, click on the message. It displays garbled.
F8 to hide the preview, F8 to show it again. It doesn't display garbled any more.

Do you see this effect? Can you confirm the regression around bug 1717523 and bug 1713627?

Of course you can also try with a Japanese message.

Flags: needinfo?(alice0775)
No longer regressed by: 1717523
Regressed by: 1717523
Flags: needinfo?(hsivonen)

I forgot to thank you, Alice. Many thanks!! I was looking at bug 1717523 and one can get easily confused with too many problems at the same time. So it's good to have the confirmation and a Japanese example as well.

(In reply to José M. Muñoz from comment #0)

"double-garbled" instead of "simple-garbled", that means, a message with a UTF-8 å (0xC3 0xA5) should be displayed as Ã¥ when interpreted as windows-1252, but it instead shows Ã¥, that's the four UTF-8 bytes for Ã¥ displayed as ISO-8859-1.

That reminds me of Bug 1666146. I am not sure if there is any connection at all. It is, after all, a completely different code path. But the symptoms are similar.

Bug 1666146 occurs when TB reuses an expired news connection. Then it seems that the connection is not initialized completely or not in time.

Since you're asking for a déjà-vu: I have another one: Bug 1287336 (and duplicates), for steps to reproduce, see bug 1287336 comment #33.

Indeed. Want some more?
Bug 1485469 - circumflexed A ( Â ) is shown in message display that isn't in the message source
Bug 952861 - When switching views, message source is displayed in the preview pane

Confirming this even though I've only seen it in tests. Not sure why it doesn't happen to me as a user.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Not sure why it doesn't happen to me as a user.

You need to use a message with non-ASCII content. Dragging attachment 9228820 [details] or attachment 9228871 [details] into a folder will show them garbled, if they are the first message you display in the session. They are also garbled after a restart or even if they are the first message received after a restart. Whilst the "western" message is still somewhat readable, the Japanese message is totally unreadable, so that will not be well received when shipped to a large number of users, Japan being a popular locale for TB.

Priority: -- → P1
Summary: Daily: First message displayed after start looks garbled → Daily: First message displayed after start looks garbled (non-ascii messages)

(In reply to José M. Muñoz from comment #3)

Henri, any idea why https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15 and https://hg.mozilla.org/mozilla-central/rev/640ad102b31631a639d0b0e875257e684fe10dea would have caused display issues on first display?

No idea why it's first display only.

However, I suppose on first display there is nothing that communicates the UTF-8ness of the input to the HTML parser. I suggest one of the following:

  • Making the channel that mailnews core uses for passing the converted-to-UTF-8 email to Gecko report UTF-8 as the channel charset.
  • Starting the converted-to-UTF-8 data with the UTF-8 BOM.
Flags: needinfo?(hsivonen)

Just to note, the changes here have caused the Conversations add-on to have a similar issue - except it sees it all the time (with non-ascii messages).

Looks like Henri was spot on, I've just seen this in the console:
The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol.

Likely that this removal should have been a replacement:
https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15#l1.12
mDocShell->SetCharset("UTF-8"_ns);, but what's the replacement?

(In reply to José M. Muñoz from comment #13)

Likely that this removal should have been a replacement:
https://hg.mozilla.org/comm-central/rev/9941ab12f44ccfa75652bdc6e54d723a3e984b15#l1.12
mDocShell->SetCharset("UTF-8"_ns);, but what's the replacement?

There is no direct replacement. The items Henri described in comment 10 will need to be implemented.

Henri's comment suggested "one of the following". So surely "Making the channel ... report UTF-8 as the channel charset" would be easier than converting the data to UTF-8 with BOM rather than UTF-8 (without BOM). So dropping in a "replacement" nsIChannel::SetContentCharset() in the right location will likely fix the issue.

https://searchfox.org/comm-central/search?q=setcontentcharset&path=mime&case=false&regexp=false

(In reply to José M. Muñoz from comment #15)

Henri's comment suggested "one of the following". So surely "Making the channel ... report UTF-8 as the channel charset" would be easier than converting the data to UTF-8 with BOM rather than UTF-8 (without BOM). So dropping in a "replacement" nsIChannel::SetContentCharset() in the right location will likely fix the issue.

https://searchfox.org/comm-central/search?q=setcontentcharset&path=mime&case=false&regexp=false

The BOM route is the most robust one and involves prepending 3 bytes to the data, so it should be very easy. The only downside is the expected buffer copy from the prepending operation.

Thanks for the discussion here. Until upstream finds a better solution, we're going with this fix which hacks into the GetContentCharset() implementations:
https://github.com/Betterbird/thunderbird-patches/blob/main/91/bugs/1718119-first-display-garbled.patch
Could be hard to find the right spot in the MIME code to prepend the UTF-8 BOM.

Sorry about the noise, we revised our first hacky version. Now it's down to this and two similar hunks:

 NS_IMETHODIMP nsMsgProtocol::GetContentCharset(nsACString& aContentCharset) {
-  aContentCharset.Assign(mCharset);
+  aContentCharset.Assign(mCharset.IsEmpty() ? "UTF-8"_ns : mCharset);
   return NS_OK;
 }
Assignee: nobody → geoff
Status: NEW → ASSIGNED

I considered changing the protocols too, but in the end decided not to.

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/8cbcb7686639
When emitting messages for display, start with a UTF-8 BOM. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Comment on attachment 9231094 [details]
Bug 1718119 - When emitting messages for display, start with a UTF-8 BOM. r=mkmelin

[Approval Request Comment]
Regression caused by (bug #): bug 1717523 removed what was previously holding this together
User impact if declined: messages load in the wrong charset and look garbled
Testing completed (on c-c, etc.): on c-c, has a test that fails otherwise
Risk to taking this patch (and alternatives if risky): low

Attachment #9231094 - Flags: approval-comm-beta?

Comment on attachment 9231094 [details]
Bug 1718119 - When emitting messages for display, start with a UTF-8 BOM. r=mkmelin

[Triage Comment]
Approved for beta

Attachment #9231094 - Flags: approval-comm-beta? → approval-comm-beta+
Duplicate of this bug: 1720967
See Also: → 1666146
You need to log in before you can comment on or make changes to this bug.