Open Bug 1017768 Opened 10 years ago Updated 11 months ago

Garbled text in View Source - apparently ISO-8859-1 encoding is assumed

Categories

(MailNews Core :: Internationalization, defect)

defect

Tracking

(Not tracked)

People

(Reporter: gingerbread_man, Unassigned)

References

Details

(Keywords: intl)

Attachments

(3 files)

Filed following bug 232867, comment 11.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Thunderbird/32.0a1
Built from https://hg.mozilla.org/mozilla-central/rev/e017c15325ae

1. Compose a new message. In the subject line and message body, enter something other than basic Latin characters, e.g. "líbily nápady".
2. Click the X button in the top right corner. Choose to save the draft.
3. Press Ctrl+U to bring up the View Source window. The text in the message body is garbled and View → Character Encoding shows Western is selected.
Also observed with SeaMonkey Mail & News, but not in the browser. It doesn't seem to make any difference whether or not the fallback character encoding actually is ISO-8859-1, it's using it even if I change the default to UTF-8 (thus, still can't see the correct encoding).

The message I'm looking at has UTF-8 encoding and states so correctly in the charset attribute, thus no HTML override. The main window correctly has "Unicode" selected in the menu, View Source however assumes "Western" and this is what I see (tested on Linux).

Related to old bug 111164 which regressed somehow? If so, it still seems to have worked when bug 232867 was resolved as WFM.
Component: Mail Window Front End → Internationalization
OS: Windows 7 → All
Product: Thunderbird → MailNews Core
Hardware: x86_64 → All
Possibly fallout from bug 943268 ?
Well, that's only a quarter Megabyte of patch to go through... :-\

That's way too recent, though - I see this happening on a 29.0 build already.
like bug 701694 and friends?
Not necessarily, given that this bug here is specific to View Source, but the underlying mechanism for disregarding the encoding of the source message may be the same.
Bug 597369 and Bug 701694 is ;
"charset used in Forward/Edit As New, or Edit Draft" is not charset of the mail. charset of previously shown mail is used.
Common in both cases is : relevant mail is not shown at message pane yet.
This indicates : "charset in mail window" is set upon load of a mail in message pane.

In View/Source case, I don't know how to show source, without selecting the mail at thread pane by click, and without showing the mail at message pane.
Only way I coul do was : hide message pane, select a mail at thread pane, CTRL+U.
And, I could observer same phenomenon(garbled display) as Bug 701694 and Bug 597369 by it.
  1. Open message pane, select utf-8 mail or iso-8859-1 mail at thread pane. Mail is displayed in messae pane.
  2. Hide message pane.
  3. Sellect iso-2022-jp mail at thread pane.
  4. Ctrl+U => used charset is always charset at step 1.
                        If no mail is shown at message pane after folder open, it looks iso-8859-1 is always used.
I couldn't see problem using CTRL+U, without hiding message pane.

(a) Forward/Edit As New : Because message is not selected at thread pane when right click, message pane can be kept open.
(b) Edit Draft : "Double click of a draft mail" involves "select at thread pane", so "hiding message pane" is needed.
(c) Ctrl+U : "hiding message pane" is simplest way for "show message source without displaying mail in message pane".
I can't find differences between these cases.
(In reply to rsx11m from comment #1)
> Also observed with SeaMonkey Mail & News, but not in the browser. 

Was relevant mail selected at Thread pane? Was mail content shown at message pane?
Yes to both. Maybe noteworthy, this was a multipart message, so maybe another aspects comes in here where the individual message parts have their respective charset defined but the overall message doesn't, and that may be what's eventually deciding the charset used to view the source.
(In reply to rsx11m from comment #8)

Original case of comment #0 is for simple mail.
   1. Compose a mail in iso-8859-1.
   2. Because non-ascii text is typed, draft is saved in UTF-8 automatically.
   3. Because the saved draft mail is not opened yet, charset at mail windows is last opened draft, or no draft is opened yet.
At this step, I don't know how to view message source by Ctrl+U, with message pane kept open, without showing the draft mail(no view message source in context menu).
However, if message pane is closed, I can pretty easily reproduce same problem as Bug 597369 and Bug 701694 by Ctrl+U.

Your case is Bug 716983 which is duped to Bug 715823, isn't it?
I think phenomenon is same in both Bug 597369/Bug 701694 and Bug 715823.
   "Last used charset at mail window" is used for Forward/Edit As New/Edit draft/View Message Source,
   but data is not converted to the used charset from charset of original data.
(In reply to WADA from comment #9)
> Original case of comment #0 is for simple mail.
>    1. Compose a mail in iso-8859-1.
>    2. Because non-ascii text is typed, draft is saved in UTF-8 automatically.

That's not what I did. The draft is saved as UTF-8 automatically because that's the default, and I didn't change that. It doesn't matter whether the content of the message is all Latin letters: it's still saved as UTF-8.
Updating this bug report to confirm that it is still relevant for Thunderbird 45.7.0.
I'm still seeing this behaviour in Thunderbird 52.2.1.
Still relevant also in 52.7.0 :)
Attached image inspector.png

DOM of "view source" document in the inspector.

Attached image contentdocument.png

Dump of content document in the console.

The issue is that the content document has charset windows-1252 (so most "modern" UTF-8 messages will show a garbled source). We see no way to pass the charset to the browser, it looks like it determines the charset based on the content of the document. That means a meta tag with the charset would need to be added. However, it's not clear where the HTML for the message source is emitted and how for example the style-sheet link to "viewsource.css" in the header is added, this must happen in M-C code since there isn't a reference in C-C code:
https://searchfox.org/comm-central/search?q=viewsource.css&path=&case=false&regexp=false

Henri, do you have a suggestion?

Flags: needinfo?(hsivonen)

I suggest making the mailbox: channel for the message report the encoding that mailnews would use when displaying the message as the return value of GetContentCharset.

Flags: needinfo?(hsivonen)

Thanks for the suggestion, something similar was already a possible solution for bug 1718119, but the accepted solution there as to prepend the UTF-8 BOM (which proved to be beneficial for bug 1736344). We'll look into it.

Henri's suggestion works, experimentally returning a fixed UTF-8 charset in nsMsgProtocol::GetContentCharset() leads to the message source of a UTF-8 message being displayed correctly. That said, it's not easy to see where to take the correct charset from at that point. For displaying messages, not the source, setting the charset is called from MIME code:
https://searchfox.org/comm-central/rev/85f2768bcf455b5faf7fd63882f4f4491c6b86f2/mailnews/mime/emitters/nsMimeBaseEmitter.cpp#522,524
UpdateCharacterSet() is called from mimeEmitterUpdateCharacterSet():
https://searchfox.org/comm-central/search?q=UpdateCharacterSet&redirect=false

Debugging shows the following (already observed in bug 1718119): For displaying a message, GetContentCharset() is called before the set call from the MIME code, so it returns nothing. For displaying the source of a message, there is no set call, and GetContentCharset() also returns nothing. A possible fix would be to locate the code path that retrieves the message source for display and make it call SetContentCharset() before Gecko gets the charset. At a guess, getting the message source doesn't involve any MIME parsing and there is no connection between displaying a message and displaying its source. Also, the front end JS code that launches the source display of the currently displayed message which has access to the charset of that message doesn't have access to the (back end) channel that will be used for display of the message source.

Severity: normal → S3
Duplicate of this bug: 1835543
Duplicate of this bug: 1448373
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: