Garbled text in View Source - apparently ISO-8859-1 encoding is assumed
Categories
(MailNews Core :: Internationalization, defect)
Tracking
(Not tracked)
People
(Reporter: gingerbread_man, Unassigned)
References
Details
(Keywords: intl)
Attachments
(3 files)
Comment 2•11 years ago
|
||
Comment 4•11 years ago
|
||
Comment 6•11 years ago
|
||
Comment 7•11 years ago
|
||
Comment 9•11 years ago
|
||
Comment 10•11 years ago
|
||
Reporter | ||
Comment 11•11 years ago
|
||
Comment 13•8 years ago
|
||
Comment 14•8 years ago
|
||
Comment 15•7 years ago
|
||
Comment 18•4 years ago
|
||
Note bug 1736248 comment #3 (quote):
... we should probably pass in the charset to the viewer.
https://searchfox.org/comm-central/rev/af4783ca444808036d2a5860444e7322e904c09f/mail/base/content/mailCore.js#46,49
https://searchfox.org/mozilla-central/rev/5122357c497684e01c5bb2d4a9bf8be1fe97a413/toolkit/components/viewsource/content/viewSourceUtils.js#218,226,242
(see https://bugzilla.mozilla.org/page.cgi?id=splinter.html&ignore=&bug=111164&attachment=99046)
Comment 19•4 years ago
|
||
DOM of "view source" document in the inspector.
Comment 20•4 years ago
|
||
Dump of content document in the console.
The issue is that the content document has charset windows-1252 (so most "modern" UTF-8 messages will show a garbled source). We see no way to pass the charset to the browser, it looks like it determines the charset based on the content of the document. That means a meta tag with the charset would need to be added. However, it's not clear where the HTML for the message source is emitted and how for example the style-sheet link to "viewsource.css" in the header is added, this must happen in M-C code since there isn't a reference in C-C code:
https://searchfox.org/comm-central/search?q=viewsource.css&path=&case=false®exp=false
Henri, do you have a suggestion?
Comment 21•4 years ago
|
||
I suggest making the mailbox:
channel for the message report the encoding that mailnews would use when displaying the message as the return value of GetContentCharset
.
Comment 22•4 years ago
|
||
Thanks for the suggestion, something similar was already a possible solution for bug 1718119, but the accepted solution there as to prepend the UTF-8 BOM (which proved to be beneficial for bug 1736344). We'll look into it.
Comment 23•4 years ago
|
||
Henri's suggestion works, experimentally returning a fixed UTF-8 charset in nsMsgProtocol::GetContentCharset()
leads to the message source of a UTF-8 message being displayed correctly. That said, it's not easy to see where to take the correct charset from at that point. For displaying messages, not the source, setting the charset is called from MIME code:
https://searchfox.org/comm-central/rev/85f2768bcf455b5faf7fd63882f4f4491c6b86f2/mailnews/mime/emitters/nsMimeBaseEmitter.cpp#522,524
UpdateCharacterSet()
is called from mimeEmitterUpdateCharacterSet()
:
https://searchfox.org/comm-central/search?q=UpdateCharacterSet&redirect=false
Debugging shows the following (already observed in bug 1718119): For displaying a message, GetContentCharset()
is called before the set call from the MIME code, so it returns nothing. For displaying the source of a message, there is no set call, and GetContentCharset()
also returns nothing. A possible fix would be to locate the code path that retrieves the message source for display and make it call SetContentCharset()
before Gecko gets the charset. At a guess, getting the message source doesn't involve any MIME parsing and there is no connection between displaying a message and displaying its source. Also, the front end JS code that launches the source display of the currently displayed message which has access to the charset of that message doesn't have access to the (back end) channel that will be used for display of the message source.
Updated•3 years ago
|
Comment 27•8 months ago
|
||
Ok... Thank you for your attention :)
Description
•