Thunderbird sometimes loads messages without specifying encoding
Categories
(MailNews Core :: Backend, defect)
Tracking
(seamonkey2.45 affected, seamonkey2.46 affected, seamonkey2.47 affected, seamonkey2.48 affected, seamonkey2.49esr affected)
People
(Reporter: merike, Unassigned)
References
Details
With normal usage it happens rarely enough but every now and then I open a message and non-ascii characters are garbled. When I select another one briefly and go back to first one, it then displays correctly. It nearly always happens when I go through many messages in quick succession. Like holding down f or b key for a short while. When doing so for a few times I typically get it at least once. When it happens error console shows: The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol. Build 50.0a1 (2016-07-17)
Comment 1•8 years ago
|
||
You're not the first person observing this. I see it once every few weeks. Since switching tabs makes the problem go away, I think it's pretty much impossible to fix. Anyway, I'll shout dinner to anyone who can deliver a reproducible case.
Comment 2•8 years ago
|
||
Hmm, I was so eager to find the original bug that I didn't read comment #0 with the necessary attention. So holding down f or b will do it. Hmm, I tried on a folder with all German messages. I didn't see a bad one, however, the really went past quickly. I also looked at the error console and didn't see the message in question. That was in a release build. With my debug build I got the message and I accidentally even stopped on a garbled message. Let's get the debugger out ;-)
Comment 3•8 years ago
|
||
Note: https://dxr.mozilla.org/comm-central/source/mozilla/dom/locales/en-US/chrome/layout/htmlparser.properties#8 https://dxr.mozilla.org/comm-central/source/mozilla/parser/html/nsHtml5StreamParser.cpp#1374
Comment 4•8 years ago
|
||
Let's dupe the other one here since we have more detail here ;-)
Updated•8 years ago
|
Comment 6•8 years ago
|
||
OK, reproducible, so perhaps I owe the reporter a dinner now. But hard to fix. Here's a stack trace: xul.dll!nsHtml5StreamParser::ParseAvailableData() Line 1374 C++ xul.dll!nsHtml5StreamParser::DoStopRequest() Line 1041 C++ xul.dll!nsHtml5RequestStopper::Run() Line 1056 C++ xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1068 C++ xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 290 C++ xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 354 C++ xul.dll!MessageLoop::RunInternal() Line 232 C++ xul.dll!MessageLoop::RunHandler() Line 226 C++ xul.dll!MessageLoop::Run() Line 206 C++ xul.dll!nsThread::ThreadFunc(void * aArg) Line 465 C++ nss3.dll!_PR_NativeRunThread(void * arg) Line 397 C nss3.dll!pr_root(void * arg) Line 95 C Aceman, Kent, Magnus, any ideas how to proceed here? I have the impression that we ship the document off for display without having fully retrieved it.
Comment 7•8 years ago
|
||
(In reply to Jorg K (GMT+2, PTO during summer, NI me) from comment #6) > I have the impression that we ship the document off for display without > having fully retrieved it. Yes I think that's how rendering works (in browsers too), and it then reflows when more info is encountered.
Comment 8•8 years ago
|
||
In the ...attaching hang bug, the underlying issue had some similarities. In that case, OnStartRequest was delayed (which is a risky thing to do) so that we could figure out the character set. Ultimately, it was never called. Something similar could be going on here. That is, there is some path that is setting the character encoding that is delayed, and that delay somehow never gets completed before the rendering is complete. Or at least that is one possibility. If this theory is roughly correct, you'll need to figure out how the encoding gets communicated, and what are the circumstances that prevent it from being transmitted. Not an easy task in multiple threads.
Comment 10•7 years ago
|
||
REPRODUCIBLE with official en-US SeaMonkey 2.49a1 (NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Build 20161005004944 (Default Classic Theme) on German WIN7 64bit after Test Bug 725700 comment #8: 1. Sort order in Thread Pane "By Date, last mail at the bottom" 2. In Thread pane (All other messages Western Europa language) click an email about 15 emails before last one 3. hold down "f" on keyboard » selection highlighting quickly moves down until last email » Message body of (last) test email shows something strange instead of Japanese text a) I think Bug 1309711 has the same roots or even is a DUP of this one. b) MailNewsCore because SM is affected
Comment 11•7 years ago
|
||
For SM already reproducible with Version 2.40, so appearance probably TB 43 or earlier. Last good - first bad?
Comment 12•7 years ago
|
||
Are you 500% sure that this worked at some stage? Then we can ask Alice White (alice0775@yahoo.co.jp) to get us a regression window. He/She - I never worked out whether boy or girl ;-) - does some absolutely amazing things in no time at all. He/She has saved us so much time for locating bugs. I'm stunned and amazed every time I ask for help. BTW, his/her add-on https://addons.mozilla.org/en-US/firefox/addon/show-parent-folder/ a one of my "can't live without" add-ons for FF.
Comment 13•7 years ago
|
||
Strange: before Comment 10 I tested 3 times, 100% reproducible. Now few hours later not reproducible at all ...
Comment 14•7 years ago
|
||
Also all attempts to reproduce with FF Daily 52.0a1 (2016-10-10) (64-bit) failed.
Comment 15•7 years ago
|
||
Take a good look at comment #8. This is clearly a timing/threading issue. In very rare cases, the message is rendered without the charset having been set. Viewing many message in fast succession be holding down "F" can reproduce the problem (unreliably). Back in July in comment #2 I could reproduce the problem in a (slow) debug build.
Comment 16•7 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #12) > Are you 500% sure No, but see Bug 1309711 comment #11
Comment 17•7 years ago
|
||
fwiw, i've been able to reproduce this 8/8 times on my (rather slow) laptop by holding f for a second or two. after starting thunderbird and reproducing the bug, i have never been able to reproduce it a second time, and had to restart before i could reproduce it again. on my desktop, i could not reproduce it at all (but it does happen there from time to time, just not reliably)
Comment 19•5 years ago
|
||
For SM 2.53 always reproducible. User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0 SeaMonkey/2.53 Build identifier: 20181009130009
Comment 20•5 years ago
|
||
Error Console in case of incorrect presentation: Error: TypeError: browsers[i] is undefined Source File: chrome://navigator/content/tabbrowser.xml Line: 333 Error: The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol. Source File: mailbox:///C:/Users..... Error: TypeError: browsers[i] is undefined Source File: chrome://navigator/content/tabbrowser.xml Line: 333 Error Console in case of correct presentation: Error: TypeError: browsers[i] is undefined Source File: chrome://navigator/content/tabbrowser.xml Line: 333 Error: TypeError: browsers[i] is undefined Source File: chrome://navigator/content/tabbrowser.xml Line: 333 The same email was always displayed. Header contains: Content-Type: text/plain; charset=utf-8
Comment 21•5 years ago
|
||
It makes no difference if an email was previously displayed with the same encoding or another. I have copied a German email five times, so these five are in a row, at the end of the folder. Content-Type: text/plain; charset=utf-8 If I go to the first of these five emails and then jump to the end with f, the last email is always displayed incorrectly. If I go to the third or fourth email and then jump to the end with f, the last email will always be displayed correctly. From the second of these emails the result is different. If different emails are before the last, the result is exactly the same. Of all emails that are more than three before the last, the error always occurs. Such warnings come before the error message: Warning: Expected identifier for pseudo-class or pseudo-element but found ‘0’. Ruleset ignored due to bad selector. Source File: about:blank Line: 6, Column: 8 Source Code: margin:0; Warning: Error in parsing value for ‘-webkit-text-size-adjust’. Declaration dropped. Source File: about:blank Line: 17, Column: 53 Source Code: background-color: #b6bfcc; -webkit-text-size-adjust: 100%; Warning: Expected identifier for pseudo-class or pseudo-element but found ‘ ’. Ruleset ignored due to bad selector. Source File: about:blank Line: 1, Column: 20 Source Code: .ReadMsgBody WIDTH: 100% Warning: Unexpected end of file while searching for closing } of invalid rule set. Source File: about:blank Line: 11 Warning: Expected identifier for pseudo-class or pseudo-element but found ‘ ’. Ruleset ignored due to bad selector. Source File: about:blank Line: 1, Column: 38 Source Code: *#emailWrapperTable H1 MARGIN-BOTTOM: 2px; FONT-SIZE: 15px; FONT-FAMILY: Verdana , Arial Warning: ‘src’ attribute of <script> element is not a valid URI: “js.ui-portal.de/apps/shared/jquery/1.8.3/jquery-1.8.3.min.js” Source File: about:blank Line: 1 Looks like data processing before the data is there and data processing when the data is gone.
Comment 22•5 years ago
|
||
Reliable reproducible with f b n [ ]. The error never occurs with the cursor keys.
Comment 25•5 years ago
|
||
Bug 1309711 describes another case of this error which is easy to reproduce in SM.
Comment 33•4 years ago
|
||
I found a "sort of easy" way to reproduce this. Instead of (quote from comment #0) "holding down f or b key for a short while", I see this frequently when using the back navigation "Go back one message" that one add onto a toolbar using customisation. After visiting a few messages, and then going back to one via the drop down, that message is frequently displayed incorrectly.
Comment 35•4 years ago
|
||
With all do respect, I'm deassigning the bug from Ben (bblack). There's been no action for more than a year. Ben, do you intend to submit a patch sometime soon here? This is still one of the more puzzling bugs that has been with us for ages, at least 2005 looking at bug 315957.
Comment 37•4 years ago
|
||
In case it helps, in bug 1309711 (closed as a duplicate of this one) I described steps which seem to reliably reproduce this using a couple of emails specifically composed in different encodings and switching once from one to the other - as opposed to most discussion here which seems to involve repeatedly switching between arbitrary messages until it might happen. Those steps still reproduce the issue in SeaMonkey 2.49.5 (now on Linux; I was using Windows when first reported).
In summary:
- Set up a folder containing one message encoded in UTF-8 ("Unicode") and another in windows-1252 ("Western"), each containing text using extended characters (e.g. "Iñtërnâtiônàlizætiøn") - more detail in bug 1309711
- Select the folder under Local Folders
- Select the message in windows-1252 encoding
- Press F8 to open the message pane (depending on previous actions, it may or may not display correctly, but that's not the concern at this step)
- Press F8 to close the message pane
- Select the message in UTF-8 encoding
- Press F8 to open the message pane
Expected Result: The message text should be "Iñtërnâtiônàlizætiøn"
Actual Result: The message text is "Iñtërnâtiônà lizætiøn" instead, having been decoded using an incorrect encoding - Press F8 to close the message pane
- Press F8 to open the message pane
Result: The message text is now correct "Iñtërnâtiônàlizætiøn" - Press F8 to open the message pane [I think I probably meant to say "close" rather than "open" here, since the message pane is already open at this point]
The same effect can be seen by switching between the two messages in either direction (windows-1252 then UTF-8, or UTF-8 then windows-1252).
I also described how a similar process could lead to corruption of text in saved drafts, which was a bit nasty if the draft was modified and saved without noticing the corruption. I don't seem to be able to reproduce that one now, so perhaps that aspect is fixed, or only affects Windows.
Updated•1 year ago
|
Description
•