Open Bug 1287336 Opened 8 years ago Updated 1 year ago

Thunderbird sometimes loads messages without specifying encoding

Categories

(MailNews Core :: Backend, defect)

defect

Tracking

(seamonkey2.45 affected, seamonkey2.46 affected, seamonkey2.47 affected, seamonkey2.48 affected, seamonkey2.49esr affected)

Tracking Status
seamonkey2.45 --- affected
seamonkey2.46 --- affected
seamonkey2.47 --- affected
seamonkey2.48 --- affected
seamonkey2.49esr --- affected

People

(Reporter: merike, Unassigned)

References

Details

With normal usage it happens rarely enough but every now and then I open a message and non-ascii characters are garbled. When I select another one briefly and go back to first one, it then displays correctly.

It nearly always happens when I go through many messages in quick succession. Like holding down f or b key for a short while. When doing so for a few times I typically get it at least once. When it happens error console shows:

The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol.
Build 50.0a1 (2016-07-17)
You're not the first person observing this. I see it once every few weeks. Since switching tabs makes the problem go away, I think it's pretty much impossible to fix.

Anyway, I'll shout dinner to anyone who can deliver a reproducible case.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Hmm, I was so eager to find the original bug that I didn't read comment #0 with the necessary attention.

So holding down f or b will do it. Hmm, I tried on a folder with all German messages. I didn't see a bad one, however, the really went past quickly. I also looked at the error console and didn't see the message in question.

That was in a release build. With my debug build I got the message and I accidentally even stopped on a garbled message. Let's get the debugger out ;-)
Let's dupe the other one here since we have more detail here ;-)
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Status: REOPENED → NEW
OK, reproducible, so perhaps I owe the reporter a dinner now. But hard to fix. Here's a stack trace:

xul.dll!nsHtml5StreamParser::ParseAvailableData() Line 1374	C++
xul.dll!nsHtml5StreamParser::DoStopRequest() Line 1041	C++
xul.dll!nsHtml5RequestStopper::Run() Line 1056	C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1068	C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 290	C++
xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 354	C++
xul.dll!MessageLoop::RunInternal() Line 232	C++
xul.dll!MessageLoop::RunHandler() Line 226	C++
xul.dll!MessageLoop::Run() Line 206	C++
xul.dll!nsThread::ThreadFunc(void * aArg) Line 465	C++
nss3.dll!_PR_NativeRunThread(void * arg) Line 397	C
nss3.dll!pr_root(void * arg) Line 95	C

Aceman, Kent, Magnus, any ideas how to proceed here?

I have the impression that we ship the document off for display without having fully retrieved it.
Flags: needinfo?(rkent)
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(acelists)
(In reply to Jorg K (GMT+2, PTO during summer, NI me) from comment #6)
> I have the impression that we ship the document off for display without
> having fully retrieved it.

Yes I think that's how rendering works (in browsers too), and it then reflows when more info is encountered.
Flags: needinfo?(mkmelin+mozilla)
In the ...attaching hang bug, the underlying issue had some similarities. In that case, OnStartRequest was delayed (which is a risky thing to do) so that we could figure out the character set. Ultimately, it was never called. Something similar could be going on here. That is, there is some path that is setting the character encoding that is delayed, and that delay somehow never gets completed before the rendering is complete. Or at least that is one possibility.

If this theory is roughly correct, you'll need to figure out how the encoding gets communicated, and what are the circumstances that prevent it from being transmitted. Not an easy task in multiple threads.
Flags: needinfo?(rkent)
I have no idea here.
Flags: needinfo?(acelists)
REPRODUCIBLE with official en-US SeaMonkey 2.49a1  (NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 Build 20161005004944  (Default Classic Theme) on German WIN7 64bit after Test Bug 725700 comment #8:

1. Sort order in Thread Pane "By Date, last mail at the bottom"
2. In Thread pane (All other messages Western Europa language) click an
   email about 15 emails before last one
3. hold down "f" on keyboard
   » selection highlighting quickly moves down until last email
   » Message body of (last) test email shows something strange instead of
     Japanese text

a) I think Bug 1309711 has the same roots or even is a DUP of this one.
b) MailNewsCore because SM is affected
Component: Mail Window Front End → Backend
Product: Thunderbird → MailNews Core
See Also: → 1309711
Version: 50 Branch → 50
For SM already reproducible with Version 2.40, so appearance probably TB 43 or earlier.
Last good - first bad?
Are you 500% sure that this worked at some stage? Then we can ask Alice White (alice0775@yahoo.co.jp) to get us a regression window. He/She - I never worked out whether boy or girl ;-) - does some absolutely amazing things in no time at all. He/She has saved us so much time for locating bugs. I'm stunned and amazed every time I ask for help.
BTW, his/her add-on https://addons.mozilla.org/en-US/firefox/addon/show-parent-folder/ a one of my "can't live without" add-ons for FF.
Strange: before Comment 10 I tested 3 times, 100% reproducible. Now few hours later not reproducible at all ...
Also all attempts to reproduce with FF Daily 52.0a1 (2016-10-10) (64-bit) failed.
Take a good look at comment #8. This is clearly a timing/threading issue. In very rare cases, the message is rendered without the charset having been set. Viewing many message in fast succession be holding down "F" can reproduce the problem (unreliably). Back in July in comment #2 I could reproduce the problem in a (slow) debug build.
(In reply to Jorg K (GMT+2) from comment #12)
> Are you 500% sure 

No, but see Bug 1309711 comment #11
fwiw, i've been able to reproduce this 8/8 times on my (rather slow) laptop by holding f for a second or two. after starting thunderbird and reproducing the bug, i have never been able to reproduce it a second time, and had to restart before i could reproduce it again. on my desktop, i could not reproduce it at all (but it does happen there from time to time, just not reliably)
For SM 2.53 always reproducible.

User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0 SeaMonkey/2.53
Build identifier: 20181009130009
Error Console in case of incorrect presentation:

Error: TypeError: browsers[i] is undefined
Source File: chrome://navigator/content/tabbrowser.xml Line: 333

Error: The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol.
Source File: mailbox:///C:/Users.....

Error: TypeError: browsers[i] is undefined
Source File: chrome://navigator/content/tabbrowser.xml Line: 333


Error Console in case of correct presentation:

Error: TypeError: browsers[i] is undefined
Source File: chrome://navigator/content/tabbrowser.xml Line: 333

Error: TypeError: browsers[i] is undefined
Source File: chrome://navigator/content/tabbrowser.xml Line: 333

The same email was always displayed.
Header contains:
Content-Type: text/plain; charset=utf-8
It makes no difference if an email was previously displayed with the same encoding or another.

I have copied a German email five times, so these five are in a row, at the end of the folder.
Content-Type: text/plain; charset=utf-8

If I go to the first of these five emails and then jump to the end with f, the last email is always displayed incorrectly.
If I go to the third or fourth email and then jump to the end with f, the last email will always be displayed correctly.
From the second of these emails the result is different.

If different emails are before the last, the result is exactly the same.
Of all emails that are more than three before the last, the error always occurs.

Such warnings come before the error message:
Warning: Expected identifier for pseudo-class or pseudo-element but found ‘0’.  Ruleset ignored due to bad selector.
Source File: about:blank Line: 6, Column: 8
Source Code: margin:0;

Warning: Error in parsing value for ‘-webkit-text-size-adjust’.  Declaration dropped.
Source File: about:blank Line: 17, Column: 53
Source Code: background-color: #b6bfcc; -webkit-text-size-adjust: 100%;

Warning: Expected identifier for pseudo-class or pseudo-element but found ‘ ’.  Ruleset ignored due to bad selector.
Source File: about:blank Line: 1, Column: 20
Source Code: .ReadMsgBody  WIDTH: 100%

Warning: Unexpected end of file while searching for closing } of invalid rule set.
Source File: about:blank Line: 11

Warning: Expected identifier for pseudo-class or pseudo-element but found ‘ ’.  Ruleset ignored due to bad selector.
Source File: about:blank Line: 1, Column: 38
Source Code: *#emailWrapperTable H1  MARGIN-BOTTOM: 2px; FONT-SIZE: 15px; FONT-FAMILY: Verdana , Arial

Warning: ‘src’ attribute of <script> element is not a valid URI: “js.ui-portal.de/apps/shared/jquery/1.8.3/jquery-1.8.3.min.js”
Source File: about:blank Line: 1


Looks like data processing before the data is there and data processing when the data is gone.
Reliable reproducible with f b n [ ].
The error never occurs with the cursor keys.
Assignee: nobody → bschwarze
Status: NEW → ASSIGNED
Bug 1309711 describes another case of this error which is easy to reproduce in SM.
See Also: → 74916

I found a "sort of easy" way to reproduce this. Instead of (quote from comment #0) "holding down f or b key for a short while", I see this frequently when using the back navigation "Go back one message" that one add onto a toolbar using customisation. After visiting a few messages, and then going back to one via the drop down, that message is frequently displayed incorrectly.

With all do respect, I'm deassigning the bug from Ben (bblack). There's been no action for more than a year. Ben, do you intend to submit a patch sometime soon here? This is still one of the more puzzling bugs that has been with us for ages, at least 2005 looking at bug 315957.

Assignee: ben → nobody
Status: ASSIGNED → NEW

In case it helps, in bug 1309711 (closed as a duplicate of this one) I described steps which seem to reliably reproduce this using a couple of emails specifically composed in different encodings and switching once from one to the other - as opposed to most discussion here which seems to involve repeatedly switching between arbitrary messages until it might happen. Those steps still reproduce the issue in SeaMonkey 2.49.5 (now on Linux; I was using Windows when first reported).

In summary:

  • Set up a folder containing one message encoded in UTF-8 ("Unicode") and another in windows-1252 ("Western"), each containing text using extended characters (e.g. "Iñtërnâtiônàlizætiøn") - more detail in bug 1309711
  1. Select the folder under Local Folders
  2. Select the message in windows-1252 encoding
  3. Press F8 to open the message pane (depending on previous actions, it may or may not display correctly, but that's not the concern at this step)
  4. Press F8 to close the message pane
  5. Select the message in UTF-8 encoding
  6. Press F8 to open the message pane
    Expected Result: The message text should be "Iñtërnâtiônàlizætiøn"
    Actual Result: The message text is "Iñtërnâtiônà lizætiøn" instead, having been decoded using an incorrect encoding
  7. Press F8 to close the message pane
  8. Press F8 to open the message pane
    Result: The message text is now correct "Iñtërnâtiônàlizætiøn"
  9. Press F8 to open the message pane [I think I probably meant to say "close" rather than "open" here, since the message pane is already open at this point]

The same effect can be seen by switching between the two messages in either direction (windows-1252 then UTF-8, or UTF-8 then windows-1252).

I also described how a similar process could lead to corruption of text in saved drafts, which was a bit nasty if the draft was modified and saved without noticing the corruption. I don't seem to be able to reproduce that one now, so perhaps that aspect is fixed, or only affects Windows.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.