Closed
Bug 333877
Opened 18 years ago
Closed 18 years ago
TB 1.5.0.2 - 100% processor utilization processing IMAP IDLE
Categories
(Thunderbird :: General, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: robin.grindley, Assigned: Bienvenu)
References
Details
(Keywords: fixed1.8.1, verified1.8.1.3)
Attachments
(2 files, 1 obsolete file)
8.47 KB,
text/plain
|
Details | |
565 bytes,
patch
|
mscott
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Thunderbird/1.5.0.2
I noticed in TB 1.5 that every so often it would start using 100% of the CPU. I built a debug version of TB using tag THUNDERBIRD_1_5_0_2_RC2, and the problem still occurs. It looks like the IMAP IDLE processing gets a message that it doesn't process properly, and it goes into an infinite loop continuously trying to process the message but not getting anywhere. Debug info below.
Reproducible: Sometimes
Steps to Reproduce:
1. Run TB
2. Wait hours or days
3. TB will go runaway
Actual Results:
TB goes runaway
Expected Results:
TB should not go runaway
Here is the call trace:
> plc4.dll!PL_strchr(const char * s=0x07547c6e, char c='
') Line 43 C
thunderbird.exe!nsMsgLineStreamBuffer::NextLineAvailable() Line 469 + 0x22 bytes C++
thunderbird.exe!nsImapProtocol::HandleIdleResponses() Line 1184 + 0xe bytes C++
thunderbird.exe!nsImapProtocol::ImapThreadMainLoop() Line 1162 C++
thunderbird.exe!nsImapProtocol::Run() Line 934 C++
xpcom_core.dll!nsThread::Main(void * arg=0x06e40970) Line 118 + 0x1c bytes C++
nspr4.dll!_PR_NativeRunThread(void * arg=0x07a3d790) Line 436 + 0xf bytes C
nspr4.dll!pr_root(void * arg=0x07a3d790) Line 112 + 0xf bytes C
Here are the local data structures in NextLineAvailable(), which shows the input string that seems to be stuck in its craw:
- this 0x072910c8 {m_eatCRLFs=0x00000000 m_allocateNewLines=0x00000001 m_dataBuffer=0x07547c58 "+ entering idle mode
* 4558 EXISTS
* 0 RECENT
" ...} nsMsgLineStreamBuffer * const
+ __vfptr 0x0183cc00 const nsMsgLineStreamBuffer::`vftable' *
m_eatCRLFs 0x00000000 int
m_allocateNewLines 0x00000001 int
+ m_dataBuffer 0x07547c58 "+ entering idle mode
* 4558 EXISTS
* 0 RECENT
" char *
m_dataBufferSize 0x00002000 unsigned int
m_startPos 0x00000016 unsigned int
m_numBytesInBuffer 0x0000001b unsigned int
m_lineToken 0x0a '␊' char
Note that there are line breaks in m_dataBuffer, so the buffer input is
+ entering idle mode
* 4558 EXISTS
* 0 RECENT
The local data structure in PL_strchr is:
+ s 0x07547c6e "* 4558 EXISTS
* 0 RECENT
" const char *
c 0x0a '␊' char
In the following code
void nsImapServerResponseParser::ParseIMAPServerResponse(const char *currentCommand, PRBool aIgnoreBadAndNOResponses)
{
NS_ASSERTION(currentCommand && *currentCommand != '\r' &&
*currentCommand != '\n' && *currentCommand != ' ', "Invailid command string");
PRBool sendingIdleDone = !strcmp(currentCommand, "DONE"CRLF);
if (sendingIdleDone)
fWaitingForMoreClientInput = PR_FALSE;
// Reinitialize the parser
SetConnected(PR_TRUE);
SetSyntaxError(PR_FALSE);
// Reinitialize our state
InitializeState();
// the default is to not pipeline
fNumberOfTaggedResponsesExpected = 1;
int numberOfTaggedResponsesReceived = 0;
char *copyCurrentCommand = PL_strdup(currentCommand);
if (copyCurrentCommand && !fServerConnection.DeathSignalReceived())
copyCurrentCommand is "11 IDLE", and fServerConnection.DeathSignalReceived() always evaluates to 1. So is it somehow erroneously thinking that the server connection is gone, but continuing to process anyway?
Assignee | ||
Comment 1•18 years ago
|
||
thx very much for the diagnosis. That's very helpful! I'll look at the code, but getting a protocol log of a session where this happens might be helpful: http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap
Assignee: mscott → bienvenu
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 2•18 years ago
|
||
It just went runaway again. I stopped TB via the debugger. Then started an Ethereal trace, started TB up, let it run wild for about 90 seconds, and during that time it never checked the server of its own volition, so I clicked on Get Mail once at the end. (Result was No New Messages on Server.) Not much IMAP traffic at all. Whatever TB thinks it is doing, it seems to be inside its own head. Summary of trace is: No. Time Source Destination Protocol Info 97 57.410492 10.11.128.170 10.11.1.7 IMAP Request: DONE 98 57.429618 10.11.1.7 10.11.128.170 IMAP Response: 565 OK IDLE completed 99 57.534600 10.11.128.170 10.11.1.7 IMAP Request: 566 check 100 57.553510 10.11.1.7 10.11.128.170 IMAP Response: 566 OK CHECK completed 101 57.554020 10.11.128.170 10.11.1.7 IMAP Request: 567 UID fetch 127687:* (FLAGS) 102 57.557064 10.11.1.7 10.11.128.170 IMAP Response: 567 OK FETCH completed. 104 57.818200 10.11.128.170 10.11.1.7 IMAP Request: 568 IDLE 106 57.853440 10.11.1.7 10.11.128.170 IMAP Response: + entering idle mode 173 88.553865 10.11.128.170 10.11.1.7 IMAP Request: DONE 182 93.741808 10.11.1.7 10.11.128.170 IMAP Response: 568 OK IDLE completed 183 93.863075 10.11.128.170 10.11.1.7 IMAP Request: 569 noop 185 93.887220 10.11.1.7 10.11.128.170 IMAP Response: 569 OK NOOP completed 186 93.887663 10.11.128.170 10.11.1.7 IMAP Request: 570 UID fetch 127687:* (FLAGS) 187 93.889747 10.11.1.7 10.11.128.170 IMAP Response: 570 OK FETCH completed. 189 94.378670 10.11.128.170 10.11.1.7 IMAP Request: 571 IDLE 191 94.419730 10.11.1.7 10.11.128.170 IMAP Response: + entering idle mode I'd assume I hit Get Mail around the 93-second point in the trace.
Reporter | ||
Comment 3•18 years ago
|
||
I was taking another look at the code, and in nsImapServerResponseParser::ParseIMAPServerResponse I noticed the code below. char *copyCurrentCommand = PL_strdup(currentCommand); if (copyCurrentCommand && !fServerConnection.DeathSignalReceived()) { .... code deleted ... } else if (!fServerConnection.DeathSignalReceived()) HandleMemoryFailure(); Shouldn't that else if clause be (fServer....) not (!fServer...)? I traced it and DeathSignalReceived() is TRUE, but the logic is inverted and it never executes the else if branch. I stepped it right up to the else if assembly line and changed the eax register (to invert the logic), and then TB went on to run fine.
Reporter | ||
Comment 4•18 years ago
|
||
Since reporting the bug I've upgraded to TB 1.5.0.4, and this runaway event occurs very frequently (multiple times per day).
Comment 5•18 years ago
|
||
I am not sure but I think I also experience this bug. If there is anything I can do help troubleshoot this, let me know. Unfortunately, I am not as knowledgeable as the OP. Running TB 1.5, latest version on Debian testing.
Comment 6•18 years ago
|
||
Please change OS to all since this does not appear to be solely a Windows issue. Please change version to 1.5.
Reporter | ||
Comment 8•18 years ago
|
||
Comment on attachment 226613 [details] [diff] [review] Fix for logic error in handling of IMAP DeathSignal Requesting review. Never done this before, so if I made a mistake, let me know.
Attachment #226613 -
Flags: review?(bienvenu)
Reporter | ||
Comment 9•18 years ago
|
||
Changing OS to All and Version to 1.5, as per comment 6.
OS: Windows XP → All
Version: unspecified → 1.5
Assignee | ||
Comment 10•18 years ago
|
||
Thx for the patch. Have you tried a 2.0 or trunk build to see if the bug still exists there? The code is different there, so this patch doesn't apply cleanly. In any case, I don't this patch is correct - it may fix the problem, but I don't think it's quite right. The code is trying to only call HandleMemoryFailure in the case that it couldn't copy the current response. You've switched that around so we call HandleMemoryFailure in the case the we didn't run out of memory, but we did have DeathSignalReceived. This has the side effect of killing the connection by nsIMAPGenericParser::HandleMemoryFailure() calling SetConnected(PR_FALSE). Switching it to just "else" would be better but still not quite right - we should be checking DeathSignalReceived and explicitly calling SetConnected(PR_FALSE) instead of doing it through HandleMemoryFailure(). Hope that makes sense! The really important question is how we fix this in 2.0/trunk builds, if it still indeed is a problem. Notice how the 2.0/trunk code is different: http://lxr.mozilla.org/seamonkey/source/mailnews/imap/src/nsImapServerResponseParser.cpp#203 I suspect we want to add an else to the trunk code that calls SetConnected(0) if we've received the death signal...I'll try to come up with a patch.
Assignee | ||
Comment 11•18 years ago
|
||
if we're aborting the connection, mark it as not connected...
Attachment #226613 -
Attachment is obsolete: true
Attachment #234291 -
Flags: superreview?(mscott)
Attachment #226613 -
Flags: review?(bienvenu)
Updated•18 years ago
|
Attachment #234291 -
Flags: superreview?(mscott) → superreview+
Assignee | ||
Comment 12•18 years ago
|
||
this is fixed on the trunk and 2.0 branch...thx, Robin!
Reporter | ||
Comment 13•18 years ago
|
||
Great. I will try it out once a 2.0 Beta is available.
Comment 14•18 years ago
|
||
*** Bug 326368 has been marked as a duplicate of this bug. ***
Comment 15•17 years ago
|
||
Robin: It would be great if you could test this bug using our Thunderbird 2 candidate build and report back: ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2.0.0.0-candidates/rc2
Reporter | ||
Comment 16•17 years ago
|
||
I've been using TB 2 betas for a couple of months now,and I have not seen the problem.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•