Closed Bug 333877 Opened 18 years ago Closed 18 years ago

TB 1.5.0.2 - 100% processor utilization processing IMAP IDLE

Categories

(Thunderbird :: General, defect)

x86
All
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: robin.grindley, Assigned: Bienvenu)

References

Details

(Keywords: fixed1.8.1, verified1.8.1.3)

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Thunderbird/1.5.0.2

I noticed in TB 1.5 that every so often it would start using 100% of the CPU. I built a debug version of TB using tag THUNDERBIRD_1_5_0_2_RC2, and the problem still occurs. It looks like the IMAP IDLE processing gets a message that it doesn't process properly, and it goes into an infinite loop continuously trying to process the message but not getting anywhere. Debug info below. 

Reproducible: Sometimes

Steps to Reproduce:
1. Run TB
2. Wait hours or days
3. TB will go runaway

Actual Results:  
TB goes runaway

Expected Results:  
TB should not go runaway

Here is the call trace:

>	plc4.dll!PL_strchr(const char * s=0x07547c6e, char c='
')  Line 43	C
 	thunderbird.exe!nsMsgLineStreamBuffer::NextLineAvailable()  Line 469 + 0x22 bytes	C++
 	thunderbird.exe!nsImapProtocol::HandleIdleResponses()  Line 1184 + 0xe bytes	C++
 	thunderbird.exe!nsImapProtocol::ImapThreadMainLoop()  Line 1162	C++
 	thunderbird.exe!nsImapProtocol::Run()  Line 934	C++
 	xpcom_core.dll!nsThread::Main(void * arg=0x06e40970)  Line 118 + 0x1c bytes	C++
 	nspr4.dll!_PR_NativeRunThread(void * arg=0x07a3d790)  Line 436 + 0xf bytes	C
 	nspr4.dll!pr_root(void * arg=0x07a3d790)  Line 112 + 0xf bytes	C


Here are the local data structures in NextLineAvailable(), which shows the input string that seems to be stuck in its craw:

-		this	0x072910c8 {m_eatCRLFs=0x00000000 m_allocateNewLines=0x00000001 m_dataBuffer=0x07547c58 "+ entering idle mode
* 4558 EXISTS
* 0 RECENT
" ...}	nsMsgLineStreamBuffer * const
+		__vfptr	0x0183cc00 const nsMsgLineStreamBuffer::`vftable'	*
		m_eatCRLFs	0x00000000	int
		m_allocateNewLines	0x00000001	int
+		m_dataBuffer	0x07547c58 "+ entering idle mode
* 4558 EXISTS
* 0 RECENT
"	char *
		m_dataBufferSize	0x00002000	unsigned int
		m_startPos	0x00000016	unsigned int
		m_numBytesInBuffer	0x0000001b	unsigned int
		m_lineToken	0x0a '␊'	char

Note that there are line breaks in m_dataBuffer, so the buffer input is

+ entering idle mode
* 4558 EXISTS
* 0 RECENT


The local data structure in PL_strchr is:

+		s	0x07547c6e "* 4558 EXISTS
* 0 RECENT
"	const char *
		c	0x0a '␊'	char


In the following code

void nsImapServerResponseParser::ParseIMAPServerResponse(const char *currentCommand, PRBool aIgnoreBadAndNOResponses)
{
  
  NS_ASSERTION(currentCommand && *currentCommand != '\r' && 
    *currentCommand != '\n' && *currentCommand != ' ', "Invailid command string");
  PRBool sendingIdleDone = !strcmp(currentCommand, "DONE"CRLF);
  if (sendingIdleDone)
    fWaitingForMoreClientInput = PR_FALSE;

  // Reinitialize the parser
  SetConnected(PR_TRUE);
  SetSyntaxError(PR_FALSE);
  
  // Reinitialize our state
  InitializeState();
  
  // the default is to not pipeline
  fNumberOfTaggedResponsesExpected = 1;
  int numberOfTaggedResponsesReceived = 0;
  
  char *copyCurrentCommand = PL_strdup(currentCommand);
  if (copyCurrentCommand && !fServerConnection.DeathSignalReceived())



copyCurrentCommand is "11 IDLE", and fServerConnection.DeathSignalReceived() always evaluates to 1. So is it somehow erroneously thinking that the server connection is gone, but continuing to process anyway?
thx very much for the diagnosis.  That's very helpful!

I'll look at the code, but getting a protocol log of a session where this happens might be helpful:

http://www.mozilla.org/quality/mailnews/mail-troubleshoot.html#imap

Assignee: mscott → bienvenu
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached file IMAP trace, text dump
It just went runaway again. I stopped TB via the debugger. Then started an Ethereal trace, started TB up, let it run wild for about 90 seconds, and during that time it never checked the server of its own volition, so I clicked on Get Mail once at the end. (Result was No New Messages on Server.) 

Not much IMAP traffic at all. Whatever TB thinks it is doing, it seems to be inside its own head. Summary of trace is: 

No.     Time        Source                Destination           Protocol Info
     97 57.410492   10.11.128.170         10.11.1.7             IMAP     Request: DONE
     98 57.429618   10.11.1.7             10.11.128.170         IMAP     Response: 565 OK IDLE completed
     99 57.534600   10.11.128.170         10.11.1.7             IMAP     Request: 566 check
    100 57.553510   10.11.1.7             10.11.128.170         IMAP     Response: 566 OK CHECK completed
    101 57.554020   10.11.128.170         10.11.1.7             IMAP     Request: 567 UID fetch 127687:* (FLAGS)
    102 57.557064   10.11.1.7             10.11.128.170         IMAP     Response: 567 OK FETCH completed.
    104 57.818200   10.11.128.170         10.11.1.7             IMAP     Request: 568 IDLE
    106 57.853440   10.11.1.7             10.11.128.170         IMAP     Response: + entering idle mode
    173 88.553865   10.11.128.170         10.11.1.7             IMAP     Request: DONE
    182 93.741808   10.11.1.7             10.11.128.170         IMAP     Response: 568 OK IDLE completed
    183 93.863075   10.11.128.170         10.11.1.7             IMAP     Request: 569 noop
    185 93.887220   10.11.1.7             10.11.128.170         IMAP     Response: 569 OK NOOP completed
    186 93.887663   10.11.128.170         10.11.1.7             IMAP     Request: 570 UID fetch 127687:* (FLAGS)
    187 93.889747   10.11.1.7             10.11.128.170         IMAP     Response: 570 OK FETCH completed.
    189 94.378670   10.11.128.170         10.11.1.7             IMAP     Request: 571 IDLE
    191 94.419730   10.11.1.7             10.11.128.170         IMAP     Response: + entering idle mode


I'd assume I hit Get Mail around the 93-second point in the trace.
I was taking another look at the code, and in 

nsImapServerResponseParser::ParseIMAPServerResponse

I noticed the code below. 

  char *copyCurrentCommand = PL_strdup(currentCommand);
  if (copyCurrentCommand && !fServerConnection.DeathSignalReceived())
  {
    .... code deleted ...
  }
  else if (!fServerConnection.DeathSignalReceived())
    HandleMemoryFailure();

Shouldn't that else if clause be (fServer....) not (!fServer...)? I traced it and DeathSignalReceived() is TRUE, but the logic is inverted and it never executes the else if branch. 

I stepped it right up to the else if assembly line and changed the eax register (to invert the logic), and then TB went on to run fine. 
Since reporting the bug I've upgraded to TB 1.5.0.4, and this runaway event occurs very frequently (multiple times per day).
I am not sure but I think I also experience this bug.  If there is anything I can do help troubleshoot this, let me know.  Unfortunately, I am not as knowledgeable as the OP.

Running TB 1.5, latest version on Debian testing.
Please change OS to all since this does not appear to be solely a Windows issue.
Please change version to 1.5.
see also bug 326368, bug 339690
Severity: normal → major
Comment on attachment 226613 [details] [diff] [review]
Fix for logic error in handling of IMAP DeathSignal

Requesting review. Never done this before, so if I made a mistake, let me know.
Attachment #226613 - Flags: review?(bienvenu)
Changing OS to All and Version to 1.5, as per comment 6.
OS: Windows XP → All
Version: unspecified → 1.5
Thx for the patch. Have you tried a 2.0 or trunk build to see if the bug still exists there? The code is different there, so this patch doesn't apply cleanly. 

In any case, I don't this patch is correct - it may fix the problem, but I don't think it's quite right. The code is trying to only call HandleMemoryFailure in the case that it couldn't copy the current response. You've switched that around so we call HandleMemoryFailure in the case the we didn't run out of memory, but we did have DeathSignalReceived. This has the side effect of killing the connection by nsIMAPGenericParser::HandleMemoryFailure() calling SetConnected(PR_FALSE). Switching it to just "else" would be better but still not quite right - we should be checking DeathSignalReceived and explicitly calling SetConnected(PR_FALSE) instead of doing it through HandleMemoryFailure(). Hope that makes sense!

The really important question is how we fix this in 2.0/trunk builds, if it still indeed is a problem. Notice how the 2.0/trunk code is different:

http://lxr.mozilla.org/seamonkey/source/mailnews/imap/src/nsImapServerResponseParser.cpp#203

I suspect we want to add an else to the trunk code that calls SetConnected(0) if we've received the death signal...I'll try to come up with a patch.
Attached patch fix for trunkSplinter Review
if we're aborting the connection, mark it as not connected...
Attachment #226613 - Attachment is obsolete: true
Attachment #234291 - Flags: superreview?(mscott)
Attachment #226613 - Flags: review?(bienvenu)
Attachment #234291 - Flags: superreview?(mscott) → superreview+
this is fixed on the trunk and 2.0 branch...thx, Robin!
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Great. I will try it out once a 2.0 Beta is available. 
*** Bug 326368 has been marked as a duplicate of this bug. ***
Robin: It would be great if you could test this bug using our Thunderbird 2 candidate build and report back: ftp://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2.0.0.0-candidates/rc2
I've been using TB 2 betas for a couple of months now,and I have not seen the problem. 
Status: RESOLVED → VERIFIED
adding branch verified keyword per Comment 16.
Keywords: verified1.8.1.3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: