Closed Bug 518918 Opened 15 years ago Closed 15 years ago

Thunderbird 3pre high memory and consuming 100% CPU making mail nearly unusable, need to handle UID's > 0x7FFFFFFF

Categories

(Thunderbird :: General, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: baffoni, Assigned: Bienvenu)

References

Details

(Keywords: hang, perf, Whiteboard: [no l10n impact])

Attachments

(4 files)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4pre) Gecko/20090924 Shredder/3.0pre

Running Thunderbird, gloda appears to have discarded all prior db information in my sent folder and is reindexing the whole thing.  While it is indexing my IMAP sent (the only process showing in activity monitor, 3xxx out of 7260), my system's dual core CPU is 100%.  I think it is really cool that it has gone multithreaded and nonblocking, but now my system is slow in all other applications (setting to below->normal priority helps), and mail reading/deleting/editing/creating is suffering from the keyboard lag, retrieve lag issue.

Going offline does NOT reduce CPU load, and the indexing status stays the same in the activity monitor.  No errors due to indexing show in error console.

I can read emails, move/delete/etc. but it is just very slow, and you would think that it would drop in utilization when I went offline.

FWIW, Tbird is doing something, I see RAM (both mem usage and VM size, currently at 730MB and 1053MB respectively) fluctuating, but even afte 20 minutes of being in offline mode, indexing still shows active and CPU is 95% or higher.

Doing a "file->exit", CPU utilization of thunderbird drops briefly to 45%, but then goes back to 98% and after 10 minutes still has not properly quit.  I would say this looks like a race condition, but after a while connected, I do see the indexing numbers increment.

Trying to search locks up the mail client.

Attached is a IMAP:5 log of me starting the application up, get message once, delete one file, then quit, then force quit of the application after 5 minutes of 100% CPU utilization and not quitting.  Email me directly for a copy of the password to the zip file (I want to limit access to my listings of personal email folders, etc.).
Flags: blocking-thunderbird3?
Sorry, I should have said, doing a "search everywhere" locks up the mail client completely.  CPU drops and eventually returns so it is probably busy doing a lookup against my gloda db (global-messages-db.sqlite is 2.4GB), but there is no "progress" and it says "application not responding" for minutes at a time, probably not the intended behavior.  I'm not sure if this is a consequence of the 100%CPU issue, or just the size of the DB.

My system is a 2.53GHz Core2Duo running Windows XPSP3.  HD hardly sees any activity (some, but not solid) and total commit is under physical amount.
I've noticed as well that gloda seems to be a little bit more in the way than it was after rkent's checkin, and the changes to reduce the amount of nsMsgHdr js garbage created. I notice it especially when I start up in the morning, and a 100 or more new messages are added to my inbox.

I don't know about the hang on exit - I've exited a lot of times when gloda is busy, and haven't had it hang. If you want to e-mail me the password to the zip file, I can look at it.
(In reply to comment #2)
> I've noticed as well that gloda seems to be a little bit more in the way than
> it was after rkent's checkin

Are you saying that the checkin helped, and it since has gotten worse, or that it got worse when the checkin was done?

If the latter, which checkin do you mean?
the former - initially, things were pretty good, but when I got back from my vacation this Monday, it seemed a bit worse.
Keywords: perf
so the real the issue/root cause is redownloaded/resynced messages?
I don't know if I can answer that.  From a user perspective, background indexing should be just that, background and unnoticed.  I think that the redownloaded/resynced folders caused a surge of emails to be indexed and it caused the client to become unresponsive (and now that indexing is completed, the client seems to be behaving fairly normally), so if it is determined to be acceptable or at least unavoidable that the indexing causes noticeable/significant slowdown then I suppose you are correct.  

However, if it is intended to be nearly transparent to normal use, then I would say the root issue is that the code meant to do indexing in the background needs more work so that even when large mail stores are "discovered" by the indexing process.

However, the one issue that is definitely a problem is the 100% CPU when the client is taken offline in the middle of indexing.  Maybe some indexing subroutine is being hung/orphaned when going offline rather than properly closed.

I've had a few updates since this occurred, I'll see if I can take the client offline in the middle of the next indexing and see if the behavior occurs on smaller indexing jobs, or is some artifact of the larger indexing jobs.
My question in comment 5 was asked because "Yes" might make this a dupe of another bug. Versus indexing overhead. And "Search Everywhere" slowness is noted in another bug.

OTOH, 700+meg may be a better symptom of what's causing the slowness. What the size of your largest .msf files that would be getting updated?
In order of descending sizes 13MB, 9MB, 7.5MB, etc.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
need strategy to narrow this.

For Michael...
question #1 is whether this is only during search. 
if not, we can remove that from the summary
question #2 is your machine specs - memory, cpu, and do you see this with no other apps running

rkent, bienevenu - a thought, might it be nice if we could turn off indexing of new messages, without turning off gloda, so that we can eliminate indexing as a cause for issues such as this?  Or, can we approximate that by not syncing new messages?
Summary: Thunderbird 3pre is consuming 100%CPU (doing Gloda search?) making mail nearly unusable → Thunderbird 3pre high memory and consuming 100%CPU (doing Gloda search?) making mail nearly unusable
Going offline turns off indexing (as a proxy for being on battery power).
It would also be fairly easy to add something, say a hidden preference, that would "pause" gloda indexing.
Whiteboard: [no l10n impact]
So I turned off gloda for testing another bug (bug 506809), and I see that 100% CPU continues with it turned off.  This issue is not during a search (searching with standard to/cc/subject filter doesn't seem to have an effect).  

Currently on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.4pre) Gecko/20090929 Shredder/3.0pre.  System is Windows XP (32bit) SP3, 2GB RAM, 2.53GHz Intel core2duo (Dell E6400), standard HDD.

I've just stopped and restarted Shredder, and it isn't doing it now, but was earlier today.  While in the 100% CPU condition, activity manager is blank, and using sysinternal filemon, NOTHING is being written to disk.  This activity was tested after a file->exit (all windows close but process is still running).  Currently waiting for activities that might kick off the run away process.
Summary: Thunderbird 3pre high memory and consuming 100%CPU (doing Gloda search?) making mail nearly unusable → Thunderbird 3pre high memory and consuming 100%CPU making mail nearly unusable
I've found out how to recreate this.  CPU goes to 100% every time I select my drafts folder (saving a file to drafts does not do this), nothing appears in error console or activity console when I do this.  I'm out of time now, but I will startup up a IMAP:5 log tomorrow.  Is there any other logging I should turn on to help with this?
Oh, cool.  Yes, you can turn on gloda logging.  In the Config Editor (Advanced prefs) turn on: mailnews.database.global.logging.console and mailnews.database.global.logging.dump

(.dump will go to stderr, so that works if you know how to run thunderbird from a shell/console)
Might be a good idea to imap:5,timestamp so we'll see if there's a timeout or something.
OK, attached is my IMAP:5, timestamp log, it is a password-protected .zip, please email me for the password, Wayne already has it though.  

I did set the mailnews.database.global.dump to true (.console was already true), and did a launch of tbird with >stderr.log, but it just created a 0 length file.

I launched the mail app with the drafts folder set as my last opened folder (so it would enter the error state as quickly as possible, I then did a file->exit at 15.45.30, waited a minute plus, and killed the process at 15.46.50.
Further troubleshooting results:
On the positive side: reindexing "drafts" fixes the 100%CPU problem.
On the negative side: I have no good way of inducing this behavior, so I don't know if this was an offline copy issue, replay IMAP instruction issue, etc.
did you keep the bad .msf ?
From the log, it looks like we're trying to set a flag on an invalid msg uid in the drafts folder (looks like a small negative number, probably having to do with an offline operation) - the imap protocol code seems to bail out before we talk to the server, but then subsequent attempts to do things with the drafts folder, like update its message counts via STATUS fail, because we think we're still trying to do something to the folder. I'm not sure why this would lead to 100% cpu, but I can try to simulate this in the debugger.
I did find a copy of my drafts offline copy and .msf from yesterday, I closed
thunderbird, and replaced those copies into the IMAP folder, restarted client,
and still not seeing 100% CPU issue.  I would assume this is NOT a offline copy
issue (?), but could this be a folder replay issue?  If so, where are the
replay instructions so that I can restore those to see if they initiate the
issue? I have a copy of my entire profile directory from before reindexing.
did you select the drafts folder when you did the experiment?

Whenever you select a folder, we try to playback offline operations - it looks like the .msf file has operations to mark a couple messages read, 0xFFFFFF7E and 0xFFFFFF80, which obviously don't exist in your drafts folder. 

We have a folder flag which tells us if the folder has any offline events to play back. This is stored in the .msf file, but also panacea.dat - you may need to restore panacea.dat from your copy of your profile, along with the .msf file.
Both those logs show the same behavior with the drafts folder.
I suspect the thread trying to store the sent flag on 0xFFFFFF7E is in an infinite loop, or trying to do a very large string operation (e.g., a length has gone negative). I'll try to reproduce it in the debugger.
Bingo, readding back panacea and the drafts and drafts.msf caused the bug to reappear.  Attaching all three files to this bug, using same password as other attachments.
thx, I've almost got this fixed, actually.
Assignee: nobody → bienvenu
Status: NEW → ASSIGNED
Attached patch proposed fixSplinter Review
Never minding how we ended up with those UIDs, since we should be handling UID's > 0x7FFFFFFF anyway, this fixes the problem for me.

Asking Neil for review since he might have some better ideas about how to convince all the relevant code that we want unsigned int and the string equivalents. nsTString.AppendInt(PRUint32, radix) is charmingly broken, from what I can see:

      inline void AppendInt( PRUint32 aInteger, PRInt32 aRadix = kRadix10 )
        {
          AppendInt(PRInt32(aInteger), aRadix);
        }

I'm going to try to make an xpcshell test case for these UID's > 0x7FFFFFFF - I think the fake server lets me assign UID's, so if I can just get JS to do the right thing, I should be able to do so.
Attachment #404123 - Flags: superreview?(neil)
Attachment #404123 - Flags: review?(neil)
Target Milestone: --- → Thunderbird 3.0rc1
Since these UID's are hard to recreate in the wild, I tweaked nsImapService as follows and used the debugger and a breakpoint to change the PC in order to recreate the issue and test the fix:

+++ b/mailnews/imap/src/nsImapService.cpp
@@ -1727,17 +1727,20 @@ nsresult nsImapService::DiddleFlags(nsIE
       urlSpec.Append('>');
       urlSpec.Append(messageIdsAreUID ? uidString : sequenceString);
       urlSpec.Append(">");
       urlSpec.Append(hierarchyDelimiter);
       nsCString folderName;
       GetFolderName(aImapMailFolder, folderName);
       urlSpec.Append(folderName);
       urlSpec.Append(">");
-      urlSpec.Append(messageIdentifierList);
+      if (PR_TRUE)
+        urlSpec.Append(messageIdentifierList);
+      else
+        urlSpec.Append("4294967166:4294967168");
Whiteboard: [no l10n impact] → [no l10n impact][has patch for review]
this probably accounts for some percentage of the hangs on shutdown, though I doubt it's a very high percentage.
Comment on attachment 404123 [details] [diff] [review]
proposed fix

Worse, external API only has one AppendInt which takes an int(!)

>-    // we don't need to null terminate currentKeyToken because atoi 
>+    // we don't need to null terminate currentKeyToken because strtoul 
>     // stops at non-numeric chars.
>-    curToken = atoi(currentKeyToken); 
>+    curToken = strtoul(currentKeyToken, nsnull, 10); 
Nit: might as well trim those trailing spaces ;-)
Attachment #404123 - Flags: superreview?(neil)
Attachment #404123 - Flags: superreview+
Attachment #404123 - Flags: review?(neil)
Attachment #404123 - Flags: review+
The solution that works with the external API is to snprintf into a local temporary character array and append that to the string.
fix checked in. I'll file a bug on the string classes just for fun...
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [no l10n impact][has patch for review] → [no l10n impact]
bug 520366 filed for the string library issue.
Michael Baffoni, can  you confirm problem gone using RC1?
Severity: major → critical
Keywords: hang
Summary: Thunderbird 3pre high memory and consuming 100%CPU making mail nearly unusable → Thunderbird 3pre high memory and consuming 100% CPU making mail nearly unusable, need to handle UID's > 0x7FFFFFFF
Gloda no longer takes 100%cpu on a multicore system, and it doesn't kill my apps anymore.  However, Tbird still goes "not responding" for 5-15 minutes at a time and one core goes 100% when Gloda is enabled, after a minute or two from startup/login (after it has looked for IMAP folders and started indexing).  Is that significantly different from the current bug description to warrant a new bug, or should I submit new information on this bug?
(In reply to comment #34)

> startup/login (after it has looked for IMAP folders and started indexing).  Is
> that significantly different from the current bug description to warrant a new
> bug, or should I submit new information on this bug?

File a new bug Michael .
See Also: → 1391128
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: