Open Bug 589310 Opened 14 years ago Updated 3 months ago

Beach balls (hangs UI several seconds to couple minutes) constantly at random times - large folders with server side filters (worse on Mac?)

Categories

(MailNews Core :: Database, defect)

x86
All
defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: justdave, Unassigned)

References

(Depends on 3 open bugs)

Details

(Keywords: perf, Whiteboard: [notahang][has stacktrace][waiting on bug 588952])

Attachments

(4 files)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.9pre) Gecko/20100817 Lanikai/3.1.3pre

for the last 3 or 4 weeks (I don't have an exact time, sorry - didn't realize it had become a common occurrence until I'd forgotten when it started) Lanikai has been hanging for 2 to 3 minutes at a time at random times.  Eventually it will unhang, and everything is back to normal, but if you're actively working on mail at the time it's a serious disruption to your workflow.  Seems to happen completely at random and not associated with any particular user action.

A process sample shows 100% of time spent being in morkParser::ReadGroup and 94% in morkTable::CutRow
so it can happen during periods when you are not doing anything in thunderbird?

irrc ludo has said the same thing, but I don't remember his profile results
Keywords: perf
(In reply to comment #1)
> so it can happen during periods when you are not doing anything in thunderbird?
> 
> irrc ludo has said the same thing, but I don't remember his profile results

Mine where SSL/imap related.
Component: General → Database
Product: Thunderbird → MailNews Core
QA Contact: general → database
Version: 3.1 → Trunk
Further samples are showing that almost all of the time inside morkTable::CutRow is actually in morkTable::CutSlot(morkEnv*, int) specifically.

And yeah, seems to happen pretty much at random, although it could well be related to processing incoming mail, since I do have scheduled mail checks.  Sometimes I'll try to switch to Lanikai from another app and it won't come to the front because it's hung.  Sometimes I'll be in the middle of typing an email and it'll just hang in the middle of a word.  I've also had it hang like this in the middle of opening an email to read or while trying to scroll while reading an email (basically, it doesn't really matter what I'm doing).  The hangs typically last anywhere from 30 seconds to 2 minutes.

I gather if bug 11050 ever gets fixed this might go away ;)
This is similar to ehsan's bug 588952.

Knowing your usage patterns, the most likely explanation is that you are deleting huge numbers of IMAP messages from a different IMAP mail client and autosync goes and looks at the folder and then has to delete tons of stuff.
This would be the only client I have logged in, the majority of the time.

However, I do have several very high traffic folders on an IMAP account that have a 30 day message retention policy set on them in the folder properties.  Would the internal autoexpire be doing this?
(In reply to comment #5)
> However, I do have several very high traffic folders on an IMAP account that
> have a 30 day message retention policy set on them in the folder properties. 
> Would the internal autoexpire be doing this?

Yes. How many messages at a time would the purging code be deleting, given that it runs every hour? The purging code does try to do things in small chunks, but it does have to open the db, and then iterate over the messages in the db, and delete the older rows.
The folders in question have between 60,000 and 150,000 messages in them at any given time (all received in the space of 30 days, so an hour's worth could easily be >200 messages.
ok, I'm not sure the auto-delete is actually what's causing it.  I just happened to see an auto-delete in action, and it nuked 287 messages out of a folder, and I watched the message count go down and the status message show up while I was in the middle of reading other mail and actively scrolling, and it didn't hang while it was doing it.  I have had it hang 2 or 3 times already today though, so I know the hangs haven't gone away either.
justdave, so when this happens, it is much more frequent than an hourly interval?  correct?
Summary: Lanikai beach balls (hangs) constantly at random times → Lanikai beach balls (hangs UI several seconds to couple minutes) constantly at random times
Whiteboard: [notahang]
yes, that correct, it's much more frequent than hourly.
(In reply to comment #9)
> an autosync log with timestamps may be useful to eliminate or finger autosync.
> https://wiki.mozilla.org/MailNews:Logging

So I want So I want NSPR_LOG_MODULES=ImapAutoSync:5,timestamp ?
(In reply to comment #12)
> (In reply to comment #9)
> > an autosync log with timestamps may be useful to eliminate or finger autosync.
> > https://wiki.mozilla.org/MailNews:Logging
> 
> So I want So I want NSPR_LOG_MODULES=ImapAutoSync:5,timestamp ?

yes
OK, after leaving it running for about 18 hours with ImapAutoSync logging enabled, the logfile contained this:

2010-10-18 11:17:32.407469 UTC - -1604692736[270b8e0]: autosync paused
2010-10-18 11:21:13.401694 UTC - -1604692736[270b8e0]: autosync resumed
2010-10-18 23:23:20.306146 UTC - -1604692736[270b8e0]: autosync paused
2010-10-18 23:23:27.916851 UTC - -1604692736[270b8e0]: autosync resumed

And that was it.
Here's what I'm up against if you're wanting to see how bad it's gotten lately.  Definitely something screwy in my system:  http://pastebin.mozilla.org/821944
I get this periodically on console, dunno if it's related:

2010-10-20 09:15:04     nsActivityManager       ERROR   Exception calling onAddedActivity[Exception... "'[JavaScript Error: "Gloda is not defined" {file: "chrome://messenger/content/mailWindow.js" line: 363}]' when calling method: [nsIActivityMgrListener::onAddedActivity]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "JS frame :: file:///Applications/Lanikai.app/Contents/MacOS/components/nsActivityManager.js :: anonymous :: line 115"  data: yes]
So just to rule out corrupted .msf files, I nuked my ImapMail folder out of my profile last night, and let Lanikai re-download all of my IMAP mail from scratch.  About 12 GB of data went away by doing so.  That folder is now about 2.8 GB in size after spending all night downloading stuff (I don't think it's anywhere near done downloading yet), and it's still hanging frequently, so that probably wasn't it.
I'm now running Shredder instead of Lanikai.  The hangs are much less frequent, but still just as long lasting and annoying when they do happen.

I'm seeing a lot of "The current operation on 'mailboxname' did not succeed. The mail server for account justdave@mozilla.com responded: parse error: maximum request size exceeded." for multiple values of 'mailboxname' in my Activity Manager window.  Unknown if that's related or not.
Still happening enough to prevent me from getting work done quite frequently.  I've been resorting to force-quit and switching to Mail.app for a couple hours any time I need to actually do heavy work in email, and that's not fun because I hate Mail.app's interface and a lot of the convenience tools are missing.

Process samples still indicate the hangs are spending all their time in Mork code.
Summary: Lanikai beach balls (hangs UI several seconds to couple minutes) constantly at random times → Lanikai and Shredder beach balls (hangs UI several seconds to couple minutes) constantly at random times
it would be helpful to know the first few frames of the stack above the actual call to open the db, to see what's opening it.

there are several likely background openers of db's:

imap autosync
gloda
background application of retention settings
updating of filter move targets after the filters are applied

Each of these could be made to open the db asynchronously, with varying degrees of difficulty. I could also fix the retention settings code to completely ignore folders that don't have any retention settings.

That being said, > 10MB .msf files are never going to perform well.  I'd rather spend time on moving away from .msf files than patching what we have now, so I'd really like to know exactly what the problem is before spending any time coding up a fix.
Depends on: 588952
Attached file Another process sample
This one is hung within the same function, but has a different caller stack leading up to it.
Another process sample, this one has a different parent trace than the others so far.
bienvenu, can you take an elfish peek at dave's stacks?
OS: Mac OS X → Windows XP
Whiteboard: [notahang] → [notahang][has stacktrace]
(In reply to comment #25)
> bienvenu, can you take an elfish peek at dave's stacks?

It looks like that stack trace was grabbed a bit late - we're just adding headers to an imap folder db, after fetching them from the server.
I think that's what you were doing while it was "hung", because it continued being hung for a while after taking that stack before it unhung.
This one was taken during a hang that happened while composing (probably just coincidence).  It continued staying hung for several minutes after the sample was taken.
Comment on attachment 501490 [details]
Yet Another Process Sample

that shows us opening a .msf file, presumably a very large one, so we can do folder status on the folder. Basically, each huge folder you have is a potential landmine, because we will try to open the .msf file for it from time to time.
Whiteboard: [notahang][has stacktrace] → [notahang][has stacktrace][waiting on bug 588952]
Dave, could you do a couple more samples so we can solidify which of the following needs attention?

(In reply to David :Bienvenu from comment #21)
> it would be helpful to know the first few frames of the stack above the
> actual call to open the db, to see what's opening it.
> 
> there are several likely background openers of db's:
> 
> imap autosync
> gloda
> background application of retention settings
> updating of filter move targets after the filters are applied
so the current question for justdave on IRC is whether bug 764306 helps (landed 06-21)

bug 759092 also gets high memory usage  
(don't know about bug 757291)
Blocks: 759092
This could be related:
http://superuser.com/questions/454729/google-calendar-sync-has-stopped-working
The error: 0[1a0f140]: autosync paused
When Thunderbird 14.0 tries to sync, it asks for the username and password over and over again.
Blocks: 757291
justdave's experience here sounds very familiar to what I experienced when I opened bug 757291... the UI would just randomly stall. We're in the same department and have a lot of the same shared mailboxes... huge message counts with server-side filtering, lots of people subscribed to them... so they can grow and shrink a lot behind the scenes.

My experience was substantially improved by disabling gloda, and seems improved a bit further in the last few weeks (possibly due to bug 764306 resolution... I run Earlybird 16.0a2 currently).
My experience was substantially improved by unsubscribing from the Apache RTS folder in our department shared mailbox.  That folder typically has a few dozen new messages per minute in it.  I've had gloda disabled for a couple years.
Dave, do you consider what you reported in IRC on 11-11 (below) to be a different issue?  In other words, this bug as originally reported doesn't involve such obscene memory usage?



<justdave>	wsm: so I have the msgdb logging enabled...
<justdave>	it got up to 59 open DBs with memory increasing slowly, then it dropped back to 16 and most of the memory freed up
<wsm>	whoa!
<justdave>	then memory usage suddenly shot back up with no activity in the msgdb output
<justdave>	and it's now using 6 GB of RAM with 5 open DBs
<jcranmer>	hmm
<jcranmer>	what's heap-unclassified in about:memory?
<justdave>	almost all of it
<jcranmer>	that's ... weird
<justdave>	it was only using about half a GB at the point it had 59 DBs open
<justdave>	(which is usually about normal forit)
<justdave>	wonder if I can get anything useful out of cleopatra before it swaps itself to hell
<justdave>	(now using 12 GB)
<justdave>	most time spent in memmove$VARIANT$sse42
<justdave>	go figure, if it's swapping :)
* wsm	wish irving were here
<wsm>	justdave: regarding the 59 open db, woulld your fillters/activity have hit that many folders in that period of time (whatever that was)?
<justdave>	if I select the block of time when the memory increase was in progress, time is being spent in layout::flush
<justdave>	wsm: it might have, that was the initial "check mail" after it launched
<justdave>	and there's 8 or so imap accounts and a couple pop3 in there
<wsm>	y. i can see that many db with that many accounts
* wsm	wonder if open db limit should be adjusted for #accounts.
<justdave>	up to 17 GB of memory in use, time to kill it
<justdave>	(5GB physical, 17GB virtual)
Yeah, it doesn't feel like it's the same issue.  This one feels like an outright leak.  It accumulates too fast once it starts snowballing and doesn't stop until I forcequit.
No longer blocks: 759092
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
comment 34 suggests bug 551209 would help. 
And other gloda performance bugs may be a factor - many are noted in bug 1023000
Depends on: 551209
OS: Windows XP → All
Summary: Lanikai and Shredder beach balls (hangs UI several seconds to couple minutes) constantly at random times → Lanikai and Shredder beach balls (hangs UI several seconds to couple minutes) constantly at random times - large folders with server-side filters
Blocks: 1264743
Summary: Lanikai and Shredder beach balls (hangs UI several seconds to couple minutes) constantly at random times - large folders with server-side filters → Lanikai and Shredder beach balls (hangs UI several seconds to couple minutes) constantly at random times - large folders with server-side filters (worse on Mac?)

If you should ever run into this again - a performance profile will be helpful https://github.com/thunderbird-conversations/thunderbird-conversations/wiki/Profiling-Conversation's-Performance

Also, any additional info would help, such as whether you have any Thunderbird filter rules on message body.

See Also: → 1744453

(In reply to Wayne Mery (:wsmwk) from comment #40)

If you should ever run into this again - a performance profile will be helpful https://github.com/thunderbird-conversations/thunderbird-conversations/wiki/Profiling-Conversation's-Performance

New instructions @ https://support.mozilla.org/en-US/kb/profiling-thunderbird-performance

Summary: Lanikai and Shredder beach balls (hangs UI several seconds to couple minutes) constantly at random times - large folders with server-side filters (worse on Mac?) → Beach balls (hangs UI several seconds to couple minutes) constantly at random times - large folders with server side filters (worse on Mac?)
See Also: → 1819895
Depends on: 1747311
See Also: → 1869260
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: