Closed Bug 757291 Opened 13 years ago Closed 12 years ago

Earlybird 14.0a2 100% CPU usage / beachballing

Categories

(Thunderbird :: Search, defect)

14 Branch
x86
macOS
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: nmaul, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf, regression, Whiteboard: [workaround: disable gloda])

This just started today (May 21), and was not present on May 18. When I first noticed it I was running build ID 20120518042012... I have since upgraded to 20120521042012, and the problem still exists. About every 3-4 minutes, Earlybird shoots to 100% CPU usage (1 full core only) and stays there for 30 seconds. It does this whether I'm looking at my inbox, another folder, or composing a message... it seems not to matter what I'm doing, really. It is, however, *intensely* frustrating. The Activity Manager shows nothing of interest until *after* the beachballing is over (about 30 seconds), and then something enters and leaves it, to fast to read. Actual operations when not beachballing (sending, receiving, saving drafts) happen at a normal pace. What more information can I provide that might help?
> The Activity Manager shows nothing of interest until *after* the beachballing is over (about 30 seconds), and then something enters and leaves it, to fast to read. typically either compacting or indexing what happens if you disable compacting at options | advanced | network and disk ?
Keywords: perf, regression
Version: unspecified → 14
I've caught it a couple times since opening this bug where it seems to be indexing, rather than compacting. It would be nice if this could happen in such a way that it doesn't block the whole interface, as it seems to for me currently. I don't think that's new behavior though. One thing that seems to make it a better for a while is to close and re-open Earlybird/Thunderbird. This gets me a generally much better experience if I do this every morning. I've experienced this for quite a while on Thunderbird/OSX (and to an extent Firefox/Aurora also, but I have many addons there so I can't pinpoint for sure if it's any section of shared code... might be totally unrelated issues). However, It's also rather bad on startup... it seems to index random folders, some of which are very large. This happens even if it was just open, which to me indicates it isn't (or shouldn't be) a situation where it just downloaded a bunch of new messages in them and has to index them for the first time... they should have been fetched and indexed while it was previously running.
(In reply to Jake Maul [:jakem] from comment #2) > I've caught it a couple times since opening this bug where it seems to be > indexing, rather than compacting. It would be nice if this could happen in > such a way that it doesn't block the whole interface, as it seems to for me > currently. I don't think that's new behavior though. To check whehther it's indexing, open the activity manager and put the AM window off to the side. > One thing that seems > to make it a better for a while is to close and re-open > Earlybird/Thunderbird. This gets me a generally much better experience if I > do this every morning. I've experienced this for quite a while on > Thunderbird/OSX (and to an extent Firefox/Aurora also, but I have many > addons there so I can't pinpoint for sure if it's any section of shared > code... might be totally unrelated issues). If you received little or no messages in the period when TB was down, then there should be next to no indexing activity. The caveat is if it needs to index a large folder. In that case, the folder index (.msf) file must be loaded. > However, It's also rather bad > on startup... it seems to index random folders, some of which are very > large. This happens even if it was just open, which to me indicates it isn't > (or shouldn't be) a situation where it just downloaded a bunch of new > messages in them and has to index them for the first time... they should > have been fetched and indexed while it was previously running. How big is your inbox? How big are the folders to where you auto-filter messages to?
can you narrow the regression range from 20120518-20120521 down to a one day period please? That should help pin point the cause
(In reply to Wayne Mery (:wsmwk) from comment #3) > If you received little or no messages in the period when TB was down, then > there should be next to no indexing activity. > > The caveat is if it needs to index a large folder. In that case, the folder > index (.msf) file must be loaded. Some of the folders are rather large, but their properties are set as such: "When getting new messages for this account, always check this folder" - unchecked "Select this folder for offline use" - unchecked To me that implies it should only load those folders / fetch new messages / index them if I actually click on them. So I wouldn't expect this to show the original symptoms. > > However, It's also rather bad > > on startup... it seems to index random folders, some of which are very > > large. This happens even if it was just open, which to me indicates it isn't > > (or shouldn't be) a situation where it just downloaded a bunch of new > > messages in them and has to index them for the first time... they should > > have been fetched and indexed while it was previously running. > > How big is your inbox? > How big are the folders to where you auto-filter messages to? Filtering is all done server-side, and most of the folders should not automatically fetch new email... but some of them are quite large. Inbox is XXk messages - auto-fetched bugzilla-daemon@ is 31k messages - auto-fetched Cron - 207k messages - *not* auto-fetched I just made the mistake of clicking on "Cron", and now it's indexing almost 25k new messages. While this is ongoing, the folder list is non-operational. I can click on other folders and they become highlighted, but the display doesn't change to them. The activity manager shows the indexing job, but it's *extremely* slow... several minutes of 100% CPU for 1-2 messages indexed, on a Core i7 MBP. It's not beachballing this time- I can look at messages in the folder that's displayed... I just can't look at anything else. (In reply to Wayne Mery (:wsmwk) from comment #4) > can you narrow the regression range from 20120518-20120521 down to a one day > period please? That should help pin point the cause If you wanted to narrow it down though, I would pick 20120518 as the most likely culrpit, as this issue manifested on 5/21 *before* I pulled down a fresh Earlybird. However, I'm thinking this is not a regression in a build from one of those days, but rather one day of a particularly bad string of what has been a common event for me, for a very long time. Without knowing anything about the code base, it feels like the indexing is just very intensive and can block the UI.
(In reply to Jake Maul [:jakem] from comment #5) > Inbox is XXk messages - auto-fetched This was supposed to say 23k messages, auto-fetched. > I just made the mistake of clicking on "Cron", and now it's indexing almost > 25k new messages. While this is ongoing, the folder list is non-operational. > I can click on other folders and they become highlighted, but the display > doesn't change to them. The activity manager shows the indexing job, but > it's *extremely* slow... several minutes of 100% CPU for 1-2 messages > indexed, on a Core i7 MBP. It's not beachballing this time- I can look at > messages in the folder that's displayed... I just can't look at anything > else. Closing Earlybird and re-opening it resets the indexing job... now it's indexing the *entire* folder and not just the 25k new messages. Fortunately, I can change folders again, so it's not problematic. I wish I could offer better information... so much of the pain is stuff I can't reproduce. I can virtually guarantee though that over ~12 hours of use or so, my TB/EB will be suffer from enough of these sorts of issues (CPU usage + beachballing + general laggy interface and short UI stalls) that it becomes frustrating to use. Closing and reopening it works as a short-term fix.
you're sure that we're indexing the messages? If the messages aren't stored offline, I thought gloda wouldn't index them. The activity manager is saying that TB is indexing messages in the cron folder? Or by indexing, do you mean downloading headers and adding them to the .msf file?
(In reply to David :Bienvenu from comment #7) > you're sure that we're indexing the messages? If the messages aren't stored > offline, I thought gloda wouldn't index them. The activity manager is saying > that TB is indexing messages in the cron folder? Or by indexing, do you mean > downloading headers and adding them to the .msf file? I mean that Activity Manager says "Indexing messages", "Indexing 147953 of 191723 messages in Cron (77% complete)". Cron *is* selected for "Include messages in this folder in Global Search results", but the other 2 boxes ("always check this folder" and "select for offline use") are not checked. FWIW, anecdotally this basic issue (not indexing specifically, but overall responsiveness after prolonged usage) is a common theme in Mozilla IT- very few of us dogfood Thunderbird on OSX (even though most of us use OSX). One of the common reasons given is slowly worsening user experience over the course of a session: beachballing, 100% CPU usage, and UI stalls, necessitating a restart of TB/EB, which restores normal functionality for a time. For me it's bad enough that I start every day by closing Earlybird and Aurora and relaunching them. Like I said, addons in Aurora prevent me from definitively pointing at shared code, but the possibility is there... the behavior is often similar.
(In reply to Jake Maul [:jakem] from comment #8) > Cron *is* selected for "Include messages in this folder in Global Search > results", but the other 2 boxes ("always check this folder" and "select for > offline use") are not checked. Oh, right, I forgot about that option - it's relatively new. You probably don't really want that for the Cron folder, do you? Indexing isn't supposed to block the UI, and it's supposed to back off when you're not idle (though some aspects of gecko idle detection were broken for a while; I don't know if those idle fixes have made it to Earlybird, but Irving would know, though I don't see the bug in his list). > > FWIW, anecdotally this basic issue (not indexing specifically, but overall > responsiveness after prolonged usage) is a common theme in Mozilla IT- very > few of us dogfood Thunderbird on OSX (even though most of us use OSX). One > of the common reasons given is slowly worsening user experience over the > course of a session: beachballing, 100% CPU usage, and UI stalls, > necessitating a restart of TB/EB, which restores normal functionality for a > time. For me it's bad enough that I start every day by closing Earlybird and > Aurora and relaunching them. Like I said, addons in Aurora prevent me from > definitively pointing at shared code, but the possibility is there... the > behavior is often similar. So this sounds like memory bloat - if you look at Thunderbird's memory usage after running for a day, what does it look like? In a couple weeks, Earlybird will contain some code to close inactive msgdb's, which could give some relief, though I doubt it's the real issue. From what I understand, the memory reporter in Earlybird in a couple weeks will also be able to better pinpoint what's using memory because of the compartment stuff.
I have seen memory usage as high as 1-2GB in Earlybird. More interestingly, I have occasionally seen the thread count get very high - over 100 threads, as I recall. I think there was a bug on this at one point in the last few months, but I can't find it anymore. I haven't seen it do this in the last several weeks, though. I've just disabled "Global Search and Indexer", and deleted the global-messages-db.sqlite file (which was about 3GB... total profile directory with that file was around 12GB). I then went to account settings and looked over exactly which folders were selected for offline use... more than I expected. I disabled them and then deleted the corresponding files, based on info at this link: http://kb.mozillazine.org/Minimize_the_size_of_a_profile. Between that one sqlite file and the offline folders I didn't really want, my profile is now at around 1.8GB, down from almost 12GB. Presumably the actual presence of these files is irrelevant and all that matters is the position of the option checkbox that enables usage of them... but even so, perhaps this will have a net-positive effect on things. I will report back next week if there is any change. One thing the Activity Manager doesn't seen to show is Earlybird's activity when you click on a folder. In some cases for me this can take a long time (15+ sec) and beachball-lock the whole app until completion. Twice it has thrown a "Warning: Unresponsive script" message (Script: chrome://messenger/content/mailWidgets.xml:2504). Another similar issue: right now the CPU usage is around 20-40%, 26 threads, 1.09GB mem used, and activity manager is empty. The status bar on the main interface says "Downloading message...", presumably from my clicking around in several folders to try and get it to act up. This seems like something that should show up in the Activity Manager... especially since it's bound to be an exceedingly common operation. Ideally I'd like to see something in Activity Manager any time CPU usage is non-zero, but I realize that may not be an achievable goal (or even a desirable one, for most people).
My actions in comment 10 seem to have had a fairly significant effect on overall performance. Subjectively it feels much snappier, with less delay between actions. My CPU fan spins up much less frequently, which is my usual catalyst to checking OSX's Activity Monitor to see what's going on (usually it would be Aurora or Earlybird... now it's almost never EB). The initial issue (prolonged and repeated complete hang with 100% CPU usage) has not reappeared. I believe EB is now giving much more consistent performance over multiple days' usage without restarting. Memory usage is currently 273MB, 31 threads, and it's been running for well over 24 hours without being restarted. I think this is an improvement, but sadly I have not been diligently tracking memory usage over time to do a proper before-and-after comparison. I'm definitely looking forward to having a memory reporter built in.
So you've left gloda turned off? It might be interesting (more for us than you, perhaps :-) ) to turn it back on but turn it off for particular high traffic low value folders like the cron folder.
so, it's still rather unclear to me which issue was most impactful. And what is actionable for improvement. For example, large global indexes (like multi GB) have been common to those who have performance problems - but is there some inherent inefficiency in gloda which impacts overall performance, versus is there just poor folder performance? (In reply to David :Bienvenu from comment #12) > So you've left gloda turned off? It might be interesting (more for us than > you, perhaps :-) ) to turn it back on but turn it off for particular high > traffic low value folders like the cron folder. If cron folder was changed to NOT selected for offline use, as were many of the other folders, then indeed enabling gloda "should" end up not impacting performance, once all the messages are indexed. To recap ... before after profile 12GB 1.8G global 3GB - memory 1-2GB <300MB Inbox XXk messages - auto-fetched bugzilla 31k messages - auto-fetched Cron 207k messages - *not* auto-fetched "When getting new messages for this account, always check this folder" - unchecked "Select this folder for offline use" - unchecked
(In reply to Jake Maul [:jakem] from comment #8) > FWIW, anecdotally this basic issue (not indexing specifically, but overall > responsiveness after prolonged usage) is a common theme in Mozilla IT- very > few of us dogfood Thunderbird on OSX (even though most of us use OSX). One > of the common reasons given is slowly worsening user experience over the > course of a session: beachballing, 100% CPU usage, and UI stalls, > necessitating a restart of TB/EB, which restores normal functionality for a > time. For me it's bad enough that I start every day by closing Earlybird and > Aurora and relaunching them. Like I said, addons in Aurora prevent me from > definitively pointing at shared code, but the possibility is there... the > behavior is often similar. perhaps it would be useful to examine a couple more cases, in separate bug reports. *especially* helpful might be examples where the user does *not* have large folders, and also non-Mac users. FWIW, bug 759092 and bug 589310 may be compatriots of yours.
Severity: normal → major
(In reply to Wayne Mery (:wsmwk) from comment #13) > Cron 207k messages - *not* auto-fetched Gloda can still cause beach-balling for folders like this if anything happens in that folder. Gloda synchronously opens folders (.msf files) it wants to make sure should be up-to-date. bienvenu implemented support for async folder opening, but I never hooked it up. Since it's the .msf that's being opened, the beach-balling is not going to be impacted by whether the folder is offline or not, although whether auto-sync decides to open the folder will have an impact because it will affect whether gloda thinks there have been changes in the folder. Although gloda is clever enough to do event-driven indexing, it is still conservative about marking folders dirty/clean. If gloda hears about something happening in a folder that requires indexing, it will mark the folder dirty immediately and queue event-driven indexing. The next time gloda does a full indexing sweep (usually only at startup, but also if an overload condition causes event-driven indexing to go into overflow mode), it looks in all the dirty folders and will only then mark the folder clean after (efficiently) seeing if it actually has any work to do in the folder. The (easy) actionable items for gloda are: - hook up the async folder open logic - have gloda ignore folders with too many messages in them by default, marking the folder not to be indexed.
Just a quick status update: I've made no additional config changes since comment 10. Memory usage, CPU usage, and general responsiveness continues to be good, and I have little trouble leaving Earlybird open for days at a time before closing to update (presently on 15.0a2 2012-06-17). I have also upgraded from OSX 10.6.8 to 10.7.4 yesterday, and no issues all day today. Also to clarify comment 13, my inbox is currently around 24k messages. :)
Jake do the try builds in bug 764306 help ? If not could you try http://mikeconley.ca/blog/2012/06/15/gecko-profiler-now-works-in-thunderbird-daily/ so we can figure out where the cpu is being spent ?
Component: General → Search
Depends on: 759092, 589310
Whiteboard: [workaround: disable gloda]
That bug is now resolved and I'm running Earlybird 16.0a2 2012-08-27, which should contain that patch. It does indeed seem pretty snappy... but disabling Gloda was the real win for me, performance-wise. Even so today I got a long hang and then a beach-ball, opening a folder I hadn't looked at for some time. The profile for that is here: http://people.mozilla.com/~bgirard/cleopatra/?report=d83784d87c711f382f7bdc2df8febaa91f82d847 I suspect this is unrelated though... there was a good bit of new stuff in that folder (server-side filtering), so it's entirely reasonable for there to have been some work. It's a repeatable effect, too... I have several such folders, and they all behave like this the first time I open them and there's a lot of work to do. Sometimes even when going back to one I was just looking at will trigger another 30s of beachballing. Here's another one involving more occurrences of the problem, and some normal usage: http://people.mozilla.com/~bgirard/cleopatra/?report=f1220d5a76823bfc950170a981c60f04a84e3324
I just switched gloda back on and closed/reopened EB. CPU usage is much higher than normal... Activity Manager says it's indexing my inbox. This seems okay to me, just reporting it. I'll run this way for a while and see if memory usage climbs or if performance degrades. One thing is for sure... with or without gloda on, when I go poking around (looking for trouble) like in comment 18 it's pretty easy to generate a beachball for 30+ seconds... I just had one last over 2 full minutes. http://people.mozilla.com/~bgirard/cleopatra/?report=1472882666c4a83635c77158b3b2359cb83235c4 In the few minutes since I turned gloda back on and started writing this (and got that profiler dump), mem usage has shot up to 1.94GB... over a full GB higher than it ever was with gloda off. It has since fallen back down to around 1.15GB.
A few minutes' of usage (not even done indexing) and a couple emails sent is enough to convince me thoroughly that gloda is *definitely* the cause. The interface is absolutely terrible right now with it on. Switching folders switching messages opening (as opposed to just previewing) messages opening the compose new message window sending a message All are significantly slower than with gloda off. Possibly things could be improved with some prioritization on the gloda work... maybe it's just stealing too much CPU time from the UI. I am typically at or near 100% used by Earlybird right now with gloda on and indexing. Mem usage remains high... 1.5GB currently. Another profiler dump, because hey these things are free and easy to make. :) http://people.mozilla.com/~bgirard/cleopatra/?report=cc1787722ae2fafbf3fe94efc025a5afe5937642
gloda hasn't changed, but still I am tempted to ask, do you have any sense that version 15 is somehow worse performance-wise?
It's hard to tell from the profiles, but I suspect this is related to bug 784286 - some of the JavaScript code in gloda interacts very badly with compartment-per-global.
(In reply to Wayne Mery (:wsmwk) from comment #21) > gloda hasn't changed, but still I am tempted to ask, do you have any sense > that version 15 is somehow worse performance-wise? Sorry, don't know. I run Earlybird, so I haven't been on 15.0 in a while now. I'm on 17.0a2 (2012-08-31) now. FWIW, I haven't "felt" any performance regressions in the last few months that I can recall.
Jake, do you still see this problem?
Flags: needinfo?(nmaul)
Nope, nothing like what I was originally seeing. I'm on 18.0a2 now, with gloda enabled, and performance is acceptable. I think it's slightly less laggy with gloda disabled, but not enough that it bothers me to leave it on. I think most of the problems were around indexing... perhaps that is just too heavyweight of a job (at least for my mail and my system) or could benefit from different prioritization. In the normal case, when its not indexing (which is almost all the time, now), I don't notice nearly so many hangs or beachballs. My global-messages-db.sqlite file is 1.11GB right now, down from ~3GB right before I wiped it out back in comment 10. I suspect my problems were cured by a combination of things: cleaning up my profile (including deleting the gloda sqlite file), changing settings on the worst-offending folders, bug 764306, and the eventual clean re-indexing of everything. I do still get a hang and/or beachball when opening a folder I haven't looked at in some time (comment 18 and 19). I attribute this to the fact that server-side filtering has generally changed the folder a lot, and a lot of work has to be done. I wish this didn't hang the UI when it happens... but I do think it's inherently a different problem. The original beachballing had absolutely nothing to do with changing folders, and was occurring even with zero user interaction. I'm content to close this out as fixed.
Flags: needinfo?(nmaul)
WFM per reporter
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.