Closed Bug 398961 Opened 17 years ago Closed 17 years ago

800 MB spike in SM VM size when I start mailnews main window

Categories

(SeaMonkey :: MailNews: Message Display, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: nelson, Unassigned)

Details

Attachments

(4 files)

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a9pre) Gecko/2007100703 SeaMonkey/2.0a1pre When I start SM 2.0a1pre (browser), there's just a relatively tiny increase in system memory usage, ~16MB. But when I then start the mailnews window (CTRL+2) for the first time, there is a HUGE memory spike, on the order of 800 MegaBytes in the VM space of SeaMonkey. This happens EVERY TIME I start mailnews in a new SM process. First it jumps up ~500 MB for about 7 seconds, then ~300 more to a peak of ~800MB for about 12 seconds, then it comes down by ~200 MB for about 35 seconds, then comes down another ~200 MB, leveling off at ~450 MB VM space for SM. 10 seconds later, the mail/news window finally appears on the screen. This whole series of events takes ~75 seconds. This is just unusable. SM mail just crawls, swapping to death. I am attaching a gif file that captures the system's aggregate VM space growth as I start SM mail/news. Happens in two different profiles which have essentially identical sets of mail accounts and folders. I have 1 IMAP, 3 POP, and 4 NNTP news accounts. NONE of them is configured to fetch messages at startup. The total size on disk of my largest profile's entire Mail and ImapMail folders is less than 650 MB, so even if it was reading all my folders into memory at once, along with their .msf files, that would still not account for ~800 MB of growth. According to about:plugins, the plugins are: - QuickTime 7.2 = Mozilla Default Plug-in - AlternaTiff - Shockwave Flash 9.0 r16 - Cult3D Player v5.3.0.154 - Java 1.4.2_04 - Adobe Acrobat 5.0 - Windows Media Player Plug-in (no version given) - Microsoft DRM (no version given) Also: FlashBlock (this doesn't show up in the list) Most of these are quite old, so I see little reason for them to have suddenly changed behavior dramatically. On my laptop, where I'm still running SM 1.5a, mailnews starts up almost instantaneously, with no visible increase in the VM size. That's what I want SM 2.0a1pre to do, too.
WORKSFPRME. VM size=22MB, after start of Browser & Mailnews & download POP3 mails under following environment. - MS Win-XP SP2 - Seamonkey trunk : Gecko/2007100703 SeaMonkey/2.0a1pre (same as yours) - Profile created by Seamonkey trunk a few weeks ago - No extension except standard one such as DOM inspector - 1 real POP3 account, 1 dummy POP3 account, Local Folders - 1 dummy SMTP server (no mail send is done yet) - Live bookmarks(RSS feeds) in default bookmark are deleted manually via UI
> Also: FlashBlock (this doesn't show up in the list) Similar issue to following bugs? > Bug 398035 – flash not shows at all if adblock installed after 2007092502 > Bug 398135 – [FIX]Since the landing of 372769, adblock plus fails to allow new filters to be added
I've had all my plugins (except 1) since 2006 without problems. Besides SM's own "default" or "null" plugin, which is updated every time I install a new trunk build, the only other plugin to have changed in 2007 is quicktime (which was updated due to a security vulnerability). So, as an experiment, I removed the quicktime plugin DLLs and class file, and (re)started SM. I checked in about:plugins to ensure that QT was gone. This had no effect on the VM size when I started the mailnews window. It's still ~450 MB after it settles down and the mailnews window finally appears.
> Also: FlashBlock (this doesn't show up in the list) Is there any message relates to FlashBlock in Error console? Can you test without FlashBlock? (Not disables status. Uninstalled or not-installed status)
When software goes haywire suddenly, I always ask: what changed? FlashBlock didn't change (AFAIK). Files are still dated in 2006.
Seamonkey has been apparently changed to build of 2007100703 in your environment from former builds. "FlashBlock doesn't show up even you installed it successfully and it worked well in the past" may be a result of mismatch between latest SM build and the extension. See Bug 320915 Comment #12 and comments follows it for severe memory leak problem by AdBlock when Firefox was changed from 1.0(1.7 branch) to 1.5(1.8.0 branch).
I eliminated Flashblock, and the 800 MB problem continues. It was not reduced at all.
I disabled all plugins. No difference. 800+ MB spike continues on mailnews startup. At this point, I'm looking for more steps to eliminate possibilities. Otherwise, I'll have to abandon the trunk.
Flags: blocking-seamonkey2.0a1?
> At this point, I'm looking for more steps to eliminate possibilities. My usual first problem determination procedure is: Test with newly created problem with minimum prefs.js change from default. e.g. Set "Leave on server" if real POP3 account is required for test Will problem be re-produced with new profile(with minimum change from default), single real IMAP account and mail read only?
(Off-Topic) > Otherwise, I'll have to abandon the trunk. "Stay at build just before problem starts" is always one of choices for nightly testers. See Bug 333540 which is the longest term of crash, 3 months, for me. I couldn't find any workaround or cause until suggestion 3 months after bug open by Kai Engert based on analysis of other bug reports.
I ran the nightly trunk build in the debugger, and set breakpoints in malloc and realloc to trigger on allocations greater than 0x0400 0000 bytes (~67 MB). The break point was hit 4 times between typing ctrl+2 and the appearance of the mailnews window. Each hit corresponded to a huge increase in stack space. There were a couple other jumps in size that did not correspond to these breakpoints, but these account for most of the huge increase. The stacks are attached. They do not contain symbols for all the functions. They contain library names and function name symbols only for functions in xpcom_core.dll. If I could download a complete debug build with symbols (or all the .pdb files) built with MSVC 2005 SP1, I could get better stacks. The huge reallocations are due to calls from nsACString_internal::Replace() and nsACString_internal::ReplaceASCII(). The big malloc call was ? from nsACString_internal::Replace(). These results are all reproducible.
You could also bite the bullet and just do a debug SeaMonkey build and run under the debugger to see what the stack is. I assume you did try removing the .mfl files we talked about over AIM...I think those would get loaded completely on startup, long before mailnews is launched, but I could be wrong. Occasionally running a debug build might help track down some of the other issues you've been seeing. Just a thought - I know it's painful...
I'm willing to RUN debug builds, but for MANY reasons, I'd rather not build them myself. If there was somewhere from which I could download a complete debug build ... I found that the symbols reported by the debugger as nsACString_internal::Replace() and nsACString_internal::ReplaceASCII() appear in the source code with a different class name. They are actually part of class nsTSubstring_CharT declared in http://mxr.mozilla.org/mozilla/source/xpcom/string/public/nsTSubstring.h#347 and defined in http://mxr.mozilla.org/mozilla/source/xpcom/string/src/nsTSubstring.cpp#468 I think it would be a good idea to change all those functions to place an upper bound on the size of the (supposed) text substring they replace. I'd suggest a few megabytes, like 10, or even 32. Just fail if the size is bigger than that. That would probably solve the immediate problem. In fact, I'm thinking of trying that out, just to see.
(In reply to comment #11) > They contain library names and function name symbols only for functions in > xpcom_core.dll. If I could download a complete debug build with symbols > (or all the .pdb files) built with MSVC 2005 SP1, I could get better stacks. > > The huge reallocations are due to calls from nsACString_internal::Replace() > and nsACString_internal::ReplaceASCII(). The big malloc call was ? from > nsACString_internal::Replace(). These results are all reproducible. Although knowing that it looks like string replacements/functions may be causing the huge reallocations partially helps, we really need to know where in mail.dll they are coming from, otherwise its a needle in a haystack unless someone finds a reliable way of reproducing it. I know Firefox was meant to be starting to post pdb files for builds, I'm not sure if that's been extended to Thunderbird (and afaik it isn't done for SeaMonkey).
Using assembler debugging, I set a breakpoint in nsStringBuffer::Alloc See http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/string/src/nsSubstring.cpp&rev=1.8#201 It was a conditional breakpoint, stopping when variable "size" was greater than 50 million (50 000 000) decimal. I hit it more than once. Each time, I just made it return NULL. mailnews started successfully, and much more quickly. Now, I ask you, is there really any need to allocate strings larger than 50 Megabytes? (Apprently not, since mailnews worked OK after that). Could we just put code in that function to always fail requests for that much (or more) memory? We can fix all the broken code and achieve Nirvana later. Today I just want a mailnews client that doesn't run out of swap space.
mailnews is not the only client of the string api. I believe this problem needs to be fixed in the mailnews code...
I've gotten rid of the XUL cache files, and rebuilt all the msf files. No impact. I do know why it takes peaks at 800 MB for 12 seconds, then comes down by 200 MB. It's doing a collossal realloc. It takes 12 seconds to memcopy several hundred megabytes from the old buffer to the new one, given that it's swapping all the way. Then when the hundreds of megabytes are copied, it frees the old buffer, which drops about 200 MB. That's all happening in nsTSubstring_CharT::MutatePrep. Yup, somebody's trying to grow a 200+ MB string buffer. We need Nancy Reagan to review this code: Just say No!
This is a start. I'd call 50 million bytes an adequate, perhaps absurdly generous, upper bound.
Attachment #285255 - Flags: review?
You'll need to request a review from an XPCOM string peer/module owner. I'm not sure who this is at this point, since Darin's not involved anymore. Maybe bisi?
Just a FYI: Flashblock is /NOT/ an plugin, so it should never appear in about:plugins.
(In reply to comment #17) > I do know why it takes peaks at 800 MB for 12 seconds, then comes down > by 200 MB. It's doing a collossal realloc. FYI. I saw similar phenomenon during test of Bug 398684 (Shift+Del of mail mails). When Shift+Del of 40,000 mails on MS Win XP, as I reported to Bug 398684, peak VM size was acceptable, and Shift+Del completed in reasonable time. But, when I tried to Shift+Del of 320,000 mails, increasing speed of VM was very rapid, and peak VM easily exceeded 768MB, and took very long. I always killed Tb when this situation to avoid long term system slowdown by severe swapping (I have 1GB memory only), so I don't know what happen after completion of Shift+Del of 320,000 mails. "Very rapid VM increase(and probably released after end of process)" looked to be phenomenon of very faster "object creation requests, then memory allocation" than "destroy of old objects, garbage collection, then freemain". And your observation/description seems to explain my test result.
(In reply to comment #18) > Created an attachment (id=285255) [details] > Nancy Reagan patch > > This is a start. I'd call 50 million bytes an adequate, perhaps > absurdly generous, upper bound. This isn't a start. The problem (as indicated by the stacks) is ost likely something in mailnews that is causing the large allocations. All this patch will do is to cover it up for your instance. There may be someone out there with the same bug, but that bug in their case is causing 49 million bytes to be allocated 50 times on startup. Without properly debugging this, or at least finding where in mailnews this is coming from (which IMHO at the moment you are most ideally situated to do), we've got no idea what's causing it or why. If we don't find the real location/reason for why its happening then the possible case I've just stated could still occur (or be occurring), and we'd have to start again. Lets at least try and have some idea what is really happening to fix this properly rather than just coming up with a hack because its easier.
(has debug builds for most recent week been considered for http://ftp.mozilla.org/pub/mozilla.org/<whatever>/nightly/latest-trunk/ ... even if just for windows?) So this huge memory increase happens _before_ you access any account (since you have checking turned off for startup and timed interval)?
(In reply to comment #23) > (has debug builds for most recent week been considered for > http://ftp.mozilla.org/pub/mozilla.org/<whatever>/nightly/latest-trunk/ ... > even if just for windows?) Wayne, I'm not sure to whom that question is addressed, nor what exactly you're proposing. All the nighly trunk builds that I run come from the directory that you cite above. The stacks I reported in comments above come from those builds. The builds in that directory are NOT debug builds. > So this huge memory increase happens _before_ you access any account (since > you have checking turned off for startup and timed interval)? That's correct.
Attached image New mem spike graph
Even though I have not downloaded any update to my browser (and I think I have auto-update disabled), and about: shows the same date as before (Gecko/2007100703 ) today the memory spike behavior changed, dramatically. Now, when I start the mailnews part, the memory spike still happens, 800+ MB, as before, but now, before the mailnews window appears, the huge memory allocation goes completely away. The process VM size goes back down to below 40MB before the window appears. When I reported this bug, it would come down from 800+MB to about 450 MB and level off and stay there for the process lifetime. Now, it goes back down to where it belongs, after about 30 seconds. Attached is a new graph of the VM spike. The vertical scale is not the same as before, so I've labeled it. The horizontal scale is (or should be) similar to the previous graph (first attachment above). The process is a lot more usable now, than before, once the mailnews window opens. It would still be nice to eliminate that 800+ MB spike, but it doesn't seem as urgent now.
It's history. This morning, the mailnews window appeared in just a second or two after I clicked to start it. The old 800 MB spike, and the 30-90 second wait while it copied hundreds of megabytes in realloc just don't happen any more. I've exited and restarted it many times, always with the same result. No more 800 MB spike. It lasted for about, which is conincidentally the amount of time that newsgroup headers are kept by my prefs. So, I suspect it had something to do with newsgroups and header retention. But that's just a guess. I guess we'll never know. I
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
(In reply to comment #26) > It's history. > > This morning, the mailnews window appeared in just a second or two > after I clicked to start it. The old 800 MB spike, and the 30-90 > second wait while it copied hundreds of megabytes in realloc just > don't happen any more. > > I've exited and restarted it many times, always with the same result. > No more 800 MB spike. with the same build 2007100703 ?
Wayne, Yes, with the very same build. When this behavior changed, so did other behaviors. Bug 400526 began at the time the 800 MB spike ended. There are a couple of words missing from comment 26. I meant to write: It lasted for about *30 days*, which is coincidentally the amount of time that newsgroup headers are kept by my prefs. So, I suspect it had something to do with newsgroups and header retention. Bug 400526 also has to do with newsgroup and header retention. I did not discover that bug until after I had written comment 26.
> > with the same build 2007100703 ? > Yes, with the very same build What was build number just before you transfered to 2007100703 build? As you say in Bug 400526, Bug 393620 was fixed on 2007-09-28(vendor ID change from mozilla.org to Mozilla), so you had to move profile location manually. Was event sequence as follows? (1) You were using builds before 9/28, so profile location was mozilla.org (2) You manually moved profile location(mozilla.org=>Mozilla) for 10/07 build (3) You started to use 10/07 build on 10/07 (date of your comment #0) Guide for manual profile location change is following page. http://wiki.mozilla.org/SeaMonkey:Moving_Profiles_from_mozilla.org_to_Mozilla This page says; > Files to remove > * XUL.mfasl > * XPC.mfasl > * panacea.dat > * xpti.dat > * compreg.dat Did you delete panacea.dat when manual profile location change? panacea.dat keeps cached information relate to account/folder/newsgroup etc. So old panecea.dat possibly causes re-subscription of newsgroups.
Wada, Yes, That's the approximate sequence of events. Previous build (to 2007-10-07) was 2007-09-01. I believe I did delete the files named in your comment 29 above. I never did find files with a suffix of .mfasl, but I did find files named XUL.mfl and XPL.mfl, and I removed them. Should we be discussing this in bug 400526?
Nelson, root cause of this bug and Bug 400526 is Bug 400648 when you migrated profile?
No, I think we don't know the cause of this bug. To state that either Bug 400526 or Bug 400648 are the cause of this bug is mere speculation.
If there is some way that a 200 KB newsgroup .rc file can cause an 800+ MB memory allocation, THAT should be fixed. But the 800+ MB allocation went away, and the 200KB .rc file didn't. So I wouldn't draw a conclusion about causality from that.
Flags: blocking-seamonkey2.0a1?
Attachment #285255 - Flags: review?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: