Closed Bug 21364 Opened 25 years ago Closed 25 years ago

[leak] Leak another 10K just viewing a pair of messages (each time)

Categories

(MailNews Core :: Backend, defect, P1)

x86
Other
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jar, Assigned: mscott)

References

Details

(Keywords: memory-leak)

Attachments

(5 files)

Using the 12/9 build.... Open mail Open inbox select a message view the message select next message view message select first message ...etc You will leak amounts of memory as high as 1M, but rarely less thatn 200K with each transition. The double transition (back to original message) seems to consistently cost in excess of 1M total. The messages I'm viewing are each about 5 screen in lengeth (not too big, but not tiny). My inbox has 4000 messages, but I don't expect this to have been critical.
This is probably XUL's fault. XUL has become quite leaky in the last week.
I just tried this with small messages, and the leak was a mere 20-40K per cycle, so I suspect the size of the message is significant.
QA Contact: lchiang → suresh
Assignee: phil → waterson
It turns out we seem to leak a nsDocLoaderImpl (which holds on to the XUL document) because of a circular reference between the docloader and the load group it creates. I'm gonna take a look at this, with mscott's help.
Whiteboard: [PDT+]
Putting on the PDT+ radar.
Assignee: waterson → mscott
mscott wrastled this from me.
Status: NEW → ASSIGNED
so right now I am not looking at mailnews. I'm trying to figure out why starting up and shutting down the browser apparently leaks a doc loader and load group as waterson mentioned. I've figured out why that leaks happens. I'm trying to come with some code to fix it right now. Some people try to create a doc loader instance as a service. The first instance is stored in a global variable and has no ties to a web shell. Keep in mind that doc loaders have a circular reference with a load group. So a doc loader's Destroy method needs to be called in order to break the circular reference. This happens naturally when a doc loader is created in conjunction with a webshell (because the webshell always calls it's doc loader's destroy method). But this doesn't happen when someone creates a "global" doc loader by calling into the service manager. So the circular reference is never broken and we leak the doc loader and load group. Adding rpotts to the cc list. My fix is going to involve force people who want to use the doc loader as a service to use a different object than those that use the doc loader as part of the web shell. You'll get the doc loader service, then ask it for the real doc loader. Then when the doc loader service is destroyed, it will call the destroy method on the global doc loader it contains.
hey scott, If the docloader is leaking because of the circular link to the loadgroup, then let me break that link with a nsWeakPtr... It used to be done this way, but for a short time we were proxying the notifications from the load group - and the weak ptr was lost. BTW. leaking the global docloader service should not be the cause this leak! This docloader has no associated webshell and *only* contains other docloaders (no channels). Since all of the other docloaders (and loadgroups) are cleaned up this one will be empty when it is leaked... -- rick
Hey Rick, So I missed this message of yours. I just went ahead and wrote a small snippet of code to make a doc loader service object which olds onto the global doc loader. This fixes the leak, but of course as you pointed out, the leak is quite small and not the source of the 400K leaks we are seeing on tinderbox which is what waterson and I were investigating.
waterson, I hate to have to think about bouncing this back to you but I've fixed the doc loader / load group problem in my tree and it doesn't effect the bloat at all. some of the bigger ticket items in the leak log are xul attributes (, atomimpl, 2 xul documents...
No problem. if nsDocLoaderImpl doesn't leak, but nsXULDocument still leaks, give it to me.
Update: The changes I have in my tree to fix the doc loader leak + some changes to fix webshells that were leaked by the global window have helped the mailnews leak story. That is, these changes fix the webshells that were leaked per message viewed. i.e. if you used to view 4 messages, we would leak 4 webshells. However, this change doesn't have as dramatic as an effect on the leak count as you might think because the webshells that were getting leaked before were being destroyed. i.e. there destroy method was being called so they weren't leaking anything they were holding on to. Just the actual webshell structure itself. And you won't notice that gain until you close the mail app too. That being said. When I view messages using todays bits, I'm not seeing jumps of 200K between message views. Sometimes it jumps up, sometimes it goes down depending on how big the message size is. But I'm not seeing increasing amounts of memory useage. Now that being said, the part that I think is still potentially a dogfood fix is the fact that starting up mail OR the browser leaks the xul document associated with the window for that application. I think this is contributing to a large about of the 400K we leak when you start up and shut down the browser (as seen on tinderbox)
More info: I do see my memory useage jump about 200K when I display a message but as soon as layout is done, the 200K is returned. i.e. if I view 5 or 6 messages that are about the same size, the mem useage when I view the first one vs when I view the last one is about the same. The memory is being returned. But i do see 200K spikes when laying out each message. Also, I've been trying to look into the XUL Document leak for mail and the browser. The bloat log swears it's getting leaked. Yet when I ask it for the serial number of the leaked document it won't give me one! So I can't get a leak tree for the xul document...arggh...
I'll look for that also. Do you have patches in your tree that I should pick up? If so, spank 'em into the bug.
Chris, I take it back. I am getting logs of nsXULDocument. I was incorrectly setting the class to be nsWebShell for ref count data and of course we weren't leaking the webshell. =) So I'm getting a valid ref counting graph for the xul document. There's one xtra ref count in there. I'm trying to track it down right now but feel free to look too. I'm attaching a patch which includes changes to globalWindowImpl, webshell, extensions.
Summary: [dogfood][leak] Clicking between adjacent messages leaks 200K-1M → [dogfood][leak] Leak 400K when starting up and shutting down the app
I'm going to change the summary of this bug to represent what we are working on: starting up and shutting down the browser leaks over 400K according to the leak tools on tinderbox. The patches I've already posted fix any webshell leaks related to viewing mailnews messages. In addition to these patches, I'm slowly chipping away at leaks for starting up and shutting down the browser. I found some nasty leaks in the cookie code and the wallet code earlier and I now have my linux box down to the point where it is leaking 350K instead of 425K. But I'm just chipping around the edges until we can solve one of the big leaks which I think is the XULDocument.
load balancing to waterson who will probably spot the xul document leak faster than me anyhow. I also have leak fixes that help bring this number down. I'll continue to try to help chris as well.
Assignee: mscott → waterson
Status: ASSIGNED → NEW
re-assigning for real this time.
QA Contact: suresh → gerardok
Depends on: 21661
Depends on: 21643
Status: NEW → ASSIGNED
Target Milestone: M12
Depends on: 21668
Summary: [dogfood][leak] Leak 400K when starting up and shutting down the app → [dogfood][leak] Leak another 400K just viewing a pair of messages (each time)
The bug is not a "startup and shutdown" bug. Sorry the focus changed by someone. I'm changing the title back to the original which involves about 1M plus leakage from just viewing two distinct messages (even as you view them again and again). Two message views add up to a 1 meg leak. That was the bug... not a startup/shutdown issue.
I'm sorry Jar. I was the one who morphed the subject title. Per my comments way up there somewhere, I'm not seeing a leak between viewing messages. Yes we use up 200K to display it but I see that memory getting freed. I suppose I should have marked this bug as works for me and we should have used a new one to track the large start up and shutdown leaks. One note about my inability to reproduce your problem: I now have lots of leak fixes in my tree for the startup / shutdown problem including changes that prevent us from leaking webshells when viewing messages)
I just tried this on the 12/13 build number 15 on windows (today's build). Without working too hard, (my second pair of messages I chose), I quickly found a pair that leaks about 350K each time I cycle back and forth through them. I can do this again and again, with ever increasing total leak (not just startup/stutdown). If there is a startup/shutdown bug, that should be filed... but tihs should really look at the problem. Please ring me up or come by to see (if my description is still not clear, and you can't reproduce). Thanks, Jim
jar: due to the nature of object ownership, if a XUL document is leaked, it will leak all the things that it owns.
I understand that if a XUL object is leaked, then all related objects go along with it (leak) in a reference-count system. The critical thing about this bug is that in the course of an oft repeated user activity (reading email) that a leak of several hundred K apppears for every couple of messages is a problem. IF there was a 400K leak that related to startup and shutdown *ONLY*, then it would not be a PDT+ bug ('cause it would not impact usability, since a user doesn't startup and shut down tons of times in each session.... especially mail... and especially with the startup/shutdown time performance in mail today). I understand that the leaked object(s) will typically be small... I'm just after finding the leaked object that transitively induces this mother of a leak. This bug will be gone when that object (or the one or two big helpers) is isolated, and repaired.
Every mail message is a XUL document.
jar: sorry, re-reading the comments in this bug, my previous comment was a complete tautology ("an object leaks the objects it owns" -- duh). What I *meant* to say was that the XUL document is a central object that has a very high ownership fan-out. It leaks the document content model and script event listeners, content model elements hold on to RDF datasources, etc. Fixing the startup leak of the XUL documents on startup will be necessary (although possibly not sufficient) to fix the leak of the mail message document.
chris, here are two small leak fixes that I also have in my tree of the global context. I'm attaching them here just so we are on the same page. Note: these fix ref counting leaks on the global context but even with these changes, it is still being leaked.
Rpotts confirmed that he could not reproduce this in his non-commercial build. He also confirmed that he could reproduce this in his comercial build. mscott: When you said you could not reproduce, were you using the comercial build? Please try it there... in my demo for rpotts, we were losing about 600K per pair of messages using the 12/13 build pulled from sweetlou. The obvious question is why are we leaking so differently on comercial build??
In a tree without mods, I see a 80-100K leak per message in the mozilla build. I see a 400-500Kb leak in the commercial build. I'm going to build a commercial tree with these mods and see what happens.
Some AIM stuff was recently turned on when displaying messages in the commercial build. I bet my bottom dollar it's coming from there...That would explain the difference.
Jar, I think I may have forced your problem to be fixed incidently today. Please see the PDT+ bug I filed on Alex Musil this afternoon: Bug #21710. Turns out they had added some stuff recently (within the last week) that was causing an address book to get opened for each email address in the email message. And I've been told that the address book is very very leaky. Alex fixed this bug for me and as a result we only open an address book up once per session instead of every time you view a message. I believe this was causing the leak you were seeing since it was only occurring in the commercial build. I'm running a commercial debug build with alex's changes for that bug and I'm seeing the same minimal leak behavior I reported to you earlier on the mozilla build when viewing messages. I belive we can mark this PDT bug fixed due to the changes in Alex's code tonight. Tell you what, why don't I wait and you can try out tomorrow's bits .if you aren't seeing the better leak behavior that I'm seeing in the commercial build we'll take a look again.
mscott: it looks like you are right. I just updated my tree, and am no longer seeing a large leak each time a new message is opened. jar: i'll let you mark this as fixed when the builds come out tomorrow.
It appears to be windows-specific. I cannot reproduce the leak in a commercial Linux build.
Target Milestone: M12 → M13
lets check this out to see if it looks good in 12/15 builds. if not fix in the first m13 carpool landings?
Both Waterson and I can still see the leak in the 12/15 morning build :-(. It is not gone :-/.
Priority: P3 → P1
jar: in a recent private email, you said that this seems to be down to l.t. 100K per message now. should we declare victory, or continue work on this? Or did I mis-read your email? ;-)
I was either sleepy, or miscommunicated (and I can't find the messag in my outbox :-/ ). Anyway... the problem is still visible on the Netscape Window's build, but not visible on the mozilla windows build, and not visible on the Linux Netscape build. Conclusion: This is a Windows only, Netscape version only, leak. Attached is a file that you can use to demonstrate the leak. Please just cycle back and forth between the two messages, and you should see about a 380K leak per cycle.
Blocks: 22176
Second try... this time I attached the actual mailbox, rather than the mailbox index. I tried a couple of messages. I can't tell if the leak is larger because of the number of folks on the To/CC list, or because of the complexity of the message. I could try to debug this down by hand editing the file so that I could tell what caused the larger leak. I'll do a little of this and comment back in a moment.
I tried reducing the number of folks on the CC list, and the leak persisted. It would appear that the size of the leak relates to the size and/or complexity of the message, not the email address count in the headers. Hope that helps, Jim p.s., This was tested on the M12 candidate for the comercial build, built 12/20
I see 320Kb per message on Linux; which is consistent with the 700Kb per "cycle", as you were defining it, Jim.
Ok, I've just done the Dumb Thing and instrumented PR_Malloc() to dump a stack trace when asked to allocate a block >8Kb. Turns out that message parsing does a -lot- of this. Specifically, we create ten to twenty 16-32Kb blocks (!) while parsing the "data:" URL that (presumably) is holding the message data. This is done to copy string data around. If leaked (or not re-usable because of subseqent re-poritioning to smaller blocks), this would account for jar's 300Kb leak per message. See http://lxr.mozilla.org/mozilla/source/netwerk/protocol/data/src/nsDataChannel.cpp#118 Specifically, the mURL->GetSpec() is creating 16-32Kb string copies! Not only does this seem to be -tremendously- inefficient, my guess that, even if it isn't leaking memory, it is seriously fragmenting memory. Can we think of a better way to do this? Maybe adding a [noscript] GetImmutableSpec() to nsIURI? (warren: this is your chance to say "I told you so.") Or using a mechanism other than a "data:" URL to transfer message data?
add bienvenu to cc list, who is another leak hero on the mail team.
Chris: i wouldn't spend too much time on this leak if it looks to be related to the message body being placed in the data url. I've been spending the last few days cooking up a new more efficient way to display messages that gets rid of the data url. It's much more efficient and is improving message display times by as much as 4 times on slower machines! Plus, it looks to be much more memory efficient as well. I'm planning on landing it when I get back from the holidays after New Years.
I will purify with Scott's new message loading stuff tomorrow and see what happens.
Assignee: waterson → mscott
Status: ASSIGNED → NEW
mscott: giving this one to you. when you land your mail reading changes, test it, see if we still leak. if we do, then you can give it back to me. if we don't then close it!
Status: NEW → ASSIGNED
I'm not seeing the large message leaks after landing my new souped up message display stuff. However, I don't consider myself to be a good test case 'cause I had such a hard time getting the large leak to happen before too. Jar, are you still seeing large leaks?
Summary: [dogfood][leak] Leak another 400K just viewing a pair of messages (each time) → [leak] Leak another 400K just viewing a pair of messages (each time)
Whiteboard: [PDT+]
The bug is now very small, and certainly not worthy of a PDT+ status (or even a dogfood request). I'm removing both comments. I think the leak is now under 10K per toggle, which is still a bug... but nothing critical at this point. IMO, you should mark this at least M15. Thanks to all, Jim
Target Milestone: M13 → M15
Thanks for the help jar. Bouncing out past beta 1 for the work involving tracking down the 10K or so we are still leaking when displaying a message.
updating the summary to reflect that the last known leak cnt was under 10K for some messages. I'm not actually seeing any signficant leakage displaying messages according to the refcounting logs and by monitoring my memory useage on windows when displaying messages. I'm just going to go ahead and mark this fixed. I will of course continue to keep an eye on leaks displaying messages as we go forward.
Summary: [leak] Leak another 400K just viewing a pair of messages (each time) → [leak] Leak another 10K just viewing a pair of messages (each time)
fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
No longer blocks: 22176
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: