Closed Bug 21364 Opened 25 years ago Closed 24 years ago

[leak] Leak another 10K just viewing a pair of messages (each time)

Categories

(MailNews Core :: Backend, defect, P1)

x86
Other
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jar, Assigned: mscott)

References

Details

(Keywords: memory-leak)

Attachments

(5 files)

Using the 12/9 build....
Open mail
Open inbox
select a message
view the message
select next message
view message
select first message
...etc

You will leak amounts of memory as high as 1M, but rarely less thatn 200K with
each transition.  The double transition (back to original message) seems to
consistently cost in excess of 1M total.  The messages I'm viewing are each
about 5 screen in lengeth (not too big, but not tiny).  My inbox has 4000
messages, but I don't expect this to have been critical.
This is probably XUL's fault. XUL has become quite leaky in the last week.
I just tried this with small messages, and the leak was a mere 20-40K per cycle,
so I suspect the size of the message is significant.
QA Contact: lchiang → suresh
Assignee: phil → waterson
It turns out we seem to leak a nsDocLoaderImpl (which holds on to the XUL
document) because of a circular reference between the docloader and the load
group it creates. I'm gonna take a look at this, with mscott's help.
Whiteboard: [PDT+]
Putting on the PDT+ radar.
Assignee: waterson → mscott
mscott wrastled this from me.
Status: NEW → ASSIGNED
so right now I am not looking at mailnews. I'm trying to figure out why starting
up and shutting down the browser apparently leaks a doc loader and load group as
waterson mentioned.

I've figured out why that leaks happens. I'm trying to come with some code to
fix it right now. Some people try to create a doc loader instance as a service.
The first instance is stored in a global variable and has no ties to a web
shell. Keep in mind that doc loaders have a circular reference with a load
group. So a doc loader's Destroy method needs to be called in order to break the
circular reference. This happens naturally when a doc loader is created in
conjunction with a webshell (because the webshell always calls it's doc loader's
destroy method). But this doesn't happen when someone creates a "global" doc
loader by calling into the service manager. So the circular reference is never
broken and we leak the doc loader and load group.

Adding rpotts to the cc list. My fix is going to involve force people who want
to use the doc loader as a service to use a different object than those that use
the doc loader as part of the web shell. You'll get the doc loader service, then
ask it for the real doc loader. Then when the doc loader service is destroyed,
it will call the destroy method on the global doc loader it contains.
hey scott,
If the docloader is leaking because of the circular link to the loadgroup, then
let me break that link with a nsWeakPtr...  It used to be done this way, but for
a short time we were proxying the notifications from the load group - and the
weak ptr was lost.

BTW.  leaking the global docloader service should not be the cause this leak!
This docloader has no associated webshell and *only* contains other docloaders
(no channels).  Since all of the other docloaders (and loadgroups) are cleaned
up this one will be empty when it is leaked...

-- rick
Hey Rick, So I missed this message of yours. I just went ahead and wrote a small
snippet of code to make a doc loader service object which olds onto the global
doc loader.

This fixes the leak, but of course as you pointed out, the leak is quite small
and not the source of the 400K leaks we are seeing on tinderbox which is what
waterson and I were investigating.
waterson, I hate to have to think about bouncing this back to you but I've fixed
the doc loader / load group problem in my tree and it doesn't effect the bloat
at all. some of the bigger ticket items in the leak log are xul attributes (,
atomimpl, 2 xul documents...
No problem. if nsDocLoaderImpl doesn't leak, but nsXULDocument still leaks, give
it to me.
Update: The changes I have in my tree to fix the doc loader leak + some changes
to fix webshells that were leaked by the global window have helped the mailnews
leak story. That is, these changes fix the webshells that were leaked per
message viewed. i.e. if you used to view 4 messages, we would leak 4 webshells.
However, this change doesn't have as dramatic as an effect on the leak count as
you might think because the webshells that were getting leaked before were being
destroyed. i.e. there destroy method was being called so they weren't leaking
anything they were holding on to. Just the actual webshell structure itself.

And you won't notice that gain until you close the mail app too.

That being said. When I view messages using todays bits, I'm not seeing jumps of
200K between message views. Sometimes it jumps up, sometimes it goes down
depending on how big the message size is. But I'm not seeing increasing  amounts
of memory useage.

Now that being said, the part that I think is still potentially a dogfood fix is
the fact that starting up mail OR the browser leaks the xul document associated
with the window for that application. I think this is contributing to a large
about of the 400K we leak when you start up and shut down the browser (as seen
on tinderbox)
More info: I do see my memory useage jump about 200K when I display a message
but as soon as layout is done, the 200K is returned. i.e. if I view 5 or 6
messages that are about the same size, the mem useage when I view the first one
vs when I view the last one is about the same. The memory is being returned. But
i do see 200K spikes when laying out each message.

Also, I've been trying to look into the XUL Document leak for mail and the
browser. The bloat log swears it's getting leaked. Yet when I ask it for the
serial number of the leaked document it won't give me one! So I can't get a leak
tree for the xul document...arggh...
I'll look for that also. Do you have patches in your tree that I should pick
up? If so, spank 'em  into the bug.
Chris, I take it back. I am getting logs of nsXULDocument. I was incorrectly
setting the class to be nsWebShell for ref count data and of course we weren't
leaking the webshell. =) So I'm getting a valid ref counting graph for the xul
document. There's one xtra ref count in there. I'm trying to track it down right
now but feel free to look too.

I'm attaching a patch which includes changes to globalWindowImpl, webshell,
extensions.
Summary: [dogfood][leak] Clicking between adjacent messages leaks 200K-1M → [dogfood][leak] Leak 400K when starting up and shutting down the app
I'm going to change the summary of this bug to represent what we are working on:
starting up and shutting down the browser leaks over 400K according to the leak
tools on tinderbox. The patches I've already posted fix any webshell leaks
related to viewing mailnews messages.


In addition to these patches, I'm slowly chipping away at leaks for starting up
and shutting down the browser. I found some nasty leaks in the cookie code and
the wallet code earlier and I now have my linux box down to the point where it
is leaking 350K instead of 425K. But I'm just chipping around the edges until we
can solve one of the big leaks which I think is the XULDocument.
load balancing to waterson who will probably spot the xul document leak faster
than me anyhow. I also have leak fixes that help bring this number down. I'll
continue to try to help chris as well.
Assignee: mscott → waterson
Status: ASSIGNED → NEW
re-assigning for real this time.
QA Contact: suresh → gerardok
Depends on: 21661
Depends on: 21643
Status: NEW → ASSIGNED
Target Milestone: M12
Depends on: 21668
Summary: [dogfood][leak] Leak 400K when starting up and shutting down the app → [dogfood][leak] Leak another 400K just viewing a pair of messages (each time)
The bug is not a "startup and shutdown" bug.  Sorry the focus changed by
someone. I'm changing the title back to the original which involves about 1M
plus leakage from just viewing two distinct messages (even as you view them
again and again).  Two message views add up to a 1 meg leak.  That was the
bug... not a startup/shutdown issue.
I'm sorry Jar. I was the one who morphed the subject title. Per my comments way
up there somewhere, I'm not seeing a leak between viewing messages. Yes we use
up 200K to display it but I see that memory getting freed.  I suppose I should
have marked this bug as works for me and we should have used a new one to track
the large start up and shutdown leaks.

One note about my inability to reproduce your problem: I now have lots of leak
fixes in my tree for the startup / shutdown problem including changes that
prevent us from leaking webshells when viewing messages)
I just tried this on the 12/13 build number 15 on windows (today's build).
Without working too hard, (my second pair of messages I chose), I quickly found
a pair that leaks about 350K each time I cycle back and forth through them.  I
can do this again and again, with ever increasing total leak (not just
startup/stutdown).  If there is a startup/shutdown bug, that should be filed...
but tihs should really look at the problem.  Please ring me up or come by to see
(if my description is still not clear, and you can't reproduce).

Thanks,

Jim
jar: due to the nature of object ownership, if a XUL document is leaked, it
will leak all the things that it owns.
I understand that if a XUL object is leaked, then all related objects go along
with it (leak) in a reference-count system.  The critical thing about this bug
is that in the course of an oft repeated user activity (reading email) that a
leak of several hundred K apppears for every couple of messages is a problem.

IF there was a 400K leak that related to startup and shutdown *ONLY*, then it
would not be a PDT+ bug ('cause it would not impact usability, since a user
doesn't startup and shut down tons of times in each session.... especially
mail... and especially with the startup/shutdown time performance in mail today).

I understand that the leaked object(s) will typically be small... I'm just after
finding the leaked object that transitively induces this mother of a leak. This
bug will be gone when that object (or the one or two big helpers) is isolated,
and repaired.
Every mail message is a XUL document.
jar: sorry, re-reading the comments in this bug, my previous comment was a
complete tautology ("an object leaks the objects it owns" -- duh). What I
*meant* to say was that the XUL document is a central object that has a very
high ownership fan-out. It leaks the document content model and script event
listeners, content model elements hold on to RDF datasources, etc. Fixing the
startup leak of the XUL documents on startup will be necessary (although
possibly not sufficient) to fix the leak of the mail message document.
chris, here are two small leak fixes that I also have in my tree of the global
context. I'm attaching them here just so we are on the same page. Note: these
fix ref counting leaks on the global context but even with these changes, it is
still being leaked.
Rpotts confirmed that he could not reproduce this in his non-commercial build.
He also confirmed that he could reproduce this in his comercial build.

mscott: When you said you could not reproduce, were you using the comercial
build?  Please try it there... in my demo for rpotts, we were losing about 600K
per pair of messages using the 12/13 build pulled from sweetlou.

The obvious question is why are we leaking so differently on comercial build??
In a tree without mods, I see a 80-100K leak per message in the mozilla build. I
see a 400-500Kb leak in the commercial build. I'm going to build a commercial
tree with these mods and see what happens.
Some AIM stuff was recently turned on when displaying messages in the commercial
build. I bet my bottom dollar it's coming from there...That would explain the
difference.
Jar, I think I may have forced your problem to be fixed incidently today. Please
see the PDT+ bug I filed on Alex Musil this afternoon: Bug #21710. Turns out
they had added some stuff recently (within the last week) that was causing an
address book to get opened for each email address in the email message. And I've
been told that the address book is very very leaky. Alex fixed this bug for me
and as a result we only open an address book up once per session instead of
every time you view a message.

I believe this was causing the leak you were seeing since it was only occurring
in the commercial build. I'm running a commercial debug build with alex's
changes for that bug and I'm seeing the same minimal leak behavior I reported to
you earlier on the mozilla build when viewing messages.

I belive we can mark this PDT bug fixed due to the changes in Alex's code tonight.
 Tell you what, why don't I wait and you can try out tomorrow's bits .if you
aren't seeing the better leak behavior that I'm seeing in the commercial build
we'll take a look again.
mscott: it looks like you are right. I just updated my tree, and am no longer

seeing a large leak each time a new message is opened.



jar: i'll let you mark this as fixed when the builds come out tomorrow.
It appears to be windows-specific. I cannot reproduce the leak in a commercial
Linux build.
Target Milestone: M12 → M13
lets check this out to see if it looks good in 12/15 builds.
if not fix in the first m13 carpool landings?
Both Waterson and I can still see the leak in the 12/15 morning build :-(.
It is not gone :-/.
Priority: P3 → P1
jar: in a recent private email, you said that this seems to be down to l.t. 100K
per message now. should we declare victory, or continue work on this? Or did I
mis-read your email? ;-)
I was either sleepy, or miscommunicated (and I can't find the messag in my
outbox :-/ ).  Anyway... the problem is still visible on the Netscape Window's
build, but not visible on the mozilla windows build, and not visible on the
Linux Netscape build.  Conclusion: This is a Windows only, Netscape version
only, leak.

Attached is a file that you can use to demonstrate the leak.  Please just cycle
back and forth between the two messages, and you should see about a 380K leak
per cycle.
Blocks: 22176
Second try... this time I attached the actual mailbox, rather than the mailbox
index.   I tried a couple of messages.  I can't tell if the leak is larger
because of the number of folks on the To/CC list, or because of the complexity
of the message.  I could try to debug this down by hand editing the file so that
I could tell what caused the larger leak.  I'll do a little of this and comment
back in a moment.
I tried reducing the number of folks on the CC list, and the leak persisted.  It
would appear that the size of the leak relates to the size and/or complexity of
the message, not the email address count in the headers.

Hope that helps,

Jim

p.s., This was tested on the M12 candidate for the comercial build, built 12/20
I see 320Kb per message on Linux; which is consistent with the 700Kb per
"cycle", as you were defining it, Jim.
Ok, I've just done the Dumb Thing and instrumented PR_Malloc() to dump a stack
trace when asked to allocate a block >8Kb. Turns out that message parsing does a
-lot- of this.

Specifically, we create ten to twenty 16-32Kb blocks (!) while parsing the
"data:" URL that (presumably) is holding the message data. This is done to copy
string data around. If leaked (or not re-usable because of subseqent
re-poritioning to smaller blocks), this would account for jar's 300Kb leak per
message.

See

http://lxr.mozilla.org/mozilla/source/netwerk/protocol/data/src/nsDataChannel.cpp#118

Specifically, the mURL->GetSpec() is creating 16-32Kb string copies!

Not only does this seem to be -tremendously- inefficient, my guess that, even if
it isn't leaking memory, it is seriously fragmenting memory.

Can we think of a better way to do this? Maybe adding a [noscript]
GetImmutableSpec() to nsIURI? (warren: this is your chance to say "I told you
so.") Or using a mechanism other than a "data:" URL to transfer message data?
add bienvenu to cc list, who is another leak hero on the mail team.
Chris: i wouldn't spend too much time on this leak if it looks to be related to
the message body being placed in the data url. I've been spending the last few
days cooking up a new more efficient way to display messages that gets rid of
the data url. It's much more efficient and is improving message display times by
as much as 4 times on slower machines! Plus, it looks to be much more memory
efficient as well. I'm planning on landing it when I get back from the holidays
after New Years.
I will purify with Scott's new message loading stuff tomorrow and see what
happens.
Assignee: waterson → mscott
Status: ASSIGNED → NEW
mscott: giving this one to you. when you land your mail reading changes, test
it, see if we still leak. if we do, then you can give it back to me. if we don't
then close it!
Status: NEW → ASSIGNED
I'm not seeing the large message leaks after landing my new souped up message
display stuff. However, I don't consider myself to be a good test case 'cause I
had such a hard time getting the large leak to happen before too. Jar, are you
still seeing large leaks?
Summary: [dogfood][leak] Leak another 400K just viewing a pair of messages (each time) → [leak] Leak another 400K just viewing a pair of messages (each time)
Whiteboard: [PDT+]
The bug is now very small, and certainly not worthy of a PDT+ status (or even a
dogfood request).  I'm removing both comments.  I think the leak is now under
10K per toggle, which is still a bug... but nothing critical at this point.

IMO, you should mark this at least M15.

Thanks to all,

Jim
Target Milestone: M13 → M15
Thanks for the help jar. Bouncing out past beta 1 for the work involving
tracking down the 10K or so we are still leaking when displaying a message.
updating the summary to reflect that the last known leak cnt was under 10K for
some messages. I'm not actually seeing any signficant leakage displaying
messages according to the refcounting logs and by monitoring my memory useage on
windows when displaying messages.

I'm just going to go ahead and mark this fixed. I will of course continue to
keep an eye on leaks displaying messages as we go forward.
Summary: [leak] Leak another 400K just viewing a pair of messages (each time) → [leak] Leak another 10K just viewing a pair of messages (each time)
fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
No longer blocks: 22176
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: