[leak] Leak another 10K just viewing a pair of messages (each time)

RESOLVED FIXED in M15

Status

MailNews Core
Backend
P1
normal
RESOLVED FIXED
19 years ago
10 years ago

People

(Reporter: Jim Roskind, Assigned: Scott MacGregor)

Tracking

({memory-leak})

Trunk
x86
Other
memory-leak
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments)

(Reporter)

Description

19 years ago
Using the 12/9 build....
Open mail
Open inbox
select a message
view the message
select next message
view message
select first message
...etc

You will leak amounts of memory as high as 1M, but rarely less thatn 200K with
each transition.  The double transition (back to original message) seems to
consistently cost in excess of 1M total.  The messages I'm viewing are each
about 5 screen in lengeth (not too big, but not tiny).  My inbox has 4000
messages, but I don't expect this to have been critical.

Comment 1

19 years ago
This is probably XUL's fault. XUL has become quite leaky in the last week.
(Reporter)

Comment 2

19 years ago
I just tried this with small messages, and the leak was a mere 20-40K per cycle,
so I suspect the size of the message is significant.

Updated

19 years ago
QA Contact: lchiang → suresh

Updated

19 years ago
Assignee: phil → waterson

Comment 3

19 years ago
It turns out we seem to leak a nsDocLoaderImpl (which holds on to the XUL
document) because of a circular reference between the docloader and the load
group it creates. I'm gonna take a look at this, with mscott's help.

Updated

19 years ago
Whiteboard: [PDT+]

Comment 4

19 years ago
Putting on the PDT+ radar.

Updated

19 years ago
Assignee: waterson → mscott

Comment 5

19 years ago
mscott wrastled this from me.
(Assignee)

Updated

19 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 6

19 years ago
so right now I am not looking at mailnews. I'm trying to figure out why starting
up and shutting down the browser apparently leaks a doc loader and load group as
waterson mentioned.

I've figured out why that leaks happens. I'm trying to come with some code to
fix it right now. Some people try to create a doc loader instance as a service.
The first instance is stored in a global variable and has no ties to a web
shell. Keep in mind that doc loaders have a circular reference with a load
group. So a doc loader's Destroy method needs to be called in order to break the
circular reference. This happens naturally when a doc loader is created in
conjunction with a webshell (because the webshell always calls it's doc loader's
destroy method). But this doesn't happen when someone creates a "global" doc
loader by calling into the service manager. So the circular reference is never
broken and we leak the doc loader and load group.

Adding rpotts to the cc list. My fix is going to involve force people who want
to use the doc loader as a service to use a different object than those that use
the doc loader as part of the web shell. You'll get the doc loader service, then
ask it for the real doc loader. Then when the doc loader service is destroyed,
it will call the destroy method on the global doc loader it contains.

Comment 7

19 years ago
hey scott,
If the docloader is leaking because of the circular link to the loadgroup, then
let me break that link with a nsWeakPtr...  It used to be done this way, but for
a short time we were proxying the notifications from the load group - and the
weak ptr was lost.

BTW.  leaking the global docloader service should not be the cause this leak!
This docloader has no associated webshell and *only* contains other docloaders
(no channels).  Since all of the other docloaders (and loadgroups) are cleaned
up this one will be empty when it is leaked...

-- rick
(Assignee)

Comment 8

19 years ago
Hey Rick, So I missed this message of yours. I just went ahead and wrote a small
snippet of code to make a doc loader service object which olds onto the global
doc loader.

This fixes the leak, but of course as you pointed out, the leak is quite small
and not the source of the 400K leaks we are seeing on tinderbox which is what
waterson and I were investigating.
(Assignee)

Comment 9

19 years ago
waterson, I hate to have to think about bouncing this back to you but I've fixed
the doc loader / load group problem in my tree and it doesn't effect the bloat
at all. some of the bigger ticket items in the leak log are xul attributes (,
atomimpl, 2 xul documents...

Comment 10

19 years ago
No problem. if nsDocLoaderImpl doesn't leak, but nsXULDocument still leaks, give
it to me.
(Assignee)

Comment 11

19 years ago
Update: The changes I have in my tree to fix the doc loader leak + some changes
to fix webshells that were leaked by the global window have helped the mailnews
leak story. That is, these changes fix the webshells that were leaked per
message viewed. i.e. if you used to view 4 messages, we would leak 4 webshells.
However, this change doesn't have as dramatic as an effect on the leak count as
you might think because the webshells that were getting leaked before were being
destroyed. i.e. there destroy method was being called so they weren't leaking
anything they were holding on to. Just the actual webshell structure itself.

And you won't notice that gain until you close the mail app too.

That being said. When I view messages using todays bits, I'm not seeing jumps of
200K between message views. Sometimes it jumps up, sometimes it goes down
depending on how big the message size is. But I'm not seeing increasing  amounts
of memory useage.

Now that being said, the part that I think is still potentially a dogfood fix is
the fact that starting up mail OR the browser leaks the xul document associated
with the window for that application. I think this is contributing to a large
about of the 400K we leak when you start up and shut down the browser (as seen
on tinderbox)
(Assignee)

Comment 12

19 years ago
More info: I do see my memory useage jump about 200K when I display a message
but as soon as layout is done, the 200K is returned. i.e. if I view 5 or 6
messages that are about the same size, the mem useage when I view the first one
vs when I view the last one is about the same. The memory is being returned. But
i do see 200K spikes when laying out each message.

Also, I've been trying to look into the XUL Document leak for mail and the
browser. The bloat log swears it's getting leaked. Yet when I ask it for the
serial number of the leaked document it won't give me one! So I can't get a leak
tree for the xul document...arggh...

Comment 13

19 years ago
I'll look for that also. Do you have patches in your tree that I should pick
up? If so, spank 'em  into the bug.
(Assignee)

Comment 14

19 years ago
Chris, I take it back. I am getting logs of nsXULDocument. I was incorrectly
setting the class to be nsWebShell for ref count data and of course we weren't
leaking the webshell. =) So I'm getting a valid ref counting graph for the xul
document. There's one xtra ref count in there. I'm trying to track it down right
now but feel free to look too.

I'm attaching a patch which includes changes to globalWindowImpl, webshell,
extensions.
(Assignee)

Comment 15

19 years ago
Created attachment 3416 [details] [diff] [review]
current patch showing leak fixes in my tree for webshell and global window stuff
(Assignee)

Updated

19 years ago
Summary: [dogfood][leak] Clicking between adjacent messages leaks 200K-1M → [dogfood][leak] Leak 400K when starting up and shutting down the app
(Assignee)

Comment 16

19 years ago
I'm going to change the summary of this bug to represent what we are working on:
starting up and shutting down the browser leaks over 400K according to the leak
tools on tinderbox. The patches I've already posted fix any webshell leaks
related to viewing mailnews messages.


In addition to these patches, I'm slowly chipping away at leaks for starting up
and shutting down the browser. I found some nasty leaks in the cookie code and
the wallet code earlier and I now have my linux box down to the point where it
is leaking 350K instead of 425K. But I'm just chipping around the edges until we
can solve one of the big leaks which I think is the XULDocument.
(Assignee)

Comment 17

19 years ago
load balancing to waterson who will probably spot the xul document leak faster
than me anyhow. I also have leak fixes that help bring this number down. I'll
continue to try to help chris as well.
(Assignee)

Updated

19 years ago
Assignee: mscott → waterson
Status: ASSIGNED → NEW
(Assignee)

Comment 18

19 years ago
re-assigning for real this time.

Updated

19 years ago
QA Contact: suresh → gerardok

Updated

19 years ago
Depends on: 21661

Updated

19 years ago
Depends on: 21643

Updated

19 years ago
Status: NEW → ASSIGNED
Target Milestone: M12

Updated

19 years ago
Depends on: 21668
(Reporter)

Updated

19 years ago
Summary: [dogfood][leak] Leak 400K when starting up and shutting down the app → [dogfood][leak] Leak another 400K just viewing a pair of messages (each time)
(Reporter)

Comment 19

19 years ago
The bug is not a "startup and shutdown" bug.  Sorry the focus changed by
someone. I'm changing the title back to the original which involves about 1M
plus leakage from just viewing two distinct messages (even as you view them
again and again).  Two message views add up to a 1 meg leak.  That was the
bug... not a startup/shutdown issue.
(Assignee)

Comment 20

19 years ago
I'm sorry Jar. I was the one who morphed the subject title. Per my comments way
up there somewhere, I'm not seeing a leak between viewing messages. Yes we use
up 200K to display it but I see that memory getting freed.  I suppose I should
have marked this bug as works for me and we should have used a new one to track
the large start up and shutdown leaks.

One note about my inability to reproduce your problem: I now have lots of leak
fixes in my tree for the startup / shutdown problem including changes that
prevent us from leaking webshells when viewing messages)
(Reporter)

Comment 21

19 years ago
I just tried this on the 12/13 build number 15 on windows (today's build).
Without working too hard, (my second pair of messages I chose), I quickly found
a pair that leaks about 350K each time I cycle back and forth through them.  I
can do this again and again, with ever increasing total leak (not just
startup/stutdown).  If there is a startup/shutdown bug, that should be filed...
but tihs should really look at the problem.  Please ring me up or come by to see
(if my description is still not clear, and you can't reproduce).

Thanks,

Jim

Comment 22

19 years ago
jar: due to the nature of object ownership, if a XUL document is leaked, it
will leak all the things that it owns.
(Reporter)

Comment 23

19 years ago
I understand that if a XUL object is leaked, then all related objects go along
with it (leak) in a reference-count system.  The critical thing about this bug
is that in the course of an oft repeated user activity (reading email) that a
leak of several hundred K apppears for every couple of messages is a problem.

IF there was a 400K leak that related to startup and shutdown *ONLY*, then it
would not be a PDT+ bug ('cause it would not impact usability, since a user
doesn't startup and shut down tons of times in each session.... especially
mail... and especially with the startup/shutdown time performance in mail today).

I understand that the leaked object(s) will typically be small... I'm just after
finding the leaked object that transitively induces this mother of a leak. This
bug will be gone when that object (or the one or two big helpers) is isolated,
and repaired.

Comment 24

19 years ago
Every mail message is a XUL document.

Comment 25

19 years ago
jar: sorry, re-reading the comments in this bug, my previous comment was a
complete tautology ("an object leaks the objects it owns" -- duh). What I
*meant* to say was that the XUL document is a central object that has a very
high ownership fan-out. It leaks the document content model and script event
listeners, content model elements hold on to RDF datasources, etc. Fixing the
startup leak of the XUL documents on startup will be necessary (although
possibly not sufficient) to fix the leak of the mail message document.
(Assignee)

Comment 26

19 years ago
chris, here are two small leak fixes that I also have in my tree of the global
context. I'm attaching them here just so we are on the same page. Note: these
fix ref counting leaks on the global context but even with these changes, it is
still being leaked.
(Assignee)

Comment 27

19 years ago
Created attachment 3485 [details] [diff] [review]
fixes two leaks of the global context

Comment 28

19 years ago
Created attachment 3491 [details] [diff] [review]
more fixes. includes fixes for dependent bugs.
(Reporter)

Comment 29

19 years ago
Rpotts confirmed that he could not reproduce this in his non-commercial build.
He also confirmed that he could reproduce this in his comercial build.

mscott: When you said you could not reproduce, were you using the comercial
build?  Please try it there... in my demo for rpotts, we were losing about 600K
per pair of messages using the 12/13 build pulled from sweetlou.

The obvious question is why are we leaking so differently on comercial build??

Comment 30

19 years ago
In a tree without mods, I see a 80-100K leak per message in the mozilla build. I
see a 400-500Kb leak in the commercial build. I'm going to build a commercial
tree with these mods and see what happens.
(Assignee)

Comment 31

19 years ago
Some AIM stuff was recently turned on when displaying messages in the commercial
build. I bet my bottom dollar it's coming from there...That would explain the
difference.
(Assignee)

Comment 32

19 years ago
Jar, I think I may have forced your problem to be fixed incidently today. Please
see the PDT+ bug I filed on Alex Musil this afternoon: Bug #21710. Turns out
they had added some stuff recently (within the last week) that was causing an
address book to get opened for each email address in the email message. And I've
been told that the address book is very very leaky. Alex fixed this bug for me
and as a result we only open an address book up once per session instead of
every time you view a message.

I believe this was causing the leak you were seeing since it was only occurring
in the commercial build. I'm running a commercial debug build with alex's
changes for that bug and I'm seeing the same minimal leak behavior I reported to
you earlier on the mozilla build when viewing messages.

I belive we can mark this PDT bug fixed due to the changes in Alex's code tonight.
 Tell you what, why don't I wait and you can try out tomorrow's bits .if you
aren't seeing the better leak behavior that I'm seeing in the commercial build
we'll take a look again.

Comment 33

19 years ago
mscott: it looks like you are right. I just updated my tree, and am no longer

seeing a large leak each time a new message is opened.



jar: i'll let you mark this as fixed when the builds come out tomorrow.

Comment 34

19 years ago
It appears to be windows-specific. I cannot reproduce the leak in a commercial
Linux build.

Updated

19 years ago
Target Milestone: M12 → M13

Comment 35

19 years ago
lets check this out to see if it looks good in 12/15 builds.
if not fix in the first m13 carpool landings?
(Reporter)

Comment 36

19 years ago
Both Waterson and I can still see the leak in the 12/15 morning build :-(.
It is not gone :-/.

Updated

19 years ago
Priority: P3 → P1

Comment 37

19 years ago
jar: in a recent private email, you said that this seems to be down to l.t. 100K
per message now. should we declare victory, or continue work on this? Or did I
mis-read your email? ;-)
(Reporter)

Comment 38

19 years ago
I was either sleepy, or miscommunicated (and I can't find the messag in my
outbox :-/ ).  Anyway... the problem is still visible on the Netscape Window's
build, but not visible on the mozilla windows build, and not visible on the
Linux Netscape build.  Conclusion: This is a Windows only, Netscape version
only, leak.

Attached is a file that you can use to demonstrate the leak.  Please just cycle
back and forth between the two messages, and you should see about a 380K leak
per cycle.
(Reporter)

Comment 39

19 years ago
Created attachment 3612 [details]
Leak demo (cycle between messages in this local box to see leak)
(Reporter)

Updated

19 years ago
Blocks: 22176
(Reporter)

Comment 40

19 years ago
Created attachment 3681 [details]
This is a demo mailbox file, with two messages, that leaks 600-700K per viewing cycle
(Reporter)

Comment 41

19 years ago
Second try... this time I attached the actual mailbox, rather than the mailbox
index.   I tried a couple of messages.  I can't tell if the leak is larger
because of the number of folks on the To/CC list, or because of the complexity
of the message.  I could try to debug this down by hand editing the file so that
I could tell what caused the larger leak.  I'll do a little of this and comment
back in a moment.
(Reporter)

Comment 42

19 years ago
I tried reducing the number of folks on the CC list, and the leak persisted.  It
would appear that the size of the leak relates to the size and/or complexity of
the message, not the email address count in the headers.

Hope that helps,

Jim

p.s., This was tested on the M12 candidate for the comercial build, built 12/20

Comment 43

19 years ago
I see 320Kb per message on Linux; which is consistent with the 700Kb per
"cycle", as you were defining it, Jim.

Comment 44

19 years ago
Ok, I've just done the Dumb Thing and instrumented PR_Malloc() to dump a stack
trace when asked to allocate a block >8Kb. Turns out that message parsing does a
-lot- of this.

Specifically, we create ten to twenty 16-32Kb blocks (!) while parsing the
"data:" URL that (presumably) is holding the message data. This is done to copy
string data around. If leaked (or not re-usable because of subseqent
re-poritioning to smaller blocks), this would account for jar's 300Kb leak per
message.

See

http://lxr.mozilla.org/mozilla/source/netwerk/protocol/data/src/nsDataChannel.cpp#118

Specifically, the mURL->GetSpec() is creating 16-32Kb string copies!

Not only does this seem to be -tremendously- inefficient, my guess that, even if
it isn't leaking memory, it is seriously fragmenting memory.

Can we think of a better way to do this? Maybe adding a [noscript]
GetImmutableSpec() to nsIURI? (warren: this is your chance to say "I told you
so.") Or using a mechanism other than a "data:" URL to transfer message data?

Comment 45

19 years ago
add bienvenu to cc list, who is another leak hero on the mail team.
(Assignee)

Comment 46

19 years ago
Chris: i wouldn't spend too much time on this leak if it looks to be related to
the message body being placed in the data url. I've been spending the last few
days cooking up a new more efficient way to display messages that gets rid of
the data url. It's much more efficient and is improving message display times by
as much as 4 times on slower machines! Plus, it looks to be much more memory
efficient as well. I'm planning on landing it when I get back from the holidays
after New Years.

Comment 47

19 years ago
I will purify with Scott's new message loading stuff tomorrow and see what
happens.

Updated

19 years ago
Assignee: waterson → mscott
Status: ASSIGNED → NEW

Comment 48

19 years ago
mscott: giving this one to you. when you land your mail reading changes, test
it, see if we still leak. if we do, then you can give it back to me. if we don't
then close it!
(Assignee)

Updated

19 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 49

19 years ago
I'm not seeing the large message leaks after landing my new souped up message
display stuff. However, I don't consider myself to be a good test case 'cause I
had such a hard time getting the large leak to happen before too. Jar, are you
still seeing large leaks?
(Reporter)

Updated

19 years ago
Summary: [dogfood][leak] Leak another 400K just viewing a pair of messages (each time) → [leak] Leak another 400K just viewing a pair of messages (each time)
Whiteboard: [PDT+]
(Reporter)

Comment 50

19 years ago
The bug is now very small, and certainly not worthy of a PDT+ status (or even a
dogfood request).  I'm removing both comments.  I think the leak is now under
10K per toggle, which is still a bug... but nothing critical at this point.

IMO, you should mark this at least M15.

Thanks to all,

Jim
(Assignee)

Updated

19 years ago
Target Milestone: M13 → M15
(Assignee)

Comment 51

19 years ago
Thanks for the help jar. Bouncing out past beta 1 for the work involving
tracking down the 10K or so we are still leaking when displaying a message.
Keywords: mlk
(Assignee)

Comment 52

18 years ago
updating the summary to reflect that the last known leak cnt was under 10K for
some messages. I'm not actually seeing any signficant leakage displaying
messages according to the refcounting logs and by monitoring my memory useage on
windows when displaying messages.

I'm just going to go ahead and mark this fixed. I will of course continue to
keep an eye on leaks displaying messages as we go forward.
Summary: [leak] Leak another 400K just viewing a pair of messages (each time) → [leak] Leak another 10K just viewing a pair of messages (each time)
(Assignee)

Comment 53

18 years ago
fixed.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
(Reporter)

Updated

18 years ago
No longer blocks: 22176
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.