Closed Bug 51177 Opened 25 years ago Closed 25 years ago

message compose leaks 1M per window

Categories

(MailNews Core :: Composition, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: Bienvenu)

References

Details

(Keywords: memory-leak, Whiteboard: [nsbeta3+][PDTP2])

Attachments

(3 files)

Opening a message compose window leaks about 1 meg. I believe the leak is coming from a single un-rooted nsXULElement's script object. STEPS TO REPRODUCE: * setenv XPCOM_MEM_LEAK_LOG 1 * ./mozilla (to start browser window) * Ctrl-M (to start message compose) * click X (to close message compose) * click X (to close browser window) Leak log and list of leaked roots to be attached.
The tagname of the accused element is "treechildren".
There's an nsXULDocument leaking - won't that cause most of these leaks? I wonder if it's the same leak that the mail/news thread pane suffers from.
Yes. My guess above was that the nsXULDocument was leaking because there was a rooted script object for a treechildren element that prevented the nsXULDocument's global from being garbage collected and the nsXULDocument from being released.
Is this a dup of another bug against threadpane then?
Whiteboard: [b3 need info]
Not that I know of, no.
How is the treechildren element used in the mail compose window?
Is this really every single compose window?
Good point. There is a lott of shared structure, so it's about 1M for the first window and a little over 300K for each additional window. Also, it's not every window, since if you close the window immediately when you see it (before it finishes loading), it won't leak. But how many users do that?
300K per compose window is still a huge leak that is at least worth looking into and understanding.
Status: NEW → ASSIGNED
Target Milestone: --- → M18
[nsbeta3+]
Whiteboard: [b3 need info] → [nsbeta3+]
Moving to P2.
Priority: P3 → P2
I think the 300K figure is misleading. According to task manager, it's well over 1MB leaked per compose window opened. David Baron may be talking about numbers generated by the leak log, which only account for memory that's allocated for leak-tracing-enabled objects.
I think this bug involves clicking in the address widget. If I don't click in the address widget, the xul document is destroyed when the window is closed. This makes it even more similar to the thread pane problem, where the nsXULDocument leaks if you scroll the thread pane. That's probably a clue for someone who has a clue, unlike me :-). Something about the event handling is either causing a ref-counting problem or a circular reference.
You are correct that the 300K figure was only objects with leak logging. I see this leak even if I don't click anywhere in the window, as long as I let the window go through its full startup before closing it (with the X in the corner).
Oh... and I've already said the leaked JS root is probably the cause. What that means is that there's some content to which a SetDocument(nsnull, PR_TRUE, PR_TRUE) call did not get propogated. (Perhaps anonymous content? Maybe off of frames where there was a reframe (I know we miss this now - see bug 50999)? Or maybe something new???)
If I do a New Msg, the cursor is programmatically put in the address widget, and I see the leak. But If I do a reply, the cursor is put in the message body, and I don't see the leak (unless I click in an address widget).
I take it back about selecting the address widget causing the leak - all I know for sure is that New Msg leaks an nsXULDocument, and Reply does not. I changed new msg not to set focus in the address widget and that had no effect, nor does clicking in it for the reply case.
Where would there be a treechildren element that is accessed via JS during window open for new message but not for reply? (Perhaps some strange anonymous treechildren...)
To answer David Baron's question, I'm not sure. I would think the opposite would be true - reply would access more stuff through JS. What I did find out is that when the document viewer is getting destroyed, the mDocument nsXULDocument member var has 9 references when we enter the destructor, and 0 when we leave in the non-leak case, and 1 in the leak case. The pattern of releases is the same for both cases except when the nsIPresShell is released - in the non-leak case, releasing the nsIPresShell causes the ref count to go down by 2, but only 1 in the leak case. I'm not sure what all is held onto by the nsIPresShell because I haven't drilled down on that yet.
PDT agrees this is at least a P2, and if the leak is frequently encountered, could even be a P1
Whiteboard: [nsbeta3+] → [nsbeta3+][PDTP2]
Any clues what this could look like in JS? I'm trying to find out which code paths are different between reply and new msg, but it's difficult without knowing the code or being able to step through the js code :-(
Here's another way you might debug this (and I might try once my build finishes, although it's very slow over a remote connection...): In rdf/content/src/nsXULElement.cpp, in |GetScriptObject| (in the part where it creates the script object), put something you can break on (like an assertion) in an |if (tag.get() == nsXULAtoms::treechildren)|, and do the same thing for |SetDocument|. See how many calls to each there are (they should be unbalanced), and in particular see if the path to calling the unbalanced GetScriptObject shows you what kind of content this is...
thanks, David, I think that's going to be very helpful. We hit that code twice in both cases, but I suspect we hit it at different times in the two cases - New Msg hits it earlier than reply, because reply has to run the quoting url. Perhaps some state is different. I think I also found that disabling the code that sets the To: address in the reply case also causes the leak, but I need to verify that.
If I comment out the following line in addressingOverlayWidget.js: CompFields2Recipients(): awSetInputAndPopupFromArray(msgCompFields.SplitRecipients(msgCompFields.GetTo(), false), "addr_to", newTreeChildrenNode, templateNode); then the reply case *also* leaks the nsXULDocument. So something about setting this field (or setting it to a non-empty value) "fixes" the leak.
Was there a root to be released in both calls to SetDocument? Or did one of them not make it to the RemoveReference in one case...?
I haven't looked at that yet, but I have found the offending line of javascript and a potential fix. I need to make sure the new msg focus is still correct after my changes, though.
It would really be better to fix this by fixing the underlying problem rather than changing the JS, because this could be leaking similar amounts in other places.
I agree, but I'm pretty far away from understanding what the underlying cause is :-( I was also under the impression that it might be something like 50999, which looks like it might not get fixed right way. Perhaps the change I came up with might be a clue to the underlying cause. I'll attach the diffs.
Attached patch proposed fixSplinter Review
The affect of this change is to always execute these lines of code: ! var parent = treeChildren.parentNode; ! parent.replaceChild(newTreeChildrenNode, treeChildren); ! setTimeout("awFinishCopyNodes();", 0); JF, what do these lines of code do? They do seem to be related to treeChildren.
My completely undeducated guess about this is that we're cloning a node here: var newTreeChildrenNode = treeChildren.cloneNode(false); and then poking at it, but in the new msg case, failing to hook it up to the cloned node's parent, which I guess would make it orphaned.
Erm... I *think* this is a very basic problem. I think (based on a quick look at the code) that we leak the document whenever a document is used through JS to create an element that is not added to that document's content model. (I'm surprised we didn't see this earlier.) I have to look and see whether we set the document to null on creation, or whether we use the document whose factory method is being used. If the former, this could be easy to fix, although it means sacrificing persistence of user- decorated properties on elements before they are added to the document, in some cases. Does that make sense? I'll try to look into this more tonight...
(to respond to comment right before my previous one) Yes, my comment above was also suggesting that its orphaned state (of *never* being added to the document) is the problem.
Yes, that makes sense.
David, you can confirme your theory by setting a reply-to address and then do a New Message, you should not leak! What does the following code? ! var parent = treeChildren.parentNode; ! parent.replaceChild(newTreeChildrenNode, treeChildren); ! setTimeout("awFinishCopyNodes();", 0); When you load a compose message window, the tree has only one row. During the initialization, I build "off line" a new tree in which one I add all the needed rows. This for performance purpose. Then I swap the actual tree node with the one I've built dynamically. The last step (setTimeout("awFinishCopyNodes();", 0);) is a work around for bug 26344 & 26528) which I am pretty sure we don't need anymore!
I can see a similar leak on Mac: New Message leak about 500K, Reply leak about 45K. Also, we have 3 other small leak that have been identified: bug 50571, bug 50662, bug 50663
Depends on: 50571, 50662, 50663
reassigning to myself. What JF and I think we're going to do is check in my fix and then open a new bug on the base problem. Who would that be filed on, and what component? dom ?
Assignee: ducarroz → bienvenu
Status: ASSIGNED → NEW
...and it's the right stuff to do anyway. That will also fix bug 52179. Two for the price of one :-)
bienvenu, file it on DOM L1 unless it's XUL-only, in which case, file it on XUL (me).
fix checked in.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
It's XUL only. I'll file another bug, or two...
For reference, filed bug 52726 (and also the slightly related bug 52732).
Filed bug 53901 too. Both bug 53901 and bug 52726 need fixing to fix the root problem here. (Bug 52726 is a trivial fix.)
verifying this bug
QA Contact: lchiang → esther
Not verified on Mac but no significant leakage on Win and Linux
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: