Closed Bug 241518 Opened 20 years ago Closed 19 years ago

calling addEventListener with a closure holding a content node leaks the document

Categories

(Core :: DOM: Events, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: spaceshaker.geo, Assigned: dbaron)

References

Details

(Keywords: fixed1.8.1, memory-leak, testcase, Whiteboard: [patch])

Attachments

(5 files, 16 obsolete files)

436 bytes, text/html; charset=UTF-8
Details
1.02 KB, text/html; charset=UTF-8
Details
116.08 KB, patch
Details | Diff | Splinter Review
2.93 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
1.38 KB, patch
bzbarsky
: review+
bzbarsky
: superreview+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115

Adding an event listener to a DOM element using addEventListener() without ever
calling removeEventListener() before the page is unloaded causes a memory leak
in nsGenericElement. The memory for the event handlers is _never_ being cleaned
up and is being indicated by an assertion at nsGenericElement.cpp:820 (Moz1.6)
at shutdown.


Reproducible: Always
Steps to Reproduce:
1. Make sure mozilla is compiled with debug symbols.
2. Call addEventListener() on a DOM object with a non-empty anonymous function
as the handler.
3. Load the page in the browser.
4. Close mozilla. Notice the assertion

Actual Results:  
###!!! ASSERTION: nsGenericElement's event listener manager hash not empty at
shutdown!: 'sEventListenerManagersHash.entryCount == 0', file
nsGenericElement.cpp, line 820
Use the following page as a test case:

<html>
<head>
<script type="text/javascript">
function bodyLoad() {
    var test = document.getElementById("test");
    var func = function() { alert('Test'); }
    test.addEventListener('click', func, false);

    //Uncomment the below line if you want to see the problem disappear
    test.removeEventListener('click', func, false);
}

</script>
</head>
<body onload="bodyLoad();">
<input type="Button" id="test"/>
</body>
</html>
If you use GDB and set a break point at nsGenericElement.cpp at line 820 (or so)
and you run the above script you find that entryCount is equal to 1. If you make
a small patch (see below) to the test page you notice that for each call to
addEventListener() entryCount is incremented.

for(var i = 0; i < 1000; ++i) {
    var node = document.createElement('input');
    node.type = 'button';
    node.addEventListener('click', function() { alert('test'); }, false);
    document.body.appendChild(node);
}

You'll also note that once the memory is created for event listeners it is not
released until shutdown. Several of my collegues have successfully reproduced
this on a number of version of Mozilla (and IE for that matter). Our javascript
code base (over 1 meg of JS) would leak memory until until "Out of memory"
errors (I've seen it over 150MB). We added a patch to our code base to use
obj.onclick instead of obj.addEventListener() and this successfully solved the
leak (gdb reports entryCount == 0).
Well, at least one of the problematic cycles is document -> content -> event
listener manager -> event listener and then the following cycle back via JS:

08d2ee30 object 0x8fd2520 HTMLDocument via
nsXPCWrappedJS::mJSObj[nsIDOMEventListener,0x8fee3c0,0x8d2ffc0](Function).__parent__(Call).test(HTMLInputElement).__parent__(HTMLDocument).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Oh, I see.  The problem is the closure.  Where the "line to be uncommented" is,
another thing that would fix the leak is:

test = null;


This is because the test variable is in the scope chain for the anonymous
function (i.e., the function is a closure containing both code and variable
scope), so the function keeps the element alive, and keeping the element's JS
context alive keeps the document alive.

This makes me think we really do need to unhook event listeners in
SetDocument(nsnull).
(In reply to comment #5)
> Oh, I see.  The problem is the closure.  Where the "line to be uncommented" is,
> another thing that would fix the leak is:
> 
> test = null;
> 
> 
> This is because the test variable is in the scope chain for the anonymous
> function (i.e., the function is a closure containing both code and variable
> scope), so the function keeps the element alive, and keeping the element's JS
> context alive keeps the document alive.
> 
> This makes me think we really do need to unhook event listeners in
> SetDocument(nsnull).


I will test the above to verify.
> This makes me think we really do need to unhook event listeners in
> SetDocument(nsnull).

Unfortunately, nodes not in a document can still be targeted with events per the
DOM spec, as I understand it...
Per DOM you can't have nodes not in a document. So if ownerDocument has been set
to null, we're already past the point of non-compliance anyway.
(In reply to comment #8)
> Per DOM you can't have nodes not in a document. So if ownerDocument has been set
> to null, we're already past the point of non-compliance anyway.

SetDocument(null) doesn't set the ownerDocument to null. By "nodes not in a
document" bz probably means nodes having document and parent set to null but
ownerDocument not null.
(In reply to comment #9)
> SetDocument(null) doesn't set the ownerDocument to null. By "nodes not in a
> document" bz probably means nodes having document and parent set to null but
> ownerDocument not null.

Sigh, I meant to say "nodes having parentNode equal to null but ownerDocument
not null".
Yes, I'm talking about nodes whose parentNode chain does not include a document
node.  Put another way:

var foo = new Image();
foo.onerror = bar;
foo.onload = baz;
foo.src = "whatever";

One of those two event handlers should fire.
(In reply to comment #11)
> Yes, I'm talking about nodes whose parentNode chain does not include a document
> node.  Put another way:
> 
> var foo = new Image();
> foo.onerror = bar;
> foo.onload = baz;
> foo.src = "whatever";
> 
> One of those two event handlers should fire.


I may be stating the obvious here but as per my initial comments, setting the
event handler as a property (foo.onload) does not cause a leak. It only occurs
when using foo.addEventListener().
Yes, but the point is that event listeners should fire in general on detached
nodes, which means we probably can't clear out the hash when nodes are detached...
Well, the way to fix this that would provide the broadest API would be to add
additional C++ code to participate in the GC's mark phase, roughly as follows:

 * all of the roots (XPCWrappedNative or XPCWrappedJS, and probably others)
established by things "in" the document wouldn't be true GC roots, but would be
maintained by the document (this requires hacking XPConnect)

 * if any of the JS objects "in" the document is marked during the mark phase OR
if the document is currently being displayed, then all the documents roots are
marked.  (The case of the document currently being displayed could perhaps be
handled at the beginning of the GC to save a bit of work.)

(I put "in" in quotes because I've left it undefined, and defining it might be a
bit of work.)

I don't see any less complicated approach with which we don't either (a) have to
disconnect things at a given time or (b) leak.  Am I missing something?
Summary: Memory Leak in nsGenericElement.cpp caused by addEventListener() → calling addEventListener with a closure holding a content node leaks the document
We could make document participate in the mark phase, for sure.

Isn't it sufficient to know that any wrapper is "in" the document by checking
its XPConnect scope?

Cc'ing dbradley.

/be
Keywords: testcase
Perhaps I'm missing something, or misunderstanding something, but why should
there be any worry about raising events on detached elements?  If there is still
a reference (or chain of references) to a detached element, it can't be GC'd, so
its listeners should remain in place.  If there is no (chain of) reference to a
detached node, how can an event be raised on it?  The element is not in the
document, so the user can't click on it, keypress on it, or anything else, and
general document events can't traverse the document to reach the element.  Also,
there are no references to the element, so an event cannot be foisted on the
element by any JavaScript code because the code can't reach the element to do
the foisting.  Such an element should be GC'd along with all of its listeners,
no?  (Well, a given function may be attached to more than one element, so maybe
some of the listeners will survive, but their ref-count should drop by one, or
whatever.)

I'm not sure I've clearly explained myself, but I think the ideas that I meant
to express above would satisfy this bug, if implemented.  The memory footprint
of the project Curtis is mentioning grows most quickly when developers are
continually refreshing the browser to test changes to the code (I work with
Curtis, so I saw all the OOM errors).  If all the nodes (detached or otherwise)
were GC'd, along with their listeners, on document unload, most of this problem
would go away, or at least be mitigated.  If detached, unreachable nodes were
GC'd along with their listeners, I think everything about this bug would be
solved.  (Of course, the likelyhood that IE will suddenly fix its matching bug
is pretty close to nil, so we'll still have to use our workaround, but
whatever.)  Does it break the DOM spec to GC detached, unreachable DOM nodes?

Ian
FWIW, I think this is hitting http://maps.google.com/ in a big way.
Oh, Re comment 16:  the issue that makes this hard is that we do garbage
collection on the wrapper objects that reflect the DOM nodes into JavaScript,
but we don't garbage-collect the nodes themselves -- they're reference counted.
 A cycle that's partly reference-counted and partly garbage collected is just as
bad as one that's fully reference-counted.  See
http://www.mozilla.org/scriptable/avoiding-leaks.html (and also bug 283129
comment 0, but please don't add unnecessary noise to that bug -- or, for that
matter, this one).
Attached patch work in progress (obsolete) — Splinter Review
So I'm thinking that in the current world where we disconnect the global object
at a set time (likely to change with bryner's back/forward changes, I think),
there's no harm in disconnecting event listeners too.

This needs a lot more testing, and I need to figure out why the assertion
sometimes fires.
Assignee: events → dbaron
Status: NEW → ASSIGNED
Whiteboard: [patch]
Target Milestone: --- → mozilla1.8beta2
Attached patch work in progress (obsolete) — Splinter Review
Removal and reinsertion is a real problem, though.  I wish we had a "document
going away" instead of SetDocument.
Attachment #178015 - Attachment is obsolete: true
And this doesn't fix http://maps.google.com/ .  I'm guessing some of the
elements there were never in the document...
I just discussed this a bit with jst.  I think the thing to do here is, once bug
283129 lands:
 * make nsEventListenerManager QI added nsIDOMEventListener* objects to
nsIXPConnectWrappedJS , and for those that implement it, QI them to
nsISupportsWeakReference.  If they support nsISupportsWeakReference, store the
weak reference instead and store the nsIXPConnectJSObjectHolder (ancestor of
nsIXPConnectWrappedJS) in the preserved wrappers table created in bug 283129.
 * make the table of preserved wrappers handle things other than DOM nodes -- in
particular, the API needs to change nsIXPConnectWrappedNative to
nsIXPConnectJSObjectHolder, it needs to allow more than one wrapper per node
(probably just make the wrapper the key instead of the DOM node and change
ReleaseWrapper appropriately)
 * I need to find a way to call ReleaseWrapper so the preserved wrapper table
doesn't get bloated.  Not sure about this yet.
And it would be good to do this in a way that's reusable by other consumers of
interfaces that are marked [function] in the IDL.
Attached patch work in progress (obsolete) — Splinter Review
Attachment #178016 - Attachment is obsolete: true
Attached patch work in progress (obsolete) — Splinter Review
Attachment #178914 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #178978 - Attachment is obsolete: true
Comment on attachment 179236 [details] [diff] [review]
patch

This fixes both the testcase in this bug and the http://maps.google.com/ leak,
although it may be a little overkill as well.
Attachment #179236 - Attachment description: work in progress → patch
Comment on attachment 179236 [details] [diff] [review]
patch

Oh, and this patch doesn't technically compile since it requires tons of
REQUIRES changes.  The new header should probably be in content simply to avoid
that problem, but I wanted to discuss that...
Attached patch patch (obsolete) — Splinter Review
This one compiles since the new header is in content.  That may not be the
ideal location, but it reduces the need for REQUIRES changes.

Requesting review to at least get comments on this approach, if not actual
review.
Attachment #179236 - Attachment is obsolete: true
Attachment #179404 - Flags: review?(jst)
Attachment #182012 - Attachment description: testcase for recent (since 1.7) behavior change related to crash caused by this patch → testcase for recent (since 1.7) behavior change related to crash caused by this patch (filed as bug 292146)
Attached patch patch (obsolete) — Splinter Review
Merged to trunk.
Blocks: 295269
Attached patch patch (obsolete) — Splinter Review
Merged to trunk, and with null check to work around crash.
Attached patch patch (obsolete) — Splinter Review
Merged to trunk.
Attachment #187912 - Attachment is obsolete: true
Attached patch patch (no longer works) (obsolete) — Splinter Review
After merging with the splitwindow changes, this crashes on startup.  I've
removed these changes from my primary tree after running with them for over
three months.
I'd also note that I think I probably need to fix the XPConnect change so that
it doesn't allow all functions to support weak references, only certain ones. 
Not sure how, though.
(In reply to comment #36)
> After merging with the splitwindow changes, this crashes on startup.  I've
> removed these changes from my primary tree after running with them for over
> three months.

That actually probably wasn't from the merge, but was from the other
nsGlobalWindow::QueryInterface change that slipped into that patch.
The reason I'm crashing is that my marked JS function holder needs to get a
finalize callback to call ReleaseWrapper -- it's leaving bad wrappers in the
preserved wrapper table.
Blocks: 305536
Attached patch patch (obsolete) — Splinter Review
Attachment #190595 - Attachment is obsolete: true
Attachment #191158 - Attachment is obsolete: true
Attached patch patch (doesn't run) (obsolete) — Splinter Review
I think this is closer to what I want, but it doesn't run because the refcounting of XPConnect wrappers caused by HolderToWrappedJS pokes rt->gcPoke (by adding and removing a root) in the middle of every GC.
Attachment #199489 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
This patch works.  I decided to use the simplest ugly hack I could think of, which was an extra interface with dont-refcount semantics.
Attachment #202185 - Attachment is obsolete: true
Flags: blocking1.8.1?
Attached patch patch (obsolete) — Splinter Review
This has a few additional changes based on discussion with jst and mrbkap, and some additional comments.
Attachment #203057 - Attachment is obsolete: true
Attachment #203072 - Flags: superreview?(jst)
Attachment #203072 - Flags: review?(mrbkap)
Comment on attachment 203072 [details] [diff] [review]
patch

r=mrbkap
Attachment #203072 - Flags: review?(mrbkap) → review+
Comment on attachment 203072 [details] [diff] [review]
patch

Some nits I found:

Index: js/src/xpconnect/src/xpcwrappedjs.cpp
+    // This interface is a hack that says "don't AddRef me".
+    if(aIID.Equals(NS_GET_IID(nsWeakRefToIXPConnectWrappedJS))) {
nit: { gets its own line in xpconnect.
+        *aInstancePtr = NS_STATIC_CAST(nsWeakRefToIXPConnectWrappedJS*, this);
+        return NS_OK;

Index: dom/src/base/nsDOMClassInfo.cpp
   WrapperSCCEntry *entry = NS_STATIC_CAST(WrapperSCCEntry*, hdr);
-  new (entry) WrapperSCCEntry(NS_STATIC_CAST(nsIDOMNode*,
+  new (entry) WrapperSCCEntry(NS_STATIC_CAST(nsIDOMGCParticipant*,
                                 NS_CONST_CAST(void*, key)));
nit: these don't line up.

Index: dom/src/base/nsDOMClassInfo.h
+  /**
+   * Easier way to call the above just for DOM nodes (and better, since
+   * we get the performance benefits of having the same identity function.

nit: missing ).

Index: dom/src/base/nsGlobalWindow.cpp
   // Clear our mutation bitfield.
+  // XXX We don't always clear mutation listeners (aRemoveEventListeners)!

I'll fix this soon.

Index: dom/src/base/nsWindowRoot.cpp
+    mListenerManager->SetListenerTarget(
+      NS_STATIC_CAST(nsIDOMEventReceiver*,this));

Uber-nit: space after the comma, please.
Attached patch patchSplinter Review
This is merged to the trunk, and addresses mrbkap's comments.
Attachment #203072 - Attachment is obsolete: true
Attachment #203862 - Flags: superreview?(jst)
Attachment #203072 - Flags: superreview?(jst)
Comment on attachment 203862 [details] [diff] [review]
patch

- In js/src/xpconnect/idl/nsIXPConnect.idl:

+ * This interface is a complete hack.  It is used by the DOM code to get
+ * call QueryReferent on a weak reference to a wrapped JS object without

remove the "get"

- In nsEventReceiverSH::Mark():

+  nsISupports *native = wrapper->Native();
+  nsCOMPtr<nsIDOMGCParticipant> participant(do_QueryInterface(native));

Use do_QueryWrappedNative() here?

- In nsGlobalWindow::SetNewDocument():

   // Clear our mutation bitfield.
+  // XXX We don't always clear mutation listeners (aRemoveEventListeners)!
   mMutationBits = 0;

The aRemoveEventListeners argument no longer exists, and this does the right thing either way already (mutation bits are really only kept on the inner window, so clearing it on the outer does no harm). So remove that comment.

- In nsJSUtils.h:

+/**
+ * XXX WRITE ME
+ */

Yes, please do :)

The rest looks good to me. sr=jst
Attachment #203862 - Flags: superreview?(jst) → superreview+
Great patch, looking forward to it landing.  Word on the street is that closures as XMLHttpRequest handlers, probably again only if they entrain a DOM element in their environment, cause major leaks.  So we'd want to use nsMarkedJSFunctionHolder in more than just the ELM.

/be
(In reply to comment #48)
> Great patch, looking forward to it landing.  Word on the street is that
> closures as XMLHttpRequest handlers, probably again only if they entrain a DOM
> element in their environment, cause major leaks.  So we'd want to use
> nsMarkedJSFunctionHolder in more than just the ELM.

I think the XMLHttpRequest leak (is there a bug on that?) would be better solved by nulling out the 4 pointers (3 nsIDOMEventListener, 1 nsIOnReadyStateChangeHandler) at an appropriate time.  Presumably (not being at all an expert on XMLHttpRequest) while the HTTP request is being made there's no requirement that something hold onto the JS object for the XMLHttpRequest in order to get notifications when the XML loads; likewise there's a concrete point (OnStopRequest?) at which we know that the load either succeeded or failed and we'll never again notify the listeners.
Fix checked in to trunk.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.8beta2 → mozilla1.9alpha
Hrm.  Well, we do that already, actually, but that leaves bug 206520 plus perhaps a similar issue with successful multipart requests.
*** Bug 317818 has been marked as a duplicate of this bug. ***
Should nsGenericElement and nsGenericDOMDataNode append their ownerDocument to the reachable list?
That's a little scary to do. Wouldn't that cause leaking an element to leak the entire document?
(In reply to comment #53)
> Should nsGenericElement and nsGenericDOMDataNode append their ownerDocument to
> the reachable list?

Yes, they should.

(In reply to comment #54)
> That's a little scary to do. Wouldn't that cause leaking an element to leak the
> entire document?

No, and it only makes a difference for elements that have been removed from the document anyway.  (The document is still reachable from them thanks to ownerDocument; they're not reachable from it.)
One other thought: for nsIContent impls, both AppendReachableList() and GetSCCIndex() use only nsIContent apis, so we could even declare them non-pure-virtual on nsIContent and implement nsIContent::GetSCCIndex/AppendReachableList somewhere (eg in nsGenericElement.cpp?) instead of having two copies of that code in generic element and generic DOM data node.
we really need an nsIContent.cpp file, or somewhere where we place all the virtual  nsIContent functions that have default implementations.
Attachment #205010 - Flags: superreview?(jst)
Attachment #205010 - Flags: review?(bzbarsky)
Comment on attachment 205010 [details] [diff] [review]
add ownerDocument to reachable list

Looks good to me.  It looks like the reachable list must not contain null pointers, right?  In that case, nsGlobalWindow::AppendReachableList might need to null-check some things (eg an outer window is not guaranteed to have an inner, and we might end up in a situation where an inner has no outer).
Attachment #205010 - Flags: review?(bzbarsky) → review+
I think I should probably just null-check in nsDOMClassInfo.cpp after the call to AppendReachableList, i.e.,

-    MarkReachablePreservedWrappers(reachable[i], cx, arg);
+    if (reachable[i])
+      MarkReachablePreservedWrappers(reachable[i], cx, arg);
Attached patch null-checkSplinter Review
Attachment #205073 - Flags: superreview?(bzbarsky)
Attachment #205073 - Flags: review?(bzbarsky)
> I think the XMLHttpRequest leak (is there a bug on that?) would be better
> solved by nulling out the 4 pointers (3 nsIDOMEventListener, 1
> nsIOnReadyStateChangeHandler) at an appropriate time.

The "XMLHttpRequest leak" was merely a theory... not yet confirmed, so there is no bug on file (that I know of).
Bug 206520 is on file, and I think there may be additional problems with multipart -- see comment 51.
Comment on attachment 205010 [details] [diff] [review]
add ownerDocument to reachable list

sr=jst
Attachment #205010 - Flags: superreview?(jst) → superreview+
Comment on attachment 205073 [details] [diff] [review]
null-check

r+sr=bzbarsky
Attachment #205073 - Flags: superreview?(bzbarsky)
Attachment #205073 - Flags: superreview+
Attachment #205073 - Flags: review?(bzbarsky)
Attachment #205073 - Flags: review+
Both of those patches checked in to trunk.
Depends on: 319293
Depends on: 319642
Blocks: mlk1.8
Flags: blocking1.8.0.1?
Would love this for a 1.8.0.x release, but the still-open regressions are very scary.
Flags: blocking1.8.0.2?
Flags: blocking1.8.0.1?
Flags: blocking1.8.0.1-
*** Bug 323074 has been marked as a duplicate of this bug. ***
Is this (and regression fixes) ever going to be appropriate for an .0.x release, or should we just wait for 1.8.1?
(For what it's worth, I think http://maps.google.com/ has changed their code so they don't trigger this leak anymore.)
[oops, I thought I submitted this before the last comment, which was intended as an aside to this one]

Probably not, and perhaps not even for 1.8.1.
> (For what it's worth, I think http://maps.google.com/ has changed their code so
> they don't trigger this leak anymore.)

Yup, confirmed from my internal sources ;-)
This is too risky for the 1.8.0 branch, and still has open regressions.
Flags: blocking1.8.0.2? → blocking1.8.0.2-
*** Bug 334005 has been marked as a duplicate of this bug. ***
Is 321054 the only open, known regression remaining on the trunk?

This is a highly desireable fix for 1.8.1, I have to say...
*** Bug 337405 has been marked as a duplicate of this bug. ***
Flags: blocking1.8.1? → blocking1.8.1+
Fixed on MOZILLA_1_8_BRANCH by checkin of bug 336791.
Keywords: fixed1.8.1
*** Bug 351101 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.