Closed
Bug 28570
Opened 25 years ago
Closed 25 years ago
circular reference leak: nsJSContext and nsXULPrototypeDocument
Categories
(Core :: XUL, defect, P3)
Tracking
()
RESOLVED
FIXED
M15
People
(Reporter: dbaron, Assigned: waterson)
References
Details
(Keywords: memory-leak)
Attachments
(9 files)
6.62 KB,
text/plain
|
Details | |
30.97 KB,
text/plain
|
Details | |
10.18 KB,
patch
|
Details | Diff | Splinter Review | |
11.35 KB,
patch
|
Details | Diff | Splinter Review | |
3.67 KB,
patch
|
Details | Diff | Splinter Review | |
3.68 KB,
patch
|
Details | Diff | Splinter Review | |
12.61 KB,
patch
|
Details | Diff | Splinter Review | |
14.90 KB,
patch
|
Details | Diff | Splinter Review | |
15.08 KB,
patch
|
Details | Diff | Splinter Review |
Chris - this is the bug I mentioned to you in email last weekend, and I'm assigning it to you as you said I could. DESCRIPTION: An nsJSContext and an nsXULPrototypeDocument both seem to be leaking because they're holding references to each other. Both of these objects are probably holding on to a good bit of other stuff as well. STEPS TO REPRODUCE: * start mozilla * hit "Exit" in the profile manager (I'm 90% sure that's how I got the refcount balancer logs I'll attach) ADDITIONAL INFORMATION: I'll attach a refcount balancer log for both objects, made with --ignore-balanced. The log for the nsXULPrototypeDocument is huge, and I attempted to match up the addrefs and the corresponding releases with numbers at the left edge. The one with "??" at the left edge is the one that I **think** is leaking (which would mean this is a case of circular reference).
Reporter | ||
Comment 1•25 years ago
|
||
Reporter | ||
Comment 2•25 years ago
|
||
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Target Milestone: M15
Reporter | ||
Comment 3•25 years ago
|
||
I think the cause of this leak is that nsXULPrototypeDocument::GetScriptObject returns "this" as the script object. I don't think that's what's supposed to happen, because nsJSContext::InitContext creates a reference to the script object. In other words, I don't think a single class is intended to implement both nsIScriptObjectOwner and nsIScriptGlobalObject (or something like that).
Reporter | ||
Comment 4•25 years ago
|
||
I think my previous comment is wrong.
Reporter | ||
Comment 5•25 years ago
|
||
It might be worth comparing the way the JSContext is handled in this bug and the way it's handled in the webshell: http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#2185 (created in EnsureScriptEnvironment) http://lxr.mozilla.org/seamonkey/source/webshell/src/nsWebShell.cpp#624 (destroyed in ~nsWebShell)
Reporter | ||
Comment 6•25 years ago
|
||
I have a patch that fixes this problem and dramatically reduces the leaks when clicking "Exit" in the profile manager (to 2352 bytes, from 12668 bytes). However, if I don't click exit in the profile manager, it hangs during the destruction of the JSContext that used to be leaked, with the following stack trace: #0 0x404ce1bb in __sigsuspend (set=0xbfffed94) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 #1 0x4024dfe0 in pthread_cond_wait (cond=0x8185d14, mutex=0x8185cb8) at restart.h:49 #2 0x4022e958 in PR_WaitCondVar (cvar=0x8185d10, timeout=4294967295) at ptsynch.c:360 #3 0x4008ef01 in js_GC (cx=0x82ade78) at jsgc.c:743 #4 0x4008eda5 in js_ForceGC (cx=0x82ade78) at jsgc.c:678 #5 0x4006c3f9 in js_DestroyContext (cx=0x82ade78, gcmode=JS_FORCE_GC) at jscntxt.c:179 #6 0x400615b3 in JS_DestroyContext (cx=0x82ade78) at jsapi.c:794 #7 0x40315ae8 in nsJSContext::~nsJSContext (this=0x82b7fe8, __in_chrg=3) at nsJSEnvironment.cpp:185 #8 0x40315c91 in nsJSContext::Release (this=0x82b7fe8) at nsJSEnvironment.cpp:194 #9 0x402879da in nsCOMPtr<nsIScriptContext>::assign_assuming_AddRef ( this=0x82cb178, newPtr=0x0) at ../../dist/include/nsCOMPtr.h:416 #10 0x40287a20 in nsCOMPtr<nsIScriptContext>::assign_with_AddRef ( this=0x82cb178, rawPtr=0x0) at ../../dist/include/nsCOMPtr.h:787 #11 0x402908c7 in nsCOMPtr<nsIScriptContext>::operator= (this=0x82cb178, rhs=0x0) at ../../dist/include/nsCOMPtr.h:526 #12 0x40c6e695 in nsXULPDGlobalObject::SetContext (this=0x82cb170, aContext=0x0) at nsXULPrototypeDocument.cpp:437 #13 0x40c6d8ad in nsXULPrototypeDocument::~nsXULPrototypeDocument ( this=0x8282bc0, __in_chrg=3) at nsXULPrototypeDocument.cpp:174 #14 0x40c6da20 in nsXULPrototypeDocument::Release (this=0x8282bc0) at nsXULPrototypeDocument.cpp:180 #15 0x40c88664 in nsCOMPtr<nsIXULPrototypeDocument>::~nsCOMPtr ( this=0x827ff50, __in_chrg=2) at ../../../dist/include/nsCOMPtr.h:434 #16 0x40c37643 in nsXULDocument::~nsXULDocument (this=0x827fd70, __in_chrg=3) at nsXULDocument.cpp:505 #17 0x40c37e44 in nsXULDocument::Release (this=0x827fd70) at nsXULDocument.cpp:598 I'll attach the patch in a minute. Any ideas on what's going on? Is it a problem with the fix or an existing problem? Ignoring the crash, does the patch look like the right thing to do in the first place?
Reporter | ||
Comment 7•25 years ago
|
||
Reporter | ||
Comment 8•25 years ago
|
||
Actually, it's not a "crash" (as I called it), it just hangs (in thread stuff).
Reporter | ||
Comment 9•25 years ago
|
||
The cause of the hang seems to be the thread-safety code in JS_GC (see http://lxr.mozilla.org/seamonkey/source/js/src/jsgc.c#682 ), which doesn't, if I understand it correctly, account for the possibility that there could be garbage collection occurring on the same thread but for a different JS context. I neglected to look farther down the stack trace I included above. It's all within a JSContext::~JSContext on a different object: #0 0x404ce1bb in __sigsuspend (set=0xbfffed94) at ../sysdeps/unix/sysv/linux/sigsuspend.c:48 #1 0x4024dfe0 in pthread_cond_wait (cond=0x8185d14, mutex=0x8185cb8) at restart.h:49 #2 0x4022e958 in PR_WaitCondVar (cvar=0x8185d10, timeout=4294967295) at ptsynch.c:360 #3 0x4008ef01 in js_GC (cx=0x8332b40) at jsgc.c:743 #4 0x4008eda5 in js_ForceGC (cx=0x8332b40) at jsgc.c:678 #5 0x4006c3f9 in js_DestroyContext (cx=0x8332b40, gcmode=JS_FORCE_GC) at jscntxt.c:179 #6 0x400615b3 in JS_DestroyContext (cx=0x8332b40) at jsapi.c:794 #7 0x40315ae8 in nsJSContext::~nsJSContext (this=0x8200d50, __in_chrg=3) at nsJSEnvironment.cpp:185 #8 0x40315c91 in nsJSContext::Release (this=0x8200d50) at nsJSEnvironment.cpp:194 #9 0x402879da in nsCOMPtr<nsIScriptContext>::assign_assuming_AddRef ( this=0x82b7ff0, newPtr=0x0) at ../../dist/include/nsCOMPtr.h:416 ... #28 0x400b0dda in js_FinalizeObject (cx=0x82850a0, obj=0x839ae30) at jsobj.c:1387 #29 0x4008f567 in js_GC (cx=0x82850a0) at jsgc.c:891 #30 0x4008eda5 in js_ForceGC (cx=0x82850a0) at jsgc.c:678 #31 0x4006c3f9 in js_DestroyContext (cx=0x82850a0, gcmode=JS_FORCE_GC) at jscntxt.c:179 #32 0x400615b3 in JS_DestroyContext (cx=0x82850a0) at jsapi.c:794 #33 0x40315ae8 in nsJSContext::~nsJSContext (this=0x8285070, __in_chrg=3) at nsJSEnvironment.cpp:185 #34 0x40315c91 in nsJSContext::Release (this=0x8285070) at nsJSEnvironment.cpp:194 #35 0x402879da in nsCOMPtr<nsIScriptContext>::assign_assuming_AddRef ( this=0x820aa04, newPtr=0x0) at ../../dist/include/nsCOMPtr.h:416 #36 0x40287a20 in nsCOMPtr<nsIScriptContext>::assign_with_AddRef ( this=0x820aa04, rawPtr=0x0) at ../../dist/include/nsCOMPtr.h:787 #37 0x402908c7 in nsCOMPtr<nsIScriptContext>::operator= (this=0x820aa04, rhs=0x0) at ../../dist/include/nsCOMPtr.h:526 #38 0x4027790a in nsWebShell::~nsWebShell (this=0x820a970, __in_chrg=3) at nsWebShell.cpp:630 #39 0x40277f64 in nsWebShell::Release (this=0x820a970) at nsWebShell.cpp:730
Reporter | ||
Comment 10•25 years ago
|
||
To see what would happen, I commented out a few lines of jsgc.c (the check that there wasn't JS GC going on on another context (which said on another thread) so that the JS GC would be allowed, and then I crashed in the outer JS GC. To be more specific, this: Index: jsgc.c =================================================================== RCS file: /cvsroot/mozilla/js/src/jsgc.c,v retrieving revision 3.18 diff -u -r3.18 jsgc.c --- jsgc.c 2000/02/04 02:01:41 3.18 +++ jsgc.c 2000/02/27 20:54:40 @@ -737,6 +737,9 @@ JS_NOTIFY_REQUEST_DONE(rt); } +#if 0 + /* Is this necessary? We've already checked to make sure there + * isn't GC on this context. */ /* If another thread is already in GC, don't attempt GC; wait instead. */ if (rt->gcLevel > 0) { while (rt->gcLevel > 0) @@ -746,6 +749,7 @@ JS_UNLOCK_GC(rt); return; } +#endif /* No other thread is in GC, so indicate that we're now in GC. */ rt->gcLevel = 1; led to this: #0 __libc_free (mem=0xdadadada) at malloc.c:2914 #1 0x40061c7f in JS_free (cx=0x8284ce0, p=0xdadadada) at jsapi.c:1042 #2 0x400b0daa in js_FinalizeObject (cx=0x8284ce0, obj=0x839af18) at jsobj.c:1392 #3 0x4008f507 in js_GC (cx=0x8284ce0) at jsgc.c:895 #4 0x4008eda5 in js_ForceGC (cx=0x8284ce0) at jsgc.c:678 #5 0x4006c3f9 in js_DestroyContext (cx=0x8284ce0, gcmode=JS_FORCE_GC) at jscntxt.c:179 #6 0x400615b3 in JS_DestroyContext (cx=0x8284ce0) at jsapi.c:794 #7 0x40315ae8 in nsJSContext::~nsJSContext (this=0x8284cb0, __in_chrg=3) at nsJSEnvironment.cpp:185 #8 0x40315c91 in nsJSContext::Release (this=0x8284cb0) at nsJSEnvironment.cpp:194 #9 0x402879da in nsCOMPtr<nsIScriptContext>::assign_assuming_AddRef ( this=0x820aa0c, newPtr=0x0) at ../../dist/include/nsCOMPtr.h:416 #10 0x40287a20 in nsCOMPtr<nsIScriptContext>::assign_with_AddRef ( this=0x820aa0c, rawPtr=0x0) at ../../dist/include/nsCOMPtr.h:787 #11 0x402908c7 in nsCOMPtr<nsIScriptContext>::operator= (this=0x820aa0c, rhs=0x0) at ../../dist/include/nsCOMPtr.h:526 #12 0x4027790a in nsWebShell::~nsWebShell (this=0x820a978, __in_chrg=3) at nsWebShell.cpp:630 #13 0x40277f64 in nsWebShell::Release (this=0x820a978) at nsWebShell.cpp:730 I think fixing this requires someone who knows something about the JS code...
Comment 11•25 years ago
|
||
It looks to me like nsJSContext::~nsJSContext should probably use JS_DestroyContextNoGC instead of JS_DestroyContext. There is altogether too much GC'ing going on anyway. nsJSContext has its own method (nsJSContext::GC) for forcing a GC. It should use that as necessary. nsJSContext::GC might want to also add a re-entrance guard to its GC method. It think it is broken for embedding code to be doing nested GC on the same thread and JSContext. I don't think the engine should have to guard against that (I could be wrong). Also note that http://bugzilla.mozilla.org/show_bug.cgi?id=13350 gets into the need to use JS_BeginRequest/JS_EndRequest to protect against GC on one thread while other threads are doing stuff.
Reporter | ||
Comment 12•25 years ago
|
||
But it's not GC on the same JSContext. It's GC on two different JSContexts, and the engine doesn't seem to like that.
Reporter | ||
Comment 13•25 years ago
|
||
For reference, the differences in the leaks when clicking exit in the profile manager are: http://dbaron.student.harvard.edu/u/david/leaks/myfixes/28570-patch-after.txt http://dbaron.student.harvard.edu/u/david/leaks/myfixes/28570-patch-before.txt (I have a few other small leak fixes in my tree that could account for some differences. I last updated my tree this morning.)
Reporter | ||
Comment 14•25 years ago
|
||
The "differences" I mention are not differences between my before and after, but between my data and what you might see. (Although the things in my tree are most likely entirely insignificant...)
Comment 15•25 years ago
|
||
I'm suggesting something like... (in nsJSEnvironment) ::JS_SetGlobalObject(mContext, nsnull); - ::JS_DestroyContext(mContext); + ::JS_DestroyContextNoGC(mContext); + /* do our own GC that has re-entrance protection */ + GC(); ... and ... NS_IMETHODIMP nsJSContext::GC() { + /* XXX ASSUME single threaded calling. */ + /* Protect against re-entry */ + static PRBool busy = PR_FALSE; + if (busy) + return NS_OK; + busy = PR_TRUE; ::JS_GC(mContext); + busy = PR_FALSE; return NS_OK; }
Reporter | ||
Comment 16•25 years ago
|
||
But what happens if a JSContext is destroyed but never garbage-collected?
Assignee | ||
Comment 17•25 years ago
|
||
(/me rushes in yelling about things he hardly knows about. added brendan and vidur to wipe my chin.) I don't think JavaScript "contexts" have any relevance to the garbage collector at all (so worrying that a "context won't be garbage collected" is a red herring). I believe that the scope of the garbage collector spans the entire JS "runtime", which may include many contexts. I'm afraid of trying to explain what a JS "context" is for fear that I'll just confuse the issue with my partial understanding. (brendan? vidur? jband?)
Reporter | ||
Comment 18•25 years ago
|
||
That makes sense. jband's patch works. I'm not sure why I didn't just try it earlier... Anyway, I'll attach the complete diffs of what I have in my tree, which works. I made one additional minor change to the code in nsXULPrototypeDocument.cpp. Waterson, could you take a close look at those changes and see if they're right? I realize that the contents of nsXULPrototypeDocument.cpp should probably be put in a different order, but it's a lot easier to see what was changed when that's not done.
Reporter | ||
Comment 19•25 years ago
|
||
Assignee | ||
Comment 20•25 years ago
|
||
At first blush, these patches seem fine to me. I'm gonna try running with them for a bit to make sure nothing evil jumps out. vidur/brendan: could you review the changes to nsJSEnvironment? I'm trying to think of a situation where the nsXULPDGlobalObject could outlive the nsXULPrototypeDocument, and thus risk returning a dangling pointer when calling GetGlobalObjectOwner? (If this is in fact a problem, we could solve it by using nsWeakRef or hand-coding something to null the back-pointer in nsXULPrototypeDocument's dtor.) And yes, I agree, the code should be re-arranged. :-)
Reporter | ||
Comment 21•25 years ago
|
||
I think that problem about the dangling pointer is the one thing I changed between the two versions of the patch, other than adding the JS stuff. Note that the destructor for the XULPrototypeDocument now has a mGlobalObject->SetGlobalObjectOwner(nsnull); for just that possibility, even though mGlobalObject will most likely be destroyed almost immediately afterward. Is that sufficient?
Assignee | ||
Comment 22•25 years ago
|
||
Oops! You're right. That's perfect.
Comment 23•25 years ago
|
||
****WARNING**** Do NOT checkin the code I suggested that does a nsJSContext::GC() *after* calling ::JS_DestroyContextNoGC(mContext). This makes BAD things happen because nsJSContext::GC() uses 'mContext' which JS has just been destroyed. Duh! We could get by with just not doing a GC here - there are other calls to GC. But we are really holding a timebomb here. We need to fix the reentrance protection in js_GC. (As dbaron pointed out) in the JS_THREADSAFE case we just don't deal correctly with nested JS calls on the same thread but on different JSContexts. The check "if (cx->gcActive)" just doesn't cut it. This fix is going to have to be very carefully made because we don't want to screwup the fancy dancing in the "gc called on multiple threads" stuff with its wait and notify code. I'll think about this. Anyone want to propose a fix?
Comment 24•25 years ago
|
||
Damn, bugzilla mail got through my sabbatical mail filters. All that bad old gc interlocking code comes from server-land, where jawahar and I developed it long ago, and where there is exactly one context per thread. The client, OTOH, uses a context per DOM window, yet DOM windows are accessed by one thread only, so worlds collide. Even for server-type embeddings, there is a stupid, longstanding AB-BA deadlock between js_DestroyContext and js_GC (see the XXX commment in jscntxt.c); chouck@geocast.com ran into this but worked around it as others have, by using JS_DestroyContextNoGC (which looks like it was added as a hack to avoid fixing the deadlock). It seems to me any fix here must distinguish thread from context, using a new JSRuntime state variable, say void *gcThread; to point at the thread currently running the GC. I leave that as an exercise for one of you non-sabbaticalized JS guys. And please fix the js_DestroyContext vs. js_GC deadlock while you're at it, ok? /be
Comment 25•25 years ago
|
||
Comment 26•25 years ago
|
||
...Looking for reviewers.... My proposed patch to JS engine is attached above. The avoidance of re-entry on the same thread seemed pretty straightforward. As brendan suggested, I also include a fix for the potential deadlock when js_DestroyContext does js_GC and some other thread is waiting in js_GC for the first thread's request to end. My fix is a punt to return early from js_GC in the thread doing the js_DestroyContext. Comment on that appreciated.
Comment 27•25 years ago
|
||
r=brendan@mozilla.org, looks good (only one nit: nested assignment to currentThread in jsgc.c is a little funky, might pull it out to an assignment statement right before the if). Thanks, /be
Comment 28•25 years ago
|
||
Comment 29•25 years ago
|
||
Yeah, sometimes the urge to write clever code is too much for me. Patch with nit fixed is attached. I'll check it in when the tree opens after the beta1 branch. Thanks.
Reporter | ||
Comment 30•25 years ago
|
||
It turns out that either: * I did a really horrible job testing my fix * something else changed since I wrote it and it's impossible to click on links in the browser with my fix. I really have no idea what's causing this. It doesn't seem like it's because objects aren't leaked, because putting an NS_ADDREF_THIS in the constructor for nsXULPrototypeDocument doesn't fix things. I really have no idea what's causing this. I imagine that there could be something that's trying to QueryInterface an nsIScriptGlobalObjectOwner to an nsIScriptGlobalObject, or vice versa, or nsIScriptGlobalObject to something else, and that's now failing. However, I don't know what it is, or where to look. I guess the thing to try is writing QueryInterface functions for these two classes that return the other object in certain cases. Is this a good thing to do? Or could the problem be somewhere else?
Reporter | ||
Comment 31•25 years ago
|
||
Hacking the QueryInterface to return the other object fixes the problem. Is that bad style? Right now I have it as a mix of code and macros, which I'll need to clean up, unless there's an easy way to write a queryinterface like this...
Reporter | ||
Comment 32•25 years ago
|
||
Except that with this fix it's leaked again... back to the refcount balancer...
Reporter | ||
Comment 33•25 years ago
|
||
Reporter | ||
Comment 34•25 years ago
|
||
Reporter | ||
Comment 35•25 years ago
|
||
Comment 36•25 years ago
|
||
I just checked in the fixes to the JS engine to protect against the gc reentrance problem.
Comment 37•25 years ago
|
||
*IGNORE* - massive spam changing open XPToolkit bug's QA contact to jrgm@netscape.com
QA Contact: paulmac → jrgm
Reporter | ||
Comment 38•25 years ago
|
||
Fix checked in 2000-03-15 17:11 PST.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: jrgmorrison → xptoolkit.widgets
You need to log in
before you can comment on or make changes to this bug.
Description
•