Closed
Bug 398219
Opened 17 years ago
Closed 17 years ago
function objects cloned by XPConnect keep hidden window alive late into shutdown
Categories
(Core :: XPConnect, defect, P2)
Tracking
()
RESOLVED
FIXED
People
(Reporter: dbaron, Assigned: igor)
References
Details
(Keywords: memory-leak)
Attachments
(3 files, 4 obsolete files)
13.44 KB,
patch
|
brendan
:
review+
brendan
:
approval1.9+
|
Details | Diff | Splinter Review |
2.80 KB,
patch
|
Details | Diff | Splinter Review | |
1.20 KB,
patch
|
Details | Diff | Splinter Review |
I was investigating why the NoScript extension leaks on shutdown, and I came across something that seems like a rather general problem. I have a cycle collector trace that looks like this: nsCycleCollector: JS Object (BackstagePass) 0xb733a7e0 was not collected due to 1 external references (643 total - 642 known) An object expected to be garbage could be reached from it by the path: JS Object (BackstagePass) 0xb733a7e0 JS Object (Object) 0xb692fd20 JS Object (Object) 0xb692c700 JS Object (XPCWrappedNative_NoHelper) 0xb7f5b9e0 JS Object (Function - removeObserver) 0xb0b23320 JS Object (Function - removeObserver) 0xb0b25bc0 JS Object (ChromeWindow) 0xb35d0260 nsGlobalWindow 0x8ab742c This is bad -- something in a JS component keeping a window alive. (It's not so bad since it's the hidden window.) From a JS_DumpGCHeap right before this, I have the following GC traces that provide a little bit more information: 0xb0b23360 Function b0b27858 via nsXPCWrappedJS[nsIFactory,0x8875418].mJSObj(0xb692fd20 Object)._instance(0xb692c700 Object).caps(0xb7f5b8c0 XPCWrappedNative_NoHelper).removeObserver 0xb0b27858 function removeObserver via nsXPCWrappedJS[nsIFactory,0x8875418].mJSObj(0xb692fd20 Object)._instance(0xb692c700 Object).caps(0xb7f5b8c0 XPCWrappedNative_NoHelper).removeObserver(0xb0b23360 Function).private 0xb0b25bc0 Function b0b27858 via nsXPCWrappedJS[nsIFactory,0x8875418].mJSObj(0xb692fd20 Object)._instance(0xb692c700 Object).caps(0xb7f5b8c0 XPCWrappedNative_NoHelper).removeObserver(0xb0b23360 Function).object What's going on here is that "caps" is an XPConnect wrapper for an nsIPrefBranch2 (representing preferences under "capability.policy."). Its removeObserver property is a cloned function object -- presumably created by a call to JS_CloneFunctionObject from xpc_CloneFunctionObject from DefinePropertyIfFound, which clones the function object cached on an XPCNativeMember. However, cloned functions in JS keep the original function alive, and in this case the original function's parent is the global of the safe JS context at the time we first used the function, which is the hidden window. (But by this point in shutdown, the hidden window is no longer the safe JS context -- we've created a new safe JS Context.) (The window being alive, in turn, prevents nsLayoutStatics from shutting down, which keeps many services, including nsXPConnect alive. This eventually gets partly cleaned up at shutdown, although not enough to destroy the window -- then again, it may not be the actual root cause of the leak, but I'm reasonably confident (thanks to DEBUG_CC output) that it's why the leak entrains the hidden window and thus everything held by nsLayoutStatics.) I'm not entirely sure that something is really wrong here, but it seems like if this weren't the case we'd probably have an easier time shutting down without leaking.
Comment 1•17 years ago
|
||
The "cloned function" concept seems to be causing us all kinds of problems :(
Assignee | ||
Comment 2•17 years ago
|
||
The reason the cloning keeps the original function alive is that the cloning, while sharing JSFunction between the clone and the original function object, keeps JSFunction.object pointing to the original. That pointer is used for 2 purposes: 1. The object is used as a hashtable to store argument name mappings through so-called hidden properties. 2. SpiderMonkey exposes JSFunction through public API and allows to access the function object through JSFunction. So to avoid keeping alive the original function object together with its parent chain the following has to be done: 1. The storage of hidden properties has to be moved into JSFunction itself. 2. JSFunction can be renamed into JSFunctionInternal and JSFunction defined as an alias for JSObject preserving API compatibility. I will try to see how much efforts is necessary for the first item.
Comment 3•17 years ago
|
||
It's true that the very old (1995, pre-dating SpiderMonkey) API design decision is causing us stress. If we can equate public JSFunction and JSObject pointer, and make the private state (suggest it still be called JSFunction -- the public type is opaque and used only to type-tag pointers) not link to any object, great. If you can get rid of SPROP_IS_DUPLICATE and related code, even better. Just don't add too much mechanism that duplicates the scope/property-tree, and don't break any torture tests that synthesize functions with lots (approaching the 64K limit) of locals/formals. Thanks, /be
Assignee | ||
Comment 4•17 years ago
|
||
It turned it is not that simple to move the storage for argument and variable names out of hidden properties to eventually address the leak. So I will split that process into separated bugs that can be tested and landed separately.
Assignee: nobody → igor
Assignee | ||
Comment 5•17 years ago
|
||
I filed bug 398609 with the intention to hide the details of argument and variable name lookup for the functions behind tailored API.
Reporter | ||
Comment 6•17 years ago
|
||
Just to be clear -- from a Firefox/Gecko perspective, I don't think this bug is particularly important.
Assignee | ||
Comment 7•17 years ago
|
||
(In reply to comment #6) > Just to be clear -- from a Firefox/Gecko perspective, I don't think this bug is > particularly important. On the other hand resolving it should reduce the memory footprint for the browser through better data structures in SM. Plus it will simply SM code as the patch for bug 398609 witnesses.
Comment 8•17 years ago
|
||
I *think* I'm seeing this in bug 388573 and bug 401345. Any chance we could fix this in 1.9?
Assignee | ||
Comment 9•17 years ago
|
||
(In reply to comment #8) > I *think* I'm seeing this in bug 388573 and bug 401345. Any chance we could fix > this in 1.9? > Yes: after landing bug 398609 and bug 399544 the fix would be straightforward.
Updated•17 years ago
|
Flags: blocking1.9?
Comment 10•17 years ago
|
||
Moving to B+, P2 based on the fact that this might help address the extension leaks. If that is an incorrect reading of the comments please adjust...
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Assignee | ||
Comment 11•17 years ago
|
||
[The patch should be applied on top of the patch from bug 404935 comment 7.] This is more of a workaround rather a proper solution. The idea behind the patch is to clear for to-be-cloned functions and regexps proto and parent slots. These slots are not used in any case so clearing them allows to break the references to the variable object which is used during compilation. This patch makes it very explicit that there is no need to create JSObject wrapper for JSFunction created during the compilation. Unfortunately even with the bug 399544 fixed the compiler still have to create the useless wrappers for API compatibility. The problem with the plan described in comment 2 is newScriptHook which takes JSFunction and from which an implementation may call JS_GetFunctionObject.
Attachment #289954 -
Flags: review?(brendan)
Assignee | ||
Comment 12•17 years ago
|
||
The patch from comment 11 should be applied on top of the patch from bug 404935 comment 10.
Comment 13•17 years ago
|
||
(In reply to comment #11) > This patch makes it very explicit that there is no need to create JSObject > wrapper for JSFunction created during the compilation. Unfortunately even with > the bug 399544 fixed the compiler still have to create the useless wrappers for > API compatibility. The problem with the plan described in comment 2 is > newScriptHook which takes JSFunction and from which an implementation may call > JS_GetFunctionObject. How about we break API compatibility by adding a leading cx parameter to JS_GetFunctionObject and making it fallible -- making it lazily construct the function's object when called? Or to keep API compatibility (but we don't need to for Mozilla 2 at least, and we probably could get away with select breakage where justified even for 1.9), let it return null but add a new JS_GetOrCreateFunctionObject, or some such? /be
Assignee | ||
Comment 14•17 years ago
|
||
(In reply to comment #13) > > The problem with the plan described in comment 2 is > > newScriptHook which takes JSFunction and from which an implementation may call > > JS_GetFunctionObject. ... > Or to keep API compatibility..., > let it return null but add a new JS_GetOrCreateFunctionObject, or some such? I see an incompatible change in the behavior of the API that still allows code to compile is worse than the change that would force to alter the exist code. Since the problem here comes from newScriptHook, I would suggest to change that not to taker JSFunction at all. Since the bug 379410 has already shown the deficiency of the current API, the change should address that bug as well and making at least one user of SM happier as a result.
Assignee | ||
Comment 15•17 years ago
|
||
This is a CVS diff version of the patch.
Attachment #289954 -
Attachment is obsolete: true
Attachment #290720 -
Flags: review?(brendan)
Attachment #289954 -
Flags: review?(brendan)
Comment 16•17 years ago
|
||
Comment on attachment 290720 [details] [diff] [review] v1b > static JSFunction * > NewCompilerFunction(JSContext *cx, JSTreeContext *tc, JSAtom *atom, > JSBool lambda) > { > JSObject *parent; >+ JSFunction *fun; > > parent = (tc->flags & TCF_IN_FUNCTION) ? tc->fun->object : cx->fp->varobj; >- return js_NewFunction(cx, NULL, NULL, 0, >- JSFUN_INTERPRETED | (lambda ? JSFUN_LAMBDA : 0), >- parent, atom); >+ fun = js_NewFunction(cx, NULL, NULL, 0, >+ JSFUN_INTERPRETED | (lambda ? JSFUN_LAMBDA : 0), Might be worth reparameterizing the few (formerly one) functions in jsparse.c that take JSBool lambda, to take uintN lambda, which is JSFUN_LAMBDA or 0, to avoid ?: here. Depends on how much it costs for callers to synthesize the flag constant vs. 1 in the "lambda" case. r/a=me with that, or not if it turns out to lose in code size. /be
Attachment #290720 -
Flags: review?(brendan)
Attachment #290720 -
Flags: review+
Attachment #290720 -
Flags: approval1.9+
Assignee | ||
Comment 17•17 years ago
|
||
This is an alternative to v1 that passes 0 or JSFUN_LAMBDA, not JS_FALSE or JS_TRUE to the function parsing code. This gives the same amount of code with GcC at -O, but it is slightly more readable. To emphasis that the type of lambda is numerical the patch uses explicit compare against 0 when testing if the function is a lambda.
Attachment #291131 -
Flags: review?(brendan)
Updated•17 years ago
|
Attachment #291131 -
Flags: review?(brendan)
Attachment #291131 -
Flags: review+
Attachment #291131 -
Flags: approval1.9+
Updated•17 years ago
|
Whiteboard: [needs landing]
Assignee | ||
Comment 18•17 years ago
|
||
I checked in the patch from comment 17 to the cVS trunk: Checking in jsfun.c; /cvsroot/mozilla/js/src/jsfun.c,v <-- jsfun.c new revision: 3.241; previous revision: 3.240 done Checking in jsinterp.c; /cvsroot/mozilla/js/src/jsinterp.c,v <-- jsinterp.c new revision: 3.392; previous revision: 3.391 done Checking in jsparse.c; /cvsroot/mozilla/js/src/jsparse.c,v <-- jsparse.c new revision: 3.314; previous revision: 3.313 done Checking in jsregexp.c; /cvsroot/mozilla/js/src/jsregexp.c,v <-- jsregexp.c new revision: 3.170; previous revision: 3.169 done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 19•17 years ago
|
||
Igor, I think this caused the following regressions: ecma/extensions/15-1.js MyObject.prototype.__proto__ == Object.prototype expected: true actual: false reason: wrong value ecma/extensions/15-1.js TestCase.prototype.__proto__ == Object.prototype expected: true actual: false reason: wrong value ecma_2/extensions/instanceof-002.js pat.__proto__.__proto__.__proto__.__proto__ == Object.prototype expected: true actual: false reason: wrong value ecma_2/extensions/instanceof-002.js pat.__proto__.__proto__.__proto__.__proto__.__proto__ == null expected: true actual: false reason: wrong value js1_3/extensions/proto_10.js pat.__proto__.__proto__.__proto__.__proto__ == Object.prototype expected: true actual: false reason: wrong value js1_3/extensions/proto_10.js pat.__proto__.__proto__.__proto__.__proto__.__proto__ == null expected: true actual: false reason: wrong value
Assignee | ||
Comment 20•17 years ago
|
||
I backed out the check in.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 21•17 years ago
|
||
The new version removes the code that tries to shares function.prototype between the clones to address the regressions above. The sharing effectively leads to the same kind of reference problems that this bug is about as it stores in a shared JSFunction.object an object that depends of the compilation scope.
Attachment #290720 -
Attachment is obsolete: true
Attachment #291131 -
Attachment is obsolete: true
Attachment #292825 -
Flags: review?(brendan)
Assignee | ||
Comment 22•17 years ago
|
||
Comment 23•17 years ago
|
||
Comment on attachment 292825 [details] [diff] [review] v3 Sorry, should have remembered this misbegotten code in fun_resolve. The test will need adjusting still, right? /be
Attachment #292825 -
Flags: review?(brendan)
Attachment #292825 -
Flags: review+
Attachment #292825 -
Flags: approval1.9+
Comment 24•17 years ago
|
||
(In reply to comment #22) > Created an attachment (id=292826) [details] > delta between v2 and v3 Looks corrupted, nearly empty with some quilt mumbling at the start. /be
Assignee | ||
Comment 25•17 years ago
|
||
Here is a file for real
Attachment #292826 -
Attachment is obsolete: true
Is this ready to reland?
Assignee | ||
Comment 27•17 years ago
|
||
I checked in the patch from comment 21 to the CVS trunk: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1198070280&maxdate=1198070451&who=igor%mir2.org
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 28•17 years ago
|
||
So it seems like the underlying bug here isn't really fixed -- I'm still seeing a window (presumably the hidden window) being kept alive through functions -- there's just one additional step in the GC traces now, so they look like this: nsCycleCollector: JS Object (BackstagePass) 0x2aaaafc9d500 was not collected due to external references An object expected to be garbage could be reached from it by the path: JS Object (BackstagePass) 0x2aaaafc9d500 JS Object (XPCWrappedNative_NoHelper) 0x2aaab5443880 JS Object (Function - newURI) 0x2aaab6d4c400 JS Object 0x2aaabab1b5a0 JS Object (Function - newURI) 0x2aaaba3ae0c0 JS Object (ChromeWindow) 0x2aaab6d52300 nsGlobalWindow 0xb67f70
Assignee | ||
Comment 29•17 years ago
|
||
(In reply to comment #28) Any hints about what is exactly that JS Object 0x2aaabab1b5a0?
Reporter | ||
Comment 30•17 years ago
|
||
It's a JS function (an object traced with JSTRACE_FUNCTION).
Assignee | ||
Comment 31•17 years ago
|
||
(In reply to comment #30) > It's a JS function (an object traced with JSTRACE_FUNCTION). This is strange as it means that there is a link from the clone to the function. I will check it an file a new bug.
Comment 32•17 years ago
|
||
I'm definitely still seeing this too. See for example: 0xa734f20 ChromeWindow 15e66f60 via /Users/peterv/Library/Application Support/Firefox-debug/Profiles/debug/extensions/{0538E3E3-7E9B-4d49-8831-A227C80A7AD3}/components/nsForecastfox.js(0x16ee7e20 BackstagePass).gHelpersBundle(0xf3a1ec0 XPCWrappedNative_NoHelper).GetStringFromName(0x15de6e40 Function).private(0x16edf6f0 function).object(0x16ee9680 Function).__parent__ I tried the hack in this attachment and a bunch of extension leaks go away, so we definitely need to figure out how to fix this. STOBJ_GET_PARENT(fun->object) and STOBJ_GET_PROTO(fun->object) are still true in xpc_CloneJSFunction, wasn't that what the patches in this bug were trying to change? I'm pretty sure this is not the right fix though. Igor, when is it ok to set parent and proto of fun->object to null?
Comment 33•17 years ago
|
||
Another way I can fix those leaks is by setting proto and parent of funobj to null in XPCNativeMember::Resolve (http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/xpconnect/src/xpcwrappednativeinfo.cpp&rev=1.16&mark=195#180).
Comment 34•17 years ago
|
||
I filed bug 412491 on the problem noted in comment 32.
Whiteboard: [needs landing]
You need to log in
before you can comment on or make changes to this bug.
Description
•