Closed
Bug 294893
Opened 19 years ago
Closed 19 years ago
Major leakage on Linux balsa Dep GTK1 (gcc 3.4)
Categories
(Core :: XPConnect, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla1.8beta2
People
(Reporter: Gijs, Assigned: bzbarsky)
References
()
Details
(Keywords: memory-leak, perf, regression)
Attachments
(9 files)
4.84 KB,
patch
|
Details | Diff | Splinter Review | |
6.50 KB,
patch
|
Details | Diff | Splinter Review | |
7.93 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
benjamin
:
approval1.8b2+
|
Details | Diff | Splinter Review |
2.25 KB,
patch
|
jst
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
14.98 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
1.43 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
3.84 KB,
patch
|
brendan
:
review+
brendan
:
approval1.8b2+
|
Details | Diff | Splinter Review |
3.36 KB,
patch
|
bzbarsky
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
1.89 KB,
patch
|
brendan
:
review+
brendan
:
approval1.8b3+
|
Details | Diff | Splinter Review |
As of the 2005-05-19 20:26 build, trace_malloc_leakage on this build has skyrocketed from 300k to 1.35MB I've attempted to CC everyone who checked in something that might relate to this bug. I apologize if I was overzealous here, I'm trying to do the best I can while not knowing an awful lot about Tinderbox / build mechanisms. Leak graph: http://build-graphs.mozilla.org/graph/query.cgi?tbox=balsa&testname=trace_malloc_leaks&autoscale=1&size=&days=7&units=bytes<ype=&points=&avg=1&showpoint= Bonsai timeframe: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-05-19+17%3A52%3A00&maxdate=2005-05-19+20%3A26%3A00&cvsroot=%2Fcvsroot Note: this probably doesn't belong in Firefox > General, but I have no idea where else it ought to go for now. In a somewhat similar timeframe, the Seamonkey build also got 200k extra leakage - I don't know if this is related.
Comment 1•19 years ago
|
||
Regression from bug 281988 i think -- see bug 281988 comment 144
Keywords: regression
Reporter | ||
Comment 2•19 years ago
|
||
Marking depends per comment 1 and discussion on IRC
Depends on: 281988
Assignee | ||
Updated•19 years ago
|
Assignee | ||
Comment 3•19 years ago
|
||
Minimal testcase is in the URL bar.... For some reason, we end up with: ###!!! ASSERTION: nsGenericElement's event listener manager hash not empty at shutdown!: 'sEventListenerManagersHash.entryCount == 0', file ../../../../mozilla/content/base/src/nsGenericElement.cpp, line 804 and leak the world when we load that testcase. I've confirmed that disabling the XPCNativeWrapper stuff (by just hacking NativeInterface2JSObject to never do it) avoids the leak. :( We really don't want to ship 1.8b2 with this. Debugging now...
Assignee: nobody → dbradley
Component: General → XPConnect
Flags: blocking1.8b2?
Keywords: mlk
Product: Firefox → Core
QA Contact: general → pschwartau
Assignee | ||
Updated•19 years ago
|
Assignee: dbradley → bzbarsky
Priority: -- → P1
Target Milestone: --- → mozilla1.8beta2
Updated•19 years ago
|
Flags: blocking1.8b2? → blocking1.8b2+
Updated•19 years ago
|
Flags: blocking1.8b2+ → blocking1.8b2?
Updated•19 years ago
|
Flags: blocking1.8b2? → blocking1.8b2+
Assignee | ||
Updated•19 years ago
|
Comment 4•19 years ago
|
||
I don't fully understand this yet, but this is IMO what we want regardless of whether this really does fix the leak (which it appears to do with the trivial testcase in this bug). What the current code does is to mark mNativeWrapper when a wrapped native is marked, and that makes the nw stay around as long as the wn stays around. I don't see a reason for that, if all script references to a nw are gone then we might as well let the nw go, even if the wn is still referenced (in a different scope, most likely). I'm going to check this in now, there's noone around to r/sr, so bz, please r+sr after the fact, and back this out if there's something really wrong with this. I want testing for this in tomorrows build, and I want to see how this affects the leak numbers ASAP. With some luck this'll make the leak go away and we can ship with this even if we don't yet fully understand all aspects of this leak etc.
Updated•19 years ago
|
Attachment #184387 -
Flags: superreview?(bzbarsky)
Attachment #184387 -
Flags: review?(bzbarsky)
Comment 5•19 years ago
|
||
Probable leak fix checked in, a=asa (over IRC).
Comment 6•19 years ago
|
||
The first cycle with the leak fix (double-checked with build log CVS output that the fix is in) still leaks like before.
Comment 7•19 years ago
|
||
We need to understand the leak to fix it. Also, I think we want that mark of the nw from its wn, or auto-magically created nw's will be ephemeral. That's bloaty and (if you can stick a property on an nw, as you can for certain ids) arguably user-hostile. Did anyone get a GC-heap dump that matched the one bz saw, where the BackstagePass object was implicated, and certain prototype chains had one too many Object protos near the end? /be
Assignee | ||
Comment 8•19 years ago
|
||
jst, I think comment 7 is right -- we want to keep the XPCNativeWrapper alive just like we keep XPCWrappedNatives alive if they have random user-defined props on them.... So this is what we found yesterday, just so we all have the same info. Note that I was testing with the minimal testcase in this bug, so we may also have other issues (given that jst's change fixes this testcase but not tinderbox). 1) The leak on this testcase went away if every time XPCNativeWrapper::GetNewOrUsed was called I had it create a new wrapper _and_ set the new wrapper on the XPCWrappedNative. If I just had it create a new wrapper every time but left the wrapped native alone if it already had an mNativeWrapper, we still leaked. This is probably the issue that jst's patch tries to address... 2) I beat things about the head and shoulders till we stopped leaking the JS runtime service (through XBL and XUL), so I got a printout of the remaining roots at runtime shutdown. I also hacked XBL to use slightly more informative root names. This is what I get at shutdown: JS engine warning: leaking GC root ' -- chrome://global/content/bindings/scrollbar.xml#scrollbar -- nsXBLProtoImplMethod::mJSMethodObject' at 0x85a7d64 JS engine warning: leaking GC root 'res->input' at 0x876d7a4 JS engine warning: leaking GC root 'initScrollbar -- chrome://global/content/bindings/scrollbar.xml#scrollbar -- nsXBLProtoImplMethod::mJSMethodObject' at 0x85a80a4 JS engine warning: leaking GC root 'res->input' at 0x8598624 JS engine warning: 4 GC roots remain after destroying the JSRuntime. These roots may point to freed memory. Objects reachable through them have not been finalized. So the leaking roots are two regexp-related things and the initScrollbar method and constructor of the scrollbar binding.... I'm not really sure where to go with that. The address 0x85a80a4 does not appear in the gc heap dump that I (or emacs i-search, and I did just search for 85a80a4) can see. What _does_ appear in the dump, is this: 08473ef0 object 0x87712d8 Window via global object(Window). I'm not really sure what that means, exactly. Is the global object an implicit GC root or something? Still trying to sort out how I can use this gc heap dump... jst and I were also a little concerned about the calls to JS_NewObject for XPCNativeWrappers whose parent is null (i.e. the wrapper for the window's wrapped native). We tried setting the parent to the window's JSObject (the one off the wrapped native), but that didn't seem to help.
Assignee | ||
Comment 9•19 years ago
|
||
I posted the full output of a leaking build at http://web.mit.edu/bzbarsky/www/gc_log.txt (and a gzipped version at http://web.mit.edu/bzbarsky/www/gc_log.txt.gz if people want to download it via "save link as" -- the file is about 4MB). In addition to the GC heap dump, this file contains two more types of debug output: Lines like: Created wrapper: '[object Window @ 0x87712d8]', object is: 0x8473ef0 mean we created an XPCWrappedNative for a Window object, the address of the XPCWrappedNative is 0x87712d8 and the address of its flat JSObject is 0x8473ef0. Lines like: Created extra-wrapper: 0x8474f20, for: [object Window @ 0x87712d8] mean we created an XPCNativeWrapper for the XPCWrappedNative at address 0x87712d8. The address of the XCPNativeWrapper JSObject is 0x8474f20. The gc dump is also somewhat hacked; I changed the path-dumping in gc_dump_thing as follows: while (next) { path = JS_sprintf_append(path, "%s(%s).", next->name, gc_object_class_name(next->thing)); + if ((*js_GetGCThingFlags(next->thing) & GCF_TYPEMASK) == GCX_OBJECT) { + path = JS_sprintf_append(path, "[was %p]", + (JSObject*)next->thing); + } next = next->next; } in the hopes of getting addresses for things in the path. Note that this puts the address after the '.' because I was lazy... So a line like: 08473ef0 object 0x87712d8 Window via global object(Window).[was 8473ef0] would mean that the JSObject at 0x08473ef0 (with private data 0x87712d8, so this is exactly our Window object above) had on the path the JSObject at 0x08473ef0, if I understand what this code is doing right... I didn't include the actual log XPCOM_MEM_LEAK_LOG creates, since all it tells me is that we leak a window and a document and everything that entails. Note that this is all without jst's patch. I'll try applying that now and seeing whether I can create a minimal leak testcase with it applied.
Comment 10•19 years ago
|
||
(In reply to comment #7) > We need to understand the leak to fix it. Also, I think we want that mark of > the nw from its wn, or auto-magically created nw's will be ephemeral. That's > bloaty and (if you can stick a property on an nw, as you can for certain ids) > arguably user-hostile. What id's would that be? I can't imagine we would ever want anyone to be able to store anything on these wrappers. What am I missing?
Comment 11•19 years ago
|
||
Attachment #184427 -
Flags: superreview?(bzbarsky)
Attachment #184427 -
Flags: review?(bzbarsky)
Comment 12•19 years ago
|
||
Attachment #184428 -
Flags: superreview?(bzbarsky)
Attachment #184428 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 13•19 years ago
|
||
Comment on attachment 184428 [details] [diff] [review] Same as above w/ more newborn object protection. r+sr=bzbarsky; requesting approval for yet more attempted leak-fixing.
Attachment #184428 -
Flags: superreview?(bzbarsky)
Attachment #184428 -
Flags: superreview+
Attachment #184428 -
Flags: review?(bzbarsky)
Attachment #184428 -
Flags: review+
Attachment #184428 -
Flags: approval1.8b2?
Comment 14•19 years ago
|
||
Comment on attachment 184428 [details] [diff] [review] Same as above w/ more newborn object protection. per deerpark meeting, any reviewed patch for this bug is auto-approved
Attachment #184428 -
Flags: approval1.8b2? → approval1.8b2+
Comment 15•19 years ago
|
||
Additional change landed now too.
Comment 16•19 years ago
|
||
Bz's the man, he figured this out (assuming that this *really* does fix the leak, and we think we know why it would now). What's happening here is that we end up with a wrapped native for a content window and a native wrapper for that WN. Chrome code accesses content.document, which ends up wrapping the content document and sticking it in the property cache on the NW. If the window is then closed, we end up clearing the scope, but leaving the NW around with a reference to the document. At least before my earlier changes, the cycle was something along these lines: window > window::mNativeWrapper > nsHTMLDocument > child nodes > nsEventListenerManager > nsJSEventListener > nsJSContext > window With my earlier changes the cycle must be different, but I haven't been able to reproduce a leak w/ my earlier changes so I can't say for sure what the cycle really is. This patch is going in, r+sr=bzbarsky (over IRC).
Updated•19 years ago
|
Attachment #184466 -
Flags: superreview+
Attachment #184466 -
Flags: review+
Comment 17•19 years ago
|
||
Great work, sorry I wasn't around to help. jst, I'm still wondering (comment 7): don't we want to mark mNativeWrapper from XPCWrappedNative::MarkBeforeJSFinalize? And therefore not need a finalize hook in XPCNativeWrapper? At least toString may be overridden on the nw, and it seems like a waste to let the GC destroy all the "faulted-in" ids on an busy nw for a live wn. /be
Comment 18•19 years ago
|
||
balsa is green again, still leaks like before.
Assignee | ||
Comment 19•19 years ago
|
||
So in the range in which leaks went up, the following patches went in: Bug 294808 Bug 294831 Bug 294630 Bug 294815 Bug 281988 -- first patch only (so wrappers NOT enabled at that point) Bug 258650 I'm pretty sure the first three things on that list don't leak. Bug 294815 could, in theory, have issues... Bug 281988 could have issues if wrappers are being created manually or if somehow code was getting wrappers because it had a null script filename... or if there is a bug in the rest of the code somewhere, but I don't see one. Bug 258650 is probably safe. There _are_ some manual XPCNativeWrapper creations in the code -- see <http://lxr.mozilla.org/seamonkey/search?string=XPCNativeWrapper%28>. The only ones I see in Firefox code are pageinfo and the DOMWillOpenModalDialog handler in tabbrowser. I'll try removing the latter and seeing what happens, I guess. It really would be nice if I could reproduce the leak at this point...
Assignee | ||
Comment 20•19 years ago
|
||
I think I've finally reproduced this. Balsa is reregistering components on every single startup (why? I have no idea). If I remove my compreg.dat and then start, I do see a leak. It's not leaking through the wrappers added in bug 281988, though...
Comment 21•19 years ago
|
||
Attachment #184556 -
Flags: superreview?(bzbarsky)
Attachment #184556 -
Flags: review?(bzbarsky)
Updated•19 years ago
|
Attachment #184427 -
Flags: superreview?(bzbarsky)
Attachment #184427 -
Flags: review?(bzbarsky)
Updated•19 years ago
|
Attachment #184387 -
Flags: superreview?(bzbarsky)
Attachment #184387 -
Flags: review?(bzbarsky)
Comment 22•19 years ago
|
||
Regarding #294893 [Core]-Major leakage on Linux balsa Dep GTK1 (gcc 3.4) [Lin] Attempt 4.... GTK2+ is out, why haven't they updated? I have noticed this recently myself, 512MB of ram and fresh reboot say, I am using near 32% of it, then loads FF and it goes up to near 80% use. Does not even release it either. Has to do a Cold reboot to clear it out. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050524 Firefox/1.0+ ID:2005052421 One last note, when I control T (open new Tab) then try to control v (paste) tonew tab, it does not default to the URL like it did in the previous build I had. Ok Graphical Representation here [url=http://images5.theimagehosting.com/Memory1.png]With FF Loaded[/url] [url=http://images5.theimagehosting.com/Memory2.png]Without FF loaded and a Reboot[/url] would have shown here, but for some reason they are the size of my Display 1280x1024 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050525 Firefox/1.0+ ID:2005052509 Since I did this, the Prometheus Build is what had the problem.
Assignee | ||
Comment 23•19 years ago
|
||
Comment on attachment 184556 [details] [diff] [review] Also clear the scope of every explicitly created nw, and put back the marking of the "primary" nw when wn's are marked. > Index: js/src/xpconnect/src/XPCNativeWrapper.cpp + JS_ClearScope(d->cx, (JSObject*)entry->key); ::JS_ClearScope, please. >Index: js/src/xpconnect/src/xpcjsruntime.cpp >+ mExplicitNativeWrapperMap(XPCNativeWrapperMap::newMap(XPC_NATIVE_WRAPPER_MAP_SIZE)), You need to delete this map in ~XPCJSRuntime. You also need to null-check this map in newXPCJSRuntime(). r+sr=bzbarsky with those fixed.
Attachment #184556 -
Flags: superreview?(bzbarsky)
Attachment #184556 -
Flags: superreview+
Attachment #184556 -
Flags: review?(bzbarsky)
Attachment #184556 -
Flags: review+
Assignee | ||
Comment 24•19 years ago
|
||
So I've been debugging the leak on startup and registration. We seem to be leaking two chrome windows -- the hidden window and the main browser window. I can sorta tell them apart, since the hidden window is a ChromeWindow with an HTMLDocument hanging off it: 086bc088 object 0x86e3b60 HTMLDocument via XPCWrappedNativeProto::mJSProtoObject(XPC_WN_ModsAllowed_Proto_JSClass @ 0x081c65f0).outerWidth getter(Function @ 0x08653580).__proto__(Function @ 0x08653578).__proto__(Function @ 0x08502f20).__proto__(Object @ 0x08502f50).__proto__(Object @ 0x08502e88).__parent__(ChromeWindow @ 0x08502e48).document(HTMLDocument @ 0x086bc088). 08502e48 object 0x851cbf0 ChromeWindow via XPCWrappedNativeProto::mJSProtoObject(XPC_WN_ModsAllowed_Proto_JSClass @ 0x081c65f0).outerWidth getter(Function @ 0x08653580).__proto__(Function @ 0x08653578).__proto__(Function @ 0x08502f20).__proto__(Object @ 0x08502f50).__proto__(Object @ 0x08502e88).__parent__(ChromeWindow @ 0x08502e48). The other window we're leaking is: 081c65f8 object 0x85ad1c0 ChromeWindow via atom(AnyName @ 0x0812c438).constructor(Function @ 0x0821e930).__proto__(Function @ 0x081c66d8).__proto__(Object @ 0x081c6c10).__proto__(Object @ 0x081c6640).__parent__(ChromeWindow @ 0x081c65f8). This is the root cause of the leaks; leaking this window means we leak various stuff that entrains the chrome window as the compilation scope (you can see that above). This is the only path via which this window is marked. The leak log says "(1 references; 0 from COMPtrs)", so the only ref to it is held from the XPCWrappedNative we're leaking here. Brendan, what's this atom(AnyName) thing that claims to be a GCX_OBJECT in its typemask (otherwise it wouldn't get an "@ pointer" thing to go with it)? More importantly, why's it acting like a root? Fwiw, I checked with auto-wrapping completely disabled (and there's no manual wrapping in Firefox outside of page info, which I did not open), and the startup leak was still there....
Assignee | ||
Comment 25•19 years ago
|
||
We (brendan, jst, I) have no idea why this wasn't a problem before, but this AnyName atom is what's entraining the non-hidden chrome window... Just landed this patch, with r+a=brendan, sr=jst.
Attachment #184577 -
Flags: superreview+
Attachment #184577 -
Flags: review+
Assignee | ||
Comment 26•19 years ago
|
||
And it looks like that patch causes crashes during standard class init -- we manage to somehow lose our just-constructed proto for the AnyName class to GC.
Assignee | ||
Comment 27•19 years ago
|
||
So the reason for the crashes is that js_GetAnyName returns rt->atomState.lazy.anynameAtom if that's set. If we don't pin the atom, that can get GC'd, and then we're returning bogus stuff. Since that atom is also the proto for the class, as far as I can tell, bad things happen when we try to js_SetClassPrototype in JS_InitClass. We have to be initing the class not for the first time, but that's ok -- we have plenty of global objects around. Brendan, this is sounding like an all-you kinda thing...
Assignee | ||
Comment 28•19 years ago
|
||
This just nulls out the atomstate entries when the atom is finalized. With this change, I don't leak and don't crash, even with TOO_MUCH_GC.
Assignee | ||
Comment 29•19 years ago
|
||
Comment on attachment 184597 [details] [diff] [review] Better, stronger, faster brendan, whatcha think?
Attachment #184597 -
Flags: review?(brendan)
Comment 30•19 years ago
|
||
First balsa build after bz' checking. RLk:816B Lk:307KB MH:9.48MB A:259K
Comment 31•19 years ago
|
||
The following is for comparative purposes (presumably to be used in working out whether this should be "Status -> RESOLVED, Resolution -> FIXED" or not.) The previous values before this bug was filed according to http://forums.mozillazine.org./viewtopic.php?t=271385&postdays=0&postorder=asc&postsperpage=15&start=15 (and backed-up by bug 281988 comment 144) were: RLk:780B Lk:308KB MH:8.00MB A:247K
Assignee | ||
Comment 32•19 years ago
|
||
Marking fixed, again pending Brendan's review. Remaining issues, if any, should get separate bugs.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 33•19 years ago
|
||
Looks like both beast and pacifica tinderboxes crash on TP after this commit
Comment 34•19 years ago
|
||
Updated•19 years ago
|
Attachment #184615 -
Flags: superreview?(bzbarsky)
Attachment #184615 -
Flags: review?(bzbarsky)
Assignee | ||
Updated•19 years ago
|
Attachment #184615 -
Flags: superreview?(bzbarsky)
Attachment #184615 -
Flags: superreview+
Attachment #184615 -
Flags: review?(bzbarsky)
Attachment #184615 -
Flags: review+
Comment 35•19 years ago
|
||
finalize hook change landed.
Comment 36•19 years ago
|
||
Bug 295634 for random crashes on Win after last part landed
Comment 37•19 years ago
|
||
Comment on attachment 184597 [details] [diff] [review] Better, stronger, faster Great, thanks for covering. /be
Attachment #184597 -
Flags: review?(brendan)
Attachment #184597 -
Flags: review+
Attachment #184597 -
Flags: approval1.8b2+
Comment 38•19 years ago
|
||
His code should compile to the same machine code as mine, for release builds. I changed some names, registerized rt, and added an assertion. I'll check this in for 1.8b3. /be
Attachment #184728 -
Flags: review+
Attachment #184728 -
Flags: approval1.8b3+
Comment 39•18 years ago
|
||
*** Bug 295634 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•