Closed Bug 391318 Opened 17 years ago Closed 16 years ago

Leak EM-related stuff after bug 382963

Categories

(SeaMonkey :: UI Design, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
seamonkey2.0a3

People

(Reporter: ajschult784, Assigned: kairo)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak, regression, Whiteboard: [Workaround: comment 8])

Attachments

(3 files, 2 obsolete files)

From bug 382963,
comment 14:
with this landed, [I] see 5616 bytes leaked on bloatcycle.html

comment 16:
We now load EM on startup, so if it leaks, then so do we. File a bug on EM?
--------------------------------

Further investigation shows that

Before bug 382963, loading the EM (Tools->Add-on Manager) did not cause a leak.
Firefox leaks nothing whether I opened the Add-on Manager or not.
If I load the Add-on Manager in SeaMonkey now, I leak ~240K.

It seems that the change from 382963 is keeping something alive.
> It seems that the change from 382963 is keeping something alive.

From looking at the leak stats, could this be the RDF Service/Datasources?

> Firefox leaks nothing whether I opened the Add-on Manager or not.

Did you try leaving closing the Add-on manager until last? That way the RDF datasource will hopefully be freed/closed in a similar way as when we shut the browser down.
> Did you try leaving closing the Add-on manager until last? That way the RDF
> datasource will hopefully be freed/closed in a similar way as when we shut the
> browser down.

That also does not leak in firefox.
If you start with a non-browser window, open EM, open browser, does that leak?
After the followup fix for bug 383116, we seem to leak even more, at least from what the nye test results tell us.
> If you start with a non-browser window, open EM, open browser, does that leak?

No.  Opening addressbook, then EM, then browser, I don't get a leak.
Please can you try to see which combinations (if any) of these statements leak:
(using ./seamonkey -silent -jsconsole)

top.em=Components.classes['@mozilla.org/extensions/manager;1'].getService(Components.interfaces.nsIExtensionManager);

top.ds=Components.classes['@mozilla.org/rdf/rdf-service;1'].getService(Components.interfaces.nsIRDFService).GetDataSource('rdf:extensions');

If they leak, do they leak in Firefox and/or Thunderbird too? Thanks.
> Please can you try to see which combinations (if any) of these
> statements leak:

no leak in seamonkey or firefox
Curiouser and curiouser.

If you open the browser, then open the JS console, then evaluate this, does that still leak?

this.ds=top.opener.document.getElementsByAttribute('datasources','rdf:extensions')[0].database;while(this.ds.GetDataSources().hasMoreElements())this.ds.RemoveDataSource(this.ds.GetDataSources().getNext())

If you change datasources="rdf:extensions" to datasources="rdf:local-store" in navigatorOverlay.xul does that still leak? (Unlikely)
I saw a random comment on another bug that mentioned datasources, so I don't know if its anything to do with this problem or not but just in case:

https://bugzilla.mozilla.org/show_bug.cgi?id=392959#c1
> If you open the browser, then open the JS console, then evaluate
> this, does that still leak?

No

> If you change datasources="rdf:extensions" to datasources="rdf:local-
> store" in navigatorOverlay.xul does that still leak? (Unlikely)

No
(In reply to comment #10)
>>If you open the browser, then open the JS console, then evaluate
>>this, does that still leak?
>No
Interesting that manually removing the EM datasource from the database fixes the leak; I thought that the composite datasource specifically participates in cycle collection to avoid leaks!
(In reply to comment #11)
> Interesting that manually removing the EM datasource from the database fixes
> the leak; I thought that the composite datasource specifically participates in
> cycle collection to avoid leaks!

Just because an object participates in cycle collection doesn't mean that any cycle involving it will be collected. A cycle will only be collected if all objects that (directly or indirectly) hold a strong reference to an object in the cycle participate in cycle collection themselves. You can always manually break a cycle that the cycle collector can't collect, but you want to look for references that the cycle collector doesn't know about to really fix this.
(In reply to comment #12)
>Just because an object participates in cycle collection doesn't mean that any
>cycle involving it will be collected. A cycle will only be collected if all
>objects that (directly or indirectly) hold a strong reference to an object in
>the cycle participate in cycle collection themselves. You can always manually
>break a cycle that the cycle collector can't collect, but you want to look for
>references that the cycle collector doesn't know about to really fix this.
Then why should breaking a reference between two cycle-collected objects work?
(In reply to comment #13)
> Then why should breaking a reference between two cycle-collected objects work?

Not really sure what you mean, breaking a cycle usually makes all the objects in the cycle go away, whether they're participating in cycle collection or not.
(In reply to comment #14)
>(In reply to comment #13)
>>Then why should breaking a reference between two cycle-collected objects work?
>Not really sure what you mean, breaking a cycle usually makes all the objects
>in the cycle go away, whether they're participating in cycle collection or not.
Well, we have an RDF template that uses the rdf:extensions data source. This apparently leaks 5616 bytes on on shutdown, however if we remove the data source from the template database and quit then we don't leak those bytes.
(In reply to comment #15)
> Well, we have an RDF template that uses the rdf:extensions data source. This
> apparently leaks 5616 bytes on on shutdown, however if we remove the data
> source from the template database and quit then we don't leak those bytes.

So something else is holding on to an object in that cycle. Could be a second cycle through some objects unknown to the GC, or just an edge into the cycle that is not part of a cycle itself.
OK, so I spent some fruitless time trying to track down this leak.
When starting up with the browser window and shutting down, the data source has a refcount of 2 (including 1 in case of weak references) before shutdown.
When starting up with the browser window, disconnecting the composite DS, and shutting down, the data source is deleted on shutdown.
When starting up with the JS console, opening EM, opening browser, the data source has a refcount of 4 before shutdown!
So, during a straight browser startup/shutdown, there are three references to the extensions datasource:
* one from the JS component loader
* one from the service manager
* one from the composite data source
The first two references get removed during shutdown so there are no uncollectable references to the extensions datasource. I haven't looked for uncollectable references to the composite data source as yet.
OK, so on shudown the RDF composite data source has two outstanding references. I'm assuming one here is the EM datasource; the other appears to be the local store, which unfortunately isn't cycle collected yet...
OK, so it's not the local store; that doesn't support RDF observers.
Having been unable to find any non-JS references to the EM DS, I tried looking at the composite DS instead.

The composite DS is created as part of creating the template builder. When the composite DS is created, it obviously only has the one reference (nsXULTemplateQueryProcessorRDF::GetDatasource::compDB). We then call compDB->AddDataSource(ds); (where ds is the EM DS). compDB calls aDataSource->AddObserver(this); which passes the compDB info xpconnect (since the EM DS is a JS object). It gets QI'd into XPCWrappedNative::GetNewOrUsed::identity (1+1=2). A new XPCWrappedNative is created (2+1=3). It gets QI'd by XPCWrappedNative::InitTearOff (3+1=4). XPCWrappedNative::GetNewOrUsed::identity gets destroyed (4-1=3). The EM DS then calls this._inner.AddObserver(observer), which makes XPCConvert::JSObject2NativeInterface QI it (3+1=4). _inner is an XML DS, which passes the compDS to its mInner, which is a memory DS, which adds compDS to its mObservers nsCOMArray (4+1=5). XPCWrappedNative::CallMethod then releases a ref to balance XPCConvert::JSObject2NativeInterface (5-1=4). nsXULTemplateQueryProcessorRDF::GetDatasource then copies compDS to its db local, since there's no infer data source (4+1=5), then QI's it to its out parameter (5+1=6), then releases its locals (-2=4). nsXULTemplateBuilder::LoadDataSourceUrls (which owns that out parameter as its mDataSource nsCOMPtr) then copies that to its mCompDB and mDB nsCOMPtrs (4+2=6). The nsXULDocument then checks the hookup a couple of times, but that doesn't effectively alter the reference count.

After the document load, frames are constructed and script is evaluated (the fields evaluate at an unsafe time bug). This triggers a GC, which sweeps the XPCWrappedNativeTearOff created earlier (-1=5).

When the window is closed, the cycle collector collects the nsXULTemplateBuilder object thus releasing its three references (-3=2).

Summary of references:
1. nsXULTemplateQueryProcessorRDF::GetDatasource::compDB
   Local nsCOMPtr, added by do_CreateInstance, relased on return
2. XPCWrappedNative::GetNewOrUsed::identity, local nsCOMPtr, added by
   do_QueryInterface, released on return
3. XPCWrappedNative::mIdentity, member raw ptr, added by
   XPCWrappedNative::XPCWrappedNative
4. XPCWrappedNativeTearOff::mNative, member raw ptr, previously added by
   QueryInterface in XPCWrappedNative::InitTearOff, released by GC
5. dummy reference, added by XPCConvert::JSObject2NativeInterface, released by
   XPCWrappedNative::CallMethod
6. InMemoryDataSource::mObservers, member nsCOMArray, added by AddObserver
7. nsXULTemplateQueryProcessorRDF::GetDatasource::db, local nsCOMPtr, added by
   assignment, relased on return
8. nsXULTemplateBuilder::mDataSource, member nsCOMPtr, added by outparam,
   released by Unlink
9. nsXULTemplateBuilder::mCompDB, member nsCOMPtr, added by do_QueryInterface,
   released by Unlink
10.nsXULTemplateBuilder::mDB, member nsCOMPtr, added by do_QueryInterface,
   released by Unlink

Note that references 3 and 6 are still outstanding on shutdown.
Cycle for reference 3:
The composite DS's mObservers[1] refers to a (wrapped) JS object that contains an array that contains a wrapped native for the composite DS.
Cycle for reference 6:
The composite DS's mObservers[2] refers to a (wrapped) JS object that refers to an RDFXMLDataSource that refers to an InMemoryDataSource that contains an nsCOMArray that contains a reference to the composite DS.
Do you know why we don't collect those cycles? I think all those objects participate and the edges you mention are known to the cycle collector, so it must be something else holding onto one of the objects in one of those 2 cycles. Debugging this will be painful because by definition we're looking for an edge, into one of a number of objects, that the cycle collector doesn't know about. I think you could try calling nsCycleCollector_DEBUG_shouldBeFreed on the composite DS after the cycle collector has unlinked the nsXULTemplateBuilder and look at the debugging output.
(In reply to comment #22)
>Do you know why we don't collect those cycles?
ajschult, are you quite sure the EM itself doesn't leak? It's the only thing holding on to the composite DS.
> ajschult, are you quite sure the EM itself doesn't leak? It's the only thing
> holding on to the composite DS.

Do you mean, "Does just opening the EM (without opening a navigator window) cause leaks" or "Is the EM one of the things that gets leaked when just opening a navigator window"?

I'm certain that the answer to the first is "no".  I don't know how to diagnose the second.  I don't know what it would show up as in the leak log.
OK, I still don't know whether the EM itself is leaked, but I verified that ExtensionManager._shutdown() is called and I added |this._ds = null;| so that it would explicitly drop its reference to the data source (this is what you're referring to as the "composite DS", right?).  This did not fix the leak.
(In reply to comment #25)
>OK, I still don't know whether the EM itself is leaked, but I verified that
>ExtensionManager._shutdown() is called and I added |this._ds = null;| so that
>it would explicitly drop its reference to the data source (this is what you're
>referring to as the "composite DS", right?).  This did not fix the leak.
No, that's what I'm referring to as the EM DS.
So I'm seeing several leaks of in-memory and xml data sources caused by the reference added in RDFServiceImpl::GetDataSourceBlocking never being released.  So it's looking to me like some of the JS stuff here leaks and entrains things...
In particular, doing:

    this._ds._inner = null;
    this._ds = null;

in _shutdown cleans up some but not all of the leaks.  So the ExtensionManager JSObject must be leaking somehow.

This part of the JS heap dump is rather suspicious:

0xafdec800 global_for_XPCJSContextStack_SafeJSContext 8a14fc8 via global object
I added CTOR/DTOR logging to XPCJSContextStack and we don't seem to leak those.  So I'm not sure hwo we leaked a global_for_XPCJSContextStack_SafeJSContext here...
Is it possible that the last CC run happens before ~XPCJSContextStack?
And in fact that is EXACTLY what happens.  I added printfs to both the mOwnsJSContext case in ~XPCJSContextStack and 
nsCycleCollector::Shutdown and one of the former happens after the latter.

And it looks like we kill that stack from ~nsXPConnect, which doesn't happen until CapsModuleDtor calls nsScriptSecurityManager::Shutdown.  I wonder whether we leak things through this safe context somehow...
Changing caps doesn't help, because then ~nsXPConnect happens under nsXPConnect::ReleaseXPConnectSingleton called from xpcModuleDtor.

That's where the GC dump happens too (before the release).  But I don't see anything in the JS heap dump that would obviously point to thigns we care about being leaked through the global_for_XPCJSContextStack_SafeJSContext...
We are leaking an XPCSafeJSObjectWrapper function wrapper through that.  Not sure whether that matters....  For certain, but the time we do the shutdown GC heap dump the only JS GC roots left are the nsXPCWrappedJS[nsIRDFDataSource,0x86fbbf0].mJSObj and the global_for_XPCJSContextStack_SafeJSContext.
Do we have any progress on that? It would be nice to have tinderbox leaks numbers return to something reasonable...
This should probably get moved to a core component, or at least depend on some core bugs.  Until we fix the issue in comment 31, not much else we can do here...
I know too little about the details here, but I guess it's best to file comment 31 as a new core bug against CC or so and make this depend on that one. Could you do that?
Depends on: 406914
OS: Linux → All
Whiteboard: [Workaround: comment 8]
Attached file Leak Log (obsolete) —
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1a2pre) Gecko/20080823140037 SeaMonkey/2.0a1pre] (home, optim default) (W2Ksp4)

Ftr, leaks 8557 bytes from 383 objects.
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1a2pre) Gecko/20080823180458 SeaMonkey/2.0a1pre] (home, debug default) (W2Ksp4)

Ftr, leaks 9369 bytes from 386 objects.

Fwiw,
this is a little worse than the Windows optim build:
+ 1 nsBaseAppShell, + 1 nsRunnable, + 1 nsThread;
(and 'Per-Inst' size is increased a little on some of the objects).

NB: I removed the additional debug lines from the report.
No longer blocks: SmTestFail
Assignee: jag → nobody
QA Contact: ui-design
Maybe trivial cases from bug 457440 could give a hint for this bug ?
I don't think so, as that's just a symptom. We already know we're leaking the full EM RDF, so anything that changes that info stored by EM in size changes that size of the leak, but we need to break the cycle or whatever it is that causes it in the first place.
No longer blocks: 454524
This bug is so disturbing all our leak testing :(

There must be a solution how we can break this cycle when shutting down, i.e. when we don't need to use the EM RDF any more. We can put a listener for app shutdown into nsSuiteGlue and remove the connection to that datasource in some way from there, would that help?
The other possibility is to rework the menu and build it with JS instead of RDF templates, which avoids the problem altogether.

Who would be able to work on either of those solutions, given they work?
I attached a patch and tool in bug 466157 that helps me in finding the objects that have missing edges for cycle collection. Need to apply the patch and build with DEBUG_CC. Don't know if it'll help.
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko/20081217 SeaMonkey/2.0a3pre] (home, optim default) (W2Ksp4)
(http://hg.mozilla.org/mozilla-central/rev/4a4a42520901
 +http://hg.mozilla.org/comm-central/rev/877154dd96a2 + bug 469606 patch)

I just noticed that
bug 465556 reduced this leak by 7 kB on all 3 platforms :-)

mochitest: 22.5 -> 15.5 kB
chrome   :  8.5 ->  1.5 kB
browser  :  8.5 ->  1.5 kB
(See bug 450983 where I will add more detailed figures.)
Attachment #335181 - Attachment is obsolete: true
Depends on: 465556
Flags: wanted-seamonkey2?
(In reply to comment #43)
> This bug is so disturbing all our leak testing :(
> 
> There must be a solution how we can break this cycle when shutting down, i.e.
> when we don't need to use the EM RDF any more. We can put a listener for app
> shutdown into nsSuiteGlue and remove the connection to that datasource in some
> way from there, would that help?
> The other possibility is to rework the menu and build it with JS instead of RDF
> templates, which avoids the problem altogether.
> 
> Who would be able to work on either of those solutions, given they work?

Is there a reason you can't use the code from comment 8 in an unload handler for any window using the theme menu?
Attached patch remove datasource on unload (obsolete) — Splinter Review
This patch reduces leaks from a mochitest chrome run by 1460 bytes on my machine (Linux debug, self-built), and no RDF stuff is left over, so I guess it really fixes that one (what I have left locally seems to be related to some plugin things, at least to a part).

This is basically what comment #8 had, but giving the element an explicit ID so that calling it more more straight-forward.
Assignee: nobody → kairo
Status: NEW → ASSIGNED
Attachment #353909 - Flags: superreview?(neil)
Attachment #353909 - Flags: review?(neil)
Comment on attachment 353909 [details] [diff] [review]
remove datasource on unload

>+  while(extDS.GetDataSources().hasMoreElements())
>+    extDS.RemoveDataSource(extDS.GetDataSources().getNext())
My quick-and-dirty code wasn't well written - you should of course only call GetDataSources once and there should be a space after the while.

Another possibility that comes to mind is to set the datasources attribute to "rdf:null" which should also clear out the database.
This patch should fix Neil's comments. The alternative variant seems not to work for me.
Attachment #353909 - Attachment is obsolete: true
Attachment #353917 - Flags: superreview?(neil)
Attachment #353917 - Flags: review?(neil)
Attachment #353909 - Flags: superreview?(neil)
Attachment #353909 - Flags: review?(neil)
Attachment #353917 - Flags: superreview?(neil)
Attachment #353917 - Flags: superreview+
Attachment #353917 - Flags: review?(neil)
Attachment #353917 - Flags: review+
Comment on attachment 353917 [details] [diff] [review]
remove datasource on unload, v1.1

>+    extDB.RemoveDataSource(extDS.getNext())
Nit: missing trailing semicolon
Pushed as http://hg.mozilla.org/comm-central/rev/a36eb2e47cb2 - we should not report this leak any more!
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Flags: wanted-seamonkey2?
Resolution: --- → FIXED
I filed bug 470709 and bug 470710 about the leaks that comment 8 workaround hides but that the patch did not fix.

V.Fixed.
Status: RESOLVED → VERIFIED
Target Milestone: --- → seamonkey2.0a3
(In reply to comment #52)
> I filed bug 470709 and bug 470710 about the leaks that comment 8 workaround
> hides but that the patch did not fix.

Er, actually I don't know if that's true for MacOSX bug 470710...
No longer blocks: 458484
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: