Closed Bug 366578 Opened 18 years ago Closed 18 years ago

XPCOM cycle collector does not respect leak-gauge

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: graydon, Assigned: graydon)

References

Details

Attachments

(1 file)

The XPCOM cycle collector does not purge its buffered pointers during shutdown. In fact, it does not do *anything* special during shutdown. As a result, leak-gauge sees all buffered pointers as possible leaks. Worse than that, it gets confused and crashes. Try to work out why it's crashing and what the appropriate thing to do on shutdown is, to report leaks well.
Do you mean leak-gauge.pl (http://dbaron.org/log/2006-01#e20060110a) or leakstats (used with trace-malloc)?

leak-gauge.pl does not crash that I know of (it doesn't crash for me).  leakstats does crash on balsa (bug 366169), but doesn't crash when I run it interactively.  So it's probably crashing due to something specific it sees in balsa's log.

both leak-gauge.pl and leakstats do report lots of leaks.
I'd say the right thing to do during shutdown is to collect whatever we would have "eventually" collected.  We need to do this at several places probably; in particular we need to do it at about the same time as the last-ditch GC runs.

ccing bsmedberg, since shutdown destruction ordering is something he's been working to specify, so he may have specific thoughts on how this should work.
Flags: blocking1.9?
Blocks: 366440
Draft spec that's about half-implemented can be found at http://wiki.mozilla.org/XPCOM_Shutdown

I suggest that we should perform a cycle collection at the end of xpcom-shutdown-threads and during/after xpcom-shutdown-gc.

Of course xpcom-shutdown-gc doesn't currently exist, so you have to pretend and insert the cycle collection here: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/build/nsXPComInit.cpp&rev=1.250&mark=774#770
It would be nice if there were someway to determine if we are doing leak detection or not and make the shutdown cycle collection conditional on that fact.  If we are not actually trying to detect leaks, doing a cycle collection during shutdown is just a huge waste of effort and a big performance hit for the average user.

Just my $0.02.
(In reply to comment #4)
> fact.  If we are not actually trying to detect leaks, doing a cycle collection
> during shutdown is just a huge waste of effort and a big performance hit for
> the average user.

No idea why you think "big" and "huge" are appropriate.  If it were big/huge, we wouldn't be able to do it at other random times either.


(That said, most of the leak regressions reported are actual leak regressions, not shutdown leaks.)
Comment on attachment 251863 [details] [diff] [review]
a patch hooking into XPCOM shutdown as recommended by bsmedberg

>+NS_COM void nsCycleCollector_shutdown();

Technically this doesn't need to be NS_COM, since it's only called from within the xpcom sharedlib. This way is fine, if you think there's a reason other sharedlibs might need access to it.
Attachment #251863 - Flags: review?(benjamin) → review+
Fix landed on the trunk. Marking FIXED, please reopen if there's still work to be done here.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
This had no discernible impact on RLk numbers.  Was that the intent?
No, the intent was to remove the possibility that the RLk numbers were false positives. It would have been *nice* for the RLk numbers to go down -- who would complain? -- but the affect of this patch is more of the "confirming the hypothesis that there are real new leaks" nature.
So given a cycle that remains at shutdown, would it be possible for the cycle collector to output (in some sort of debug or prlog or something mode) which exact object(s) in the cycle were referenced by things outside the cycle?  That is, which objects had a refcount that was too high to allow the cycle to be collected?

That would give us a place to start looking, at least.
Not easily. It has probably already forgotten about the leaked objects. It gives each object exactly one chance to demonstrate that its suspiciousness is caused by a real garbage cycle. If it can't prove a suspicious object is a part of a garbage cycle, it *forgets* about the object, flushing it from its buffers. It will not scan the suspicious object again until it is re-suspected. In the case of dormant garbage, that'll never happen.

It's a classical time/space tradeoff: better to keep collecting what it can, at a reasonable speed per collection, than to keep re-scanning stuff it can't collect and getting progressively slower.

I could potentially modify the collector to differentiate forgotten-due-to-scanning-failure and forgotten-due-to-explicit-deregistration (eg. destruction). That would add some complexity but give you candidates to look at, at shutdown. 

I can probably also (or instead) hook dbaron's leak monitor extension up to an API similar to this shutdown hook, and it already tracks objects that it thinks ought to be collected. That should also give a smaller and more immediate candidate set, while the browser is running. I'd sort of prefer to do that, but I can probably do either or both.

I can't really do either of these this week though. Next week I can.
We should probably file a followup bug on doing something to make things more debuggable.

Note that I'd even be happy with an at-startup (or maybe even compile-time) switch to allow us to get the info we need, as long as there is _some_ way to do it given a leaking semi-reduced testcase.
(In reply to comment #11)
> So given a cycle that remains at shutdown, would it be possible for the cycle
> collector to output (in some sort of debug or prlog or something mode) which
> exact object(s) in the cycle were referenced by things outside the cycle?  That
> is, which objects had a refcount that was too high to allow the cycle to be
> collected?

Have you looked at the leaksoup trace-malloc reader?  (Or is trace-malloc still broken with GTK2?)

That said, even with nsTraceRefcnt, start with something suspicious, and you shouldn't be too far from the problem.
All tests done with leak-gauge.pl and firefox using a new profile and about:blank as the homepage.
1. Load firefox with logging enabled (about:blank homepage)
2. Close firefox
3. Analyze nspr.log

Pre-Graydon's XPCOM Cycle Collector:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007010404 Minefield/3.0a2pre
Leaked 0 out of 6 DOM Windows
Leaked 0 out of 35 documents
Leaked 0 out of 3 docshells

Post-Graydon's XPCOM Cycle Collector & Pre-XPCOM Cycle Collector Does Not Respect Leak-Gauge fix
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007012404 Minefield/3.0a2pre
Leaked 2 out of 6 DOM Windows
Leaked 24 out of 36 documents
Leaked 0 out of 3 docshells

Post-Graydon's XPCOM Cycle Collector & Post-XPCOM Cycle Collector Does Not Respect Leak-Gauge fix
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007012602 Minefield/3.0a2pre
Leaked 2 out of 6 DOM Windows
Leaked 24 out of 35 documents
Leaked 0 out of 3 docshells

So it seems this patch has fixed the leakage of one document, but not a lot else. Should a seperate bug be filed on the new leakage, or is the patch not doing what is expected?
It's looking very likely that these checkins turned the luna Seamonkey tinderbox orange on the 24th... and it's been orange since.
Flags: blocking1.9?
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a3pre) Gecko/20070308 Minefield/3.0a3pre ID:2007030804 [cairo]

After the landing of bug 368773, with new profile, start firefox, about:blank homepage, close firefox, analyse nspr.log

Leaked 0 out of 6 DOM Windows
Leaked 0 out of 36 documents
Leaked 0 out of 3 docshells
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: