Last Comment Bug 366578 - XPCOM cycle collector does not respect leak-gauge
: XPCOM cycle collector does not respect leak-gauge
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: x86 Linux
: -- normal with 1 vote (vote)
: ---
Assigned To: Graydon Hoare :graydon
:
Mentors:
: 369625 (view as bug list)
Depends on:
Blocks: cycle-collector 366440
  Show dependency treegraph
 
Reported: 2007-01-10 09:28 PST by Graydon Hoare :graydon
Modified: 2007-03-08 06:05 PST (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
a patch hooking into XPCOM shutdown as recommended by bsmedberg (2.39 KB, patch)
2007-01-17 20:04 PST, Graydon Hoare :graydon
benjamin: review+
Details | Diff | Review

Description Graydon Hoare :graydon 2007-01-10 09:28:15 PST
The XPCOM cycle collector does not purge its buffered pointers during shutdown. In fact, it does not do *anything* special during shutdown. As a result, leak-gauge sees all buffered pointers as possible leaks. Worse than that, it gets confused and crashes. Try to work out why it's crashing and what the appropriate thing to do on shutdown is, to report leaks well.
Comment 1 Andrew Schultz 2007-01-12 20:26:10 PST
Do you mean leak-gauge.pl (http://dbaron.org/log/2006-01#e20060110a) or leakstats (used with trace-malloc)?

leak-gauge.pl does not crash that I know of (it doesn't crash for me).  leakstats does crash on balsa (bug 366169), but doesn't crash when I run it interactively.  So it's probably crashing due to something specific it sees in balsa's log.

both leak-gauge.pl and leakstats do report lots of leaks.
Comment 2 Boris Zbarsky [:bz] 2007-01-14 22:32:22 PST
I'd say the right thing to do during shutdown is to collect whatever we would have "eventually" collected.  We need to do this at several places probably; in particular we need to do it at about the same time as the last-ditch GC runs.

ccing bsmedberg, since shutdown destruction ordering is something he's been working to specify, so he may have specific thoughts on how this should work.
Comment 3 Benjamin Smedberg [:bsmedberg] 2007-01-16 07:07:44 PST
Draft spec that's about half-implemented can be found at http://wiki.mozilla.org/XPCOM_Shutdown

I suggest that we should perform a cycle collection at the end of xpcom-shutdown-threads and during/after xpcom-shutdown-gc.

Of course xpcom-shutdown-gc doesn't currently exist, so you have to pretend and insert the cycle collection here: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/build/nsXPComInit.cpp&rev=1.250&mark=774#770
Comment 4 Bill Gianopoulos [:WG9s] 2007-01-16 18:51:20 PST
It would be nice if there were someway to determine if we are doing leak detection or not and make the shutdown cycle collection conditional on that fact.  If we are not actually trying to detect leaks, doing a cycle collection during shutdown is just a huge waste of effort and a big performance hit for the average user.

Just my $0.02.
Comment 5 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2007-01-16 19:08:55 PST
(In reply to comment #4)
> fact.  If we are not actually trying to detect leaks, doing a cycle collection
> during shutdown is just a huge waste of effort and a big performance hit for
> the average user.

No idea why you think "big" and "huge" are appropriate.  If it were big/huge, we wouldn't be able to do it at other random times either.


(That said, most of the leak regressions reported are actual leak regressions, not shutdown leaks.)
Comment 6 Graydon Hoare :graydon 2007-01-17 20:04:29 PST
Created attachment 251863 [details] [diff] [review]
a patch hooking into XPCOM shutdown as recommended by bsmedberg
Comment 7 Benjamin Smedberg [:bsmedberg] 2007-01-19 08:38:17 PST
Comment on attachment 251863 [details] [diff] [review]
a patch hooking into XPCOM shutdown as recommended by bsmedberg

>+NS_COM void nsCycleCollector_shutdown();

Technically this doesn't need to be NS_COM, since it's only called from within the xpcom sharedlib. This way is fine, if you think there's a reason other sharedlibs might need access to it.
Comment 8 Johnny Stenback (:jst, jst@mozilla.com) 2007-01-24 16:44:10 PST
Fix landed on the trunk. Marking FIXED, please reopen if there's still work to be done here.
Comment 9 Andrew Schultz 2007-01-24 19:29:08 PST
This had no discernible impact on RLk numbers.  Was that the intent?
Comment 10 Graydon Hoare :graydon 2007-01-24 21:23:45 PST
No, the intent was to remove the possibility that the RLk numbers were false positives. It would have been *nice* for the RLk numbers to go down -- who would complain? -- but the affect of this patch is more of the "confirming the hypothesis that there are real new leaks" nature.
Comment 11 Boris Zbarsky [:bz] 2007-01-24 21:35:55 PST
So given a cycle that remains at shutdown, would it be possible for the cycle collector to output (in some sort of debug or prlog or something mode) which exact object(s) in the cycle were referenced by things outside the cycle?  That is, which objects had a refcount that was too high to allow the cycle to be collected?

That would give us a place to start looking, at least.
Comment 12 Graydon Hoare :graydon 2007-01-24 21:45:34 PST
Not easily. It has probably already forgotten about the leaked objects. It gives each object exactly one chance to demonstrate that its suspiciousness is caused by a real garbage cycle. If it can't prove a suspicious object is a part of a garbage cycle, it *forgets* about the object, flushing it from its buffers. It will not scan the suspicious object again until it is re-suspected. In the case of dormant garbage, that'll never happen.

It's a classical time/space tradeoff: better to keep collecting what it can, at a reasonable speed per collection, than to keep re-scanning stuff it can't collect and getting progressively slower.

I could potentially modify the collector to differentiate forgotten-due-to-scanning-failure and forgotten-due-to-explicit-deregistration (eg. destruction). That would add some complexity but give you candidates to look at, at shutdown. 

I can probably also (or instead) hook dbaron's leak monitor extension up to an API similar to this shutdown hook, and it already tracks objects that it thinks ought to be collected. That should also give a smaller and more immediate candidate set, while the browser is running. I'd sort of prefer to do that, but I can probably do either or both.

I can't really do either of these this week though. Next week I can.
Comment 13 Boris Zbarsky [:bz] 2007-01-24 22:00:16 PST
We should probably file a followup bug on doing something to make things more debuggable.

Note that I'd even be happy with an at-startup (or maybe even compile-time) switch to allow us to get the info we need, as long as there is _some_ way to do it given a leaking semi-reduced testcase.
Comment 14 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2007-01-24 22:02:43 PST
(In reply to comment #11)
> So given a cycle that remains at shutdown, would it be possible for the cycle
> collector to output (in some sort of debug or prlog or something mode) which
> exact object(s) in the cycle were referenced by things outside the cycle?  That
> is, which objects had a refcount that was too high to allow the cycle to be
> collected?

Have you looked at the leaksoup trace-malloc reader?  (Or is trace-malloc still broken with GTK2?)

That said, even with nsTraceRefcnt, start with something suspicious, and you shouldn't be too far from the problem.
Comment 15 Steve England [:stevee] 2007-01-26 10:16:33 PST
All tests done with leak-gauge.pl and firefox using a new profile and about:blank as the homepage.
1. Load firefox with logging enabled (about:blank homepage)
2. Close firefox
3. Analyze nspr.log

Pre-Graydon's XPCOM Cycle Collector:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007010404 Minefield/3.0a2pre
Leaked 0 out of 6 DOM Windows
Leaked 0 out of 35 documents
Leaked 0 out of 3 docshells

Post-Graydon's XPCOM Cycle Collector & Pre-XPCOM Cycle Collector Does Not Respect Leak-Gauge fix
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007012404 Minefield/3.0a2pre
Leaked 2 out of 6 DOM Windows
Leaked 24 out of 36 documents
Leaked 0 out of 3 docshells

Post-Graydon's XPCOM Cycle Collector & Post-XPCOM Cycle Collector Does Not Respect Leak-Gauge fix
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007012602 Minefield/3.0a2pre
Leaked 2 out of 6 DOM Windows
Leaked 24 out of 35 documents
Leaked 0 out of 3 docshells

So it seems this patch has fixed the leakage of one document, but not a lot else. Should a seperate bug be filed on the new leakage, or is the patch not doing what is expected?
Comment 16 Boris Zbarsky [:bz] 2007-01-28 09:38:24 PST
It's looking very likely that these checkins turned the luna Seamonkey tinderbox orange on the 24th... and it's been orange since.
Comment 17 Peter Van der Beken [:peterv] 2007-02-07 10:06:22 PST
*** Bug 369625 has been marked as a duplicate of this bug. ***
Comment 18 Steve England [:stevee] 2007-03-08 06:05:30 PST
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a3pre) Gecko/20070308 Minefield/3.0a3pre ID:2007030804 [cairo]

After the landing of bug 368773, with new profile, start firefox, about:blank homepage, close firefox, analyse nspr.log

Leaked 0 out of 6 DOM Windows
Leaked 0 out of 36 documents
Leaked 0 out of 3 docshells

Note You need to log in before you can comment on or make changes to this bug.