366578 - XPCOM cycle collector does not respect leak-gauge

Assignee

Description

•

18 years ago

The XPCOM cycle collector does not purge its buffered pointers during shutdown. In fact, it does not do *anything* special during shutdown. As a result, leak-gauge sees all buffered pointers as possible leaks. Worse than that, it gets confused and crashes. Try to work out why it's crashing and what the appropriate thing to do on shutdown is, to report leaks well.

Andrew Schultz

Comment 1

•

18 years ago

Do you mean leak-gauge.pl (http://dbaron.org/log/2006-01#e20060110a) or leakstats (used with trace-malloc)?

leak-gauge.pl does not crash that I know of (it doesn't crash for me).  leakstats does crash on balsa (bug 366169), but doesn't crash when I run it interactively.  So it's probably crashing due to something specific it sees in balsa's log.

both leak-gauge.pl and leakstats do report lots of leaks.

Boris Zbarsky [:bzbarsky]

Comment 2

•

18 years ago

I'd say the right thing to do during shutdown is to collect whatever we would have "eventually" collected.  We need to do this at several places probably; in particular we need to do it at about the same time as the last-ditch GC runs.

ccing bsmedberg, since shutdown destruction ordering is something he's been working to specify, so he may have specific thoughts on how this should work.

Flags: blocking1.9?

Boris Zbarsky [:bzbarsky]

Updated

•

18 years ago

Blocks: 366440

Benjamin Smedberg

Comment 3

•

18 years ago

Draft spec that's about half-implemented can be found at http://wiki.mozilla.org/XPCOM_Shutdown

I suggest that we should perform a cycle collection at the end of xpcom-shutdown-threads and during/after xpcom-shutdown-gc.

Of course xpcom-shutdown-gc doesn't currently exist, so you have to pretend and insert the cycle collection here: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/build/nsXPComInit.cpp&rev=1.250&mark=774#770

Bill Gianopoulos [:WG9s]

Comment 4

•

18 years ago

It would be nice if there were someway to determine if we are doing leak detection or not and make the shutdown cycle collection conditional on that fact.  If we are not actually trying to detect leaks, doing a cycle collection during shutdown is just a huge waste of effort and a big performance hit for the average user.

Just my $0.02.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 5

•

18 years ago

(In reply to comment #4)
> fact.  If we are not actually trying to detect leaks, doing a cycle collection
> during shutdown is just a huge waste of effort and a big performance hit for
> the average user.

No idea why you think "big" and "huge" are appropriate.  If it were big/huge, we wouldn't be able to do it at other random times either.


(That said, most of the leak regressions reported are actual leak regressions, not shutdown leaks.)

Graydon Hoare :graydon

Assignee

Comment 6

•

18 years ago

Attached patch a patch hooking into XPCOM shutdown as recommended by bsmedberg — Details — Splinter Review

Attachment #251863 - Flags: review?(benjamin)

Benjamin Smedberg

Comment 7

•

18 years ago

Comment on attachment 251863 [details] [diff] [review]
a patch hooking into XPCOM shutdown as recommended by bsmedberg

>+NS_COM void nsCycleCollector_shutdown();

Technically this doesn't need to be NS_COM, since it's only called from within the xpcom sharedlib. This way is fine, if you think there's a reason other sharedlibs might need access to it.

Attachment #251863 - Flags: review?(benjamin) → review+

Johnny Stenback (:jst)

Comment 8

•

18 years ago

Fix landed on the trunk. Marking FIXED, please reopen if there's still work to be done here.

Status: ASSIGNED → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Andrew Schultz

Comment 9

•

18 years ago

This had no discernible impact on RLk numbers.  Was that the intent?

Graydon Hoare :graydon

Assignee

Comment 10

•

18 years ago

No, the intent was to remove the possibility that the RLk numbers were false positives. It would have been *nice* for the RLk numbers to go down -- who would complain? -- but the affect of this patch is more of the "confirming the hypothesis that there are real new leaks" nature.

Boris Zbarsky [:bzbarsky]

Comment 11

•

18 years ago

So given a cycle that remains at shutdown, would it be possible for the cycle collector to output (in some sort of debug or prlog or something mode) which exact object(s) in the cycle were referenced by things outside the cycle?  That is, which objects had a refcount that was too high to allow the cycle to be collected?

That would give us a place to start looking, at least.

Graydon Hoare :graydon

Assignee

Comment 12

•

18 years ago

Not easily. It has probably already forgotten about the leaked objects. It gives each object exactly one chance to demonstrate that its suspiciousness is caused by a real garbage cycle. If it can't prove a suspicious object is a part of a garbage cycle, it *forgets* about the object, flushing it from its buffers. It will not scan the suspicious object again until it is re-suspected. In the case of dormant garbage, that'll never happen.

It's a classical time/space tradeoff: better to keep collecting what it can, at a reasonable speed per collection, than to keep re-scanning stuff it can't collect and getting progressively slower.

I could potentially modify the collector to differentiate forgotten-due-to-scanning-failure and forgotten-due-to-explicit-deregistration (eg. destruction). That would add some complexity but give you candidates to look at, at shutdown. 

I can probably also (or instead) hook dbaron's leak monitor extension up to an API similar to this shutdown hook, and it already tracks objects that it thinks ought to be collected. That should also give a smaller and more immediate candidate set, while the browser is running. I'd sort of prefer to do that, but I can probably do either or both.

I can't really do either of these this week though. Next week I can.

Boris Zbarsky [:bzbarsky]

Comment 13

•

18 years ago

We should probably file a followup bug on doing something to make things more debuggable.

Note that I'd even be happy with an at-startup (or maybe even compile-time) switch to allow us to get the info we need, as long as there is _some_ way to do it given a leaking semi-reduced testcase.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 14

•

18 years ago

(In reply to comment #11)
> So given a cycle that remains at shutdown, would it be possible for the cycle
> collector to output (in some sort of debug or prlog or something mode) which
> exact object(s) in the cycle were referenced by things outside the cycle?  That
> is, which objects had a refcount that was too high to allow the cycle to be
> collected?

Have you looked at the leaksoup trace-malloc reader?  (Or is trace-malloc still broken with GTK2?)

That said, even with nsTraceRefcnt, start with something suspicious, and you shouldn't be too far from the problem.

Steve England [:stevee]

Comment 15

•

18 years ago

All tests done with leak-gauge.pl and firefox using a new profile and about:blank as the homepage.
1. Load firefox with logging enabled (about:blank homepage)
2. Close firefox
3. Analyze nspr.log

Pre-Graydon's XPCOM Cycle Collector:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007010404 Minefield/3.0a2pre
Leaked 0 out of 6 DOM Windows
Leaked 0 out of 35 documents
Leaked 0 out of 3 docshells

Post-Graydon's XPCOM Cycle Collector & Pre-XPCOM Cycle Collector Does Not Respect Leak-Gauge fix
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007012404 Minefield/3.0a2pre
Leaked 2 out of 6 DOM Windows
Leaked 24 out of 36 documents
Leaked 0 out of 3 docshells

Post-Graydon's XPCOM Cycle Collector & Post-XPCOM Cycle Collector Does Not Respect Leak-Gauge fix
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a2pre) Gecko/2007012602 Minefield/3.0a2pre
Leaked 2 out of 6 DOM Windows
Leaked 24 out of 35 documents
Leaked 0 out of 3 docshells

So it seems this patch has fixed the leakage of one document, but not a lot else. Should a seperate bug be filed on the new leakage, or is the patch not doing what is expected?

Boris Zbarsky [:bzbarsky]

Comment 16

•

18 years ago

It's looking very likely that these checkins turned the luna Seamonkey tinderbox orange on the 24th... and it's been orange since.

Martijn Wargers (dead)

Updated

•

17 years ago

Flags: blocking1.9?

Steve England [:stevee]

Comment 18

•

17 years ago

Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a3pre) Gecko/20070308 Minefield/3.0a3pre ID:2007030804 [cairo]

After the landing of bug 368773, with new profile, start firefox, about:blank homepage, close firefox, analyse nspr.log

Leaked 0 out of 6 DOM Windows
Leaked 0 out of 36 documents
Leaked 0 out of 3 docshells