Last Comment Bug 423032 - Use static analysis to find missing cycle collector class, Traverse and Unlink declarations
: Use static analysis to find missing cycle collector class, Traverse and Unlin...
Status: RESOLVED FIXED
[MemShrink:P1]
:
Product: Core
Classification: Components
Component: Rewriting and Analysis (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Andrew McCreight [:mccr8]
:
:
Mentors:
https://github.com/amccreight/ccdehydra
: 570416 645498 (view as bug list)
Depends on: 669810 674658
Blocks: analyses MemShrinkTools
  Show dependency treegraph
 
Reported: 2008-03-14 15:20 PDT by (dormant account)
Modified: 2011-11-10 16:53 PST (History)
30 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
My version of the script (2.44 KB, text/plain)
2008-03-14 17:58 PDT, David Mandelin [:dmandelin]
no flags Details
Patch with script sources (10.51 KB, patch)
2008-03-21 14:51 PDT, David Mandelin [:dmandelin]
no flags Details | Diff | Splinter Review

Description (dormant account) 2008-03-14 15:20:59 PDT
Look for nsISupports implementors that contain nsRefPtrs or nsCOMPtrs but aren't cycle collected.

Cycle collected means having nsCycleCollectingAutoRefCnt mRefCnt member.
Comment 1 (dormant account) 2008-03-14 16:15:10 PDT
Need to ignore interfaces(with members)
Comment 2 (dormant account) 2008-03-14 16:44:38 PDT
http://hg.mozilla.org/users/tglek_mozilla.com/dehydra-gcc/?file/62ce148cfa95/test/cycle-collect.js 

this is my implementation of the spec. Extended spec to look for nsCOMArray members too

Comment 3 David Mandelin [:dmandelin] 2008-03-14 17:58:34 PDT
Created attachment 309565 [details]
My version of the script

Well, here's my take on the spec. Apparently the results aren't identical to Taras', but we are trying to do about the same thing. I might have more extra hashtable classes listed or something.

Results in the form of a list of "bad" classes are at http://people.mozilla.com/~dmandelin/ccc.txt.
Comment 4 David Mandelin [:dmandelin] 2008-03-14 18:17:01 PDT
There's another check that might be needed (mentioned by bent) that I just don't understand in enough detail to implement yet. If I understand right, an example of what this check would look for is:

class nsFoo : public nsISupports {
  nsSomeStruct *s;
  // doesn't participate in CC
};
class nsSomeStruct {
  nsCOMPtr<nsBar> mBar;
};

My limited understanding is that this is bad "in the same way as if nsFoo had the nsCOMPtr<nsBar> itself". What I need more guidance on is:

1. What kinds of members of nsFoo do we have to look at? "nsQux"? "struct nsQux*"? "struct nsQux&"? Any smart pointers of interest here? Etc.

2. What condition applies to nsSomeStruct to make this bad? Anything? Or are there cases where it's OK?
Comment 5 Benjamin Smedberg [:bsmedberg] 2008-03-15 06:42:51 PDT
Are excluding threadsafe classes? Threadsafe classes cannot be cycle-collected.

You don't know the answer about nsSomeStruct *s without additional knowledge... is "s" an owning pointer or a weak pointer? I tend to think we should either ignore that case or issues a weak warning.
Comment 6 Brendan Eich [:brendan] 2008-03-15 12:16:29 PDT
There are some bogus NS_IMPL_THREADSAFE_ISUPPORTS* uses for classes that are truly main-thread-only. We shouldn't skip them. Probably we need to get a list of such classes and take out the ones with classinfo where the classinfo both (a) does not say MAIN_THREAD_ONLY and (b) does say THREADSAFE.

/be
Comment 7 David Mandelin [:dmandelin] 2008-03-21 14:51:33 PDT
Created attachment 311056 [details] [diff] [review]
Patch with script sources

Here is a patch with this script, the script for related by 423070, and a library used by both, for possible checkin to mozilla-central. The patch places the files in the xpcom dir, next to static-checking.js, but I don't really know what the right place would be.
Comment 8 Brendan Eich [:brendan] 2008-03-21 17:15:57 PDT
Probably don't want stuff directly in xpcom. Could co-locate with the cycle collector itself, in xpcom/base IIRC.

/be
Comment 9 :Ehsan Akhgari 2010-06-06 13:20:55 PDT
*** Bug 570416 has been marked as a duplicate of this bug. ***
Comment 10 :Ehsan Akhgari 2010-06-06 13:22:06 PDT
Is there more work to be done here, except for moving the js file to xpcom/analysis?  I think we _really_ need to have this analysis.  I've recently discovered large cycles created by spell checking which cause really bad leaks, and I'm sure that there are more similar leaks yet to be uncovered.
Comment 11 Robert Sayre 2010-08-06 11:50:58 PDT
yeah, we need to be running this.
Comment 12 dwitte@gmail.com 2010-08-06 11:59:11 PDT
Ehren is working on getting an analysis tinderbox up, so we can actually run this in the wild...
Comment 13 Andrew McCreight [:mccr8] 2011-06-01 09:31:27 PDT
I'm working on looking at Dave Mandelin's scripts.  There have been at least two instances this year of classes failing to declare fields to the cycle collector, which are an enormous pain to find manually, so it would be good to have this working.  dmandelin's script checks for that, even though that isn't in the title of this bug, so I'll just work on it here.

Taras, if you happen to have your old script sitting around I'd be interested in looking at that, too.
Comment 14 (dormant account) 2011-06-01 09:58:13 PDT
(In reply to comment #13)
> Taras, if you happen to have your old script sitting around I'd be
> interested in looking at that, too.

I do not.
Comment 15 Andrew McCreight [:mccr8] 2011-06-01 11:36:10 PDT
The Dehydra script requires that all nsRefPtrs and nsCOMPtrs are declared to the CC, but this seems overly restrictive in the case where the contained class is not a CC class.  Though I do see a potential danger in general where the inner class has a subclass that is a CC class that would need to be considered.
Comment 16 Andrew McCreight [:mccr8] 2011-06-01 18:05:01 PDT
ccbase.js and cc-traverse-unlink-audit.js were not using .bases correctly: the elements of .bases are (at least now) objects containing a type, not types themselves.  This is fixed by adding in .type to every element of a .base.  I'll upload a patched version of the scripts at some point.  Sooner if somebody wants them.

Once that is fixed, cc-bad-pointer-holder.js thinks a ton of classes should be CC classes, including very basic classes like Element.  The problem is that the analysis is not taking into account the type-based pruning of acyclic objects from the CC graph: if a certain class can never be a member of a cycle, then it does not need to be a CC class.  For a class to be acyclic using this analysis, Bacon-Rajan01 require that an object contain only scalars, pointers to acyclic final classes, or arrays of either.  This is hopefully roughly similar to the criteria used in Firefox, though I think we allow non-refcounted pointers to any classes.

BR01 need classes to be final because they do this calculation at runtime, and Java has dynamic class loading.  We're working at compile time, but as the compilation is happening, so in some sense we have the same problem: we can load class A, decide it is acyclic, then load class B that makes it so that class A now can be part of a cycle, invalidating our previous analysis.

I think we can work around this, though: if we later decide that class A is now part of a cycle, we can just go back and confirm that it is part of a cycle.  What we do then is when we load a new class A, see if it can be part of any cycles.  If it can, then we must confirm that all members of all potential cycles through A are cycle collected.  I conjecture that all new potential cycle induced by adding a new class must pass through that class.  I'll probably want to memoize the is_cc function.
Comment 17 Andrew McCreight [:mccr8] 2011-06-03 17:00:51 PDT
- The biggest chunk of code I have written is the Cheriyan-Mehlhorn-Gabow algorithm for finding strongly connected components.  (Probably overkill, but the algorithm is simple enough.) We run this on a graph where each class is a node, and there's an edge from class A to B if A can contain a ref counted reference to B, including subtypes.  If a class is a member of an SCC of size 1, and can't contain a reference to itself, it doesn't need to be cycle collected.  Otherwise, it needs to be, or at least, type safety is not a strong enough property to ensure a lack of cycles involving class A.

- To determine if a type is a ref counted reference, instead of using a regexp, I perform a structural analysis, and also return the contained class.

- The script determined if a class was CCed by looking for a special field.  Unfortunately, only some CCed classes have this field.  Native CCed classes lack it, so instead we look for an inner class inheriting from one of the cycle collection participant classes.

- I filter out typedefs.

- A primitive whitelisting facility.  nsCSSStyleSheet can create cycles from a type perspective, but bz said this is always going to be a tree in practice.  I would like to make the white list more fine grained, so we can declare, say, that a specific field won't contain cyclic pointers, rather than just a blanket whitelist.  That way if somebody added some other kind of field to nsCSSStyleSheet that could cause a cycle, the analysis would find it.

I put the basic algorithm in place, and then started investigating places where the analysis is wrong, either with a false negative or a false positive, assuming the file I'm looking at is right.

My current problem is that nsIDocument has a field |nsCOMPtr<nsISupports> mSecurityInfo|.  This is bad because every class we are considering is a subclass of nsISupports, thus anything that contains a ref counted reference to nsIDocument can be a member of a cycle, at least from a type perspective.  Looking over the entire code base, fields with type |nsCOMPtr<nsISupports>| appear to be common, so I'll need to come up with a scalable solution.  I'll have to investigate how these fields are actually used before I can come up with a plan of attack.  I could imagine that some dataflow analysis about mSecurity info in each subclass of nsIDocument might be able to help, if it is used fairly simply.
Comment 18 Andrew McCreight [:mccr8] 2011-06-07 10:14:07 PDT
I switched the whitelist definition to be a set of fields.  If a field is on the whitelist, it will not add an edge to the type graph.  I decided to punt on nsIDocument::mSecurityInfo and add it to the whitelist.

The next big issue I need to deal with is to reason about cycles that pass through Javascript.  peterv said this is why nsBaseContentList and nsRange are cycle collected, so I need to understand that and figure out a way to represent that in my analysis.
Comment 19 Nicholas Nethercote [:njn] 2011-06-10 08:59:58 PDT
*** Bug 645498 has been marked as a duplicate of this bug. ***
Comment 20 Andrew McCreight [:mccr8] 2011-06-15 09:52:18 PDT
Bug 335998 and Bug 637099 have links to a ton of cycle collector bugs that we should be able to automatically find.
Comment 21 Andrew McCreight [:mccr8] 2011-07-06 11:02:48 PDT
I've made a little bit of progress on Traverse/Unlink analysis.  When run on dom/indexedDB/IDBRequest.o, the script produces this output:

Unrefed fields in mozilla::dom::indexedDB::IDBRequest::cycleCollection::Traverse(void*, nsCy\
cleCollectionTraversalCallback&):
  (/home/amccreight/tm/dom/indexedDB/IDBRequest.cpp:297:1)
    mozilla::dom::indexedDB::IDBRequest::mTransaction

Unrefed fields in mozilla::dom::indexedDB::IDBRequest::cycleCollection::Unlink(void*):
  (/home/amccreight/tm/dom/indexedDB/IDBRequest.cpp:305:1)
    mozilla::dom::indexedDB::IDBRequest::mTransaction

This is precisely the problem that Bug 658637 fixed.  I'm using an old checkout of TM before peterv and smaug landed all of their cycle collector fixes to see if my scripts will find the errors.
Comment 22 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-07-06 11:04:21 PDT
Hot!
Comment 23 Andrew McCreight [:mccr8] 2011-07-11 15:18:49 PDT
I put the latest version of the scripts on GitHub.

https://github.com/amccreight/ccdehydra
Comment 24 Andrew McCreight [:mccr8] 2011-07-25 14:28:52 PDT
I started posting various issues I am finding on the github issues tracker.

https://github.com/amccreight/ccdehydra/issues
Comment 25 Andrew McCreight [:mccr8] 2011-07-25 16:20:57 PDT
My analysis finds the missing Traverse/Unlink edges fixed in Bug 664455.  I was looking at a new version of the code after this was fixed, so I assumed my analysis was broken and just not finding them.  On the other hand, it also thinks there are four other fields that should be Traverse/Unlinked, which are presumably okay...
Comment 26 Andrew McCreight [:mccr8] 2011-07-28 17:17:05 PDT
I'm working my way through content/base/src, and I've mostly analyzed the analysis, aside from nsDocument.  In the course of this, I found Bug 674658, which Smaug has fixed.  I have a more detailed status in this issue: https://github.com/amccreight/ccdehydra/issues/12

I need to add support for looking in procedure calls in Traverse and Unlink, as this is used 3 times in the directory.

Aside from fixing up a few minor issues, getting things to go through mainly consisted of looking at fields that were failing, and trying to figure out if the type of the field was cycle collected or not.

Currently the white list of non-cycle-collected classes is: nsIURI, nsIDocShell, mozilla::css::Loader, nsITimer, nsIDOMFileError, nsICharsetConverterManager, nsIDocumentEncoderNodeFixup, nsIContentSerializer, nsIOutputStream, nsIUnicodeEncoder, nsIAtom, nsIPrincipal,nsIChannel,nsIDOMBlob, nsIRunnable

This list is manually generated, so it could have mistakes.  A few of these classes had a huge number of subclasses, so I didn't check them all.

Also, fields of type nsINodeInfo are apparently supposed to be Traversed, but not Unlinked.


There are a few more classes of fields that weren't traversed that I wasn't confident about.  This is detailed a bit more at the github link, but these were nsIObserver, nsIInterfaceRequestor and nsIAsyncVerifyRedirectCallback.  Each of these had at least one subclass that is cycle collected, so it wasn't really clear to me that it is safe to not Traverse or Unlink them.  But on the other hand there were a bunch of other subclasses that weren't cycle collected, so you probably can't just traverse them in general.
Comment 27 Andrew McCreight [:mccr8] 2011-08-01 13:58:49 PDT
This is what the full output for nsContentSink's Traverse looks like, as an example, at maximum verbosity:

DEBUG>> Checking Traverse for class nsContentSink.
DEBUG>>     ++ mDocument
DEBUG>>     ++ mParser
DEBUG>>     ++ mNodeInfoManager
DEBUG>>     ++ mScriptLoader
DEBUG>>     ++ mScriptElements
DEBUG>>     -- nsRevocableEventPtr<nsRunnableMethod<nsContentSink, void, false> >
DEBUG>>
DEBUG>>     found mDocument
DEBUG>>     found mParser
DEBUG>>     found mNodeInfoManager
DEBUG>>
Unrefed fields in nsContentSink::cycleCollection::Traverse(void*, nsCycleCollectionTraversa\
lCallback&):
  (/home/amccreight/tm/content/base/src/nsContentSink.cpp:182:1)
    nsContentSink::mScriptElements
    nsContentSink::mScriptLoader

The ++ indicates that the analysis decided that field should be traversed.  The -- means that the analysis found a field with the given type, and wasn't sure if the field should be traversed or not, but decided to skip it.  This lets you see if the analysis skipped a field it shouldn't have.  It tries to filter out fields that it skipped that it is "sure" about, like integers, pointers to explicitly whitelisted classes, etc., to reduce the noise, but this filtering isn't perfect.  After that, the script lists as "found" any field it found a reference to in the Traverse/Unlink.  Then, as the "real output" it prints out any fields that it found interesting, but that were not visited.
Comment 28 Andrew McCreight [:mccr8] 2011-08-09 12:37:46 PDT
I'm looking at content/html/content/src right now, and I'm finding a lot of the missing unlinks in my old version that Olli fixed, which is nice.

Yesterday I added an analysis for inherited Traverse/Unlinks, which has helped turn up many of these errors: if a class adds fields that should be communicated to the cycle collector, and it just uses its parents unlink (the latter is fairly common, due to the use of NS_DECL_CYCLE_COLLECTION_CLASS_INHERITED_NO_UNLINK), then that is bad.  This was tricky for the analysis because there's no Traverse/Unlink present in the class to actually analyze.  Instead, we have to check every type declaration to see if it is a cycle collected inner class.  If it is, then if it is missing a Traverse or Unlink then we do an analysis similar to what happens if we see a call to a parent T/U.  The actual analysis is straightforward.
Comment 29 Andrew McCreight [:mccr8] 2011-10-27 09:10:23 PDT
I think this is mostly "done" at this point.  There's certainly still more that can be done with it, but I think the most effective use at this point is for people who are familiar with the code being analyzed to try it out if they want to look for leaks.  I tried gathering a bunch of information from people, and use that to see if things should be Traversed or Unlinked (missing unlinks were far more common), but even with their input I found it too difficult to really figure out what to do without knowing anything about the code.

The missing cycle collector class analysis is probably in a broken state at this point.  It seems like a hard problem, and anyways it seems like a much rarer problem than missing Traverse/Unlinks.

I'll make sure I have the latest version of my code on git hub, try to document it a bit, and then close this out.
Comment 30 Nicholas Nethercote [:njn] 2011-11-08 00:23:48 PST
> I'll make sure I have the latest version of my code on git hub, try to
> document it a bit, and then close this out.

Any progress on this?
Comment 31 Andrew McCreight [:mccr8] 2011-11-08 06:09:42 PST
Ah, I keep forgetting!  I'll do that this week.
Comment 32 Andrew McCreight [:mccr8] 2011-11-10 16:44:25 PST
I'm going to declare this fixed for now.  The Traverse-Unlink matching analysis is in a decent state, and could probably be used by a domain expert to find more missing things.  I basically hit the point where I didn't know enough about the existing code to figure out whether missing things were okay or not.  I think the analysis is good enough that using it is much better than manually trying to look between the class definition and the traverse definition, which are in different files, and the class definition often has so much other stuff in it that it is hard to make sure you find everything.  It skips over a number of common fields that don't need to be CCed which should help.  I guess the next step would be to evangelize to people who look for those kinds of leaks, which probably means peterv and smaug. :)

The cycle collector class analysis, which is supposed to figure out whether a class should be cycle collector or not, is in a much rougher state.  It probably isn't really useful for anything.  I think that's not too bad, as deciding whether a class needs to be CCed or not needs to be done much less rarely, and is not something that changes much.

Note You need to log in before you can comment on or make changes to this bug.