Last Comment Bug 649532 - improve cycle collector logging information
: improve cycle collector logging information
Status: RESOLVED FIXED
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla9
Assigned To: Andrew McCreight [:mccr8]
:
Mentors:
Depends on: 694436
Blocks: 668695 638299 MemShrinkTools
  Show dependency treegraph
 
Reported: 2011-04-12 17:12 PDT by Andrew McCreight [:mccr8]
Modified: 2011-10-13 15:41 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
improve cycle collector logging information (11.05 KB, patch)
2011-04-25 17:39 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Review
Python parser library for new CC graph format (5.59 KB, text/x-python-script)
2011-04-26 12:36 PDT, Andrew McCreight [:mccr8]
no flags Details
Python graph reverser, using the parser I wrote (1.24 KB, text/x-python-script)
2011-04-26 12:37 PDT, Andrew McCreight [:mccr8]
no flags Details
Python cycle collector, results comparer (3.25 KB, text/x-python-script)
2011-04-26 12:50 PDT, Andrew McCreight [:mccr8]
no flags Details
Python parser library for new CC graph format (5.37 KB, text/x-python-script)
2011-05-06 11:50 PDT, Andrew McCreight [:mccr8]
no flags Details
first crack at a root finder (2.85 KB, text/x-python-script)
2011-05-06 16:57 PDT, Andrew McCreight [:mccr8]
no flags Details
single character edge label analyzer (1.42 KB, text/x-python-script)
2011-05-24 18:10 PDT, Andrew McCreight [:mccr8]
no flags Details
improve cycle collector logging information (15.09 KB, patch)
2011-05-26 11:07 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Review
Python parser library for new CC graph format (6.17 KB, text/x-python-script)
2011-05-26 15:59 PDT, Andrew McCreight [:mccr8]
no flags Details
Python cycle collector, results comparer (4.65 KB, text/x-python-script)
2011-05-27 09:20 PDT, Andrew McCreight [:mccr8]
no flags Details
make cycle collector logging output complete CC graph (15.12 KB, patch)
2011-06-22 10:45 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Review
make cycle collector logging output complete CC graph (15.87 KB, patch)
2011-06-28 12:11 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Review
make cycle collector logging output complete CC graph (15.90 KB, patch)
2011-07-14 16:00 PDT, Andrew McCreight [:mccr8]
peterv: review+
Details | Diff | Review

Description Andrew McCreight [:mccr8] 2011-04-12 17:12:14 PDT
I would like to make the cycle collector logger produce more complete information.

Goals:
- dump all info in the cycle collector graph to enable implementing a CC outside of FF for debugging purposes
- log information about optimizations (edge deletion, node merging, etc.) in a flexible way without interfering with tools that don't care about that info
- maintain the current node/edge info and human readability

These modifications will require altering nsICycleCollectorListener.idl and any of its clients.  I think the only client is nsCycleCollectorLogger?  It will also require changing any scripts that use the CC log info, which I think only includes peterv's and mine.

The logger has two phases, as now, but with a more clear divide in their purpose.  In the first, the CC graph is dumped.  This should contain complete information about the building of the graph.  The second phase contains complete information about the results of the cycle collection: internal counts calculated, roots, garbage nodes.

First phase:

node info:
  - ref counted: address, ref count, obj name
  - gced: address, isMarked, obj name
edge info:
  - target, edge name
Currently the source of the edge is also passed in, but this seems pointless, as the source is always the node visited prior to the edge (this is specified, and the Logger depends on it), so I'd remove that.

In addition, I'd pass in an additional string to indicate any optimization results.  If this node or edge was not optimized, the string will be null.

node concrete syntax would look like:
0x123567 [rc:12] {pruned by magic} JS Object
0x123567 [gc:marked] {pruned by science} JS Object
0x123567 [gc:unmarked] Foo bar
First is the address in whatever format.  Then a bracketed description.  A node is either refcounted (rc) or garbage collected (gc).  For a ref counted node, the ref count follows the colon, while for a gc'ed node, the colon is followed by either "marked" or "unmarked".  Something like "m" or "u" could be done instead, but that's a little harder to read.  After that, is an optional description of the optimization, bounded by {}.  I'm open to other suggestions, but [] won't work because at least for edge names some names are bounded by [].  Then, after a space and up until the end of the line is the optional name of the node.

Edge info would be the same as now except with the optional optimization tag:
> 0x193939 {pruned using math} my name goes here
> 0x292929 [global object]

I need to think a little more about what the best way to produce the results of cycle collection is.  We want to convey the set of roots, along with their respective external reference counts.  For non-roots, we want to know whether the CC classified the node as garbage or not.  It suffices to produce only one of these sets, and the set of garbage nodes is probably much smaller than non-garbage non-root nodes, so I'm inclined to go with the former, if dumping them won't be too much of a problem.

I'm open to suggestions on any of this.
Comment 1 Andrew McCreight [:mccr8] 2011-04-25 16:57:10 PDT
I've iterated on this a bit.

Ref counted objects are now indicated by [rc=12]
For GCed objects, unmarked objects are indicated by [gc], and marked are indicated with [gc.marked], as they are less common.

For representing the results, there are two kinds of lines:
0x122533c30 [garbage]
0x1051c9a80 [known=12]

In the first case, the CC has determined the object is garbage.  In the latter case, the CC has determined the object is a RCed root, and found 12 of the edges.  Comparing to the total RC in the object description will give how many unknown edges there are.  All other objects were either black GCed objects or RCed objects where all edges were found, but were not garbage.

I'm working on some python scripts to act as replacements for peterv's scripts, as changing the output format will obviously break them.  I wrote an edge reverser.  Actually reversing the edges is fairly fast, but actually parsing the file seems a bit pokey to me.  Reading a 610k line file takes 8 seconds on my computer.  It takes 5 seconds more to reverse it twice.  Probably bad regexps or something.
Comment 2 Andrew McCreight [:mccr8] 2011-04-25 17:39:43 PDT
Created attachment 528236 [details] [diff] [review]
improve cycle collector logging information

First attempt at a patch.

Currently, DescribeGarbage is done inside BeginCollection (all white objects are described as garbage), but possibly it might make sense to do this later, like inside CollectWhite where white nodes are actually cleaned up.
Comment 3 Andrew McCreight [:mccr8] 2011-04-26 12:36:33 PDT
Created attachment 528397 [details]
Python parser library for new CC graph format
Comment 4 Andrew McCreight [:mccr8] 2011-04-26 12:37:17 PDT
Created attachment 528398 [details]
Python graph reverser, using the parser I wrote

Doesn't do anything to the results, just dumps out the new graph reversed.
Comment 5 Andrew McCreight [:mccr8] 2011-04-26 12:50:54 PDT
Created attachment 528402 [details]
Python cycle collector, results comparer

This file checks the results of the FF CC in the following steps:
1. Parse the edge file using the Python lib I already uploaded.
2. Run a cycle collector on the CC graph.
3. Compare the two cycle collector results.

Cycle collector: I implemented a cycle collector in Python using a slightly higher level version of Bacon-Rajan 01.
1. compute internal counts (ref counts in the CC graph)
2. compute set of ref counted roots (RC nodes where internal count < ref count)
3. compute all RCed objects reachable from either RCed roots or marked GCed objects (these are the live objects, ie non-garbage objects)

Comparing results:
1. Are the sets of garbage objects identical?
2. Are the sets of RCed roots identical?  Are the internal counts computed for the RCed roots identical?

I've tested this a little. It works on three different edge log files without errors, and messing with the results a little bit (adding garbage, removing garbage, changing counts, etc.) produces the expected error messages.
Comment 6 Andrew McCreight [:mccr8] 2011-05-06 11:50:25 PDT
Created attachment 530682 [details]
Python parser library for new CC graph format

Tweaking the parser, speeding it up by about 25%.  Still pretty pokey.  About 1/4 of the time parsing is spent storing data about edges.  Takes 5.7 seconds to parse a 21MB on my laptop.  I'm going to see if PyPy helps.
Comment 7 Andrew McCreight [:mccr8] 2011-05-06 16:57:04 PDT
Created attachment 530775 [details]
first crack at a root finder

First crack at reimplmenting Peterv's script from Bug 466157 in Python for the new log format.  Still needs niceties like being able to select a node on the command line.  A few twists I added are being able to configure what is a root and what is not, based on my experience with helping roc and co investigate the Gmail leak.  For ref counted roots and JS roots, you can set them to be used as roots for the root finder or not.  Turning off JS roots generates a huge number of results, as you would expect, so I'm not sure how practical it is without some additional sophistication.

I added the notion of "stop node labels", which is a user-specified list of node labels.  The root finder will treat these as roots.  In the gmail case,  we were looking for nsRanges that held on to the windows in question.

Another fun little feature is that when there are multiple edges from one node to another, with separate labels, the script will print them all out.

I tried pypy but it was a few seconds slower on my large file.  The parsing takes long enough that I worry that the best route for the exploration of large graphs may be some kind of REPL, though that is fairly heavyweight.

Maybe I can come up with a format for storing edge labels that is faster, to speed up repeated accesses.  Right now, they are only consulted when we print out paths, so slower access wouldn't be too bad.

host-4-131:2 amccreight$ python ../../../find_roots.py cc-edges-1.log
Parsing edge file cc-edges-1.log... done.

0x1046e7220 [XPCWrappedNative (ChromeWindow)]
    --[mIdentity]-> 0x1046e3ef8 [nsGlobalWindow]

    Root 0x1046e7220 has a stop node label.

0x10491dc00 [nsDocument (XUL) chrome://browser/content/browser.xul]
    --[mScriptGlobalObject]-> 0x1046e3ef8 [nsGlobalWindow]

    Root 0x10491dc00 is a ref counted object.
    known edges:
       0x1052afb48 [nsGlobalWindow] --[mDoc,mDocument]-> 0x10491dc00
       0x10519cbe0 [XPCWrappedNative (XULDocument)] --[mIdentity]-> 0x10491dc00
       0x104a51c00 [nsPresContext] --[mDocument]-> 0x10491dc00
       0x1046e2b40 [nsXULCommandDispatcher] --[mDocument]-> 0x10491dc00
       0x1046e3ef8 [nsGlobalWindow] --[mDoc,mDocument]-> 0x10491dc00
Comment 8 Andrew McCreight [:mccr8] 2011-05-06 17:02:23 PDT
Ah, you can actually see in my sample output the multiple edges thing come into play.  0x1052afb48 is connected to 0x10491dc00 by two fields, mDoc and mDocument.
Comment 9 Andrew McCreight [:mccr8] 2011-05-13 14:53:53 PDT
On further thought, I think the optimization annotations {like this} aren't sufficiently flexible.  In the SCC condensation optimization, after a node is traversed, it is basically broken up and stuffed into some data structures.  Its fate will not be determined until later.  At that point, when the algorithm decides if the node will live or die, the edges are not really associated with their original node any more.  It is possible to figure out which edge they are related to, but that would require additional work.

I think it makes more sense to output both the original input CC graph that is "told" to the builder, and the pruned CC graph that is actually constructed.  These two graphs will either have to be intermixed (perhaps with the pruned graph description being preceeded by a special character) or put in separate files.

Anyways, I think for now I'll split out dealing with recording the pruned graph in a separate bug, once I have a better feel for how that should be handled.
Comment 10 Andrew McCreight [:mccr8] 2011-05-20 16:56:25 PDT
peterv: Are edge names like this expected?  This looks kind of... weird.  Is this the result of the patches you checked in recently?  Or maybe my logging patch is doing something weird?

0x13b157f70 [gc] JS Object (Object) (global=134b65990)
> 0x134b199d8 proto
> 0x134b65990 parent
> 0x13b15a068 P
> 0x13b15a0d0 t
> 0x13b15a138 r
> 0x13b15a1a0 I
> 0x13b15a208 n
> 0x13b15a270 f
> 0x13b15a2d8 o
0x13b176478 [gc] JS Object (Object) (global=134b65990)
> 0x134b199d8 proto
> 0x134b65990 parent
> 0x13b1764e0 s
> 0x13b176548 t
> 0x13b1765b0 r
> 0x13b176618 u
> 0x13b176680 c
> 0x13b176888  
> 0x13b17c208 {
> 0x13b181478 v
> 0x13b1814e0 o
> 0x13b181548 i
> 0x13b1815b0 d
> 0x13b1818f0 *
> 0x13b181a90 ;
Comment 11 Andrew McCreight [:mccr8] 2011-05-20 20:00:00 PDT
I mean Bug 652135.  There is something that adds new edge names, but only via |#_field|, and |{| and | | aren't valid field names, so it must be coming from some other place.  Weird edge names like ";", "{" and " " don't show up in any of my previous logs, and in fact only a single one of my logs from my recent run had these odd edge names, and it was the second to last CC of the run.
Comment 12 Peter Van der Beken [:peterv] 2011-05-23 09:03:57 PDT
Weird, I reread the patch and can't see how that would happen :-/.
Comment 13 Andrew McCreight [:mccr8] 2011-05-23 11:19:08 PDT
I don't think this is the problem here, but mNextEdgeName should probably be initialized to the empty string.  If the first time a Note*Child functions is called NoteNextEdgeName isn't called before it, and WantDebugInfo() is true, then it looks like uninitialized garbage will be copied in.

It appears that on subsequent calls, if an intervening NoteNextEdgeName is called, then you'll just get the empty string, which is what we want.  Though maybe if the callee frees the string in the meanwhile there could be a problem?  I don't know exactly how memory management of nsCStrings works.

I haven't found any particular instance of this that could have caused the above, but in NoteJSChild, if a trace is being done in a non-DEBUG build, no EdgeName will be set before NoteScriptChild is done.  Maybe this needs to be fixed?

If these sound like problems, Peter, I can file separate bugs on them.
Comment 14 Peter Van der Beken [:peterv] 2011-05-24 06:52:28 PDT
(In reply to comment #13)
> I don't think this is the problem here, but mNextEdgeName should probably be
> initialized to the empty string.

It is initialized to the empty string, nsCString does that in its constructor.
Comment 15 Peter Van der Beken [:peterv] 2011-05-24 07:02:15 PDT
Are these all JS objects? If so, looks like it would a problem with this code: http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/nsXPConnect.cpp#637
Comment 16 Andrew McCreight [:mccr8] 2011-05-24 11:58:47 PDT
(In reply to comment #14)
> It is initialized to the empty string, nsCString does that in its
> constructor.
Ah, that's good.  I'm not used to high level languages like C++. ;)

I think they are all JS Objects.  I'll write a little script to confirm this, then try to figure out what is happening.
Comment 17 Andrew McCreight [:mccr8] 2011-05-24 18:10:30 PDT
Created attachment 534956 [details]
single character edge label analyzer

This is a summary of the single-character edge names in my most recent log.  The source nodes were all JS Objects.  I stripped out anything following a " - " describing the type of the object. (So, "JS Object (Function - foo)" is listed under Function below.)  The list of characters is the list of edge names that there were.

Function: FPb
Object:  "#'*+,-./0123456789:;=>?ABCDEFGIJKLMNOPQRSTUVWXY[]^_`abcdefghijklmnopqrstuvwxyz{~
HTMLFormElement: q
Window: $_
Call: $ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
Array: 0189

Looking at some previous directories, only Object looks particularly unusual.
Comment 18 Andrew McCreight [:mccr8] 2011-05-26 11:07:37 PDT
Created attachment 535412 [details] [diff] [review]
improve cycle collector logging information

First we report the entire cycle collector graph.  Previously, this information was not quite complete.  With the new output, we can run a cycle collector algorithm using the log information.  NoteObject is split into NoteRefCountedObject and NoteGCedObject, because we need different information for each.  For the former, we report the ref count, and for the latter we report whether it is marked or not.  For NoteEdge, we no longer pass the source address of the edge, as it is already baked into the behavior that it must be an edge from the last node we noted.  As before, each node and edge can have a description.

After the cycle collector graph is dumped, we report the results of the cycle collection.  BeginDescriptions() is now BeginResults(). There are two types of results.  DescribeRoot() is used to report all black nodes where the number of internal references found is not equal to the ref count of the node.  DescribeGarbage() is used to report all nodes that were found to be garbage.  Any nodes that were not reported as either are not garbage and were found to have internal references equal to their ref count.  I don't explicitly report these because they are probably much more common than roots+garbage, so it will keep the log file size down a bit.

In contrast to the prior logging, garbage nodes are reported right in the middle of rooting and unlinking.  This is supposed to catch cases where an object is marked white, but not actually unlinked, but it does complicate the implementation a little by requiring that the listener be passed into FinishCollection and CollectWhite.
Comment 19 Andrew McCreight [:mccr8] 2011-05-26 15:59:28 PDT
Created attachment 535497 [details]
Python parser library for new CC graph format

Removed support for optimization descriptions, cleaned up control flow a little, added some simple printing functions.
Comment 20 Andrew McCreight [:mccr8] 2011-05-27 09:20:00 PDT
Created attachment 535657 [details]
Python cycle collector, results comparer

Changed the output of the Python CC to match the output of the Fx CC, to make comparing results simpler and more reliable.  Now the output must be identical.

Added a simple graph formation checks.

Made a simple example of a bad file, and the checker successfully finds the errors there.

Added support for checking multiple files at once.
  python check_cycle_collector.py cc-edges*.log
will check all the edge files in the current directory, for instance.

Ran it on membuster a few times, and it doesn't find any errors.  At some point, I'll do a full Mochitest etc. run and check the results.
Comment 21 Andrew McCreight [:mccr8] 2011-06-22 10:45:51 PDT
Created attachment 541105 [details] [diff] [review]
make cycle collector logging output complete CC graph

Un-bitrotting, plus some cleanup.  I'll probably put this up for review soon.
Comment 22 Andrew McCreight [:mccr8] 2011-06-23 12:33:35 PDT
Try server run was mostly okay, but I got "TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/sessionstore/test/browser/browser_394759.js | Test timed out" on both Linux debug and Linux64 debug Moth.  Rerunning Linux64 got the same error again.  This appears to have become a permaorange yesterday that was fixed early this morning, so I'll rebase and try the two Linux debugs again.
Comment 23 Andrew McCreight [:mccr8] 2011-06-28 12:11:22 PDT
Created attachment 542550 [details] [diff] [review]
make cycle collector logging output complete CC graph

Another round of un-bitrotting, in the wake of the recent enum removal.  Also probably a little bit of cleanup, too.  I'm going to land the new CC scripts in a separate bug.
Comment 24 Andrew McCreight [:mccr8] 2011-07-01 12:04:01 PDT
Try server run was good http://tbpl.mozilla.org/?tree=Try&rev=83e4a7eba790
Comment 25 Andrew McCreight [:mccr8] 2011-07-14 16:00:23 PDT
Created attachment 546026 [details] [diff] [review]
make cycle collector logging output complete CC graph

fixed some minor bit rotting in the patch
Comment 26 Andrew McCreight [:mccr8] 2011-08-12 10:46:14 PDT
Comment on attachment 546026 [details] [diff] [review]
make cycle collector logging output complete CC graph

review ping
Comment 27 Nicholas Nethercote [:njn] 2011-08-28 19:25:17 PDT
peterv, it's been 6 weeks.  Any chance you could review mccr8's patch?
Comment 28 Peter Van der Beken [:peterv] 2011-08-29 06:40:45 PDT
Comment on attachment 546026 [details] [diff] [review]
make cycle collector logging output complete CC graph

The reason NoteEdge took the start and end of the edge was that there might be listeners that need both, and this way they don't have to cache the start themselves (and the CC already caches it anyway). I guess it doesn't matter much.

I'm not sure I like it that if a refcounted object is a root then that's noted after beginResults, but if a GC'ed object is a root then that's noted in the description (.marked). I suppose the scripts just have to deal.
Comment 29 Andrew McCreight [:mccr8] 2011-08-29 06:45:21 PDT
(In reply to Peter Van der Beken [:peterv] from comment #28)
> The reason NoteEdge took the start and end of the edge was that there might
> be listeners that need both, and this way they don't have to cache the start
> themselves (and the CC already caches it anyway). I guess it doesn't matter
> much.
The start end thing seemed like a convenience feature, but nobody seems to actually use it, and it is pretty easy to work around.

> I'm not sure I like it that if a refcounted object is a root then that's
> noted after beginResults, but if a GC'ed object is a root then that's noted
> in the description (.marked). I suppose the scripts just have to deal.

I was thinking of it from the perspective of everything at the top is the input to the cycle collector, and the rest is the output, but I suppose it is a little odd for anything aside from a script that is doing a CC.
Comment 30 Peter Van der Beken [:peterv] 2011-08-29 07:17:51 PDT
(In reply to Andrew McCreight [:mccr8] from comment #29)
> I was thinking of it from the perspective of everything at the top is the
> input to the cycle collector, and the rest is the output, but I suppose it
> is a little odd for anything aside from a script that is doing a CC.

I see what you mean, I'm fine with this landing as is.
Comment 31 Andrew McCreight [:mccr8] 2011-08-31 11:40:16 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/220b6e404f1b
Comment 32 Ed Morley [:emorley] 2011-09-01 01:53:42 PDT
http://hg.mozilla.org/mozilla-central/rev/220b6e404f1b

Note You need to log in before you can comment on or make changes to this bug.