Closed Bug 638299 Opened 9 years ago Closed 8 years ago

analyze the Firefox cycle collector & heap [meta]

Categories

(Mozilla Labs :: Research, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: gal, Assigned: mccr8)

References

(Depends on 2 open bugs)

Details

Attachments

(14 files, 3 obsolete files)

2.41 KB, text/plain
Details
1.56 KB, text/plain
Details
1.04 KB, text/plain
Details
1.15 KB, text/plain
Details
1.94 KB, text/plain
Details
508 bytes, text/plain
Details
1.39 KB, text/x-python-script
Details
27.50 KB, text/x-python-script
Details
1.48 MB, application/pdf
Details
144.32 KB, application/pdf
Details
57.92 KB, application/pdf
Details
50.11 KB, application/pdf
Details
1.32 KB, text/plain
Details
49.95 KB, text/x-python-script
Details
No description provided.
Assignee: nobody → continuation
I've noticed that the number of objects that are Suspected() is greater than the number of objects that are Forgotten(), by a few hundred per collection (which is less than 1% of suspected objects).  This violates the "purple safety" described in the cycle collector comments.

I should be able to log when objects are Suspected and Forgotten, and figure out which objects are purple-unsafe, in the manner of David Baron's Leak Gauge.
Explanation of the example cycle collector statistics: each line starts with the name of the variable being categorized.  Then it has the values of each variable at the end of a particular cycle collection.  So, the 3rd value for mCollectedNode has the number of collected nodes during the third cycle collection.

mCollectedNode: number of nodes that were collected.  This does not include any JS objects that may have been part of the cycle.

mSuspectNode/mForgetNode: number of times the Suspect or Forget functions are called.  Suspect is called when the number of references to an object drops to a non-zero value, and forget is called when the number of references drops to zero or increases.

mPrePurpleRoots: number of JS objects added as cycle collector roots

setColorBlack/setColorWhite: number of times these colors were set.  setColorWhite is smaller than collected node because the color can go from white to black before collection.

visitedNode: number of nodes added to the graph the CC traverses
visited{GCMarked,GCUnmarked,RefCounted}Node: splits visitedNode into 3 categories depending on the type of the object being visited.  I think GCMarked = marked black by JS GC and GCUnmarked = marked grey by JS GC.

walkedGraph: number of nodes visited during scan phase

The number of black JS objs visited roughly equals the number of grey JS objs visited, which is about double the number of ref counted objs visited.

At a glance, some of the collections look fairly sad, like collection 9 when over 200k objects are visited only to free 7 objects.  No telling how many JS objects that freed, though, or how big the involved objects are.
> mCollectedNode: number of nodes that were collected.  This does not include any
> JS objects that may have been part of the cycle.

The CC breaks the cycles, the next JS GC will dispose of these.
free_distrib.txt gives the distribution of the classes of the cycle collection participants freed by the cycle collector.  The top 3 account for 54.5%, 18.9% and 10.4% of the objects collected.  The 4th is only 3.2% of objects collected.

Unfortunately, I haven't figured out how to get the name of the classes, so the first column of the data is the address the cycle collector participant is located at.  The most common class has an address that is really far from the others, which is probably interesting.

I tried to make a Traverse to extract the name (DescribeNode is fed the name), but for some reason it just crashes the browser, even with a trivial visitor.
Attachment #516746 - Attachment mime type: application/octet-stream → text/plain
Attachment #516652 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #1)
> I've noticed that the number of objects that are Suspected() is greater than
> the number of objects that are Forgotten(), by a few hundred per collection
> (which is less than 1% of suspected objects).

Counting between what points? Note that unlinking might resuspect an object (if it was unlinked but not collected).
By collected I mean freed. This can happen because some of our unlinking implementations are incomplete (mostly due to uncertainty about the safety of unlinking all members) and because of the delayed unlinking of JS objects (delayed until the next GC).
There are more Suspects than Forgets both within a collector epoch (though some do have more Forgets than Suspects), and across the lifespan of the program.  Here's an example: 2040, 397, 169, 22, 291, 634, -249, 3904, -3151.  Which is a total of 4057 more Suspects than Forgets.  This is out of almost 1.3 million Suspects, so it is a pretty small percentage.  I'm mostly curious about it because the cycle collector comments are very dire about the consequences of not being purple safe.

That's a good point about Suspects and Forgets that happen during collection.  I can try to count them separately.
Looking at the code some more, the suspect and forget counters are not incremented when the scan is in progress, so there can't actually be any during the collection.  But reading over your comment again, I guess you meant ones that happen outside of a collection.
(In reply to comment #12)
> Looking at the code some more, the suspect and forget counters are not
> incremented when the scan is in progress, so there can't actually be any during
> the collection.

I don't think that's true, we ignore suspect/forget during traversal but not during unlinking. We should change the early return during traversal to asserts actually, we don't support addref/release (and thus suspect/forget) during traversal anymore since we changed the QI to nsCycleCollectionISupports to not do any refcounting.
Differences between Suspects and Forgets over the lifespan of the program sounds worrying though.
Though I guess technically you're both right and wrong :-). We ignore suspect and forget during traversal/scan, but not during unlinking, so over the whole collection (traversal+unlinking) we might get some new suspects.
Ah, right, I see that now.

I noticed that the existing CC pointer logging actually is logging Suspect/Forgets, so I removed my own code to do the same.  I wrote a little log processor (added as an attachment above).  The log processor matches up suspects and forgets, noting when something is suspected twice in a row, or forgotten without being suspected.  It also shows how many objects are "live" at the end (suspected, but not forgotten).

I went to a few random JS-heavy sites, and out of about 1.9 million Suspects, there were 4139 double suspects, and 66 objects left "live" at the end.  The total of these, 4205, matches the suspect/forget discrepancy generated by the stat logging.  That split is probably not very meaningful.

There were 0 instances of an unsuspected node being forgotten.
Interesting. I wonder what those objects are.
I added a little flag that indicates what the current CC phase is, and then print that out whenever a suspect or forget is done.  This results in the following split counts for a short sample run on some JS heavy pages:

donegc: S=2759628, F=2661503, 
initialized: S=236107, F=229110, 
purple: F=99769, 
prep: S=3190, F=2790, 
forcedGC: S=92, F=2476, 
collectWhite: S=60826, F=55278, 

initialized and donegc are when the CC is running (initialized is the first chunk prior to the first time the CC is run).  Purple is the pseudo-forgetting done when the purple buffer is cleared out.  prep is the phase between the start of the CC and when the forced GC is potentially done.  I'm not sure what code is even run there.  forcedGC covers the part where the JS GC is forced to run.  collectWhite is during CollectWhite(), as you may guess.

There is a decent amount of suspect/forget activity during CollectWhite().  Most of this probably involves objects that are being cleared anyways (that will be removed from the purple buffers by the end of this phase), but some of this may involve objects outside of garbage cycles the collector has found.

I think it should be sound to clear the purple buffers at the end of this phase, as the cycle collector should find all garbage nodes: if removing an edge from a garbage node creates a garbage cycle, then the collector should already have discovered that new garbage cycle.

Just lopping out the forget/suspects done during the GC didn't get rid of all the unbalanced forget/suspects.

For a different, though similar, run this is the number of elements in the purple buffer after each CC: 164,49,54,1,0,159,21,21,0,2014,0.  So there are a few things in there, though a fairly small number compared to other sources of roots.
Andreas pointed out that ptrInfo has the class of each object already sitting around, so I was able to finish my code that spits out what objects are freed.  I had to combine a bunch of categories in the output because there were too many, but hopefully this gives a little bit of a sense of it.  There's more info about what kinds of JS Object, etc. show up if that is relevant.  "Other" is the combination of classes that showed up less than 100 times each.

63% are JS objects
14.9% are nsGenericElement (xhtml)
7.5% are nsGenericDomDataNode

The next largest category is only 2.4% of the total, so it falls off quickly.
I just remembered that DEBUG_CC turns on WANT_ALL_TRACES by default, so all of the stats I've given so far are not accurate, as they are ignoring the JC color information.  That seems to drop mSetColorBlack from about 500k to 150k, which is a much less dire number.  GCMarkedNode drops from 200k to 12k, etc.
I found an off by one error in the enumeration of purple roots in debugging code (see bug 639675).  This accounts for 1 mismatched Suspect/Forget pair in every cycle collection with at least one purple root.
Depends on: 639675
I did some further investigations into the Suspect/Forget mismatches.  I realized that Suspect adds something to the purple buffer, and Forget removes it. At the same time, the purple buffer is cleared near the start of a collection.  Thus it isn't really possible to "leak" a Suspect: every suspect must either be matched by a Forget or it will be cleared out during the cycle collection.

It also means that the span we should consider when tracking the lifetime of a purple object is between clearings of the purple buffer, rather than from the end of one collection to the start of the next.  The latter split produces weird results because as noted above Suspects occur during the collection.

Once I had this idea, I could see in the log the precise moment objects should have been forgotten, but where this fact was not actually logged.  I started with an easy case, where 2 objects were in the purple buffer, but only 1 was recorded, which led me to the bug in the previous comment.

One thing I did to try to find that bug was to dump out the objects that are added to the graph by the language runtimes.  I noticed that one of the unaccounted for objects was in this set of objects.  After messing with my scripts a bit, I found that all of the remaining unaccounted for Suspected objects were added to the graph by the runtime.

I'm not entirely sure what that means, but at least I can account for what happens to all of the purple nodes.
With mark-based pruning in place, many less objects are visited by the JS.  On one JS-heavy example I looked at the split of node visits was:
GC marked nodes 40k
GCunmarked nodes 210k
refcounted node 242k
With pruning in place, the ratio of marked:unmarked:RCed is about 0.16:0.88:1.  Compare this to a previous run without pruning, where the ratio was about 2.9:2.8:1.

Similarly, with pruning about 1 object was collected for every 4 black objects marked, compared to without pruning where 1 object was collected for every 124 black objects marked.

So the pruning really does cut out a huge number of nodes.  Sorry for the skewed statistics before.
"visited by the JS" should be "visited by the CC"
Can you define pruning.
I just mean the "pruning" that occurs when WANT_ALL_TRACES is not set (ie WantAllTraces() is false).  When this flag is not set, the CC will only add wrapped natives that are gray to the model heap, will skip JSContexts with outstanding requests, will only visit JS children that are gray, and won't traverse marked JS objects.
Ok, I think we should probably use the terms "full graph" and "gray graph" then. Also, dbaron pointed out the gray part of the graph can be smaller. We should measure that too. I will ask him for the bug # again.
Depends on: 640853
I've been working on visualization of the heap that the cycle collector traverses. With the DEBUG_CC compile flag set, and the XPCOM_DRAW_GRAPHS env var set, the existing FF code dumps information about the shadow heap that the collector builds.  It also dumps out information about the type of each reachable node (GCed vs ref counted), but not for non-root nodes.  This patch makes it so for every reachable node you can tell if it is a root or not, and whether it lives in the GC or ref counted heap.  Any object that appears in the graph, but has not had any other info about its node recorded must be a node the CC found to be garbage.
This Python script processes the graph information dumped by the CC with XPCOM_DRAW_GRAPHS turned on, and turns it into a DOT format, which can be processed by the Graphviz tools.  Use sfdp, as the other Graphviz tools can't handle large graphs.

GCed objects are squares, ref counted objects are circles.  Filled objects are roots.  If a GCed object is a root, that means the JS GC found that it is reachable from a root.  If an RCed object is a root, that means the CC found that it is a root.  Objects that the CC has decided are garbage are triangles.

This is a dumb script, that just reformats the input file, but it still is enough to give you a general sense of what the heap looks like.  It was enough to easily see that a bunch of the garbage cycles had a common form: a star with 9 points.  By tweaking the script a bit, and digging around in the raw graph dump, Andreas and I were able to find the class of the objects involved in these cycles.  Andreas filed this as bug 640853.  That bug also has an example dot file for a whole heap.
Attached file heap graph analyzer (obsolete) —
This Python script also parses in heap graphs, but it performs some analysis.  It parses the file into a graph, then uses a union-find algorithm with path compression to split the single graph into the constituent disjoint subgraphs.

It then finds all duplicates of certain graphs of size 1, 2 and 3 to reduce the clutter a bit.  If two graphs have the same shape, and the same node types, they will only be printed once.  The one that is printed has a count on it that indicates how many of that kind of graph were found.

After this combining, it dumps out a dot file.

There's also an option to carry out a ref count based collection on the input, which eliminates nodes not reachable from loops, which tend to be boring.

There are a couple of uses of this script that I can think of.  One is that it is easy to filter out graphs of certain sizes, if you want to look at one by itself (using a tool beside sfdp if you can carve out a piece that is small enough), or carry out other modifications to eliminate clutter.  Another is that by "Pythonizing" real Firefox cycle collector heap data I can carry out experiments on the effect of changes to the CC algorithm, and write those experiments in Python, which is much faster than modifying the C++ code, both because of a fast edit-compile loop and the high level nature of the language.  It takes less than a second to process the data with my current sloppily written script, so the speed seems sufficient for this size of data set.
One fun little bit of data that I have extracted is the distribution of the sizes of these disconnected subgraphs that the CC visits.  The data is in the following format: graphSize=numberOfGraphs(totalNumberOfNodes)

1=1925(1925), 2=2898(5796), 3=532(1596), 4=30(120), 5=39(195), 6=12(72), 7=435(3045), 8=4(32), 9=6(54), 10=8(80), 11=32(352), 12=1(12), 13=18(234), 14=2(28), 15=2(30), 17=5(85), 18=2(36), 19=5(95), 20=6(120), 21=9(189), 23=1(23), 27=2(54), 29=1(29), 33=1(33), 35=1(35), 42=1(42), 49=2(98), 52=2(104), 55=2(110), 73=2(146), 81=1(81), 94=1(94), 222=1(222), 330=1(330), 374=1(374), 488=1(488), 617=1(617), 11501=1(11501),

Out of a total of 30k nodes, about a third are of graphs of size 1, 2 and 3.  Another third is in that single megastructure at the end.  In this cycle collection, the megastructure contained about 1100 garbage nodes.

Anyways, I think a huge number of the tiny graphs (and a fair number of the larger graphs) can be quickly eliminated, because they are almost all acyclic, and thus of no interest to the CC.  In many of these graphs, all nodes within a step or two have no out edges, and thus must be acyclic.

Not adding these nodes and edges to the model graph will save memory, and speed up the later scanning phases.  Andreas said I should file a bug for it, so I'll do that, and try to quantify exactly how many nodes could be eliminated using various approaches, then try to implement the approaches that seem most effective in the collector itself.
This is awesome work Andrew!
Depends on: 641243
I added some code to my object heap grapher to parse in the pointer_log to mark nodes that are purple (part of the initial root set).  The duplication removing code doesn't take purpleness into account yet, so using them together isn't necessarily accurate.  A pretty huge percentage of nodes are marked purple, so it isn't very useful.  There may be a bug in my code somewhere.
I noticed that the JS parts of the graphs tend to have a lot of identical nodes: let's define two JS nodes as identical if they have the same parents and the same children.  If a and b are identical, then we can remove b from the model graph without affecting the results. b is a JS node so we are never going to free it anyways.

Here are some stats from some consecutive CCs:

3015 / 23972 duplicate nodes ( 12 %)
994 / 11820 duplicate nodes ( 8 %)
23438 / 55459 duplicate nodes ( 42 %)
23800 / 47834 duplicate nodes ( 49 %)
23973 / 46726 duplicate nodes ( 51 %)
0 / 8684 duplicate nodes ( 0 %)
0 / 7660 duplicate nodes ( 0 %)

When the model heap is small, there isn't much duplication, but as the webpage loading frenzy reaches its crescendo, huge JS data structures are pulled into the CC graph, which have a ton of duplication (remember, only JS nodes can be duplicates, so this is a lower bound on the number of JS nodes!).  This is also kind of nice in that the sorts of CC graphs that have a lot of duplication seem to also have lower levels of acyclic nodes.  However, note that some duplicate nodes will also be acyclic, and thus would be removed by the other analysis.

Unfortunately, I can't really think of any way to compute these on the fly.  The biggest problem is computing the set of parents of an object.
This patch adds some additional logging information to DEBUG_CC, to give more complete information for a creating a picture.  Mostly just minor tweaks to existing infrastructure.

To compile with this active, you must make sure DEBUG_CC is set in nsCycleCollector.h and nsCycleCollectionParticipant.h

Once you have built it, set the following environment variables:

export NSPR_LOG_MODULES=CycleCollectorStats:5
export NSPR_LOG_FILE=my_log_file.txt

export XPCOM_CC_DRAW_GRAPHS=1
  // enable dumping of the CC graph
export XPCOM_CC_LOG_POINTERS=1
  // log suspects and forgets, and some info about the purpler buffer
export XPCOM_CC_REPORT_STATS=1
  // calc misc stats that can be handy

This will cause FF to dump out 3 kinds of log files
  cc-edges-*.log   
  pointer_info
  my_log_file.txt
On windows, they will be in ., on the mac they will be in /
I have a couple of Python scripts that will process these logs and produce various analysis or graphs.  I will upload the latest version of those next, along with some explanation of how to use them.
Attachment #518878 - Attachment is obsolete: true
This is the latest revision of my CC heap graph analysis script.  It takes the logging output produced by the CC described in attachment 521871 [details] [diff] [review] and produces a PDF file as output.  As part of the parsing process, it splits the data into Python data structures, which can be further analyzed.  The script also includes some things like acyclicity analysis and duplication analysis.  There are also a variety options of controlling what exactly is displayed on the graph, which can be useful for exploring various structures.  Read the start of the script to find the various options.
Attachment #518893 - Attachment is obsolete: true
I was reading over "Efficient On-the-Fly Cycle Collection" by Paz et al, which is a follow up to the paper that the current cycle collector is based on.  They present what might be an interesting optimization for Firefox.  The basic idea is generational purple buffers.  Instead of having a single purple buffer, you have a number of them.  When an object gets suspected, it is knocked back to the youngest purple buffer.  This is okay to do because if an object is suspected, it must have been live at that point.  When a CC happens, only the oldest generation is actually traced, then the rest of the objects are moved into an old generation.

One twist for our setting compared to the paper is that the JS runtime adds a bunch of objects to be traced.  This is kind of a Suspect of those objects, but since the JS runtime keeps around a handle for those objects, I don't know that we can infer that an object being repeatedly Suspected by the JS runtime implies that it is actually live.  In my modelling below, I have taken a conservative approach and just always added the JS runtime objects to the set of objects to be traced (removing them from the rest of the purple buffer where applicable).  This accounts for 20% to 60% of objects.

This reduces the number of objects that are scanned in two ways:
1. It takes longer for an object to be scanned, so it is more likely it will die in that time.
2. Knocking re-suspected objects to the young buffer means that active objects won't be scanned.

I'm not sure how much the first effect really helps, because it seems like you are just trading a delay in when garbage is collected for improved time performance.  On the other hand, the incremental nature of the generational buffer allows you to make this tradeoff without making the purple buffer too big, saving space and reducing pause times compared to just not running the CC as often.

Using some suspect/forget log data I have sitting around, I rigged up a Python script to measure the second effect.  These are for a fairly short run with 7 CCs.  This isn't quite a base Firefox, but close enough I think.  The total includes all objects in the buffer at the end of the CCs, which includes a higher % of objects as the number of generations increases.  I'm not entirely show how those should be accounted for.

1 generation (should be same as default behavior): 65217
2 generations: 53250
3 generations: 50875
4 generations: 49341
5 generations: 47736

5 generations is an improvement of 26.8% over the base behavior.  This is only a crude proxy for CC size, as a radically different number of objects are reachable from each purple root.  I could imagine that the roots that tend to survive are smaller, but who knows.

I could hack up a prototype of this by keeping a single purple buffer, but adding an age field to each entry.  When a purple object is suspected, the age field is reset to 0.  When the purple buffer is scanned, objects at the max age are added to the graph, and the age of all other objects is incremented.  There's some dire warning about how changing the representation of the purple buffer will require going to Suspect3 and Forget3, but hopefully I can rig it together well enough to at least get some numbers.
Depends on: 649456
Depends on: 649532
Depends on: 653019
Depends on: 653191
Need to visualize CC once bug 335998 has a patch which doesn't leak.
(I think I'm down to one leak - xul rdf templates need still some tweaking.)
This is a more "pure" fixing up of the CC logging information than my previous patch, which was a bit of a monster.  The basic problem is that the CC logger doesn't print out information about the type of non-root nodes.  With this patch in place, it still won't print out info about the type of garbage nodes, but garbage nodes are fairly rare so it shouldn't matter.  My script relies on data being in this format.

To use this with my script:
1. enable DEBUG_CC nsCycleCollector.h and nsCycleCollectionParticipant.h
2. comment out the sole call to ExplainLiveExpectedGarbage() in nsCycleCollector.cpp due to Bug 652985.
3. apply this patch
4. set the XPCOM_CC_DRAW_GRAPHS environment variable
export XPCOM_CC_DRAW_GRAPHS=1
5. run Firefox.  It will dump out a log file of the form cc-edges-*.log that contains edge information.  On my Mac, it puts these files in /, but I think in Windows and Linux they will be in the directory you invoke FF from.
6. run the python log processor (cc_grapher2 above):
python cc_grapher2 1
would process cc-edges-1.log and produce cc-edges-1.dot as out.
7. turn the dot file into a pdf file using GraphViz
sfdp -Tpdf -O cc-edges-1.dot
(on smaller files dot and twopi sometimes work)

WANT_ALL_TRACES tends to make the graphs stupendously immense, so if you are mostly interested in DOM stuff you should comment out the part of the code where it is set.  You still want the other one, DEBUG_INFO or whatnot.

I'll try to upload my updated script at some point (though the only notable additional feature is collapsing death stars).
Attachment #521871 - Attachment is obsolete: true
Depends on: 658386
Depends on: 658672
Depends on: 666417
Depends on: 668695
Depends on: 663920
Depends on: 670283
Depends on: 671661
Depends on: 697115
Depends on: 697134
Bug 649532 landed in September, and made the CC log dump [1] produce more complete information.  I've started updating my CC-dump-to-graphviz-format script to work with this new format, using my parsing library.  Grab my heap graph repo at github [2] and use the misnamed dotify.py script.  I don't yet have all of the nice features working, but it kind of works.

[1] https://wiki.mozilla.org/Performance:Leak_Tools#Cycle_collector_heap_dump
[2] https://github.com/amccreight/heapgraph/tree/master/cc
Depends on: 698629
Depends on: 698919
Depends on: 701878
Depends on: 706164
No longer depends on: 698919
Depends on: 698919
I haven't really put anything in here in a while so I'm just going to close it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.