Closed Bug 377787 Opened 15 years ago Closed 10 years ago

Cycle collector performance tracking


(Core Graveyard :: Tracking, defect)

Not set


(Not tracked)



(Reporter: bzbarsky, Assigned: chofmann)


(Blocks 1 open bug)


(Keywords: meta, perf)


(1 file, 1 obsolete file)

This is a tracking bug for performance work on the cycle collector.
Depends on: 377606
Keywords: meta, perf
OS: Linux → All
Hardware: PC → All
Attached file Zipped-up profile as of today (obsolete) —
This is a profile of a build with the patches to bug 373693 applied.

To get this, I loaded a large bonsai query in one window, and browsed in another one.  Every so often I moused over the bonsai window to trigger traversal of all that stuff.

I started and stopped jprof programmatically around the three calls to nsCycleCollector_collect in nsJSEnvironment; if we have other entry points into it (other than shutdown), I'd love to know.

The profile largely shows hashtable stuff, actually.  I should try the patch in bug 377606; didn't realize it was there.
Actually, looks like I can't apply that on top of the patch to bug 373693.  I'll just wait for more stuff to land.
Attachment #261877 - Attachment is patch: false
Attachment #261877 - Attachment mime type: text/plain → application/zip
changing Browser.sessionhistory.max_total_viewers in about:config from -1 to 3 produced  a noticeable performance increase.  perhaps an at least temporary "workaround" would be to submit a patch that changes it to (at least) 3 upon install and maybe if its possible changes it if ff is already installed.

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a4pre) Gecko/20070417 Minefield/3.0a4pre - Build ID: 2007041704
Depends on: 377884
Depends on: 372110
Attached file Updated profile
This was generated the same way that the profile in comment 1 was generated, but in a build with the patches for bug 372110, bug 373693, bug 374872, bug 377606 applied (basically, all the patches checked in as of right now as far as I can tell).  Every so often I hovered the bonsai page to make sure it was purple (from the mousemove event dispatch).

There are 455673 hits under nsCycleCollector::Collect, broken down mostly as as:

             191121 nsCycleCollector::MarkRoots()
             164912 nsCycleCollector::ScanRoots()
              43962 nsCycleCollector::CollectWhite()
              31329 nsXPConnect::BeginCycleCollection()
               9144 PL_DHashTableFinish
               7560 nsXPConnect::FinishCycleCollection()
               6565 memset

nsXPConnect::BeginCycleCollection is spent under JS_GC, memset (allocating the hashtable, I guess), and PL_DHashTableFinish.

nsCycleCollector::CollectWhite just calls PL_DHashTableEnumerate, which pretty much just calls FindWhiteCallback.  About 40% of the time is spent in FindWhiteCallback itself, another 30% under nsCycleCollector::Forget, 5% QI on nsTextNodes, 5% UnmarkPurple on generic elements and text nodes, 10% in QI on various HTML elements.

nsCycleCollector::ScanRoots basically calls GraphWalker::Walk, which more or less calls nsCycleCollectionXPCOMRuntime::Traverse (there is a detour that gets there through nsXPConnect::Traverse calling GraphWalker::DescribeNode calling scanWalker::VisitNode calling GraphWalker::Walk).  The time under there is basically split between:

  canonicalize() -- 24% of ScanRoots
  PL_DHashTableOperate -- 27% of ScanRoots, split up by caller more or less as:
    50% -- GraphWalker::NoteXPCOMChild
    25% -- GraphWalker::Walk
    10% -- nsDocument::GetReference
    10% -- nsBindingManager::GetBinding
     3% -- GraphWalker::NoteScriptChild
     2% -- nsContentUtils::TraverseListenerManager
  ToParticipant -- 10% of ScanRoots.
  nsDeque push and pop -- 8% of ScanRoots (from Walk and NoteXPCOMChild).
  ScanBlackWalker::ShouldVisitNode -- 2.5% of ScanRoots
  In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of ScanRoots
  In (not under) nsGenericElement::cycleCollection::Traverse -- 4% of ScanRoots
  Various other non-hotspotty stuff.

nsCycleCollector::MarkRoots calls GraphWalker::Walk, which mostly calls nsCycleCollectionXPCOMRuntime::Traverse, etc.  From the bottom up, we have:

  canonicalize() -- 20% of MarkRoots
  PL_DHashTableOperate -- 37% of MarkRoots, split up by caller more or less as:
    66% -- GraphWalker::NoteXPCOMChild
    16% -- GraphWalker::Walk
     6% -- nsDocument::GetReference
     4% -- nsBindingManager::GetBinding
     2% -- GraphWalker::NoteScriptChild
     2% -- nsContentUtils::TraverseListenerManager
  ToParticipant -- 9% of MarkRoots.
  nsDeque push and pop -- 6% of MarkRoots (from Walk and NoteXPCOMChild).
  In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of MarkRoots
  In (not under) nsGenericElement::cycleCollection::Traverse -- 3% of ScanRoots
  Various other non-hotspotty stuff, I think.

Things that jump out at me:  canonicalize() and ToParticipant() together make up 
24% of the total cycle collection time.  I wonder whether there's some way to at least cache the results between ScanRoots and MarkRoots.

PL_DHashTableOperate makes up another 25% of the total.  Some of these tables we could try to optimize a bit; for example, could we use bits on nodes to indicate whether they have a binding or a preserved wrapper so that we don't have to hit the hashtable for all the ones that do not?

Basic problem really is that we're walking a _lot_ of stuff here.  :(  The pause I was profiling is about 3 seconds long on my machine, without jprof running.

Note:  I got some of the data above using -i and -e options to jprof; I'm not sure it's possible to get some of those numbers directly out of the profile that I'm attaching.
Attachment #261877 - Attachment is obsolete: true
Filed bug 378389 on the preserved wrapper table and bug 378390 on the binding manager table.
Depends on: 378389, 378390
Depends on: 378514
(In reply to comment #4)
 5% QI on
> nsTextNodes,  10% in QI on
> various HTML elements.

Could we improve QIing, changing the order of NS_INTERFACE_MAP_ENTRYs.
Yes, we could.  That could hurt performance elsewhere, of course....
Yes, improving QI is 1 of the 3 things we're going to try next:

1) Don't traverse some root objects (probably nsGlobalWindow and nsDocument) that
   hang on to a large number of other objects, if we can know for sure that they
   are alive (black). Sicking will file a bug and make a patch.
2) Improve QI by moving cycle collection entries. I will file a bug and make a
3) Record the traversal, so that we can play it back instead of traversing
   multiple times. Dbaron filed bug 378514 for this.
Depends on: 378595
Depends on: 378987
Depends on: 379593
Can we mark this fixed, or is there more work to do?
Depends on: 470684
Depends on: 491364
Depends on: 490695
I'm going to mark this fixed as it hasn't been touched in a few years.  Later CC perf tracking bugs: bug 549533 and bug 698919.
Blocks: 698919
Closed: 10 years ago
Resolution: --- → FIXED
No longer blocks: 698919
Blocks: 698919
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.