Last Comment Bug 377787 - Cycle collector performance tracking
: Cycle collector performance tracking
Status: RESOLVED FIXED
: meta, perf
Product: Core Graveyard
Classification: Graveyard
Component: Tracking (show other bugs)
: Trunk
: All All
: -- normal with 5 votes (vote)
: ---
Assigned To: chris hofmann
: chris hofmann
Mentors:
Depends on: 372110 373693 373694 374872 377606 377884 378389 378390 378514 378595 378987 379593 470684 490695 491364
Blocks: 698919 cycle-collector
  Show dependency treegraph
 
Reported: 2007-04-17 10:19 PDT by Boris Zbarsky [:bz] (still a bit busy)
Modified: 2016-07-15 12:13 PDT (History)
31 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Zipped-up profile as of today (590.78 KB, application/zip)
2007-04-17 16:04 PDT, Boris Zbarsky [:bz] (still a bit busy)
no flags Details
Updated profile (70.69 KB, application/zip)
2007-04-21 23:59 PDT, Boris Zbarsky [:bz] (still a bit busy)
no flags Details

Description Boris Zbarsky [:bz] (still a bit busy) 2007-04-17 10:19:24 PDT
This is a tracking bug for performance work on the cycle collector.
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2007-04-17 16:04:57 PDT
Created attachment 261877 [details]
Zipped-up profile as of today

This is a profile of a build with the patches to bug 373693 applied.

To get this, I loaded a large bonsai query in one window, and browsed in another one.  Every so often I moused over the bonsai window to trigger traversal of all that stuff.

I started and stopped jprof programmatically around the three calls to nsCycleCollector_collect in nsJSEnvironment; if we have other entry points into it (other than shutdown), I'd love to know.

The profile largely shows hashtable stuff, actually.  I should try the patch in bug 377606; didn't realize it was there.
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2007-04-17 16:06:26 PDT
Actually, looks like I can't apply that on top of the patch to bug 373693.  I'll just wait for more stuff to land.
Comment 3 echoes 2007-04-17 22:06:49 PDT
changing Browser.sessionhistory.max_total_viewers in about:config from -1 to 3 produced  a noticeable performance increase.  perhaps an at least temporary "workaround" would be to submit a patch that changes it to (at least) 3 upon install and maybe if its possible changes it if ff is already installed.

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a4pre) Gecko/20070417 Minefield/3.0a4pre - Build ID: 2007041704
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2007-04-21 23:59:08 PDT
Created attachment 262393 [details]
Updated profile

This was generated the same way that the profile in comment 1 was generated, but in a build with the patches for bug 372110, bug 373693, bug 374872, bug 377606 applied (basically, all the patches checked in as of right now as far as I can tell).  Every so often I hovered the bonsai page to make sure it was purple (from the mousemove event dispatch).

There are 455673 hits under nsCycleCollector::Collect, broken down mostly as as:

             191121 nsCycleCollector::MarkRoots()
             164912 nsCycleCollector::ScanRoots()
              43962 nsCycleCollector::CollectWhite()
              31329 nsXPConnect::BeginCycleCollection()
               9144 PL_DHashTableFinish
               7560 nsXPConnect::FinishCycleCollection()
               6565 memset

nsXPConnect::BeginCycleCollection is spent under JS_GC, memset (allocating the hashtable, I guess), and PL_DHashTableFinish.

nsCycleCollector::CollectWhite just calls PL_DHashTableEnumerate, which pretty much just calls FindWhiteCallback.  About 40% of the time is spent in FindWhiteCallback itself, another 30% under nsCycleCollector::Forget, 5% QI on nsTextNodes, 5% UnmarkPurple on generic elements and text nodes, 10% in QI on various HTML elements.

nsCycleCollector::ScanRoots basically calls GraphWalker::Walk, which more or less calls nsCycleCollectionXPCOMRuntime::Traverse (there is a detour that gets there through nsXPConnect::Traverse calling GraphWalker::DescribeNode calling scanWalker::VisitNode calling GraphWalker::Walk).  The time under there is basically split between:

  canonicalize() -- 24% of ScanRoots
  PL_DHashTableOperate -- 27% of ScanRoots, split up by caller more or less as:
    50% -- GraphWalker::NoteXPCOMChild
    25% -- GraphWalker::Walk
    10% -- nsDocument::GetReference
    10% -- nsBindingManager::GetBinding
     3% -- GraphWalker::NoteScriptChild
     2% -- nsContentUtils::TraverseListenerManager
  ToParticipant -- 10% of ScanRoots.
  nsDeque push and pop -- 8% of ScanRoots (from Walk and NoteXPCOMChild).
  ScanBlackWalker::ShouldVisitNode -- 2.5% of ScanRoots
  In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of ScanRoots
  In (not under) nsGenericElement::cycleCollection::Traverse -- 4% of ScanRoots
  Various other non-hotspotty stuff.

nsCycleCollector::MarkRoots calls GraphWalker::Walk, which mostly calls nsCycleCollectionXPCOMRuntime::Traverse, etc.  From the bottom up, we have:

  canonicalize() -- 20% of MarkRoots
  PL_DHashTableOperate -- 37% of MarkRoots, split up by caller more or less as:
    66% -- GraphWalker::NoteXPCOMChild
    16% -- GraphWalker::Walk
     6% -- nsDocument::GetReference
     4% -- nsBindingManager::GetBinding
     2% -- GraphWalker::NoteScriptChild
     2% -- nsContentUtils::TraverseListenerManager
  ToParticipant -- 9% of MarkRoots.
  nsDeque push and pop -- 6% of MarkRoots (from Walk and NoteXPCOMChild).
  In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of MarkRoots
  In (not under) nsGenericElement::cycleCollection::Traverse -- 3% of ScanRoots
  Various other non-hotspotty stuff, I think.

Things that jump out at me:  canonicalize() and ToParticipant() together make up 
24% of the total cycle collection time.  I wonder whether there's some way to at least cache the results between ScanRoots and MarkRoots.

PL_DHashTableOperate makes up another 25% of the total.  Some of these tables we could try to optimize a bit; for example, could we use bits on nodes to indicate whether they have a binding or a preserved wrapper so that we don't have to hit the hashtable for all the ones that do not?

Basic problem really is that we're walking a _lot_ of stuff here.  :(  The pause I was profiling is about 3 seconds long on my machine, without jprof running.

Note:  I got some of the data above using -i and -e options to jprof; I'm not sure it's possible to get some of those numbers directly out of the profile that I'm attaching.
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2007-04-22 13:04:45 PDT
Filed bug 378389 on the preserved wrapper table and bug 378390 on the binding manager table.
Comment 6 Olli Pettay [:smaug] (way behind * queues, especially ni? queue) 2007-04-24 01:33:04 PDT
(In reply to comment #4)
 5% QI on
> nsTextNodes,  10% in QI on
> various HTML elements.

Could we improve QIing, changing the order of NS_INTERFACE_MAP_ENTRYs.
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2007-04-24 02:06:37 PDT
Yes, we could.  That could hurt performance elsewhere, of course....
Comment 8 Peter Van der Beken [:peterv] 2007-04-24 02:18:27 PDT
Yes, improving QI is 1 of the 3 things we're going to try next:

1) Don't traverse some root objects (probably nsGlobalWindow and nsDocument) that
   hang on to a large number of other objects, if we can know for sure that they
   are alive (black). Sicking will file a bug and make a patch.
2) Improve QI by moving cycle collection entries. I will file a bug and make a
   patch.
3) Record the traversal, so that we can play it back instead of traversing
   multiple times. Dbaron filed bug 378514 for this.
Comment 9 Peter Van der Beken [:peterv] 2008-01-30 18:37:39 PST
Can we mark this fixed, or is there more work to do?
Comment 10 Andrew McCreight [:mccr8] 2011-11-30 08:43:30 PST
I'm going to mark this fixed as it hasn't been touched in a few years.  Later CC perf tracking bugs: bug 549533 and bug 698919.

Note You need to log in before you can comment on or make changes to this bug.