Closed
Bug 377787
Opened 18 years ago
Closed 14 years ago
Cycle collector performance tracking
Categories
(Core Graveyard :: Tracking, defect)
Core Graveyard
Tracking
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: chofmann)
References
Details
(Keywords: meta, perf)
Attachments
(1 file, 1 obsolete file)
70.69 KB,
application/zip
|
Details |
This is a tracking bug for performance work on the cycle collector.
Updated•18 years ago
|
![]() |
Reporter | |
Comment 1•18 years ago
|
||
This is a profile of a build with the patches to bug 373693 applied.
To get this, I loaded a large bonsai query in one window, and browsed in another one. Every so often I moused over the bonsai window to trigger traversal of all that stuff.
I started and stopped jprof programmatically around the three calls to nsCycleCollector_collect in nsJSEnvironment; if we have other entry points into it (other than shutdown), I'd love to know.
The profile largely shows hashtable stuff, actually. I should try the patch in bug 377606; didn't realize it was there.
![]() |
Reporter | |
Comment 2•18 years ago
|
||
Actually, looks like I can't apply that on top of the patch to bug 373693. I'll just wait for more stuff to land.
![]() |
Reporter | |
Updated•18 years ago
|
Attachment #261877 -
Attachment is patch: false
Attachment #261877 -
Attachment mime type: text/plain → application/zip
changing Browser.sessionhistory.max_total_viewers in about:config from -1 to 3 produced a noticeable performance increase. perhaps an at least temporary "workaround" would be to submit a patch that changes it to (at least) 3 upon install and maybe if its possible changes it if ff is already installed.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a4pre) Gecko/20070417 Minefield/3.0a4pre - Build ID: 2007041704
![]() |
Reporter | |
Comment 4•18 years ago
|
||
This was generated the same way that the profile in comment 1 was generated, but in a build with the patches for bug 372110, bug 373693, bug 374872, bug 377606 applied (basically, all the patches checked in as of right now as far as I can tell). Every so often I hovered the bonsai page to make sure it was purple (from the mousemove event dispatch).
There are 455673 hits under nsCycleCollector::Collect, broken down mostly as as:
191121 nsCycleCollector::MarkRoots()
164912 nsCycleCollector::ScanRoots()
43962 nsCycleCollector::CollectWhite()
31329 nsXPConnect::BeginCycleCollection()
9144 PL_DHashTableFinish
7560 nsXPConnect::FinishCycleCollection()
6565 memset
nsXPConnect::BeginCycleCollection is spent under JS_GC, memset (allocating the hashtable, I guess), and PL_DHashTableFinish.
nsCycleCollector::CollectWhite just calls PL_DHashTableEnumerate, which pretty much just calls FindWhiteCallback. About 40% of the time is spent in FindWhiteCallback itself, another 30% under nsCycleCollector::Forget, 5% QI on nsTextNodes, 5% UnmarkPurple on generic elements and text nodes, 10% in QI on various HTML elements.
nsCycleCollector::ScanRoots basically calls GraphWalker::Walk, which more or less calls nsCycleCollectionXPCOMRuntime::Traverse (there is a detour that gets there through nsXPConnect::Traverse calling GraphWalker::DescribeNode calling scanWalker::VisitNode calling GraphWalker::Walk). The time under there is basically split between:
canonicalize() -- 24% of ScanRoots
PL_DHashTableOperate -- 27% of ScanRoots, split up by caller more or less as:
50% -- GraphWalker::NoteXPCOMChild
25% -- GraphWalker::Walk
10% -- nsDocument::GetReference
10% -- nsBindingManager::GetBinding
3% -- GraphWalker::NoteScriptChild
2% -- nsContentUtils::TraverseListenerManager
ToParticipant -- 10% of ScanRoots.
nsDeque push and pop -- 8% of ScanRoots (from Walk and NoteXPCOMChild).
ScanBlackWalker::ShouldVisitNode -- 2.5% of ScanRoots
In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of ScanRoots
In (not under) nsGenericElement::cycleCollection::Traverse -- 4% of ScanRoots
Various other non-hotspotty stuff.
nsCycleCollector::MarkRoots calls GraphWalker::Walk, which mostly calls nsCycleCollectionXPCOMRuntime::Traverse, etc. From the bottom up, we have:
canonicalize() -- 20% of MarkRoots
PL_DHashTableOperate -- 37% of MarkRoots, split up by caller more or less as:
66% -- GraphWalker::NoteXPCOMChild
16% -- GraphWalker::Walk
6% -- nsDocument::GetReference
4% -- nsBindingManager::GetBinding
2% -- GraphWalker::NoteScriptChild
2% -- nsContentUtils::TraverseListenerManager
ToParticipant -- 9% of MarkRoots.
nsDeque push and pop -- 6% of MarkRoots (from Walk and NoteXPCOMChild).
In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of MarkRoots
In (not under) nsGenericElement::cycleCollection::Traverse -- 3% of ScanRoots
Various other non-hotspotty stuff, I think.
Things that jump out at me: canonicalize() and ToParticipant() together make up
24% of the total cycle collection time. I wonder whether there's some way to at least cache the results between ScanRoots and MarkRoots.
PL_DHashTableOperate makes up another 25% of the total. Some of these tables we could try to optimize a bit; for example, could we use bits on nodes to indicate whether they have a binding or a preserved wrapper so that we don't have to hit the hashtable for all the ones that do not?
Basic problem really is that we're walking a _lot_ of stuff here. :( The pause I was profiling is about 3 seconds long on my machine, without jprof running.
Note: I got some of the data above using -i and -e options to jprof; I'm not sure it's possible to get some of those numbers directly out of the profile that I'm attaching.
Attachment #261877 -
Attachment is obsolete: true
![]() |
Reporter | |
Comment 5•18 years ago
|
||
Filed bug 378389 on the preserved wrapper table and bug 378390 on the binding manager table.
Depends on: 378514
(In reply to comment #4)
5% QI on
> nsTextNodes, 10% in QI on
> various HTML elements.
Could we improve QIing, changing the order of NS_INTERFACE_MAP_ENTRYs.
![]() |
Reporter | |
Comment 7•18 years ago
|
||
Yes, we could. That could hurt performance elsewhere, of course....
Comment 8•18 years ago
|
||
Yes, improving QI is 1 of the 3 things we're going to try next:
1) Don't traverse some root objects (probably nsGlobalWindow and nsDocument) that
hang on to a large number of other objects, if we can know for sure that they
are alive (black). Sicking will file a bug and make a patch.
2) Improve QI by moving cycle collection entries. I will file a bug and make a
patch.
3) Record the traversal, so that we can play it back instead of traversing
multiple times. Dbaron filed bug 378514 for this.
Depends on: 378987
Comment 9•17 years ago
|
||
Can we mark this fixed, or is there more work to do?
Depends on: 470684
Depends on: 491364
Depends on: 490695
Comment 10•14 years ago
|
||
I'm going to mark this fixed as it hasn't been touched in a few years. Later CC perf tracking bugs: bug 549533 and bug 698919.
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•