Closed
Bug 556133
Opened 14 years ago
Closed 2 years ago
GC Performance Graphs
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
INACTIVE
People
(Reporter: gwagner, Unassigned)
Details
Attachments
(13 files)
9.14 KB,
image/png
|
Details | |
9.37 KB,
image/png
|
Details | |
9.43 KB,
image/png
|
Details | |
1.75 KB,
text/plain
|
Details | |
6.31 KB,
image/png
|
Details | |
6.55 KB,
image/png
|
Details | |
6.37 KB,
image/png
|
Details | |
9.10 KB,
image/png
|
Details | |
7.75 KB,
image/png
|
Details | |
7.75 KB,
image/png
|
Details | |
5.39 KB,
image/png
|
Details | |
5.29 KB,
image/png
|
Details | |
5.94 KB,
image/png
|
Details |
All our GC performance numbers and graphs are posted in different bugs and hard to find. Collecting the graphs makes it easier to keep track of the GC performance.
Reporter | ||
Comment 1•14 years ago
|
||
The GC pause time since last week increased on this benchmark. The time we spend in DestoyChunks increased from about 40 to 60 cycles * 1E6. Is there a bug that caused that? I also applied the OS X update today. Maybe that's the reason.
Reporter | ||
Comment 2•14 years ago
|
||
I also cleaned my mozconfig file and removed all the crosscompile flags that were necessary for OS X 10.6 at the beginning. The GC pause time looks much worse with 64 bit :( The DestroyChunk function is the bottleneck!
Reporter | ||
Comment 3•14 years ago
|
||
This graph is created with an optimized browser build. I start the clock benchmark, open the firefox topcrash page, close it after about 20 seconds and close the browser. The graph shows that long lived objects are the bottleneck here. It seems we spend all the time in the object finalizer where we iterate over live objects. The compartment approach should solve this bottleneck.
Comment 4•14 years ago
|
||
(In reply to comment #3) > The graph shows that long lived objects are the bottleneck here. ... > It seems we spend all the time in the object finalizer where we iterate over > live objects. From the graph I see that the mark time is significantly less than object finalization time. The mark time includes marking of live objects. This is significantly more complex than skipping over them during the finalization. From that I see that it is finalization of short-lived objects that takes time, not long-lived. Thus it would be interesting to know what exactly in FinalizeObject from jsgc.cpp takes most of the time.
Reporter | ||
Comment 5•14 years ago
|
||
> From the graph I see that the mark time is significantly less than object
> finalization time. The mark time includes marking of live objects. This is
> significantly more complex than skipping over them during the finalization.
> From that I see that it is finalization of short-lived objects that takes time,
> not long-lived. Thus it would be interesting to know what exactly in
> FinalizeObject from jsgc.cpp takes most of the time.
I will dig deeper into it and see where we spend all the time.
Safari in comparison calls the GC more often but the GC pause is much shorter.
A typical GC run with the clock benchmark and the crash page looks like:
mark: 21.839928, sweep: 0.000064, m+s: 21.839992, end-enter: 21.862544
mark: 22.412496, sweep: 0.000064, m+s: 22.412560, end-enter: 22.435472
Comment 6•14 years ago
|
||
Question: How do I read these graphs? In particular, what is the X axis?
Reporter | ||
Comment 7•14 years ago
|
||
The x-axis represents the GC events. So one entry on the X axis per GC run. The (left) Y axis represents the cycles[1e6] we spend for the GC run. The total value is cumulative mark and sweep. The sweep phase is also divided into finalize objects, doubles and string.
Reporter | ||
Comment 8•14 years ago
|
||
I was wrong with the guess that the long living objects are the bottleneck. It seems that we run the GC less frequent and therefore generate more garbage. The Finalizer function for Objects sees the following: Just Clock benchmark: thingKind: 0, cycles[1e6]: 69.333576 , arenas: 13845, objects: 899925, finalizer called: 876077 Clock benchmark + open crashstat page thingKind: 0, cycles[1e6]: 238.623136 , arenas: 47875, objects: 3111875, finalizer called: 3036280 The cycles represent the cycles we spend in the FinalizeArenaList function for objects. arenas represent iterations in the first for(;;) loop objects represent iterations in the second for (;; thing += sizeof(T), gcCellIndex += sizeof(T) >> GC_CELL_SHIFT) finalizer called counts how often the finalizer was called. So we have about 3-4 times more dead objects and therefore the GC pause is much longer. For the crashstat page I use following URL: http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6
Reporter | ||
Comment 9•14 years ago
|
||
Reporter | ||
Updated•14 years ago
|
Assignee: general → anygregor
Reporter | ||
Comment 10•13 years ago
|
||
Time for some updates here.
Summary: TM: GC Performance Graphs → GC Performance Graphs
Reporter | ||
Comment 11•13 years ago
|
||
I wish I could add more graphs per post... Ok lets start with the V8 benchmark suite from the browser for Firefox 5. The GC pauses are about 20 - 22 msec that are mainly dominated by sweeping. The splay benchmark at the end of the suite is dominated by marking and sweeping and the GC pauses are up to 140 msec. Legend for the graphs: X-Axis: GC Events Y-Axis: red line: Total GC pause time in msec. green line: Marking Time blue line: Sweeping Time (Marking and Sweeping add up to total Time) For example the first GC takes 26msec where we spend 3msec marking and 21msec sweeping. It does not completely add up to the total time because there is other code involved that is not covered.
Reporter | ||
Comment 12•13 years ago
|
||
The current situation looks like follows: Background finalization of objects and strings (bug 627200) and the new marking stack (bug 616666) improved the situation. The time we spend in sweeping is almost completely gone. We have to identify if we can get rid of the 15msec sweepings in the middle of the benchmark suite. The spikes for the splay benchmarks at the end of the suite reduced from about 140 msec to around 55msec. The marking time is now the overall bottleneck.
Reporter | ||
Comment 13•13 years ago
|
||
The patch for bug 638660 tries to minimize the marking time by adding another thread that helps marking. The overall marking time is already very small with about 2-7msec but it reduces by another 20-40%. The marking time in the splay benchmark reduces from around 54msec to 36msec. So we reduced the GC pause time for the splay benchmark from 140msec (FF5) to 40msec. I guess that is the most we can get out of the current GC model as short term fixes.
Comment 14•13 years ago
|
||
Nice results. For the record, do you have the V8 benchmark total scores for those configurations?
Reporter | ||
Comment 15•13 years ago
|
||
(In reply to comment #14) > Nice results. For the record, do you have the V8 benchmark total scores for > those configurations? Sure. FF5 is a TM revision from the day after the FF5 branch. You know how "stable" the results are but the splay benchmark definitely shows a nice improvement. FF5: Score: 4536 Richards: 7925 DeltaBlue: 4661 Crypto: 7725 RayTrace: 3420 EarleyBoyer: 4598 RegExp: 1667 Splay: 5281 Current Trunk: Score: 4837 Richards: 7904 DeltaBlue: 4893 Crypto: 8701 RayTrace: 4091 EarleyBoyer: 4768 RegExp: 1655 Splay: 5704 With parallel marking: Score: 4927 Richards: 8014 DeltaBlue: 4979 Crypto: 8597 RayTrace: 4046 EarleyBoyer: 4772 RegExp: 1741 Splay: 6112
Comment 16•13 years ago
|
||
A 10% improvement overall => Excellent!
Reporter | ||
Comment 17•13 years ago
|
||
Lets continue with the Flight of the Navigator. For the FF5 implementation the GC pause time is caused by about two third marking and one third sweeping. Excluding startup and shutdown, the average GC pause time is about 120msec with peaks of 155msec.
Reporter | ||
Comment 18•13 years ago
|
||
The background finalization optimization reduced the average pause time to about 95msec. The bottleneck is now clearly the marking time.
Reporter | ||
Comment 19•13 years ago
|
||
The parallel Marking patch helps a lot. Excluding startup and shutdown, we have now total GC pauses of about 70msec with a peak of 80msec. Overall an reduction from 120msec with a peak of 155msec to 70msec with a peak of 80msec.
Reporter | ||
Comment 20•13 years ago
|
||
And the best one at the end! For the realtime raytracer the mark/sweep ration is about 50/50 in FF5. [http://29a.ch/2010/6/2/realtime-raytracing-in-javascript] Our average GC pause time is 210msec.
Reporter | ||
Comment 21•13 years ago
|
||
The background finalization helped a lot and our average GC pause time reduced to 93msec.
Reporter | ||
Comment 22•13 years ago
|
||
The parallel marking patch reduces the average marking time from 90msec to 74msec and the overall GC pause time is now about 77msec. There might still be room for some fine tuning in order to get better results. In comparison to FF5 we reduced the GC pause time from 210msec to 77msec!
Comment 23•2 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months.
:sdetar, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee: anygregor → nobody
Flags: needinfo?(sdetar)
Updated•2 years ago
|
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(sdetar)
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•