Status

()

9 years ago
7 years ago

People

(Reporter: gwagner, Assigned: gwagner)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(13 attachments)

(Assignee)

Description

9 years ago
All our GC performance numbers and graphs are posted in different bugs and hard to find.
Collecting the graphs makes it easier to keep track of the GC performance.
(Assignee)

Comment 1

9 years ago
Created attachment 436054 [details]
Canopy Mac

The GC pause time since last week increased on this benchmark.
The time we spend in DestoyChunks increased from about 40 to 60 cycles * 1E6.
Is there a bug that caused that? I also applied the OS X update today. Maybe that's the reason.
(Assignee)

Comment 2

9 years ago
Created attachment 436058 [details]
Canopy Mac not cross compiled

I also cleaned my mozconfig file and removed all the crosscompile flags that were necessary for OS X 10.6 at the beginning. 

The GC pause time looks much worse with 64 bit :(
The DestroyChunk function is the bottleneck!
(Assignee)

Comment 3

9 years ago
Created attachment 445256 [details]
Tip: clock benchmark + topcrash site

This graph is created with an optimized browser build.
I start the clock benchmark, open the firefox topcrash page, close it after about 20 seconds and close the browser.

The graph shows that long lived objects are the bottleneck here. 
It seems we spend all the time in the object finalizer where we iterate over live objects.

The compartment approach should solve this bottleneck.

Comment 4

9 years ago
(In reply to comment #3)
> The graph shows that long lived objects are the bottleneck here.
...
> It seems we spend all the time in the object finalizer where we iterate over
> live objects.

From the graph I see that the mark time is significantly less than object finalization time. The mark time includes marking of live objects. This is significantly more complex than skipping over them during the finalization. From that I see that it is finalization of short-lived objects that takes time, not long-lived. Thus it would be interesting to know what exactly in FinalizeObject from jsgc.cpp takes most of the time.
(Assignee)

Comment 5

9 years ago
> From the graph I see that the mark time is significantly less than object
> finalization time. The mark time includes marking of live objects. This is
> significantly more complex than skipping over them during the finalization.
> From that I see that it is finalization of short-lived objects that takes time,
> not long-lived. Thus it would be interesting to know what exactly in
> FinalizeObject from jsgc.cpp takes most of the time.

I will dig deeper into it and see where we spend all the time.

Safari in comparison calls the GC more often but the GC pause is much shorter.
A typical GC run with the clock benchmark and the crash page looks like:
mark: 21.839928, sweep: 0.000064, m+s: 21.839992, end-enter: 21.862544
mark: 22.412496, sweep: 0.000064, m+s: 22.412560, end-enter: 22.435472
Question: How do I read these graphs? In particular, what is the X axis?
(Assignee)

Comment 7

9 years ago
The x-axis represents the GC events. So one entry on the X axis per GC run.
The (left) Y axis represents the cycles[1e6] we spend for the GC run. 
The total value is cumulative mark and sweep. 
The sweep phase is also divided into finalize objects, doubles and string.
(Assignee)

Comment 8

9 years ago
I was wrong with the guess that the long living objects are the bottleneck.
It seems that we run the GC less frequent and therefore generate more garbage.

The Finalizer function for Objects sees the following: 
Just Clock benchmark:
thingKind: 0, cycles[1e6]: 69.333576 , arenas: 13845, objects: 899925, finalizer called: 876077

Clock benchmark + open crashstat page
thingKind: 0, cycles[1e6]: 238.623136 , arenas: 47875, objects: 3111875, finalizer called: 3036280

The cycles represent the cycles we spend in the FinalizeArenaList function for objects.
arenas represent iterations in the first for(;;) loop
objects represent iterations in the second  for (;; thing += sizeof(T), gcCellIndex += sizeof(T) >> GC_CELL_SHIFT)
finalizer called counts how often the finalizer was called.

So we have about 3-4 times more dead objects and therefore the GC pause is much longer.

For the crashstat page I use following URL:
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6
(Assignee)

Comment 9

9 years ago
Created attachment 458035 [details]
Current V8 GC Performance
(Assignee)

Updated

9 years ago
Assignee: general → anygregor
(Assignee)

Comment 10

8 years ago
Time for some updates here.
Summary: TM: GC Performance Graphs → GC Performance Graphs
(Assignee)

Comment 11

8 years ago
Created attachment 528440 [details]
V8 browser benchmark. FF5

I wish I could add more graphs per post...
Ok lets start with the V8 benchmark suite from the browser for Firefox 5.
The GC pauses are about 20 - 22 msec that are mainly dominated by sweeping. 
The splay benchmark at the end of the suite is dominated by marking and sweeping and the GC pauses are up to 140 msec. 

Legend for the graphs:
X-Axis: GC Events
Y-Axis: 
red line:   Total GC pause time in msec.
green line: Marking Time
blue line:  Sweeping Time
(Marking and Sweeping add up to total Time)
For example the first GC takes 26msec where we spend 3msec marking and 21msec sweeping. It does not completely add up to the total time because there is other code involved that is not covered.
(Assignee)

Comment 12

8 years ago
Created attachment 528441 [details]
V8 browser benchmark. Current Trunk

The current situation looks like follows:
Background finalization of objects and strings (bug 627200) and the new marking stack (bug 616666) improved the situation.

The time we spend in sweeping is almost completely gone. We have to identify if we can get rid of the 15msec sweepings in the middle of the benchmark suite.
The spikes for the splay benchmarks at the end of the suite reduced from about 140 msec to around 55msec. 
The marking time is now the overall bottleneck.
(Assignee)

Comment 13

8 years ago
Created attachment 528443 [details]
V8 browser benchmark. With parallel marking.

The patch for bug 638660 tries to minimize the marking time by adding another thread that helps marking.

The overall marking time is already very small with about 2-7msec but it reduces by another 20-40%.
The marking time in the splay benchmark reduces from around 54msec to 36msec.
So we reduced the GC pause time for the splay benchmark from 140msec (FF5) to 40msec.

I guess that is the most we can get out of the current GC model as short term fixes.
Nice results. For the record, do you have the V8 benchmark total scores for those configurations?
(Assignee)

Comment 15

8 years ago
(In reply to comment #14)
> Nice results. For the record, do you have the V8 benchmark total scores for
> those configurations?

Sure. FF5 is a TM revision from the day after the FF5 branch.
You know how "stable" the results are but the splay benchmark definitely shows a nice improvement.

FF5:
Score: 4536
Richards: 7925
DeltaBlue: 4661
Crypto: 7725
RayTrace: 3420
EarleyBoyer: 4598
RegExp: 1667
Splay: 5281

Current Trunk:
Score: 4837
Richards: 7904
DeltaBlue: 4893
Crypto: 8701
RayTrace: 4091
EarleyBoyer: 4768
RegExp: 1655
Splay: 5704

With parallel marking:
Score: 4927
Richards: 8014
DeltaBlue: 4979
Crypto: 8597
RayTrace: 4046
EarleyBoyer: 4772
RegExp: 1741
Splay: 6112
A 10% improvement overall => Excellent!
(Assignee)

Comment 17

8 years ago
Created attachment 528462 [details]
Flight of the navigator. FF5

Lets continue with the Flight of the Navigator.
For the FF5 implementation the GC pause time is caused by about two third marking and one third sweeping.
Excluding startup and shutdown, the average GC pause time is about 120msec with peaks of 155msec.
(Assignee)

Comment 18

8 years ago
Created attachment 528463 [details]
Flight of the navigator. Current Trunk

The background finalization optimization reduced the average pause time to about 95msec. The bottleneck is now clearly the marking time.
(Assignee)

Comment 19

8 years ago
Created attachment 528464 [details]
Flight of the navigator. With parallel Marking

The parallel Marking patch helps a lot.
Excluding startup and shutdown, we have now total GC pauses of about 70msec with a peak of 80msec.

Overall an reduction from 120msec with a peak of 155msec to 70msec with a peak of 80msec.
(Assignee)

Comment 20

8 years ago
Created attachment 528472 [details]
Realtime raytracer. FF5

And the best one at the end!
For the realtime raytracer the mark/sweep ration is about 50/50 in FF5.
[http://29a.ch/2010/6/2/realtime-raytracing-in-javascript]

Our average GC pause time is 210msec.
(Assignee)

Comment 21

8 years ago
Created attachment 528474 [details]
Realtime raytracer. Current Trunk

The background finalization helped a lot and our average GC pause time reduced to 93msec.
(Assignee)

Comment 22

8 years ago
Created attachment 528476 [details]
Realtime raytracer. With parallel marking.

The parallel marking patch reduces the average marking time from 90msec to 74msec and the overall GC pause time is now about 77msec.
There might still be room for some fine tuning in order to get better results.

In comparison to FF5 we reduced the GC pause time from 210msec to 77msec!
You need to log in before you can comment on or make changes to this bug.