Closed Bug 556133 Opened 14 years ago Closed 2 years ago

GC Performance Graphs

Tracking

()

Status:

RESOLVED INACTIVE

People

(Reporter: gwagner, Unassigned)

Details

Attachments

(13 files)

Canopy Mac 14 years ago Gregor Wagner [:gwagner] 9.14 KB, image/png		Details
Canopy Mac not cross compiled 14 years ago Gregor Wagner [:gwagner] 9.37 KB, image/png		Details
Tip: clock benchmark + topcrash site 14 years ago Gregor Wagner [:gwagner] 9.43 KB, image/png		Details
Current V8 GC Performance 14 years ago Gregor Wagner [:gwagner] 1.75 KB, text/plain		Details
V8 browser benchmark. FF5 13 years ago Gregor Wagner [:gwagner] 6.31 KB, image/png		Details
V8 browser benchmark. Current Trunk 13 years ago Gregor Wagner [:gwagner] 6.55 KB, image/png		Details
V8 browser benchmark. With parallel marking. 13 years ago Gregor Wagner [:gwagner] 6.37 KB, image/png		Details
Flight of the navigator. FF5 13 years ago Gregor Wagner [:gwagner] 9.10 KB, image/png		Details
Flight of the navigator. Current Trunk 13 years ago Gregor Wagner [:gwagner] 7.75 KB, image/png		Details
Flight of the navigator. With parallel Marking 13 years ago Gregor Wagner [:gwagner] 7.75 KB, image/png		Details
Realtime raytracer. FF5 13 years ago Gregor Wagner [:gwagner] 5.39 KB, image/png		Details
Realtime raytracer. Current Trunk 13 years ago Gregor Wagner [:gwagner] 5.29 KB, image/png		Details
Realtime raytracer. With parallel marking. 13 years ago Gregor Wagner [:gwagner] 5.94 KB, image/png		Details

Gregor Wagner [:gwagner]

Reporter

Description

•

14 years ago

All our GC performance numbers and graphs are posted in different bugs and hard to find.
Collecting the graphs makes it easier to keep track of the GC performance.

Gregor Wagner [:gwagner]

Reporter

Comment 1

•

14 years ago

Attached image Canopy Mac — Details

The GC pause time since last week increased on this benchmark.
The time we spend in DestoyChunks increased from about 40 to 60 cycles * 1E6.
Is there a bug that caused that? I also applied the OS X update today. Maybe that's the reason.

Gregor Wagner [:gwagner]

Reporter

Comment 2

•

14 years ago

Attached image Canopy Mac not cross compiled — Details

I also cleaned my mozconfig file and removed all the crosscompile flags that were necessary for OS X 10.6 at the beginning. 

The GC pause time looks much worse with 64 bit :(
The DestroyChunk function is the bottleneck!

Gregor Wagner [:gwagner]

Reporter

Comment 3

•

14 years ago

Attached image Tip: clock benchmark + topcrash site — Details

This graph is created with an optimized browser build.
I start the clock benchmark, open the firefox topcrash page, close it after about 20 seconds and close the browser.

The graph shows that long lived objects are the bottleneck here. 
It seems we spend all the time in the object finalizer where we iterate over live objects.

The compartment approach should solve this bottleneck.

Igor Bukanov

Comment 4

•

14 years ago

(In reply to comment #3)
> The graph shows that long lived objects are the bottleneck here.
...
> It seems we spend all the time in the object finalizer where we iterate over
> live objects.

From the graph I see that the mark time is significantly less than object finalization time. The mark time includes marking of live objects. This is significantly more complex than skipping over them during the finalization. From that I see that it is finalization of short-lived objects that takes time, not long-lived. Thus it would be interesting to know what exactly in FinalizeObject from jsgc.cpp takes most of the time.

Gregor Wagner [:gwagner]

Reporter

Comment 5

•

14 years ago

> From the graph I see that the mark time is significantly less than object
> finalization time. The mark time includes marking of live objects. This is
> significantly more complex than skipping over them during the finalization.
> From that I see that it is finalization of short-lived objects that takes time,
> not long-lived. Thus it would be interesting to know what exactly in
> FinalizeObject from jsgc.cpp takes most of the time.

I will dig deeper into it and see where we spend all the time.

Safari in comparison calls the GC more often but the GC pause is much shorter.
A typical GC run with the clock benchmark and the crash page looks like:
mark: 21.839928, sweep: 0.000064, m+s: 21.839992, end-enter: 21.862544
mark: 22.412496, sweep: 0.000064, m+s: 22.412560, end-enter: 22.435472

David Mandelin [:dmandelin]

Comment 6

•

14 years ago

Question: How do I read these graphs? In particular, what is the X axis?

Gregor Wagner [:gwagner]

Reporter

Comment 7

•

14 years ago

The x-axis represents the GC events. So one entry on the X axis per GC run.
The (left) Y axis represents the cycles[1e6] we spend for the GC run. 
The total value is cumulative mark and sweep. 
The sweep phase is also divided into finalize objects, doubles and string.

Gregor Wagner [:gwagner]

Reporter

Comment 8

•

14 years ago

I was wrong with the guess that the long living objects are the bottleneck.
It seems that we run the GC less frequent and therefore generate more garbage.

The Finalizer function for Objects sees the following: 
Just Clock benchmark:
thingKind: 0, cycles[1e6]: 69.333576 , arenas: 13845, objects: 899925, finalizer called: 876077

Clock benchmark + open crashstat page
thingKind: 0, cycles[1e6]: 238.623136 , arenas: 47875, objects: 3111875, finalizer called: 3036280

The cycles represent the cycles we spend in the FinalizeArenaList function for objects.
arenas represent iterations in the first for(;;) loop
objects represent iterations in the second  for (;; thing += sizeof(T), gcCellIndex += sizeof(T) >> GC_CELL_SHIFT)
finalizer called counts how often the finalizer was called.

So we have about 3-4 times more dead objects and therefore the GC pause is much longer.

For the crashstat page I use following URL:
http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6

Gregor Wagner [:gwagner]

Reporter

Comment 9

•

14 years ago

Attached file Current V8 GC Performance — Details

Gregor Wagner [:gwagner]

Reporter

Updated

•

14 years ago

Assignee: general → anygregor

Gregor Wagner [:gwagner]

Reporter

Comment 10

•

13 years ago

Time for some updates here.

Summary: TM: GC Performance Graphs → GC Performance Graphs

Gregor Wagner [:gwagner]

Reporter

Comment 11

•

13 years ago

Attached image V8 browser benchmark. FF5 — Details

I wish I could add more graphs per post...
Ok lets start with the V8 benchmark suite from the browser for Firefox 5.
The GC pauses are about 20 - 22 msec that are mainly dominated by sweeping. 
The splay benchmark at the end of the suite is dominated by marking and sweeping and the GC pauses are up to 140 msec. 

Legend for the graphs:
X-Axis: GC Events
Y-Axis: 
red line:   Total GC pause time in msec.
green line: Marking Time
blue line:  Sweeping Time
(Marking and Sweeping add up to total Time)
For example the first GC takes 26msec where we spend 3msec marking and 21msec sweeping. It does not completely add up to the total time because there is other code involved that is not covered.

Gregor Wagner [:gwagner]

Reporter

Comment 12

•

13 years ago

Attached image V8 browser benchmark. Current Trunk — Details

The current situation looks like follows:
Background finalization of objects and strings (bug 627200) and the new marking stack (bug 616666) improved the situation.

The time we spend in sweeping is almost completely gone. We have to identify if we can get rid of the 15msec sweepings in the middle of the benchmark suite.
The spikes for the splay benchmarks at the end of the suite reduced from about 140 msec to around 55msec. 
The marking time is now the overall bottleneck.

Gregor Wagner [:gwagner]

Reporter

Comment 13

•

13 years ago

Attached image V8 browser benchmark. With parallel marking. — Details

The patch for bug 638660 tries to minimize the marking time by adding another thread that helps marking.

The overall marking time is already very small with about 2-7msec but it reduces by another 20-40%.
The marking time in the splay benchmark reduces from around 54msec to 36msec.
So we reduced the GC pause time for the splay benchmark from 140msec (FF5) to 40msec.

I guess that is the most we can get out of the current GC model as short term fixes.

David Mandelin [:dmandelin]

Comment 14

•

13 years ago

Nice results. For the record, do you have the V8 benchmark total scores for those configurations?

Gregor Wagner [:gwagner]

Reporter

Comment 15

•

13 years ago

(In reply to comment #14)
> Nice results. For the record, do you have the V8 benchmark total scores for
> those configurations?

Sure. FF5 is a TM revision from the day after the FF5 branch.
You know how "stable" the results are but the splay benchmark definitely shows a nice improvement.

FF5:
Score: 4536
Richards: 7925
DeltaBlue: 4661
Crypto: 7725
RayTrace: 3420
EarleyBoyer: 4598
RegExp: 1667
Splay: 5281

Current Trunk:
Score: 4837
Richards: 7904
DeltaBlue: 4893
Crypto: 8701
RayTrace: 4091
EarleyBoyer: 4768
RegExp: 1655
Splay: 5704

With parallel marking:
Score: 4927
Richards: 8014
DeltaBlue: 4979
Crypto: 8597
RayTrace: 4046
EarleyBoyer: 4772
RegExp: 1741
Splay: 6112

David Mandelin [:dmandelin]

Comment 16

•

13 years ago

A 10% improvement overall => Excellent!

Gregor Wagner [:gwagner]

Reporter

Comment 17

•

13 years ago

Attached image Flight of the navigator. FF5 — Details

Lets continue with the Flight of the Navigator.
For the FF5 implementation the GC pause time is caused by about two third marking and one third sweeping.
Excluding startup and shutdown, the average GC pause time is about 120msec with peaks of 155msec.

Gregor Wagner [:gwagner]

Reporter

Comment 18

•

13 years ago

Attached image Flight of the navigator. Current Trunk — Details

The background finalization optimization reduced the average pause time to about 95msec. The bottleneck is now clearly the marking time.

Gregor Wagner [:gwagner]

Reporter

Comment 19

•

13 years ago

Attached image Flight of the navigator. With parallel Marking — Details

The parallel Marking patch helps a lot.
Excluding startup and shutdown, we have now total GC pauses of about 70msec with a peak of 80msec.

Overall an reduction from 120msec with a peak of 155msec to 70msec with a peak of 80msec.

Gregor Wagner [:gwagner]

Reporter

Comment 20

•

13 years ago

Attached image Realtime raytracer. FF5 — Details

And the best one at the end!
For the realtime raytracer the mark/sweep ration is about 50/50 in FF5.
[http://29a.ch/2010/6/2/realtime-raytracing-in-javascript]

Our average GC pause time is 210msec.

Gregor Wagner [:gwagner]

Reporter

Comment 21

•

13 years ago

Attached image Realtime raytracer. Current Trunk — Details

The background finalization helped a lot and our average GC pause time reduced to 93msec.

Gregor Wagner [:gwagner]

Reporter

Comment 22

•

13 years ago

Attached image Realtime raytracer. With parallel marking. — Details

The parallel marking patch reduces the average marking time from 90msec to 74msec and the overall GC pause time is now about 77msec.
There might still be room for some fine tuning in order to get better results.

In comparison to FF5 we reduced the GC pause time from 210msec to 77msec!

BugBot [:suhaib / :marco/ :calixte]

Comment 23

•

2 years ago

The bug assignee didn't login in Bugzilla in the last 7 months.
:sdetar, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: anygregor → nobody

Flags: needinfo?(sdetar)

Jon Coppeard (:jonco)

Updated

•

2 years ago

Status: NEW → RESOLVED

Closed: 2 years ago

Flags: needinfo?(sdetar)

Resolution: --- → INACTIVE

You need to log in before you can comment on or make changes to this bug.