Closed Bug 704716 Opened 13 years ago Closed 11 years ago

investigate snappiness of v8's "spinning balls" benchmark

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: mccr8, Unassigned)

References

(Depends on 1 open bug, )

Details

(Whiteboard: [Snappy:P2])

With their release of an incremental collector, the Chrome people put up this JS-focused pause time benchmark.  It may be too JS-heavy to be a good benchmark for total browser responsiveness, but it is something that is easy for us to investigate, and is pretty sexy looking so it could catch fire in the popular imagination.  We should look into our performance here.

Some things off the top of my head that might be worth looking into:

- How does this benchmark relate to other measurements for responsiveness we're working on?  EG how good of a test of regular browser responsiveness is it, is our telemetry capturing things this finds?  Should we use this as an automated regression test?
- What is causing long pauses in this benchmark?  Is it just GC?  Other JS things?  Other browser things?
- Gregor has already identified one possible issue with running this benchmark with multiple tabs open, in Bug 704577, which brings up the interesting angle of looking into seeing what happens to the test when running in a more realistic environment than "fire up browser, run test".
I ran this just now, looking at the GC/CC results in the error console.  During the run, there were 18 GCs, all compartmental, and 0 CCs.  According to the error console, the "total" field (which I assume is the total time in ms?) was about 70ms for each, except for one that was 29ms and one that was 45ms.

Here are the results the benchmark gave:

8000/320 14(max = 196) ms 4090 frames

Score 31
0-10ms	=> 31
10-20ms	=> 3989
20-30ms	=> 36
30-40ms	=> 14
40-50ms	=> 1
50-60ms	=> 1
60-70ms	=> 1
90-100ms => 13
100-110ms => 3
190-200ms => 1

This doesn't seem to agree with the GC times generated in the error console.  Maybe the 90-100ms bucket includes the longer GCs?  Even if we attribute all of the 90ms+ events to the longer GCs, there's still one event unaccounted for.  Not really super useful, but it is a start.
Marking this P2 only because it is a public benchmark.  From my measurement in Comment 1, it looks like this is probably mostly just GC, which will be addressed by incremental GC, but we should remeasure once we have some good pause profiling tools up and running.
Whiteboard: [Snappy] → [Snappy:P2]
I used to get a score of 60 on my MBP but now I only get around 16.
Depends on: 712853
I ran this on a release version of Chrome on my computer and I get a score of 963!

8000/320 21(max = 73) ms 3576 frames
Score 963

0-10ms	 => 3
10-20ms	 => 2777
20-30ms	 => 795
70-80ms	 => 1


Compared to m-c tip with a local build with incremental GC:

8000/320 19(max = 43) ms 2842 frames

Score 70
0-10ms	=> 7
10-20ms	=> 1202
20-30ms	=> 1556
30-40ms	=> 76
40-50ms	=> 1

Compared to my old numbers, it completely eliminates those >50ms spikes, but it looks like about half of the little pauses moved up a bucket.
I have very different results:

Chrome Canary: 19.0.1046.0

8000/320 19(max = 41) ms 3601 frames
Score 1635

0-10ms => 10
10-20ms => 3068
20-30ms => 516
30-40ms => 5
40-50ms => 2

In some iterations (60 sec each), Chrome reached to 16000, but in the most of them, about 1000-2000 points.


Firefox Without IGC (javascript.options.mem.gc_incremental -> false):

8000/320 16(max = 81) ms 3601 frames
Score 156

0-10ms => 3
10-20ms => 3578
20-30ms => 8
30-40ms => 1
50-60ms => 2
60-70ms => 6
70-80ms => 2
80-90ms => 1


With IGC (javascript.options.mem.gc_incremental -> true):

8000/320 15(max = 28) ms 3701 frames
Score 4018

0-10ms => 6
10-20ms => 3614
20-30ms => 81

Very consistent in all iterations, about 3000-4000 points all the time


Its true that with opening tabs, score its much worse, but on FF and Chrome
32bit build

8000/320 18(max = 30) ms 3479 frames

Score 4472
0-10ms	=> 8
10-20ms	=> 3393
20-30ms	=> 77
30-40ms	=> 1


64bit build

8000/320 17(max = 32) ms 3515 frames

Score 2028
0-10ms	=> 8
10-20ms	=> 3412
20-30ms	=> 87
30-40ms	=> 8
I just tested FF 14 and current nightly FF 16.0a1 (2012-06-27)
The number of total frames reduced by around 30% and the animation looks a lot slower now.

FF14:
8000/320 16(max = 83) ms 3476 frames
Score 79
0-10ms => 84
10-20ms => 3189
20-30ms => 171
30-40ms => 15
40-50ms => 1
50-60ms => 2
60-70ms => 6
70-80ms => 6
80-90ms => 2


FF 16.0a1 (2012-06-27)
8000/320 18(max = 63) ms 2615 frames
Score 18
0-10ms	=> 112
10-20ms	=> 1143
20-30ms	=> 706
30-40ms	=> 635
40-50ms	=> 16
50-60ms	=> 2
110-120ms => 1
not for me.

FF x64 16.0a1 (2012-06-27)
8000/320 13(max = 57) ms 3753 frames

Score 138
0-10ms	=> 60
10-20ms	=> 2536
20-30ms	=> 1123
30-40ms	=> 16
40-50ms	=> 6
50-60ms	=> 12
Was this on your netbook, Gregor? I can't reproduce. My nightly (from 6/22) does better than FF14.
(In reply to Bill McCloskey (:billm) from comment #9)
> Was this on your netbook, Gregor? I can't reproduce. My nightly (from 6/22)
> does better than FF14.

No I am using my MBP and the animation looks really slow. I don't know what's going on.
Can you try with incremental GC disabled?

It might also be a graphics issue. I made a version of the benchmark that doesn't draw the balls, and I remember it running much faster.
(In reply to Bill McCloskey (:billm) from comment #11)
> Can you try with incremental GC disabled?
> 
> It might also be a graphics issue. I made a version of the benchmark that
> doesn't draw the balls, and I remember it running much faster.

Yeah it doesn't look like a GC problem. The whole animation looks much slower.
I have a strange behavior

I'm testing spinning-balls without ball render) and clean profile:

FF16-18 -> Work perfectly, about 3000-5000 points
Nighly -> 100-300 points


the funny thing is that if I open for example two Firefox instances (with profile manager) and run spinning-ball in any of them, then work fine, obtaining acceptable scores 700-4000. If I close any of the FF instances, them score its much much worse (100-300 points again)
This benchmark didn't really set the world on fire, so I think we don't need a bug sitting around for it.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Resolution: FIXED → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.