I'm told we don't GC in the V8 benchmarks when we run them once (or when we run in the SunSpider harness?) but it really does look like we are hitting the GC in the V8 harness. We need to know what's going on here. Specifically, - in both shell and browser on x86 - for each V8-V6 benchmark, measure: - t_raw, the time taken to run one iteration all by itself (with no GC runs) - t, the average time taken to run one iteration - n, the number of iterations run in one second (or whatever the limit is) in the V8 harness. (Unless there's even more going on, we should get that 1000/t = n, and n*100 = the benchmark score.) - n_gc, the number of GCs run in one second in the harness - t_gc, the average time spent by a GC run (should get t ~= t_raw + t_gc) (Probably want min/max or some other variance parameters here.) From that, we can compute some other things of interest: - What our score would be with zero GC cost - What our score would be with no mutator cost (which tells us the max score the current GC will allow us to get) - Whether browser GC helps/hurts our score compared to shell GC, and how much It would also be great to get the same stats for V8, so we can see how much of the gap is due to mutator perf and how much due to GC.
Created attachment 482927 [details] preliminary data Here's what I got from the jsgcstats code when running individual V8 benchmarks inside the V8 test harness. As I understand this data, we're doing 22 GCs inside earley-boyer. The total time spent inside GC for earley-boyer appears to be 823ms. (I think all these numbers have to be divided by 2.67, which is my processor speed.) The whole benchmark runs for 4197ms, so that's about 20% in GC. We spend 303ms doing 7 GCs in raytrace, which runs for a total of 3024ms. So that's about 10%.
Oops. Ignore the extra crap at the end of the attachment.
> (should get t ~= t_raw + t_gc) Or rather t ~= t_raw + t_gc * (n_gc / n), right?
(In reply to comment #3) > > (should get t ~= t_raw + t_gc) > > Or rather t ~= t_raw + t_gc * (n_gc / n), right? Yes. I got my |n|s mixed up. :-)
The memory distribution changed a lot after Brians patch. In the shell we allocate about 120 chunks instead of 90 from before the patch for the v8 suite. I see a very strange pattern in the browser. In the browser we allocate 300 chunks instead of 200 now. This means 100MB more GC heap. We have to see where all the extra overhead comes from and if we reduce malloc by an equal amount. But this also means that all the shell tuning is completely irrelevant for the browser. Beside that, we definitely have to adapt the GC parameter to the new settings. My idea is to land bug 593729 pretty soon and go from there. BTW: Why do you think we should measure single benchmark runs?
(In reply to comment #5) > BTW: Why do you think we should measure single benchmark runs? I just want all the detail about the data, so we fully understand what's happening. It's kind of hard to understand what's going on in a pile of data that is already aggregated over many runs.
I was comparing the time both browsers spend in GC. The chrome profiling page says for a v8 benchmark run they spend 10% in GC or about 2.15sec. I measured with the current tip (+parallel marking patch) and our 60 GCs during the benchmark suite add up to about 615ms. This would mean we spend about 3% in GC.