Closed
Bug 712853
Opened 13 years ago
Closed 13 years ago
Regression on Spinning Balls
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
DUPLICATE
of bug 707162
People
(Reporter: dmandelin, Unassigned)
References
()
Details
(Whiteboard: [Snappy])
Several people have noticed a regression. The history is complicated; so far I've come up with these scores in nightly builds:
08/08 181
(regression?)
08/24 140
(regression?)
08/30 108
09/08 103
10/06 102
(slight regression??)
11/05 90
(regression?)
12/02 69
(most regression went away?)
12/06 128
12/12 126
(even more improvement?)
12/16 151
(huge regression)
12/17 65
12/18 80
(regression?)
12/19 44
Only the last two have 1-day ranges, the others need more bisection. Even after that, I still want to know the exact changesets.
Comment 1•13 years ago
|
||
We should also figure out what exactly the score means. I found it to be fairly reproducible, where "fairly" means that I got about the same score when running it twice on 8, 9, and 10. (not twice in a row, but intermixed with running various versions)
OS: Windows 7 → All
Hardware: x86_64 → All
Whiteboard: [Snappy]
Comment 2•13 years ago
|
||
FYI, when I looked into this before, every pause spike in the display at the bottom, which is presumably what drives the score, seemed to correspond to GCs.
Blocks: 704716
The score is related to the number of frames you can display in 60 seconds as well as the sum of squares of pauses that are longer than 20ms. Here's the relevant code:
function updateStats(pause) {
numberOfFrames++;
if (pause > 20) {
sumOfSquaredPauses += (pause - 20) * (pause - 20);
}
pauseDistribution[Math.floor(pause / 10)] |= 0;
pauseDistribution[Math.floor(pause / 10)]++;
}
function renderStats() {
...
msg.innerHTML = "Score " +
Math.round(numberOfFrames * 1000 / sumOfSquaredPauses);
...
}
Comment 4•13 years ago
|
||
When I compared 8, 9 and 10, the number of frames was about the same (decreasing slightly as version number went up), so the massive swings may be explained mostly by pauses.
Reporter | ||
Comment 5•13 years ago
|
||
Filtering out the noise, it looks like there was a regression on 8/26 from the compartment check landing, which then got removed in early Dec.
The only other thing I found was that there seemed to be a regression with bug 711095. I think the regression was an increase in GC times. It may be just that there are more outliers, but I think the pause times might actually be a bit longer. From the description of bug 711095, it sounds like the patch might have (temporarily?) disabled some optimizations. I don't think it's a huge deal because that bug was to enable IGC, and IGC should help a lot with our pause times anyway.
I think it would still be good to check on bug 711095, to find out in an increase in pause times was expected, and if not, whether there actually was an increase in pause times.
Comment 6•13 years ago
|
||
(In reply to David Mandelin from comment #5)
> The only other thing I found was that there seemed to be a regression with
> bug 711095.
That bug has changed the order of object traversal plus it increased the mark stack usage so it grew beyond our fixed space. That may well adversely influenced the result. It should be easy to fix.
Comment 7•13 years ago
|
||
Could we also look at using Peptest to do some automatic regression analysis? This could also be a good test to get them to add, as it is fairly simple.
Comment 8•13 years ago
|
||
(In reply to David Mandelin from comment #0)
> Several people have noticed a regression. The history is complicated; so far
> I've come up with these scores in nightly builds:
This is on Windows, right?
I compiled a build with MOZ_GCTIMER support and this is what I got for GC timing and benchmark results for 16th and 17th of December builds:
2011-12-16: gc time
full mark finalize
22.716000 22.113000 0.563000
Score 204 30.025000 26.545000 2.954000
0-10ms => 1 27.802000 26.815000 0.947000
10-20ms => 3131 28.144000 27.111000 0.961000
20-30ms => 464 27.740000 26.764000 0.937000
30-40ms => 1 28.218000 27.237000 0.933000
40-50ms => 10 27.903000 26.945000 0.920000
50-60ms => 2 27.638000 26.659000 0.932000
70-80ms => 1 23.676000 23.166000 0.460000
90-100ms => 1 31.486000 30.507000 0.942000
28.124000 27.137000 0.949000
28.100000 27.106000 0.945000
27.649000 26.559000 1.047000
2011-12-17:
21.210000 20.630000 0.539000
Score 165 27.237000 24.411000 2.387000
0-10ms => 2 26.256000 25.257000 0.959000
10-20ms => 3103 26.103000 24.947000 0.925000
20-30ms => 490 26.592000 25.598000 0.943000
30-40ms => 2 26.119000 24.998000 0.922000
40-50ms => 11 25.806000 24.852000 0.917000
70-80ms => 1 26.790000 25.807000 0.919000
110-120ms => 1 25.751000 24.795000 0.918000
26.329000 25.363000 0.915000
26.396000 25.411000 0.930000
25.779000 24.846000 0.881000
22.024000 21.341000 0.481000
This shows that the GC marking time improved. The only change in the area between builds is the bug 711095 in consistence with results in that bug for other benchmarks. So I think the bug 711095 cannot be responsible for the massive score drop in the results from the comment 0.
Comment 9•13 years ago
|
||
I observed that the score result is extremely noisy. For example, for 3 separated browser runs like:
firefox http://v8.googlecode.com/svn/branches/bleeding_edge/benchmarks/spinning-balls/index.html
on a Linux box I got scores 204 145 315. Are the results on Windows more consistent?
Another observation based on the comment 8 results is that the GC timing is rather consistent and stays at 25-30 ms during the benchmark run and can explain those 15 cases in the above results when the score went beyond 30 ms. However, that cannot explain couple of outliners that goes beyond 60 ms.
So something else is contributing to those big pauses.
Comment 10•13 years ago
|
||
I run the benchmark он quiet Linux box for 20000 seconds (about 6 hours) against MC tip. During the benchmark the GC time stayed within 22-30 ms range and there was about 3400 GC runs. Here is the resulting numbers:
Score 789
0-10ms => 462
10-20ms => 1013517
20-30ms => 141176
30-40ms => 2337
40-50ms => 644
50-60ms => 9
60-70ms => 74
70-80ms => 6
110-120ms => 7
120-130ms => 2
Comment 11•13 years ago
|
||
As yet another conformation that the regression is caused by the GC scheduling changes I see that with the patch from the bug 707162 noticeably improved the benchmark scores. In particular, it reduces the number of particularly bad pauses and increased the score to 300-450 from 150-300 during 60 seconds run.
Comment 12•13 years ago
|
||
On Windows I have not observed any consistency in the benchmark results for 60 second runs. For example, for the nightly build from 2011-12-26 I got scores in the range 30-38 or over 25% variation. In all cases this was for the newly started browser.
So I presume the benchmark can be used only as a very rough estimation for the GC pause time.
Reporter | ||
Comment 13•13 years ago
|
||
Thanks for checking out bug 711095, Igor. Yes, the results are very noisy. We could probably modify the scoring to make it more repeatable.
For now, it looks like this bug, if it exists at all, is just a dup of 707162, so I'm closing it out.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•