712853 - Regression on Spinning Balls

Reporter

Description

•

13 years ago

Several people have noticed a regression. The history is complicated; so far I've come up with these scores in nightly builds: 08/08 181 (regression?) 08/24 140 (regression?) 08/30 108 09/08 103 10/06 102 (slight regression??) 11/05 90 (regression?) 12/02 69 (most regression went away?) 12/06 128 12/12 126 (even more improvement?) 12/16 151 (huge regression) 12/17 65 12/18 80 (regression?) 12/19 44 Only the last two have 1-day ranges, the others need more bisection. Even after that, I still want to know the exact changesets.

Andrew McCreight [:mccr8]

Comment 1

•

13 years ago

We should also figure out what exactly the score means. I found it to be fairly reproducible, where "fairly" means that I got about the same score when running it twice on 8, 9, and 10. (not twice in a row, but intermixed with running various versions)

OS: Windows 7 → All

Hardware: x86_64 → All

Whiteboard: [Snappy]

Andrew McCreight [:mccr8]

Comment 2

•

13 years ago

FYI, when I looked into this before, every pause spike in the display at the bottom, which is presumably what drives the score, seemed to correspond to GCs.

Blocks: 704716

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 3

•

13 years ago

The score is related to the number of frames you can display in 60 seconds as well as the sum of squares of pauses that are longer than 20ms. Here's the relevant code: function updateStats(pause) { numberOfFrames++; if (pause > 20) { sumOfSquaredPauses += (pause - 20) * (pause - 20); } pauseDistribution[Math.floor(pause / 10)] |= 0; pauseDistribution[Math.floor(pause / 10)]++; } function renderStats() { ... msg.innerHTML = "Score " + Math.round(numberOfFrames * 1000 / sumOfSquaredPauses); ... }

Andrew McCreight [:mccr8]

Comment 4

•

13 years ago

When I compared 8, 9 and 10, the number of frames was about the same (decreasing slightly as version number went up), so the massive swings may be explained mostly by pauses.

David Mandelin [:dmandelin]

Reporter

Comment 5

•

13 years ago

Filtering out the noise, it looks like there was a regression on 8/26 from the compartment check landing, which then got removed in early Dec. The only other thing I found was that there seemed to be a regression with bug 711095. I think the regression was an increase in GC times. It may be just that there are more outliers, but I think the pause times might actually be a bit longer. From the description of bug 711095, it sounds like the patch might have (temporarily?) disabled some optimizations. I don't think it's a huge deal because that bug was to enable IGC, and IGC should help a lot with our pause times anyway. I think it would still be good to check on bug 711095, to find out in an increase in pause times was expected, and if not, whether there actually was an increase in pause times.

Igor Bukanov

Comment 6

•

13 years ago

(In reply to David Mandelin from comment #5) > The only other thing I found was that there seemed to be a regression with > bug 711095. That bug has changed the order of object traversal plus it increased the mark stack usage so it grew beyond our fixed space. That may well adversely influenced the result. It should be easy to fix.

Andrew McCreight [:mccr8]

Comment 7

•

13 years ago

Could we also look at using Peptest to do some automatic regression analysis? This could also be a good test to get them to add, as it is fairly simple.

Igor Bukanov

Comment 8

•

13 years ago

(In reply to David Mandelin from comment #0) > Several people have noticed a regression. The history is complicated; so far > I've come up with these scores in nightly builds: This is on Windows, right? I compiled a build with MOZ_GCTIMER support and this is what I got for GC timing and benchmark results for 16th and 17th of December builds: 2011-12-16: gc time full mark finalize 22.716000 22.113000 0.563000 Score 204 30.025000 26.545000 2.954000 0-10ms => 1 27.802000 26.815000 0.947000 10-20ms => 3131 28.144000 27.111000 0.961000 20-30ms => 464 27.740000 26.764000 0.937000 30-40ms => 1 28.218000 27.237000 0.933000 40-50ms => 10 27.903000 26.945000 0.920000 50-60ms => 2 27.638000 26.659000 0.932000 70-80ms => 1 23.676000 23.166000 0.460000 90-100ms => 1 31.486000 30.507000 0.942000 28.124000 27.137000 0.949000 28.100000 27.106000 0.945000 27.649000 26.559000 1.047000 2011-12-17: 21.210000 20.630000 0.539000 Score 165 27.237000 24.411000 2.387000 0-10ms => 2 26.256000 25.257000 0.959000 10-20ms => 3103 26.103000 24.947000 0.925000 20-30ms => 490 26.592000 25.598000 0.943000 30-40ms => 2 26.119000 24.998000 0.922000 40-50ms => 11 25.806000 24.852000 0.917000 70-80ms => 1 26.790000 25.807000 0.919000 110-120ms => 1 25.751000 24.795000 0.918000 26.329000 25.363000 0.915000 26.396000 25.411000 0.930000 25.779000 24.846000 0.881000 22.024000 21.341000 0.481000 This shows that the GC marking time improved. The only change in the area between builds is the bug 711095 in consistence with results in that bug for other benchmarks. So I think the bug 711095 cannot be responsible for the massive score drop in the results from the comment 0.

Igor Bukanov

Comment 9

•

13 years ago

I observed that the score result is extremely noisy. For example, for 3 separated browser runs like: firefox http://v8.googlecode.com/svn/branches/bleeding_edge/benchmarks/spinning-balls/index.html on a Linux box I got scores 204 145 315. Are the results on Windows more consistent? Another observation based on the comment 8 results is that the GC timing is rather consistent and stays at 25-30 ms during the benchmark run and can explain those 15 cases in the above results when the score went beyond 30 ms. However, that cannot explain couple of outliners that goes beyond 60 ms. So something else is contributing to those big pauses.

Igor Bukanov

Comment 10

•

13 years ago

I run the benchmark он quiet Linux box for 20000 seconds (about 6 hours) against MC tip. During the benchmark the GC time stayed within 22-30 ms range and there was about 3400 GC runs. Here is the resulting numbers: Score 789 0-10ms => 462 10-20ms => 1013517 20-30ms => 141176 30-40ms => 2337 40-50ms => 644 50-60ms => 9 60-70ms => 74 70-80ms => 6 110-120ms => 7 120-130ms => 2

Igor Bukanov

Comment 11

•

13 years ago

As yet another conformation that the regression is caused by the GC scheduling changes I see that with the patch from the bug 707162 noticeably improved the benchmark scores. In particular, it reduces the number of particularly bad pauses and increased the score to 300-450 from 150-300 during 60 seconds run.

Igor Bukanov

Comment 12

•

13 years ago

On Windows I have not observed any consistency in the benchmark results for 60 second runs. For example, for the nightly build from 2011-12-26 I got scores in the range 30-38 or over 25% variation. In all cases this was for the newly started browser. So I presume the benchmark can be used only as a very rough estimation for the GC pause time.

David Mandelin [:dmandelin]

Reporter

Comment 13

•

13 years ago

Thanks for checking out bug 711095, Igor. Yes, the results are very noisy. We could probably modify the scoring to make it more repeatable. For now, it looks like this bug, if it exists at all, is just a dup of 707162, so I'm closing it out.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → DUPLICATE

Bugzilla

Regression on Spinning Balls

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: dmandelin, Unassigned)

References

(
URL
)

Details

(Whiteboard: [Snappy])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13