Closed Bug 918862 Opened 11 years ago Closed 11 years ago

Talos TART regressions from Spidermonkey deadlock fixes

Categories

(Core :: JavaScript Engine, defect)

26 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox26 --- affected
firefox27 --- affected

People

(Reporter: mbrubeck, Assigned: bhackett1024)

References

Details

(Keywords: perf, regression)

It looks like this changeset (Bug 916753, Bug 916531, Bug 916504) probably regressed the TART benchmark by about 3%-9% on Windows and Linux:
http://hg.mozilla.org/mozilla-central/rev/4bcf9b261b94

These regressions were seen when the patch first landed on inbound, and also when it was backported to aurora; the changeset above is the only one common to both regression ranges:
https://groups.google.com/d/topic/mozilla.dev.tree-management/0QRQ-VSAlt0/discussion
https://groups.google.com/d/topic/mozilla.dev.tree-management/jvJQv6dvdGI/discussion
Brian - should we backout the landings mentioned in comment 0 or do you have another idea?
Assignee: general → bhackett1024
I don't think we should just back these out; in a tradeoff between stability and a small perf win, we want stability.  We should fix the perf only if it's possible to do it without reintroducing the deadlocks.
I don't know why that changeset would regress anything.  To try to confirm I did a couple try pushes:

With bug 916753 backed out: https://tbpl.mozilla.org/?tree=Try&rev=60545e8645ff
With all three bugs backed out: https://tbpl.mozilla.org/?tree=Try&rev=22cf26346c80

The datazilla links on the try pushes are all broken, and I don't know how else to see what the tart numbers are.
You can see the TART result in the footer of TBPL when you click on one of the "s" jobs.  You can use these compare-talos links to compare each of the Try pushes to their parent revision (72681e08a35d):

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=72681e08a35d&newRev=60545e8645ff&submit=true
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=72681e08a35d&newRev=22cf26346c80&submit=true

I retriggered each of the TART jobs so we can get more datapoints.  If you reload the compare-talos links after the new runs are finished (in about 20 minutes), then they will have more useful TART data.
Backing out bug 916753 alone had no significant effect, but backing out all three bugs improved TART results by about 2-4%.
(In reply to Matt Brubeck (:mbrubeck) from comment #4)
> You can see the TART result in the footer of TBPL when you click on one of
> the "s" jobs.  You can use these compare-talos links to compare each of the
> Try pushes to their parent revision (72681e08a35d):
> 
> http://perf.snarkfest.net/compare-talos/index.
> html?oldRevs=72681e08a35d&newRev=60545e8645ff&submit=true
> http://perf.snarkfest.net/compare-talos/index.
> html?oldRevs=72681e08a35d&newRev=22cf26346c80&submit=true

MattN's got a slightly updated version of compare talos, which shows a bit more info (on the details page - esp. old/new absolute values, with some specific prettyfying of TART results).

Here are the same comparisons on Matt's page:

Backing out bug 916753 only:
http://compare-talos.mattn.ca/?oldRevs=72681e08a35d&newRev=60545e8645ff&server=graphs.mozilla.org&submit=true

Backing out all 3 bugs:
http://compare-talos.mattn.ca/?oldRevs=72681e08a35d&newRev=22cf26346c80&server=graphs.mozilla.org&submit=true

If we look at the highest TART regression on the second link, which is for Ubuntu 64 which shows as 4% improvement (but is our suspected regression since "new" is a backout of all 3 bugs suspected in causing the regression), at the "details" page:

http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29702891,29818013,29818173&newTestIds=29806647,29817955,29818855&testName=tart&osName=Ubuntu%2064&server=graphs.mozilla.org

We see that most changes are at the ".error" subtests, and are of relatively small absolute magnitude. These .error values are the difference between the designated animation duration (250ms) to the actual measured animation duration.

Since all these .error changes are well under 1ms, it makes them very much negligible. Even if they were 5ms I would still consider them not meaningful enough.

The details of the other platform in this comparison (ubuntu 32, win xp/7/8) have similar patterns.

Bottom line: if these comparisons indeed represent the regression which we suspect (do they?), then they're absolutely meaningless and they should not block landing of these bugs.
(In reply to Avi Halachmi (:avih) from comment #6)
> Bottom line: if these comparisons indeed represent the regression which we
> suspect (do they?), then they're absolutely meaningless and they should not
> block landing of these bugs.

WRT TART only. I didn't inspect comparisons of other talos tests.
WONTFIX?
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(mbrubeck)
Resolution: --- → FIXED
Oops, didn't mean to resolve it myself...
Sounds good.  Thanks for the additional investigation!
Flags: needinfo?(mbrubeck)
Resolution: FIXED → WONTFIX
You need to log in before you can comment on or make changes to this bug.