2 - 15.37% almost all talos tests (windows7-32, windows8-64, windowsxp) regression on push 559a80645f20 (Thu Mar 24 2016)

RESOLVED WONTFIX

Status

defect
RESOLVED WONTFIX
3 years ago
Last year

People

(Reporter: jmaher, Unassigned)

Tracking

({perf, regression})

49 Branch
Dependency tree / graph

Firefox Tracking Flags

(firefox48 disabled, firefox49+ wontfix)

Details

(Whiteboard: [talos_regression])

Talos has detected a Firefox performance regression from push 559a80645f20. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=600

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#a11y
https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_paint
https://wiki.mozilla.org/Buildbot/Talos/Tests#tpaint
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5
https://wiki.mozilla.org/Buildbot/Talos/Tests#Dromaeo_Tests
https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvg-opacity
https://wiki.mozilla.org/Buildbot/Talos/Tests#TART.2FCART
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5o_scroll
https://wiki.mozilla.org/Buildbot/Talos/Tests#installer size
https://wiki.mozilla.org/Buildbot/Talos/Tests#CanvasMark
https://wiki.mozilla.org/Buildbot/Talos/Tests#tabpaint
https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvgx
https://wiki.mozilla.org/Buildbot/Talos/Tests#xperf
https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP
https://wiki.mozilla.org/Buildbot/Talos/Tests#tps
https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore

Reproducing and debugging the regression:

If you would like to re-run this Talos test on a potential fix, use try with the following syntax:

try: -b o -p win32,win64 -u none -t all[Windows 7,Windows 8,Windows XP] --rebuild 5  # add "mozharness: --spsProfile" to generate profile data

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a a11yr:ts_paint:tpaint:tp5o:dromaeo_css:tsvgr_opacity:tart:cart:tp5o_scroll:tcanvasmark:tabpaint:tsvgx:tp5n:damp:tps:sessionrestore_no_auto_restore:sessionrestore

(add --e10s to run tests in e10s mode)

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Tuesday, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter

Comment 1

3 years ago
:gps, this is showing quite a difference from what we saw on try server.  I believe we should back this out and work on figuring out why our perf regressed so much on just about every test.  Here is a comparison of the vs2015 push vs the previous one:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=8d59e338a5bd&newProject=mozilla-inbound&newRevision=559a80645f20&framework=1

If it was just the 3 tests that we saw on try regressing, I would be more comfortable with it.  As Windows is our top platform for desktop users, making an across the board perf hit on what we ship (pgo, we really don't see regressions on opt!) doesn't seem like a good win.

Possibly there are other thoughts or ideas on how to reduce/resolve these regressions?
Flags: needinfo?(gps)
Reporter

Updated

3 years ago
Component: Untriaged → Build Config
Product: Firefox → Core
The difference between Try and non-Try results appears alarming, I agree.

Percentage wise, the biggest regression is in a11yr, with 5.5%-17% decrease. Looking into this deeper, I think something is wonky with PGO and this test. The base for a11yr opt windows7-32 is 747.78 ± 0.92%. Base for PGO is 393.22 ± 0.40%. That's nearly a 2x difference. Percentage wise, that's much larger than the benefit we typically see from PGO. There is something fishy going on.

cart is showing a <5% regression. I /think/ in bug 1254767 you were only reporting regressions >5%, so cart didn't make the list there. I suspect it was always regressing, but just under the reporting threshold in the try runs. Ditto for some other tests like tps which also didn't regress by more than 2-3%.

As for backing out VS2015 because of perf, that's not my call and I'm not sure whose it is. Perhaps we should escalate to RelMan? It's worth pointing out that in all cases PGO results are better than non-PGO results. It's just that VS2015u1 PGO isn't as good as VS2013 PGO, it appears.

Apparently there are a number of improvements coming in VS2015u2 (which is currently in RC). We could do a try push with VS2015u2 and compare against VS2015u1. If Update 2 claws back the perf, perhaps we can live with Update 1 for a few weeks until Update 2 is officially released. We are already tentatively planning on aggressively adopting Update 2 (see bug 1259782). It would be a difficult pill to swallow to back out VS2015u1 and wait for Update 2 final, as I feel having central/Nightly on VS2015u1 is valuable.

If we spend the engineering time to investigate why there are PGO regressions, we could report them to Microsoft and see if they can improve PGO performance in future Visual Studio releases. Google has had success getting Microsoft to listen. If we adopt and test Visual Studio releases quicker, we can catch these regressions and hopefully get them fixed sooner. Perhaps we should be standing up automation that periodically builds with the next pre-release versions of Microsoft's toolchains. We can certainly do that with VS2015u2RC right now...
Flags: needinfo?(gps)
I was able to build with Visual Studio Update 2 RC on Try. Here are PGO results comparing VS2015u1 and VS2015u2 RC:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=56540b2bbe8a&newProject=try&newRevision=df78bff64a89&framework=1&showOnlyImportant=0

Somewhat surprisingly, there were only 2 changes of statistical significance. There goes my theory that VS2015u2 would claw back some performance losses :/
Reporter

Comment 4

3 years ago
Thanks for trying out the vs2015u2 release, it is too bad that didn't change our pgo performance.  If we do think we can get somewhere in the next few weeks with other options, then I wouldn't see harm in leaving this in.

Regarding pgo vs opt- We only ship pgo (at least for Windows), so the biggest concern for performance is the pgo numbers of vs2013 vs pgo numbers of vs2015.  I agree that pgo numbers are noticeably better than non pgo- that is a good thing!

In the past we have had pgo regressions which pop up related to random code added/removed- I suspect this is an outcome of the overall pgo process.  Are there other flags/options that we could use?  Should we talk to Microsoft's compiler team and see if they have suggestions?  Maybe we could try different actions while running the browser to generate a different profile.
I did something different with yesterday's Try experiment. First, I triggered multiple Talos jobs from the initial build job. This showed 2 "important" changes (>2%). Later, I triggered whole new *build* jobs. These in turn scheduled Talos jobs. So, instead of Talos jobs derived from a single build job, we have ~6 Talos jobs derived from a single build job and another 10-11 Talos jobs derived from separate build jobs. The end result is the "important" changes disappeared!

What I suspect is happening is that variations between each PGO profile run manifest in statistically different performance characteristics of the produced binary and these manifest in differences in Talos results.

I think I'll conduct the same experiment with VS2013. This should introduce more variance in the VS2013 base numbers and should hopefully paint a clearer picture of what the change in behavior between VS2013 and VS2015 actually is.
Reporter

Comment 6

3 years ago
thanks for pointing that out- I would imagine we would see variance per build, maybe this will help answer some of the unknowns.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=66a0b196cbe7&newProject=try&newRevision=56540b2bbe8a&framework=1&showOnlyImportant=0 compares VS2013 and VS2015u1 using multiple PGO builds. The results appear more or less consistent with the regressions reported with a single PGO build :/ Although I didn't dig into that data too deeply.
Duplicate of this bug: 1259919
Reporter

Updated

3 years ago
Blocks: 1267562
No longer blocks: 1256666
Version: unspecified → 49 Branch
It sounds like we have a lot of good reasons to make these changes but aren't sure what impact it may have on Firefox users. So, good for developers to have a faster build time and better toolchain, but I worry about a 15% perf hit across the board for users, so it's a tradeoff. Tracking for 49 for now.

Comment 10

3 years ago
There are a few options on what we can investigate/do here that I can think of, in no particular order.

1. Give up attempting to switch to Visual Studio 2015 and retry with a newer version of Visual Studio later.
2. Verify that we're actually PGOing.  I remember something about a bug at some point which caused us to actually not PGO stuff as part of a PGO build which manifested itself as Talos regressions.  For example using pgomgr to verify that we're collecting runtime statistics: <https://msdn.microsoft.com/en-us/library/2kw46d8w.aspx>
3. Enhance the set of things that we run in the profiling phase of the PGO build in the hopes of getting the PGO compiler to optimize more things.
4. Take the Talos regressions and move on.
5. Investigate other large projects such as Chromium to see if they have encountered similar issues when switching to 2015.
6. Try to investigate what's going on and report to Microsoft, similar to <https://randomascii.wordpress.com/2016/03/24/compiler-bugs-found-when-porting-chromium-to-vc-2015/> for example.
I'm not sure if we need to decide this and act before the merge (next Monday) or if we can let 49 go to aurora and decide and change our approach then.  Let's meet and discuss it today.
We followed up by looking at other ways to test performance and nothing else showed a performance hit. So I don't think this needs to block vs2015 builds for 49. We should keep an eye out for bugs in aurora and beta. I'll mention that to QA and in the channel meeting.   However, I'm not sure what that means for our tests. I'll leave that to Joel and the perf/Talos team. 

For external benchmarking:  arewefastyet looking at mostly JS but also DOM, WebGL & Web Audio showed no noticeable slowdown for vs2015. Jukka tested for the openwebgames project: Overall it looks like that most demos have a tiny win for vs2015. 

From manual testing by Andrei and Engineering QA:
The only difference we saw in terms of performance, following the comparison of these builds, is related to page scrolling and switching between tabs, both in favour of the vs2015 build. Loading pages, opening new windows, new tabs, scrolling pages, switching between tabs, etc. We've also benchmarked these builds using Dromaeo and CanvasMark.
Is this bug a wontfix at this point?
I would say wontfix, I will let :gps make the final call.
Flags: needinfo?(gps)
Yeah, it's a wontfix given comment #12.
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(gps)
Resolution: --- → WONTFIX

Updated

Last year
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.