Talos has detected a Firefox performance regression from push 76e8f6ad9ded. As author of one of the patches included in that push, we need your help to address this regression. This is a list of all known regressions and improvements related to the push: https://treeherder.mozilla.org/perf.html#/alerts?id=877 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5 Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p win32 -u none -t tp5o-e10s[Windows XP] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data (we suggest --rebuild 5 to be more confident in the results) To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e [path]/firefox -a tp5o --e10s Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Thursday, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Product: Firefox → Core
I pushed to try to bisect this down (xp takes a while on try, this should be ready in 12 hours or so): https://email@example.com&selectedJob=19623552&fromchange=0488ff56c381&tochange=a9ca7f760697 :terrence, this is your favorite test this cycle of firefox! as far as I know this is windows xp only, keep that in mind. Can you help make a decision here?
(In reply to Joel Maher (:jmaher) from comment #1) > :terrence, this is your favorite test this cycle of firefox! as far as I > know this is windows xp only, keep that in mind. Can you help make a > decision here? What exactly does that mean? Does this test run only on WinXP or did it only regress on WinXP?
sorry, it appears to only have regressed on windows xp.
Also, can you please (1) fix the broken link to https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code and (2) include the commit messages in the description? Making me have to |hg log -r| to figure what's even implicated is super annoying.
ack, thanks for the catch on the broken link, here it is: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code as for the commit messages, I have a script that bisects, it would be nice to update it with commit messages- that is a good tip. what I do, is look at the try pushes: https://firstname.lastname@example.org&selectedJob=19623552&fromchange=0488ff56c381&tochange=a9ca7f760697 then I match it up to: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=76e8f6ad9ded
(In reply to Joel Maher (:jmaher) from comment #3) > sorry, it appears to only have regressed on windows xp. Ugh, it's the same binary on all windows platforms. Does WinXP run on different hardware?
winxp/win7 are the same binary and same hardware, but different OS. here is a graph of the 3 windows platforms: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,22b942243cce9b43b263b83c473af3256a138e58,1%5D&series=%5Bmozilla-inbound,bf41e17491286132034748a5f95035a0aaf50458,1%5D&series=%5Bmozilla-inbound,3ece050aac95021a51363205ef7747cce7a62b24,1%5D&zoom=1460335708874.058,1461001200000,1.5055762081784394,12.54646840148699 I have been working on bumping up the priority for the winxp jobs so we can get results faster. here is a link to the machines we use in automation: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation A few of the try jobs are starting to run
I have little data, but I am leaning towards: https://hg.mozilla.org/integration/mozilla-inbound/rev/b23a6286c125
(In reply to Joel Maher (:jmaher) from comment #8) > I have little data, but I am leaning towards: > https://hg.mozilla.org/integration/mozilla-inbound/rev/b23a6286c125 I'd keep looking: that patch does no work without the later patches that landed.
ok, a lot of overlap in the data, with 6 data points each we have: https://hg.mozilla.org/integration/mozilla-inbound/rev/86bd74d49e63 you can see this on graph server: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Btry,22b942243cce9b43b263b83c473af3256a138e58,1%5D&zoom=1461002328283.2976,1461002554000,4.9645945984684685,5.878229348728537&selected=%5Btry,22b942243cce9b43b263b83c473af3256a138e58,100949,19629212,1%5D
(In reply to Joel Maher (:jmaher) from comment #10) > ok, a lot of overlap in the data, with 6 data points each we have: > https://hg.mozilla.org/integration/mozilla-inbound/rev/86bd74d49e63 Thanks for the testing. We're doing the same amount of work before and after, but split into smaller chunks. The assumption we're making is that the work-stealing queue is basically zero cost. Which is true everywhere else. The one glaring difference on WinXP is the software condition variable emulation. Looks like I need to land the optimizations for this in bug 956899.
great update, I see recent activity on bug 956899 including reviewing a patch- looking forward to it landing. This regression will roll into Aurora next week. We don't need to uplift the fix for this there, but it would be nice.
Created attachment 8744023 [details] [diff] [review] work_around_winxp_threading_slowness-v0.diff This is an absolutely vile hack. It simply disables sweeping and compaction parallelization on winxp. We might want to take it as a temporary measure until the new software CV is landed, assuming it actually gets us back the perf we lost.
oh no, I see build failures on that try push, not sure if it is a bad base- the error wasn't obvious to me looking at the patch.
this is on aurora now!
I am checking in here to see if there is anything remaining to do here?
I think this is seeing the same CV wakeup ordering issue that Nick saw in his CV landings. In particular, the score jumps back down to where it was a few days later and seems to be fairly bistable around the two scores.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.