Closed
Bug 1265480
Opened 9 years ago
Closed 9 years ago
7.03% tp5o responsiveness e10s (windowsxp) regression on push 76e8f6ad9ded (Thu Apr 14 2016)
Categories
(Core :: JavaScript: GC, defect, P3)
Core
JavaScript: GC
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: jmaher, Unassigned)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
Attachments
(1 file)
2.68 KB,
patch
|
Details | Diff | Splinter Review |
Talos has detected a Firefox performance regression from push 76e8f6ad9ded. As author of one of the patches included in that push, we need your help to address this regression.
This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=877
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5
Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win32 -u none -t tp5o-e10s[Windows XP] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data
(we suggest --rebuild 5 to be more confident in the results)
To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code
Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a tp5o --e10s
Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Thursday, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Updated•9 years ago
|
Component: Untriaged → JavaScript: GC
Product: Firefox → Core
Reporter | ||
Comment 1•9 years ago
|
||
I pushed to try to bisect this down (xp takes a while on try, this should be ready in 12 hours or so):
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&selectedJob=19623552&fromchange=0488ff56c381&tochange=a9ca7f760697
:terrence, this is your favorite test this cycle of firefox! as far as I know this is windows xp only, keep that in mind. Can you help make a decision here?
Flags: needinfo?(terrence)
Comment 2•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #1)
> :terrence, this is your favorite test this cycle of firefox! as far as I
> know this is windows xp only, keep that in mind. Can you help make a
> decision here?
What exactly does that mean? Does this test run only on WinXP or did it only regress on WinXP?
Flags: needinfo?(terrence)
Reporter | ||
Comment 3•9 years ago
|
||
sorry, it appears to only have regressed on windows xp.
Comment 4•9 years ago
|
||
Also, can you please (1) fix the broken link to https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code and (2) include the commit messages in the description? Making me have to |hg log -r| to figure what's even implicated is super annoying.
Reporter | ||
Comment 5•9 years ago
|
||
ack, thanks for the catch on the broken link, here it is:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code
as for the commit messages, I have a script that bisects, it would be nice to update it with commit messages- that is a good tip.
what I do, is look at the try pushes:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&selectedJob=19623552&fromchange=0488ff56c381&tochange=a9ca7f760697
then I match it up to:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=76e8f6ad9ded
Comment 6•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3)
> sorry, it appears to only have regressed on windows xp.
Ugh, it's the same binary on all windows platforms. Does WinXP run on different hardware?
Reporter | ||
Comment 7•9 years ago
|
||
winxp/win7 are the same binary and same hardware, but different OS. here is a graph of the 3 windows platforms:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,22b942243cce9b43b263b83c473af3256a138e58,1%5D&series=%5Bmozilla-inbound,bf41e17491286132034748a5f95035a0aaf50458,1%5D&series=%5Bmozilla-inbound,3ece050aac95021a51363205ef7747cce7a62b24,1%5D&zoom=1460335708874.058,1461001200000,1.5055762081784394,12.54646840148699
I have been working on bumping up the priority for the winxp jobs so we can get results faster.
here is a link to the machines we use in automation:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation
A few of the try jobs are starting to run
Reporter | ||
Comment 8•9 years ago
|
||
I have little data, but I am leaning towards:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b23a6286c125
Comment 9•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #8)
> I have little data, but I am leaning towards:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/b23a6286c125
I'd keep looking: that patch does no work without the later patches that landed.
Reporter | ||
Comment 10•9 years ago
|
||
ok, a lot of overlap in the data, with 6 data points each we have:
https://hg.mozilla.org/integration/mozilla-inbound/rev/86bd74d49e63
you can see this on graph server:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Btry,22b942243cce9b43b263b83c473af3256a138e58,1%5D&zoom=1461002328283.2976,1461002554000,4.9645945984684685,5.878229348728537&selected=%5Btry,22b942243cce9b43b263b83c473af3256a138e58,100949,19629212,1%5D
Updated•9 years ago
|
tracking-e10s:
--- → ?
![]() |
||
Updated•9 years ago
|
Priority: -- → P3
Comment 11•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #10)
> ok, a lot of overlap in the data, with 6 data points each we have:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/86bd74d49e63
Thanks for the testing. We're doing the same amount of work before and after, but split into smaller chunks. The assumption we're making is that the work-stealing queue is basically zero cost. Which is true everywhere else. The one glaring difference on WinXP is the software condition variable emulation. Looks like I need to land the optimizations for this in bug 956899.
Reporter | ||
Comment 12•9 years ago
|
||
great update, I see recent activity on bug 956899 including reviewing a patch- looking forward to it landing. This regression will roll into Aurora next week. We don't need to uplift the fix for this there, but it would be nice.
Comment 13•9 years ago
|
||
This is an absolutely vile hack. It simply disables sweeping and compaction parallelization on winxp. We might want to take it as a temporary measure until the new software CV is landed, assuming it actually gets us back the perf we lost.
Comment 14•9 years ago
|
||
Reporter | ||
Comment 15•9 years ago
|
||
oh no, I see build failures on that try push, not sure if it is a bad base- the error wasn't obvious to me looking at the patch.
Reporter | ||
Comment 16•9 years ago
|
||
this is on aurora now!
Comment 17•9 years ago
|
||
Updated•9 years ago
|
Version: unspecified → Trunk
Reporter | ||
Comment 18•9 years ago
|
||
I am checking in here to see if there is anything remaining to do here?
Flags: needinfo?(terrence)
Comment 19•9 years ago
|
||
I think this is seeing the same CV wakeup ordering issue that Nick saw in his CV landings. In particular, the score jumps back down to where it was a few days later and seems to be fairly bistable around the two scores.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(terrence)
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•