Closed
Bug 1230571
Opened 9 years ago
Closed 9 years ago
4% Win7 tp5o private bytes regression on Mozilla-Inbound (v.45) on December 03, 2015 from push 65f787c9fd4e
Categories
(Core Graveyard :: Plug-ins, defect)
Core Graveyard
Plug-ins
Tracking
(e10s?, firefox45 affected)
RESOLVED
WONTFIX
People
(Reporter: jmaher, Assigned: dvander)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression][e10s])
Talos has detected a Firefox performance regression from your commit 65f787c9fd4e5ed7013c32f26ae3f6dfcea88bd8 in bug 1217665. We need you to address this regression.
This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=65f787c9fd4e5ed7013c32f26ae3f6dfcea88bd8&showAll=1
On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5
Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win32 -u none -t tp5o # add "mozharness: --spsProfile" to generate profile data
To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code
Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tp5o
Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch will be backed out! ***
Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Comment 1•9 years ago
|
||
Are we creating more devices than before?
Reporter | ||
Comment 2•9 years ago
|
||
this seems to affect windows 7 regular and e10s. e10s didn't register for this revision because all talos e10s tests were broken for about 10 pushes which included this one. I am collecting more data to see what other tests are affected.
:dvander, can you take the lead here on determining why this is happening and what we should do?
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(dvander)
Reporter | ||
Comment 3•9 years ago
|
||
not sure if I have the right person, the author of the patch is danderson@mozilla.com, lets get the needinfo correct! Sadly danderson@mozilla.com is not accepting needinfo requests.
Flags: needinfo?(dvander)
Reporter | ||
Comment 4•9 years ago
|
||
seems as though :dvander is the right person! I believe I had this same mistake 5 months ago. :dvander, can you please comment on this issue and maybe sort out your bugzilla/commit email address to avoid confusion in the future :)
Flags: needinfo?(dvander)
Reporter | ||
Comment 5•9 years ago
|
||
a compare view to show differences between this revision and the previous one:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=2e33a92988cd&newProject=mozilla-inbound&newRevision=65f787c9fd4e&showOnlyConfident=1
it shows the bytes, not sure about the other tests- keep in mind anything on here is a hint, not a certain improvement/regression:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=2e33a92988cd&newProject=mozilla-inbound&newRevision=65f787c9fd4e&showOnlyConfident=1
Reporter | ||
Comment 6•9 years ago
|
||
also talos page switch (tps) from the 'g2' job seems to have regressed by 5% linux64 e10s. It appears to be regressed on other platforms as well by looking at the graphs:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]&series=[mozilla-inbound,ba8ccda021618c02de072c68b0b56ba251f42abd,1]&series=[mozilla-inbound,2134698ffee49a2f33235c3640553083cf715b8a,1]&series=[mozilla-inbound,9aa54c581c4196b42d8c278e04a7503b0d840f0f,1]&series=[mozilla-inbound,e745d8cee342396f6bf95131c8972abdd48c75d6,1]
You can see that on the compare view for linux64/win7/winxp:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=2e33a92988cd&newProject=mozilla-inbound&newRevision=65f787c9fd4e&showOnlyConfident=1
the subtests show quite a list of regressions:
https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=2e33a92988cd&newProject=mozilla-inbound&newRevision=65f787c9fd4e&originalSignature=637a7f061cf5e18c4a14cf10f342b19a345f8e3c&newSignature=637a7f061cf5e18c4a14cf10f342b19a345f8e3c
Assignee | ||
Comment 7•9 years ago
|
||
Since these patches almost entirely added unused code, I'm guessing the changes to DidComposite are to blame. I'll do some try pushes today to confirm.
Assignee: nobody → dvander
Status: NEW → ASSIGNED
Flags: needinfo?(dvander)
Reporter | ||
Comment 8•9 years ago
|
||
cool. Let me know how I can help. if you need help analyzing the try pushes, etc.
Reporter | ||
Comment 9•9 years ago
|
||
hmm, ts_paint seems to be affect by 5% on windows 7 as well. I suspect this is the last regression to be associated which is good to know the full list :)
Assignee | ||
Comment 10•9 years ago
|
||
with 5c38d2f6fb93 backed out:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8925298f1f88
Reporter | ||
Comment 11•9 years ago
|
||
did some retriggers and looking at it compared to what is on inbound, this doesn't seem to move the needle:
https://treeherder.mozilla.org/perf.html#/graphs?series=[try,4d391f08c110db750d98aa3d09075646c47940ff,1]&series=[mozilla-inbound,4d391f08c110db750d98aa3d09075646c47940ff,1]&highlightedRevisions=8925298f1f88
Assignee | ||
Comment 12•9 years ago
|
||
Well I'm at a loss, esp. since this is purportedly cross-platform. Here's a (more minimal) talos run for each cset in the push.
part 11: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a43db2f5881
part 10: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0186489c76d9
part 8: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c692cf1a1b53
part 7: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f158949b03cc
part 6: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b0a3bd81778
part 5: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6b3d6f78cb8
part 4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=54a99bc3f7d8
part 3: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc9380655fbf
part 2: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3705f7a86929
part 1: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec802b56c04f
Reporter | ||
Comment 13•9 years ago
|
||
from digging into the try history, each of these built upon each other, so part 11 was backed out, then the next push kept 11 backed out and then backed out part 10, and so forth.
This means that we can see the cumulative effect of backing these out. From what I can tell, part 11 was backed out first and part 1 last (which makes sense). After doing retriggers on the jobs:
https://treeherder.mozilla.org/#/jobs?repo=try&author=danderson@mozilla.com&filter-searchStr=tp5%20win&fromchange=3b2c7b147679&selectedJob=14457629
it helps me believe that part 11 is the problem! Looking at the base revisions the try pushes are based off:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=5ba77225c957&filter-searchStr=Windows%207%20opt%20Talos%20Performance%20Talos%20tp%20T%28tp%29&selectedJob=2832899
I did some retriggers, looking at the baseline on m-c (win7 private bytes) we have ~214,000,000 bytes used.
Do this for the first push (backout part 11):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a43db2f5881
and we end up with values more in the ~205,000,000 range.
As a note, we only collect this private byte information on windows 7, so this could affect other platforms, we just don't collect the memory there.
I assume this information is useful, please let me know what else I can do to help out.
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #13)
>
> it helps me believe that part 11 is the problem! Looking at the base
> revisions the try pushes are based off:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> central&revision=5ba77225c957&filter-
> searchStr=Windows%207%20opt%20Talos%20Performance%20Talos%20tp%20T%28tp%29&se
> lectedJob=2832899
Thanks, that makes sense. That patch started creating a D3D11 device that was previously never created on versions of Windows prior to Windows 7 pre-SP1. I guess this regression is therefore expected given that the test does not run on other versions of Windows and would not regress on other versions of Windows.
You mentioned a Linux tps regression as well - does anything in the above try runs point at a likely culprit?
Flags: needinfo?(jmaher)
Reporter | ||
Comment 15•9 years ago
|
||
collecting more data, we care about tps on e10s- it is easy to see the regression on a graph:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-central,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]&series=[mozilla-inbound,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]&series=[fx-team,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]
I am looking for when data goes from a range of:
original: 100-110
new: 105-115
Reporter | ||
Comment 16•9 years ago
|
||
Most likely patch 11 caused the tps regression for linux64 e10s:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-central,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]&series=[mozilla-inbound,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1]&series=[fx-team,637a7f061cf5e18c4a14cf10f342b19a345f8e3c,1
prior we had no data points returning >110, but with patch 11 added we have a few points >110. As the range is overlapping, it does make it hard to know with certainty. Overall, it does looks like the culprit.
The question is- what can we do to fix this? Do we need to accept this as a fix is not realistic? Maybe there is some simple fix to reduce this?
Flags: needinfo?(jmaher) → needinfo?(dvander)
Assignee | ||
Comment 17•9 years ago
|
||
Joel, here are the new try runs:
baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f31fcfb4c2e8
part 11 backed out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4bb3a57d509
Flags: needinfo?(dvander)
Reporter | ||
Comment 18•9 years ago
|
||
this is now on Aurora, I just did some retriggers on the two try runs- lets see what the compare looks like in a half hour or so:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=f31fcfb4c2e8&newProject=try&newRevision=b4bb3a57d509&framework=1
Reporter | ||
Comment 19•9 years ago
|
||
oh, this fixes the private bytes. the 'tp5o Modified Page List Bytes opt' is showing a regression due to an outlier.
Reporter | ||
Comment 20•9 years ago
|
||
this is now on beta.
Updated•9 years ago
|
tracking-e10s:
--- → ?
Assignee | ||
Comment 22•9 years ago
|
||
In comment #14 I explained that this was expected.
Flags: needinfo?(milan)
Assignee | ||
Comment 23•9 years ago
|
||
On Windows 7 SP1+PU and higher we create a D3D11 content device on startup. Versions of Windows older than this did not, until this patch. Since Win7 Talos does not run on SP1+PU, it would see this change in behavior, whereas its other versions of Windows would see no change.
Making this lazily initialized is probably not worth the effort for the complexity and benefit, unless we think Win 7 pre-SP1 users will be very adversely affected.
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Updated•2 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•