Open
Bug 1426743
Opened 6 years ago
Updated 2 years ago
3.68% cpstartup content-process-startup (windows10-64) regression on push 40eb00c21e7c776d4781c28e88a9b0917b4396ce (Thu Dec 21 2017)
Categories
(Core :: DOM: Content Processes, defect, P3)
Tracking
()
NEW
People
(Reporter: jmaher, Unassigned)
References
Details
(Keywords: perf, regression, talos-regression)
Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=40eb00c21e7c776d4781c28e88a9b0917b4396ce As author of one of the patches included in that push, we need your help to address this regression. Regressions: 4% cpstartup content-process-startup windows10-64 pgo e10s 278.58 -> 288.83 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=11042 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•6 years ago
|
||
:sylvestre, I see you landed all the patches in this commit: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=51a3a1b8513ce4bf424dce4d14cb2465529de4f4&tochange=40eb00c21e7c776d4781c28e88a9b0917b4396ce I understand you might be out for the holidays (the fun of landing code the day before you leave for a long break), so I am fine with you fixing this the first week in January.
Flags: needinfo?(sledru)
Comment 2•6 years ago
|
||
This is just renaming our application from Nightly to Firefox Nightly. I wonder how this could regress the startup. Are you sure there isn't something else?
Reporter | ||
Comment 3•6 years ago
|
||
this change shows up on pgo and opt, and I have a few extra retriggers to show before/after, the data holds true. I don't see other regressions, just the cpstartup (this was added in bug 1336389 by :gabor who is inactive on bugzilla now), possibly ping :mconley to learn more about the test?
Comment 4•6 years ago
|
||
Mike, do you have a clue why the renaming from Nightly to Firefox Nightly regress the performances by 3.7%? I had to land some fixes in parallel but they are just build & packaging plumbing. List of patches is available in comment #1. (this can wait for next year :)
Flags: needinfo?(sledru) → needinfo?(mconley)
Comment 5•6 years ago
|
||
The ts_paint regressions bellow were also noticed for this bug: Regressions: 2% ts_paint windows10-64 pgo e10s 490.67 -> 498.25 1% ts_paint windows10-64 opt e10s 572.42 -> 580.25
Comment 6•6 years ago
|
||
That's pretty mysterious. It's also interesting to see that we're losing roughly 8-10ms on all of these regressions. I don't have a great explanation or theory for this. The only thing I can recommend is to extract some profiles to see if we can suss out what is going on here. Hey igoldan - is the mechanism for automatically creating performance profiles for the before and after case set up yet? Or should I do this myself manually for now?
Flags: needinfo?(mconley) → needinfo?(igoldan)
Comment 7•6 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews and needinfos from comment #6) > Hey igoldan - is the mechanism for automatically creating performance > profiles for the before and after case set up yet? Or should I do this > myself manually for now? I don't think we added this mechanism yet. Robert, have you managed to look over this feature?
Flags: needinfo?(igoldan) → needinfo?(rwood)
Comment 8•6 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7) > (In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews and > needinfos from comment #6) > > Hey igoldan - is the mechanism for automatically creating performance > > profiles for the before and after case set up yet? Or should I do this > > myself manually for now? > I don't think we added this mechanism yet. > > Robert, have you managed to look over this feature? No that's correct we don't have that automation. Only the --geckoProfile flag for profiling the current talos run. I did a search and don't see a bug filed for that ability, although seems to me I heard something in Austin about this perhaps. Is this part of the larger idea of running talos twice on a single push (i.e. on the current rev and the rev -1) and providing the diff?
Flags: needinfo?(rwood)
Flags: needinfo?(mconley)
Flags: needinfo?(jmaher)
Reporter | ||
Comment 9•6 years ago
|
||
we don't have the ability for this- I think in the coming weeks we will get our sheriffing process updated to have two try pushes (before/after) which include geckoProfile; To do this on existing pushes has a lot of small problems to work around and tooling to do, so adding an extra sheriffing step is the path of least resistance for now. :mconley, I would just do 2 try pushes for now;
Flags: needinfo?(jmaher)
Comment hidden (obsolete) |
Comment 11•6 years ago
|
||
Just coming back round to this now. Apparently, these try pushes failed. :( Timing out in ts_paint for some reason. I'll remove all tests except for cpstartup to see if that helps.
Comment 12•6 years ago
|
||
Baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=708fc693a099ae3ffad4279e7a052f74b5f4ae5c Regression: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c0ba37714940332ed7be13ba919cf18af9c81a4 I'll keep the needinfo on myself for now until these are done and I get some profiles posted.
Comment hidden (obsolete) |
Comment 14•6 years ago
|
||
I just took a quick peek at these profiles and the scores that the cpstartup tests got with them, and it's not very encouraging. Here are the scores: Baseline: 11:45:40 INFO - PID 7012 | ------- Summary: start ------- 11:45:40 INFO - PID 7012 | Number of tests: 1 11:45:40 INFO - PID 7012 | 11:45:40 INFO - PID 7012 | [#0] content-process-startup Cycles:2 Average:491.50 Median:525.00 stddev:47.38 (9.0%) 11:45:40 INFO - PID 7012 | Values: 525.0 458.0 11:45:40 INFO - PID 7012 | -------- Summary: end -------- Regression: 11:45:42 INFO - PID 8968 | ------- Summary: start ------- 11:45:42 INFO - PID 8968 | Number of tests: 1 11:45:42 INFO - PID 8968 | 11:45:42 INFO - PID 8968 | [#0] content-process-startup Cycles:2 Average:395.50 Median:432.00 stddev:51.62 (11.9%) 11:45:42 INFO - PID 8968 | Values: 432.0 359.0 11:45:42 INFO - PID 8968 | -------- Summary: end -------- This means that the "regression" patch actually performed better than the baseline when profiling. This means the profiles in comment 13 are not likely to contain anything actionable for this issue.
Flags: needinfo?(mconley)
Comment 15•6 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews and needinfos from comment #14) > I just took a quick peek at these profiles and the scores that the cpstartup > tests got with them, and it's not very encouraging. Here are the scores: > > Baseline: > > 11:45:40 INFO - PID 7012 | ------- Summary: start ------- > 11:45:40 INFO - PID 7012 | Number of tests: 1 > 11:45:40 INFO - PID 7012 | > 11:45:40 INFO - PID 7012 | [#0] content-process-startup Cycles:2 > Average:491.50 Median:525.00 stddev:47.38 (9.0%) > 11:45:40 INFO - PID 7012 | Values: 525.0 458.0 > 11:45:40 INFO - PID 7012 | -------- Summary: end -------- > > > Regression: > > 11:45:42 INFO - PID 8968 | ------- Summary: start ------- > 11:45:42 INFO - PID 8968 | Number of tests: 1 > 11:45:42 INFO - PID 8968 | > 11:45:42 INFO - PID 8968 | [#0] content-process-startup Cycles:2 > Average:395.50 Median:432.00 stddev:51.62 (11.9%) > 11:45:42 INFO - PID 8968 | Values: 432.0 359.0 > 11:45:42 INFO - PID 8968 | -------- Summary: end -------- > > > This means that the "regression" patch actually performed better than the > baseline when profiling. > > This means the profiles in comment 13 are not likely to contain anything > actionable for this issue. Can we do collect other kinds of insight data, so we can figure this out?
Comment 16•6 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #15) > Can we do collect other kinds of insight data, so we can figure this out? I'm not sure. Profiling is our biggest hammer. If enabling profiling causes the issue to disappear, that means we're dealing with a bug we can't really observe with our instruments. When that fails, sometimes one can try to instrument things manually - have Gecko in the content process dump process-start-deltas during its start-up steps until the point it sends the result to the parent process. Do this for both revisions with and without the regressions changeset, and see if a gap opens up somewhere where one step tends to take longer for the regressor than the baseline.
Updated•6 years ago
|
Component: Untriaged → DOM: Content Processes
Product: Firefox → Core
Updated•6 years ago
|
Priority: -- → P3
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•