Closed Bug 1467043 Opened 6 years ago Closed 6 years ago

8.48% sessionrestore (windows7-32) regression on push 8a13b7657c5a600042ab3da6d9fdb45702f4ab92 (Thu May 31 2018)

Categories

(Firefox :: General, defect)

61 Branch
Unspecified
Windows
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=8a13b7657c5a600042ab3da6d9fdb45702f4ab92 As author of one of the patches included in that push, we need your help to address this regression. Regressions: 8% sessionrestore windows7-32 opt e10s stylo 225.00 -> 244.08 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13656 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Product: Testing → Firefox
I believe this is a WONTFIX.
Flags: needinfo?(mconley)
Does this mirror a previous improvement from when bug 1447719 landed?
Flags: needinfo?(igoldan)
(In reply to Dão Gottwald [::dao] from comment #2) > Does this mirror a previous improvement from when bug 1447719 landed? I don't see improvements for when bug 1447719 landed.
Flags: needinfo?(igoldan)
So it seems like a regression landed under the radar while browser.startup.blankWindow was enabled? This reminds me of the concern about ts_paint expressed in bug 1456270.
This sounds like the Windows version of bug 1465015.
I guess there are a few possibilities here: 1. We shipped a sessionrestore performance regression in the non-blank-first-paint-window case, but didn't detect this until now (What dao fears in comment 4) 2. Some follow-up work to blank-first-paint-window caused a performance improvement on sessionrestore tests (bug 1454908) in only the configuration where blank-first-paint-window is enabled, and by disabling blank-first-paint-window, those gains have been lost. What would be useful is if we could backfill a bit. florian, do you have the cycles for this?
Flags: needinfo?(mconley) → needinfo?(florian)
Try push of talos with the profiler enabled: - for the current beta tree: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a15a356ea7235f2792c6ff05014ab9ae41c506d - with blank first paint re-enabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc3aa0e825f58aa22f5d31bd0e41ed2d37f4fa80 Looking at the 'sessionRestoreInit' and 'sessionRestored' markers in the profiles seems a good enough approximation of what the talos test measures for the 'sessionrestore' test. Here are some example profiles from the above pushes, bounded at these 2 markers: Windows 7 32bits: - current beta: 682ms - https://perfht.ml/2Ly4oCL - with blank paint: 663ms - https://perfht.ml/2Lx2RfR Windows 10 64bits: - current beta: 460ms - https://perfht.ml/2LttAtS - with blank paint: 430ms - https://perfht.ml/2JmGY2o It's unfortunately much harder to extract meaningful information from these profiles than I would like, but one thing I've noticed is that with the blank window disabled, there's an 'activate' event on the browser window between the 'load' event and the start of _delayedStartup, and this activate event causes a restyle and repaint. I'm not convinced this fully explains the measured difference, but it seems to be at least a large part of it.
Flags: needinfo?(florian)
The sessionrestore talos test measures the time between when we start reading the session file, and when we are done restoring the session. This means: - we start measuring at a point in the middle of startup (after initializing a bunch of stuff including the add-on manager, telemetry, processing command line parameters, ...) - this measurement includes the (random) time it takes to do async IO with OS.File on a different thread. - this measurement includes the (random) time it takes to do (random) async stuff on the main thread while we wait for the session file to be fully read. The best way to optimize the values from this sessionstore test would be to delay as much as possible the beginning of the measurement (ie. when we start initializing the session restore code), so that its initialization happens when Firefox isn't busy with other I/O and other async tasks done during startup. This would likely improve the numbers, but be a significant regression for users. Given all of this, I don't think we should care much about the values from the current sessionrestore talos test. Here are other things we could (and probably should) measure that would be more meaningful: - as a high level startup test, measure the time between process creation and when we are done restoring the session. This is the 'sessionRestored' value from the startup timeline that I was already suggesting we should collect (see bug 1456270 comment 6). - the time it takes to restore a session once we are done reading the session file with async I/O. This would be a micro-benchmark relevant only to the people working on session restore code, but should be less noisy and shouldn't be prone to variations caused by unrelated changes, as it would only measures what happens (almost synchronously) once we start actually restoring the session. (fwiw, this relevant subset is 182ms out of the currently measured 663ms).
Given that: - even if we were confident there's a real regression here, we wouldn't have time to uplift a perf fix in 61, - we are planning to ship blank first paint on 62, - sessionrestore isn't a very relevant benchmark, as explained in comment 8, I think this bug is strongly headed in a WONTFIX direction.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
:florian, if we do not care about the sessionrestore test, then please remove it (and I assume many_window and no_auto_restore variants). If we are going to run the test we need to take it seriously. what should we do?
Flags: needinfo?(florian)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #10) > :florian, if we do not care about the sessionrestore test, then please > remove it (and I assume many_window and no_auto_restore variants). If we > are going to run the test we need to take it seriously. what should we do? We should first introduce tests that measure what we care about (for startup that's covered by the suggestions in bug 1456270), and once we have good coverage in place, we should remove the older tests.
Flags: needinfo?(florian)
that seems reasonable, but our perf sherrifs are spending time investigating bugs that are ignored because the data is useless. Nobody is working on bug 1456270. Can we please treat sessionrestore as a useful test (and revisit this bug) or agree to stop running it?
It's possible my previous comments about the value of the sessionrestore test sounded stronger than I meant. I'm not suggesting we ignore it without a replacement. And given the time I spent investigating here (see comment 7), I'm clearly not ignoring it. Comment 9 mentions 2 other reasons for resolving as wontfix, and comment 1 was already suggesting wontfix.
and to be fair it is ok to wontfix a regression, this happens often; I just want to make sure that we are running tests that have value and only ask people to investigate failures from tests that are useful- I suspect sessionrestore is useful in general, just not in many cases and we can make a better startup test.
You need to log in before you can comment on or make changes to this bug.