8.28% sessionrestore (osx-10-10) regression on push 997a22336e37ce7892960a1ed6a0408be44b8b45 (Wed May 17 2017)

RESOLVED WONTFIX

Status

()

RESOLVED WONTFIX
2 years ago
2 years ago

People

(Reporter: igoldan, Unassigned)

Tracking

({perf, regression, talos-regression})

unspecified
perf, regression, talos-regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 unaffected, firefox54 unaffected, firefox55 affected)

Details

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=997a22336e37ce7892960a1ed6a0408be44b8b45

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  sessionrestore osx-10-10 opt e10s     872.29 -> 944.50


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6696

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → Security: PSM
Product: Firefox → Core
:florian Could you please confirm the validity of this regression? Looks like it is only happening on macosx, the other platforms are not affected.
Flags: needinfo?(florian)
Is it possible to see a profile of running this test before and after bug 1362364?

From reading https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore it's unclear to me what this is trying to measure. Especially because of the "summarization: subtest: ignore first data point, then take the median of the remaining 9 data points" part. It seems to me that the first time we run this is the only one that actually matters to users, and that running it several times artificially doesn't mean much about what users will perceive. Yoric, is there something I'm missing here?
Flags: needinfo?(florian) → needinfo?(dteller)
Dumping profiles from try runs is supposed to be possible, but unfortunately it's been broken every time I've tried it in the past few months. It tends to get broken again and eventually fixed on a fairly regular basis...

You can get profiles from local runs, though, with `mach talos-test --geckoProfile`. They just don't always correspond very well to what you see on infra.

That said, I'd be surprised if that change caused any regression at all, let alone one this big. I suppose it's possible, if we happen to be deferring NSS initialization until a time when disk IO is unusually high, though.

But looking at the graph, I'm not really convinced that this changeset is responsible:

https://treeherder.mozilla.org/perf.html#/graphs?timerange=172800&series=%5Bautoland,c22155824871783bb5d8f514b3a01b4c885034c0,1,1%5D&selected=%5Bautoland,c22155824871783bb5d8f514b3a01b4c885034c0,205081,274600986,1%5D

Updated

2 years ago
status-firefox53: --- → unaffected
status-firefox54: --- → unaffected
status-firefox55: --- → affected
:florian To be able to have a comparison view like the one above, these are the steps to follow:

1. Update to a changeset, that you'll use as a baseline for the Talos tests.
   Simply add a commit message with try syntax, indicating the Talos tests to run.
   I used  | try: -b o -p macosx64 -u none -t other-e10s --rebuild 5 |. If you want other optiosn, use https://mozilla-releng.net/trychooser/ as a helper.
   A good idea is to first mention in the commit the hash of the changeset you've updated to. That way, you'll quickly identify it.
   e.g: https://hg.mozilla.org/try/rev/aacd84c1da19959005e5ee250159f6bc94d38985

2. Push to try

3. (To check how a backout performs) Update to the initial changeset, backout the undesired changes.
   Add the same try syntax to the commit message. Again, don't forget to first mention the commit of the hash you've updated to.
   e.g: try backout 997a22336e37 (Bug 1362364), alert 6696; try: -b o -p macosx64 -u none -t other-e10s --rebuild 5

   The precise | try: <whatever_options_you_require> | construct is extracted using plain regex.

4. Push this to try

(After the jobs finish, with results)

5. Go to: https://treeherder.mozilla.org/perf.html#/comparechooser
   Pick the try project for both Base and New, then provide the hashes of the jobs that you've pushed to try.
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] [:flo] from comment #2)
> Is it possible to see a profile of running this test before and after bug
> 1362364?
> 
> From reading
> https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.
> 2Fsessionrestore_no_auto_restore it's unclear to me what this is trying to
> measure. Especially because of the "summarization: subtest: ignore first
> data point, then take the median of the remaining 9 data points" part. It
> seems to me that the first time we run this is the only one that actually
> matters to users, and that running it several times artificially doesn't
> mean much about what users will perceive. Yoric, is there something I'm
> missing here?

The no-auto-restore variant basically tries to measure the time we're *wasting* on session restore when we actually read the data but decide to not restore.

Also, for the "ignore first data point", that's how Talos works, so bring this complaint to jmaher :) My guess is that the first run has much more variation than the following runs, and since we're only interested in trends, we just want something rather stable. Note that we're restoring the same profile for each of the 10 runs, so we should always be measuring the same thing.
Flags: needinfo?(dteller)
I tried to follow https://wiki.mozilla.org/Buildbot/Talos/Profiling to have profiles.

Here is the try push for the backout: https://treeherder.mozilla.org/#/jobs?repo=try&revision=94466a4211698d2743fe0f40f7f7b2f0f64d54d4

Here is the try push for the baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de3eb4aad9a169b339215b029d2771d900e0eb9a

After being queued for 12 hours, these jobs have no profiles visible in treeherder.

Comparing the results shows about a 1% improvement for the backout, and that's within the noise: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=de3eb4aad9a169b339215b029d2771d900e0eb9a&newProject=try&newRevision=94466a4211698d2743fe0f40f7f7b2f0f64d54d4&framework=1&showOnlyImportant=0

Joel, is there something I missed here?
Flags: needinfo?(florian) → needinfo?(jmaher)
when you run with --spsProfile the numbers are not useful, the only value there is the profile which doesn't seem to be working as there are no artifacts.

Oddly I don't see spsProfile getting passed through to the harness, so this might be a regular comparison.

In other cases we have seen that backing out a patch on the original point in time show the improvement, but later on the product changes and the backout shows little to no effect.  That is the case of linux64-sessionrestore, so likely the startup/restore paths on osx might be shifting in similar ways?
Flags: needinfo?(jmaher)
This bug is not actionable, closing.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.