2.19 - 16.89% glvideo Mean tick time across 100 ticks: / sessionrestore / sessionrestore_no_auto_restore / tart / tp5n main_startup_fileio (linux64, windows7-32, windows8-64) regression on push bc5dd1f16c3a61cdcbc9dd0d67e8b263ca85e570 (Tue May 23 2017)

RESOLVED WORKSFORME

Status

defect
RESOLVED WORKSFORME
2 years ago
2 years ago

People

(Reporter: igoldan, Unassigned)

Tracking

({perf, regression, talos-regression})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=bc5dd1f16c3a61cdcbc9dd0d67e8b263ca85e570

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 17%  glvideo Mean tick time across 100 ticks:  windows8-64 pgo e10s     40.91 -> 47.82
 14%  glvideo Mean tick time across 100 ticks:  windows8-64 opt e10s     42.00 -> 47.74
  9%  sessionrestore_no_auto_restore windows7-32 pgo e10s                675.65 -> 733.17
  4%  tp5n main_startup_fileio windows7-32 opt e10s                      58,534,582.08 -> 61,040,343.00
  4%  tart summary windows7-32 opt e10s                                  7.27 -> 7.54
  4%  sessionrestore_no_auto_restore windows8-64 opt e10s                699.46 -> 725.25
  4%  sessionrestore_no_auto_restore linux64 opt e10s                    864.79 -> 895.42
  3%  sessionrestore linux64 opt e10s                                    836.52 -> 865.08
  3%  tart summary windows8-64 opt e10s                                  6.02 -> 6.21
  3%  tp5n main_startup_fileio windows7-32 pgo e10s                      70,808,811.19 -> 72,837,829.08
  2%  sessionrestore windows8-64 opt e10s                                673.12 -> 687.83

Improvements:

  5%  tp5o Private Bytes windows7-32 opt e10s     149,910,325.09 -> 142,422,852.96
  4%  tp5o Private Bytes windows7-32 pgo e10s     148,609,949.07 -> 141,963,855.82
  4%  tp5o Main_RSS windows7-32 opt e10s          122,717,469.29 -> 118,212,607.46
  3%  tp5o Main_RSS windows7-32 pgo e10s          117,477,390.71 -> 113,626,149.72
  3%  damp summary linux64 opt e10s               298.65 -> 290.87


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6793

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → Talos
Product: Firefox → Testing
Version: unspecified → Trunk
These are most of the tests that got affected by the Talos changes. Others may be added to this list.

Joel, could you take a second look over some of these alerts, to confirm the triage?
Flags: needinfo?(jmaher)
thanks for filing this bug.

I did a push to try to back this out and it looks as though my patch did cause some oddities.

glvideo- this is a bi-modal test and my patch adjusted it to be single modal (more reliable, but on the high data line).  I think this is ok.

the tp5n file bytes are ok as I am loading the new addon at startup.

that leaves sessionrestore/tart- these I am confused by.

here is the sessionrestore wins I see:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1363ff096293&newProject=try&newRevision=1e950cc870185a6230ca9d26e509b938d5ae41c8&framework=1&filter=sessionrestore&showOnlyImportant=0

and there are some wins on tart as well:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1363ff096293&newProject=try&newRevision=1e950cc870185a6230ca9d26e509b938d5ae41c8&framework=1&filter=tart&showOnlyImportant=0


so this patch does affect those numbers.  What is odd is that this patch changes the way we initialize the profile creation, not the running of the tests.  In fact I was going to use marionette to connect to the browser and install addons, but it cause tart/cart to fail and it caused sessionrestore to have issues running reliably.

what I did was backout one patch:
https://hg.mozilla.org/try/rev/442758c706e768b17a19c29833487cb8e7501b05

The big difference is I use mozrunner to launch the browser and manage the process while initializing the profile vs talosprocess (which is a subclass of mozprocess).

I think we need to investigate this more.  In the meantime, sessionrestore looks to be reset thanks to an improvement which landed shortly after my patch did.

One interesting thing about sessionrestore is that we hack the profile for each browser launch whereas all other tests use the profile initially created by the initialization run.

Some next steps would be:
* look at the raw numbers for change in patterns
* determine if there are differences in the command line to launch the browser via mozrunner vs talosprocess
* determine if there are profile differences

:rwood, do you have any other ideas on how to investigate this?  I will be in need of some help on this.
Flags: needinfo?(jmaher) → needinfo?(rwood)
> :rwood, do you have any other ideas on how to investigate this?  I will be
> in need of some help on this.

Hmmm... As far as I know we've never used Marionette and all its dependencies in Talos. Maybe try a simple patch that just starts a marionette client session and ends it, and see how that looks? (But does the talos tests and addons installation the normal/old way).

If that looks alright, then maybe try using marionette to install the extension but start/close the browser the old way i.e. use marionette but don't use mozrunner... (is that possible?).
Flags: needinfo?(rwood)
that is a good idea to not use mozrunner.  Right now that is really the only difference for sessionrestore, tart, etc is that using mozrunner.
(In reply to Joel Maher ( :jmaher) from comment #4)
> that is a good idea to not use mozrunner.  Right now that is really the only
> difference for sessionrestore, tart, etc is that using mozrunner.

Trying out a patch that uses marionette but not mozrunner (comment 5):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7a7f2d6e172da3b67c9e3a74ae7242c8097c674
Whiteboard: [PI:June]
Robert, any update on using marionette instead of mozrunner?
Flags: needinfo?(rwood)
(In reply to Chris Peterson [:cpeterson] from comment #7)
> Robert, any update on using marionette instead of mozrunner?

No I haven't had a chance to get back to this, yet.
Flags: needinfo?(rwood)
Whiteboard: [PI:June] → [PI:August]
we don't have marionette anymore, so any differences here would be mozrunner vs a use of mozprocess.
Whiteboard: [PI:August] → [PI:September]
I think this has been in long enough and there is nothing actionable to do here- mozrunner gets us using the same processes management tools as mochitest/reftest/xpcshell.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
Whiteboard: [PI:September]
You need to log in before you can comment on or make changes to this bug.