Closed Bug 1468150 Opened 6 years ago Closed 6 years ago

5.08 - 7.31% damp (linux64-qr, windows10-64, windows7-32) regression on push 7eb872d2f548c167e82454fef02354619cfcb754 (Fri Jun 8 2018)

Categories

(Core :: Graphics: Layers, defect)

62 Branch
defect
Not set
normal

Tracking

()

RESOLVED INVALID
Tracking Status
firefox62 --- fix-optional

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression, talos-regression, Whiteboard: [gfx-noted])

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=7eb872d2f548c167e82454fef02354619cfcb754

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  7%  damp windows10-64 pgo e10s stylo     83.87 -> 90.00
  6%  damp windows7-32 opt e10s stylo      95.08 -> 100.54
  5%  damp windows10-64 opt e10s stylo     88.76 -> 93.57
  5%  damp linux64-qr opt e10s stylo       84.97 -> 89.29


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13734

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: General → Graphics: Layers
Product: Testing → Core
Flags: needinfo?(dothayer)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> Here are the Gecko profiles:
> 
> before:
> https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.
> net%2Fv1%2Ftask%2FT-Axt05VQeOk-
> BR3tszWcg%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_damp.zip
> 
> after:
> https://perf-html.io/from-url/https%3A%2F%2Fqueue.taskcluster.
> net%2Fv1%2Ftask%2FFZM6FugeQCaBBldcyoj7oA%2Fruns%2F0%2Fartifacts%2Fpublic%2Fte
> st_info%2Fprofile_damp.zip

These are for Windows 10 PGO build.
Joel, do you have any insights into what might be going on here? In each of these cases I'm seeing one test that seems to get bitten, and it's a different test for each platform:

osx-10-10: https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=8b1213f470f4&highlightedRevisions=7eb872d2f548&series=%5B%22autoland%22,%22c52fc1fde83812d8842b78a8f11761e90b4a1865%22,1,%221%22%5D&timerange=604800

windows7-32: https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=8b1213f470f4&highlightedRevisions=7eb872d2f548&series=%5B%22autoland%22,%22fdcc76333df8031ac7e69c1380346217cd98277d%22,1,%221%22%5D&timerange=604800

windows10-64: https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=8b1213f470f4&highlightedRevisions=7eb872d2f548&series=%5B%22autoland%22,%220ade5d811caa426ca3eb3fb2ef4351c419add103%22,1,%221%22%5D&timerange=604800

windows10-64 pgo: https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=8b1213f470f4&highlightedRevisions=7eb872d2f548&series=%5B%22autoland%22,%22b53247881c77ac53bfd9299051cc9b2f79cf82a2%22,1,%221%22%5D&timerange=604800

In each case a test that had a very low score turns into a medium or medium-high score, blowing up the percentage increase. This makes me think some fixed cost in the suite as a whole is just getting pushed around in a way that makes the score blow up because it's using a geometric rather than arithmetic mean. Thoughts?
Flags: needinfo?(dothayer) → needinfo?(jmaher)
Hey bgrins, any insights on what dthayer might be seeing here?
Flags: needinfo?(bgrinstead)
they seem to be *.inspector.open.settle (except for win10 pgo).  This does seem odd and I would suspect there is some odd timing issue we are hitting.  Do we see other tests with a change outside of *.inspector.open.settle?  I don't really understand the difference of the subtests.

I do know that :ochameau has done a lot of work recently on damp to make it more stable/reliable/useful.  Maybe he has some insight?
Flags: needinfo?(jmaher) → needinfo?(poirot.alex)
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos / reviews) from comment #4)
> Hey bgrins, any insights on what dthayer might be seeing here?

I was going to forward this question to Alex, but Joel beat me to it.
Flags: needinfo?(bgrinstead)
"*.settle" tests highlight the pending paints following each individual test executions.

So this patch most likely made a couple of tests leak new paints, or highlighted race in these tests where we do not wait correctly for all async operation to finish.


For example, if you look into "custom.inspector.open.settle" test:
https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=8b1213f470f4&highlightedRevisions=7eb872d2f548&series=%5B%22autoland%22,%22fdcc76333df8031ac7e69c1380346217cd98277d%22,1,%221%22%5D&timerange=604800

This "settle" subtest is actually run right after the "custom.inspector.open" test, ran here:
  https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/devtools/addon/content/tests/inspector/custom.js#14

The open test is opening the inspector, like this:
https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/devtools/addon/content/tests/head.js#106-112
  let test = runTest(name + ".open.DAMP");
  let toolbox = await openToolbox(tool, onLoad);
  test.done();

  test = runTest(name + ".open.settle.DAMP");
  await waitForPendingPaints(toolbox);
  test.done();

And the "settle" is basically measuring any pending paint that happens after the test is finished:
https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/devtools/addon/content/damp.js#167-180
    let utils = window.QueryInterface(Ci.nsIInterfaceRequestor)
                      .getInterface(Ci.nsIDOMWindowUtils);
    window.performance.mark("pending paints.start");
    while (utils.isMozAfterPaintPending) {
      await new Promise(done => {
        window.addEventListener("MozAfterPaint", function listener() {
          window.performance.mark("pending paint");
          done();
        }, { once: true });
      });
    }
    window.performance.measure("pending paints", "pending paints.start");

These settle test are here, to ensure we are measuring the full test action, including any pending reflow.
But I realize this may be very confusing for contributors external to DevTools.
I'm wondering if we should keep running them and collect the data for DevTools team,
but stop tracking them in the summary so that there is no sheriff report about them...

Also this code waiting for MozAfterPaint events may be imperfect,
if you have better ways to track pending reflow, I'm all up for an alternative!
Flags: needinfo?(poirot.alex)
Whiteboard: [gfx-noted]
(In reply to Alexandre Poirot [:ochameau] from comment #7)
> These settle test are here, to ensure we are measuring the full test action,
> including any pending reflow.
> But I realize this may be very confusing for contributors external to
> DevTools.
> I'm wondering if we should keep running them and collect the data for
> DevTools team,
> but stop tracking them in the summary so that there is no sheriff report
> about them...
> 
> Also this code waiting for MozAfterPaint events may be imperfect,
> if you have better ways to track pending reflow, I'm all up for an
> alternative!

Have you decided how we should proceed? If you filed any bug in this direction, please link it with bug 1468150.
Flags: needinfo?(poirot.alex)
Depends on: 1473322
I just opened bug 1473322 to stop recording these subtests.
While they would help keeping tests that wait perfectly for everything,
they are very hard to fix when the regression isn't obviously missing some event in the panel code...
Can this bug be closed? Is there anything left to do here that isn't in the followup bugs?
Yes, these tests are now disabled and the regression is more about test quality than any user visible regression.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(poirot.alex)
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.