Closed Bug 1540128 Opened 6 years ago Closed 6 years ago

12.76 - 37.04% tart (osx-10-10-shippable, windows7-32-shippable) regression on push 2ae5ad0cc2e26e35d3a3c0827f8ac54b8d16be83 (Fri Mar 29 2019)

Categories

(Core :: Web Painting, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Bebe, Unassigned)

References

(Regression)

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=2ae5ad0cc2e26e35d3a3c0827f8ac54b8d16be83

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

37% tart windows7-32-shippable opt e10s stylo 2.04 -> 2.80
13% tart osx-10-10-shippable opt e10s stylo 8.56 -> 9.65

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=20196

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling

Blocks: 1539306, 1534654
Component: General → Web Painting
Product: Testing → Core
Version: Version 3 → unspecified

The profiles don't have many samples, but the markers may give us some idea of what happened.

The marker chart on the compositor thread shows markers with the name "NoCompositorScreenshot because nothing changed" (even though profiler screenshots are not enabled, which is a bit surprising to me), and we can use those markers to count how many empty composites were detected.

In the "before" profile, one of the test runs shows lots of empty composites during some of the animations and almost none during other animations. In the "after" profile, both test runs show tons of empty composites during all of the animations.
Here are two 22ms windows from similar parts of the profile: before and after. In the "after" profile, only the composites that immediately follow a transaction do actual work. The other composites are no-ops. In the "before" profile, there are no no-op composites in this window.

The different distribution of numbers of composites, and no-op composites vs "actual work" composites, probably confuse the test measurement somehow but I don't know how.

The intent of this test is to time the durations between composites that originate from the main thread changes made to the tabstrip, and uses the mean of these durations.

Before bug 1539306, we were generating unnecessary composites between the 'real' ones, and were including these in the durations (more composites, but shorter inverals).

The change removes the unnecessary composites, so the reported result is worse, but this more accurately represents the work that we're hoping to measure.

Given that, we should take this as a change in the measurements, and not a regression.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

We typically close harness/test updates as WONTFIX. So I'll just tweak the resolution here.

Resolution: FIXED → WONTFIX
No longer blocks: 1539306
Regressed by: 1539306
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.