12.76 - 37.04% tart (osx-10-10-shippable, windows7-32-shippable) regression on push 2ae5ad0cc2e26e35d3a3c0827f8ac54b8d16be83 (Fri Mar 29 2019)
Categories
(Core :: Web Painting, defect)
Tracking
()
People
(Reporter: Bebe, Unassigned)
References
(Regression)
Details
(Keywords: perf, regression, talos-regression)
Talos has detected a Firefox performance regression from push:
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
37% tart windows7-32-shippable opt e10s stylo 2.04 -> 2.80
13% tart osx-10-10-shippable opt e10s stylo 8.56 -> 9.65
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=20196
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Performance_sheriffing/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Performance_sheriffing/Talos/RegressionBugsHandling
| Reporter | ||
Updated•6 years ago
|
| Reporter | ||
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
The profiles don't have many samples, but the markers may give us some idea of what happened.
The marker chart on the compositor thread shows markers with the name "NoCompositorScreenshot because nothing changed" (even though profiler screenshots are not enabled, which is a bit surprising to me), and we can use those markers to count how many empty composites were detected.
In the "before" profile, one of the test runs shows lots of empty composites during some of the animations and almost none during other animations. In the "after" profile, both test runs show tons of empty composites during all of the animations.
Here are two 22ms windows from similar parts of the profile: before and after. In the "after" profile, only the composites that immediately follow a transaction do actual work. The other composites are no-ops. In the "before" profile, there are no no-op composites in this window.
The different distribution of numbers of composites, and no-op composites vs "actual work" composites, probably confuse the test measurement somehow but I don't know how.
Comment 3•6 years ago
|
||
The intent of this test is to time the durations between composites that originate from the main thread changes made to the tabstrip, and uses the mean of these durations.
Before bug 1539306, we were generating unnecessary composites between the 'real' ones, and were including these in the durations (more composites, but shorter inverals).
The change removes the unnecessary composites, so the reported result is worse, but this more accurately represents the work that we're hoping to measure.
Given that, we should take this as a change in the measurements, and not a regression.
Comment 4•6 years ago
|
||
We typically close harness/test updates as WONTFIX. So I'll just tweak the resolution here.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•4 years ago
|
Description
•