Closed Bug 1461631 Opened 6 years ago Closed 5 years ago

4.56 - 68.24% damp / rasterflood_gradient / rasterflood_svg / tsvgx (windows10-64, windows7-32) regression on push 4632e7d90224f898dcfb675c47cacfa9784be5f5 (Sun May 6 2018)

Categories

(Testing :: Talos, defect, P3)

Unspecified
Windows
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

Details

(4 keywords)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=4632e7d90224f898dcfb675c47cacfa9784be5f5

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  rasterflood_svg windows7-32 opt e10s stylo     10,662.08 -> 11,150.99

Improvements:

 12%  tp5o_webext responsiveness linux64 opt e10s stylo     1.19 -> 1.05


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13154

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
We ended up with this 5% perf regression, which didn't previously replicate on mozilla-inbound or autoland. And still doesn't.
Given what we discussed during the work week in Baia Mare, I'm handing out the investigation of this to Release Management.
:ryanvm- could you redirect this as needed?  this was seen on the 61 merge to beta- possibly related to a config change.
Flags: needinfo?(ryanvm)
Patrick, maybe you can help find someone to investigate? This could probably be bisected on Try, though it'll be a bit tedious to do so (you'd need to manually apply the Beta-specific config changes on top of every rev prior to pushing). There do appear to be a number of devtools changes in that push.
Flags: needinfo?(ryanvm) → needinfo?(pbrosset)
Looking at DevTools-only changesets in this pushlog, I think we can safely ignore the following ones:

82c6cbd1db13  Gabriel Luong — Bug 1458745 - Only get the notification box in the inspector for the debugger warning only if the debugger is paused. r=pbro
19576ace167c  Gabriel Luong — Bug 1433718 - Follow up: fix eslint errors in rule-editor.js; r=me
b7aeba47bb1f  Gabriel Luong — Bug 1433718 - Make unit tests pass with the 3 pane inspector on in nightly. r=pbro
f64dc9958646  Gabriel Luong — Bug 1458749 - Remove checks for old traits in the inspector. r=pbro
1bf642a3af2a  Jan Odvarko — Bug 1458092 - Netmonitor - minor visual tweaks to new toolbar; r=davidwalsh
af05b5813fc7  Gabriel Luong — Bug 1459257 - Check the inspector is visible before showing the 3 pane onboarding tooltip. r=jdescottes
daaa9249868c  J. Ryan Stinnett — Bug 1457804 - Use content size for DevTools highlighter writing mode adjustments. r=gl
a1abfeb6a79a  Michael Ratcliffe — Bug 1098374 - Telemetry: Stop all monkey patching in devtools telemetry tests r=yulia
1a810001d12e  yulia — Bug 1441792 - add waitForRequestData to netmonitor test head r=Honza,ochameau
f7b95c137dba  yulia — Bug 1441792 - fix browser_net_resend.js and convert to async/await r=yulia
2c55003986b3  yulia — Bug 1382580 - Delete old event emitter; r=nchevobbe
398457c41176  yulia — Bug 1382580 - Remove old-event-emitter alias from webconsole; r=nchevobbe
31d43eae6c14  yulia — Bug 1382580 - Replace mention of the old event emitter with the new one in documentation; r=nchevobbe
2fc5c1baf4d3  yulia — Bug 1382580 - remove old-event-emitter from old debugger; r=nchevobbe
5bc4513eef70  Gabriel Luong — Bug 1458770 - Use the toolbox's Telemetry instance in the Inspector. r=miker

Indeed, rasterflood_svg seems to be a graphics benchmark test where devtools never gets opened. So these changes should not be related to the regression.

Now, there is this interesting change:

8f88c817b334  Alexandre Poirot — Bug 1454580 - Add a DAMP test to watch RDP/protocol.js performance. r=jryans
This one adds a new test inside the DAMP talos suite. So DAMP regressed as a result, but this was expected because we added the new test. So we're good there, but I'm just wondering if there's a chance one talos test could negatively impact another talos test.

The only other DevTools-related change I can see there is:
4832928b4bb4  Gabriel Luong — Bug 1433718 - Make unit tests pass with the 3 pane inspector on in nightly. r=pbro
and I know it had an impact on perf but: a) it should only impact DAMP and b) it got backed out in the same pushlog.

Finally, I see the following 4 changesets from Brad who's on the DevTools team:
d462d502c26b  Brad Werth — Bug 1436431 Part 4: Disable and update an existing mochitest of find-in-page. r=bz
dc092a92c289  Brad Werth — Bug 1436431 Part 3: Add a test to ensure overflow text in the viewport is findable. r=bz
ec1b3f950848  Brad Werth — Bug 1436431 Part 2: Change nsTypeAheadFind::IsRangeVisible to additionally check for visibility of range rects; not just the range's primary frame. r=bz
b0c260eb196c  Brad Werth — Bug 1436431 Part 1: Extend PresShell::GetRectVisibility to consider the target frame's scrollable ancestors. r=bz

those are the only changes that may be related to my team's work which could potentially have had that perf impact. So let me needinfo Brad so he can take a look at this.
Flags: needinfo?(pbrosset) → needinfo?(bwerth)
(In reply to Patrick Brosset <:pbro> from comment #4)

> Finally, I see the following 4 changesets from Brad who's on the DevTools
> team:
> d462d502c26b  Brad Werth — Bug 1436431 Part 4: Disable and update an
> existing mochitest of find-in-page. r=bz
> dc092a92c289  Brad Werth — Bug 1436431 Part 3: Add a test to ensure overflow
> text in the viewport is findable. r=bz
> ec1b3f950848  Brad Werth — Bug 1436431 Part 2: Change
> nsTypeAheadFind::IsRangeVisible to additionally check for visibility of
> range rects; not just the range's primary frame. r=bz
> b0c260eb196c  Brad Werth — Bug 1436431 Part 1: Extend
> PresShell::GetRectVisibility to consider the target frame's scrollable
> ancestors. r=bz

I just ran the test locally and it didn't hit PresShell::GetRectVisibility -- which makes sense because this code is only called from nsTypeAheadFind::IsRangeVisible. From reading the test description, there's no reason to think it would trigger find-in-page behavior. I don't think these changes are responsible for the slowdown.
Flags: needinfo?(bwerth)
Thoughts on how to continue with the investigations?
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)
> Thoughts on how to continue with the investigations?

Since the regression is in SVG performance, it's just really unlikely to be devtools related since our tools currently don't attach to SVG documents. I would look for anything about "display" and I found this:

2fe7f347fbbd	Matt Woodrow — Bug 1456534 - Make sure we do a full display list rebuild on the next frame after creating a displayport during painting. r=mstange
(In reply to Brad Werth [:bradwerth] from comment #7)
> (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)
> > Thoughts on how to continue with the investigations?
> 
> Since the regression is in SVG performance, it's just really unlikely to be
> devtools related since our tools currently don't attach to SVG documents. I
> would look for anything about "display" and I found this:
> 
> 2fe7f347fbbd	Matt Woodrow — Bug 1456534 - Make sure we do a full display
> list rebuild on the next frame after creating a displayport during painting.
> r=mstange

:mattwoodrow Do you agree bug 1456534 is related to our perf regression?
Flags: needinfo?(matt.woodrow)
Component: Untriaged → Talos
Product: Firefox → Testing
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)
> (In reply to Brad Werth [:bradwerth] from comment #7)
> > (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)
> > > Thoughts on how to continue with the investigations?
> > 
> > Since the regression is in SVG performance, it's just really unlikely to be
> > devtools related since our tools currently don't attach to SVG documents. I
> > would look for anything about "display" and I found this:
> > 
> > 2fe7f347fbbd	Matt Woodrow — Bug 1456534 - Make sure we do a full display
> > list rebuild on the next frame after creating a displayport during painting.
> > r=mstange
> 
> :mattwoodrow Do you agree bug 1456534 is related to our perf regression?

That bug just disables retained-dl in some situation, so it's possible, but only if RDL improved this test previously. I don't remember seeing an improvement alert, but it might have happened.

I'd say it's still worth bisecting this to be sure.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #9)
> (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)
> > (In reply to Brad Werth [:bradwerth] from comment #7)
> > > (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)
> > > > Thoughts on how to continue with the investigations?
> > > 
> > > Since the regression is in SVG performance, it's just really unlikely to be
> > > devtools related since our tools currently don't attach to SVG documents. I
> > > would look for anything about "display" and I found this:
> > > 
> > > 2fe7f347fbbd	Matt Woodrow — Bug 1456534 - Make sure we do a full display
> > > list rebuild on the next frame after creating a displayport during painting.
> > > r=mstange
> > 
> > :mattwoodrow Do you agree bug 1456534 is related to our perf regression?
> 
> That bug just disables retained-dl in some situation, so it's possible, but
> only if RDL improved this test previously. I don't remember seeing an
> improvement alert, but it might have happened.
> 
> I'd say it's still worth bisecting this to be sure.

I'm working on the bisection to confirm this.
I've come back with the results [1].

The backout doesn't highlight any kind of difference. So I guess bug 1456534 isn't the culprit.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=44736afc01b3&newProject=try&newRevision=a0d0338609e7&framework=1&filter=rasterflood_svg
Any updates on this?
Flags: needinfo?(igoldan)
Priority: -- → P3

(In reply to Vicky Chin [:vchin] from comment #12)

Any updates on this?

Not really. This is a regression older than one year & we don't store data this old.

Flags: needinfo?(igoldan)

I think at this point we have no option but to accept this regression due to the amount of time that has passed since it was detected. We need to address our workflow to prevent such a long time passing without a decision. Bug 1520714 may help to ensure the assigned sheriff follows up in a timely manner, as well as tighter collaboration with release management.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.