9.23% tabpaint (osx-10-10) regression on push bbcec17958a9f17a7bd9fb5a581c4a51a29803de (Fri Jul 28 2017)

RESOLVED WONTFIX

Status

()

defect
RESOLVED WONTFIX
2 years ago
2 years ago

People

(Reporter: jmaher, Assigned: stone)

Tracking

({perf, talos-regression})

56 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55 unaffected, firefox56 disabled, firefox57 affected)

Details

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=bbcec17958a9f17a7bd9fb5a581c4a51a29803de

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  9%  tabpaint summary osx-10-10 opt e10s     60.54 -> 66.13


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8422

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
this regression occurred while backing out many bugs/patches at once on both autoland/inbound:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=18525ab193663e4564d96d61b69dd9fc8220d4a3&tochange=bbcec17958a9f17a7bd9fb5a581c4a51a29803de

given the fact that the others are test only or have relanded with no improvements, I have narrowed it down to two possible bugs, bug 1351148, or bug 1383485- and both are in the same component with same author and reviewer :)

:stone, can you look at the two bugs and determine which one might be causing a regression.  I do not see a corresponding improvement when they originally landed, so something is causing an improvement and backing out one or more of your patches caused a regressions.  possibly just cleaning up the patches to land again would help.
Component: Untriaged → DOM: Events
Flags: needinfo?(sshih)
Product: Firefox → Core
bug 1351148 is trying the handle the input events faster. This one may impact the performance.
bug 1383485 is sending one more IPC in some specific condition. I think it's not the one you are looking for.

I'm fixing some regressions caused by the patches of bug 1351148 and will land it once all the regressions are fixed.
Flags: needinfo?(sshih)
Stone's working on bug 1351148 and when that is landed, let's dupe this to that (where we hopefully see a talos improvement :).
Assignee: nobody → sshih
Priority: -- → P1
Version: 53 Branch → 56 Branch
Hi Joel, since bug 1351148 has been relanded, have you seen talos improvement since that?
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #4)
> Hi Joel, since bug 1351148 has been relanded, have you seen talos
> improvement since that?

Oh, sorry, we landed 1351148 with the event queue disabled, so it's not the time to re-trigger talos.
Blocks: 1390044
(In reply to Andrew Overholt [:overholt] from comment #3)
> Stone's working on bug 1351148 and when that is landed, let's dupe this to
> that (where we hopefully see a talos improvement :).

Hey Andrew, should we dupe this bug now?
Flags: needinfo?(overholt)
It's weird here because this regression was found in code that has since been backed out. It's re-landed now (in a newer form with lots of improvements!) but *that* is preffed off and will be preffed on by bug 1390044. Since I don't want to sweep the performance considerations under the rug, let's leave this open and have it block bug 1390044 as Stone has done but remove the regression keyword.
Flags: needinfo?(overholt)
Keywords: regression
Andrew mentioned this should be marked as disabled for 56.
Tried to enable the input event queue and re-triggered talos
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b369b091b8a1b76a91278efc3513b136a2a225e2&newProject=try&newRevision=5d9407065d40d5d69670a3ca0ba58f1f2d2fdbb8&framework=1&showOnlyImportant=0

some tests got worse results
bloom_basic_singleton pgo e10s stylo (windows7-32) 2.31% worse
tresize opt e10s (linux64-stylo-sequential) 4.36% worse
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #10)
> if you compare the graphs to the longer trend (14 days) on mozilla-central,
> the specific tests look to be in the general range:

Thanks for this information. Wondering should I worry about those results with low confidence?

e.g.
https://treeherder.mozilla.org/perf.html#/graphs?series=try,1539858,1,1&highlightedRevisions=b369b091b8a1&highlightedRevisions=5d9407065d40

https://treeherder.mozilla.org/perf.html#/graphs?series=try,1497897,1,1&highlightedRevisions=b369b091b8a1&highlightedRevisions=5d9407065d40
I wouldn't invest time into those results- the low confidence exists because we have a data which is 'noisy' and the change is overlapping greatly with the baseline.  I always focus on the high confidence changes and take an extra look at the medium confidence changes, but not invest a lot of time unless the medium changes look realistic.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #12)
> I wouldn't invest time into those results- the low confidence exists because
> we have a data which is 'noisy' and the change is overlapping greatly with
> the baseline.  I always focus on the high confidence changes and take an
> extra look at the medium confidence changes, but not invest a lot of time
> unless the medium changes look realistic.

Thanks. It's helpful for me to move forward.
(In reply to Ming-Chou Shih [:stone] from comment #13)
> (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #12)
> > I wouldn't invest time into those results- the low confidence exists because
> > we have a data which is 'noisy' and the change is overlapping greatly with
> > the baseline.  I always focus on the high confidence changes and take an
> > extra look at the medium confidence changes, but not invest a lot of time
> > unless the medium changes look realistic.
> 
> Thanks. It's helpful for me to move forward.

Hi Stone, any progress here?
(In reply to Hsin-Yi Tsai [:hsinyi] from comment #14)
> (In reply to Ming-Chou Shih [:stone] from comment #13)
> > (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #12)
> > > I wouldn't invest time into those results- the low confidence exists because
> > > we have a data which is 'noisy' and the change is overlapping greatly with
> > > the baseline.  I always focus on the high confidence changes and take an
> > > extra look at the medium confidence changes, but not invest a lot of time
> > > unless the medium changes look realistic.
> > 
> > Thanks. It's helpful for me to move forward.
> 
> Hi Stone, any progress here?
I didn't find any obvious performance improvements or regressions after the patches are landed. (I filtered out uncertain results and checked the graph of the rests)
Hi Joel, per comment 15 and the weird nature of this bug that the regression was caused by backing out a code which didn't bring obvious talos improvement when the code had been landed. I hardly see actionable actions to move this forward, and am removing P1 flag of this bug. You may want to reconsider the regression window if it's possible or to suggest the next step.
Flags: needinfo?(jmaher)
Priority: P1 → --
it is common that code which causes a regression at one point in time will not at another point in time- this is typical as we land 1000+ changes/week in the tree and there are many complex interactions.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(jmaher)
Resolution: --- → WONTFIX
I see, thanks for the explanation, Joel.
You need to log in before you can comment on or make changes to this bug.