Closed Bug 1411243 Opened 3 years ago Closed 3 years ago

4.32 - 5.56% damp (linux64, windows10-64, windows7-32) regression on push 60ff8f8364959d64e9dfc838693f8ca3c9fc9913 (Mon Oct 23 2017)

Categories

(DevTools :: Inspector, defect)

40 Branch
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: igoldan, Assigned: jdescottes)

References

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=60ff8f8364959d64e9dfc838693f8ca3c9fc9913

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  6%  damp summary windows10-64 pgo e10s     216.44 -> 228.47
  5%  damp summary windows7-32 pgo e10s      211.30 -> 222.83
  5%  damp summary windows7-32 opt e10s      267.79 -> 279.90
  4%  damp summary windows10-64 opt e10s     254.83 -> 266.24
  4%  damp summary linux64 pgo e10s          231.35 -> 241.48
  4%  damp summary linux64 opt e10s          257.31 -> 268.43


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10142

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → Developer Tools: Inspector
:jdescottes We noticed these regressions? Can we fix them or is this something we should accept?
Flags: needinfo?(jdescottes)
Investigating. 

None of the talos pushes I did for Bug 1171482 highlighted a global regression, but I did modify the last patch in afterwards.
I can't really understand how my modifications would affect tests on the performance tool.

If needed, only the last patch needs to be backed out, the other ones should have no impact.

Ionut: Just to make sure, these patches add a new test: inspector.mutations. I assume the results of this test are not counted in the summary, since,the previous revision does not have the test.
Flags: needinfo?(jdescottes) → needinfo?(igoldan)
Assignee: nobody → jdescottes
Status: NEW → ASSIGNED
(In reply to Julian Descottes [:jdescottes] from comment #2)
> Ionut: Just to make sure, these patches add a new test: inspector.mutations.
> I assume the results of this test are not counted in the summary, since,the
> previous revision does not have the test.
I see 6 subtest regressions ranging between 3 and 11%.
I think you made a good point though, regarding the new test. I'm ni?ing :jmaher for this, as I am not yet sure.
Flags: needinfo?(igoldan)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #4)
> (In reply to Julian Descottes [:jdescottes] from comment #2)
> > Ionut: Just to make sure, these patches add a new test: inspector.mutations.
> > I assume the results of this test are not counted in the summary, since,the
> > previous revision does not have the test.
> I see 6 subtest regressions ranging between 3 and 11%.

Yep there's definitely something off, I'm investigating. I just wanted to clarify the potential impact of the new test

> I think you made a good point though, regarding the new test. I'm ni?ing
> :jmaher for this, as I am not yet sure.

adding ni? flag
Flags: needinfo?(jmaher)
Unfortunately, I think that's a know issue of the median algorithm used on Talos.

This changeset added this new DAMP subtest:
  damp inspector.mutations - 2,229.87 ± 1.74% 

Which is significant new value in the test set.
That would explain the summary regression.


Is this run a significant one? That's the only one I could get from the alert page with subtests being run:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=autoland&originalRevision=f9b62ceada2a74d3005d983ff4ccffcc048dd3fd&newProject=autoland&newRevision=60ff8f8364959d64e9dfc838693f8ca3c9fc9913&originalSignature=f687ee347bcbdc55828148af029908049de69b7f&newSignature=f687ee347bcbdc55828148af029908049de69b7f&framework=1

As Julian said, the subtests regressions are surprising.
This patch should not affect performance and console much.
It may highlight some mix up into DAMP subtests...
(I hope to reduce this kind of weird mixup in bug 1407826)
... or unexpected consequences of this patch?
(In reply to Alexandre Poirot [:ochameau] from comment #6)
> Unfortunately, I think that's a know issue of the median algorithm used on
> Talos.
> 
> This changeset added this new DAMP subtest:
>   damp inspector.mutations - 2,229.87 ± 1.74% 
> 
> Which is significant new value in the test set.
> That would explain the summary regression.
> 
> 
> Is this run a significant one? That's the only one I could get from the
> alert page with subtests being run:
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=autoland&originalRevision=f9b62ceada2a74d3005d
> 983ff4ccffcc048dd3fd&newProject=autoland&newRevision=60ff8f8364959d64e9dfc838
> 693f8ca3c9fc9913&originalSignature=f687ee347bcbdc55828148af029908049de69b7f&n
> ewSignature=f687ee347bcbdc55828148af029908049de69b7f&framework=1
> 

You can get subtests from other platforms if you go to the linked alert: https://treeherder.mozilla.org/perf.html#/alerts?id=10143

(no reruns, but it looks like linux shows the same kind of regressions, for instance on the simple performance open test)
Some results are in:

baseline vs new test: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4b1a25fda6395ee936f54b11fbd11150d69831d0&newProject=try&newRevision=639956abc5e52cfd3cdc4f35d816588a4975289a&framework=1

The summary shows the alert, which means that the new test is most likely responsible for the initial alert here.

When it comes to subtests, the new test by itself seems to slow down:
- complicated.webconsole.close
- simple.styleeditor.reload

baseline vs throttle: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4b1a25fda6395ee936f54b11fbd11150d69831d0&newProject=try&newRevision=d4d5e9b9e9dd814dfeda50fadbd5fd7ca4798615
(I had to push again, my initial throttle push contained additional perf fixes...)

The summary is still red, but the subtests don't really have the same profile as the ones from the alert (in particular simple.performance.open doesn't show up).

> (no reruns, but it looks like linux shows the same kind of regressions, for instance on the simple performance open test)
Oops, I misread the results. I looked at complicated rather than simple here.

Maybe this is a windows-only behavior. Will spawn new jobs.
any new tests will change the summary, we use a geometric mean and adding tests will change the value posted.  

typically we ignore subtests as they might be different on each OS and harder to debug, if there is work we can do to fix the subtest regressions, that is great.
Flags: needinfo?(jmaher)
Windows results are in: simply adding the new test seems to slow down all the "simple" tests, especially the performance.open test.

See baseline vs new test: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4b1a25fda6395ee936f54b11fbd11150d69831d0&newProject=try&newRevision=639956abc5e52cfd3cdc4f35d816588a4975289a&framework=1
that is really odd, but maybe it is related to runtime or other browser cleanup/interactions.
You may want to try with such change applied in your base (in your base, as it shuffles all DAMP results!):
https://hg.mozilla.org/try/rev/6155265071fcf9931a444c606e506d09c5e93a26
I should fix various issues where tests are polluting the next ones.
(for example by introducing significant GC during next tests, or by having long lasting async call still running up)
(In reply to Alexandre Poirot [:ochameau] from comment #12)
> You may want to try with such change applied in your base (in your base, as
> it shuffles all DAMP results!):
> https://hg.mozilla.org/try/rev/6155265071fcf9931a444c606e506d09c5e93a26
> I should fix various issues where tests are polluting the next ones.
> (for example by introducing significant GC during next tests, or by having
> long lasting async call still running up)

Sure!
- baseline: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=4a28cd7e3409612885297a59ae30647818a0019c
- new test: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=e2adee1cdc8a621330797cba1fec192120b618c1
- baseline vs new test: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4a28cd7e3409612885297a59ae30647818a0019c&newProject=try&newRevision=e2adee1cdc8a621330797cba1fec192120b618c1&framework=1

(just pushed so need to wait a bit to get the results)
The results look much better. My new test seems to have very little impact on the other tests when rebasing on top of Alex patches. (note that g2 running time nearly doubles with those patches though :/)

I think so far we identified that:
- the summary alert was simply coming from the new test
- the subtests variations should be fixed by the changes Alex is working on

Alex: should we deactivate the new test until your changes land? I think it can be nice to avoid polluting DAMP if possible.
Flags: needinfo?(poirot.alex)
(In reply to Julian Descottes [:jdescottes] from comment #14)
> The results look much better. My new test seems to have very little impact
> on the other tests when rebasing on top of Alex patches. (note that g2
> running time nearly doubles with those patches though :/)
> 
> I think so far we identified that:
> - the summary alert was simply coming from the new test
> - the subtests variations should be fixed by the changes Alex is working on
> 
> Alex: should we deactivate the new test until your changes land? I think it
> can be nice to avoid polluting DAMP if possible.

Hum. Isn't it already polluted?

The new test will bump the summary value no matter what we do.
For the others, we do not track performance panel much, so we can ignore it.
If that affects console, you may just put a comment in this doc:
  https://docs.google.com/document/d/1hwq0mnxS4Ycpq8pd8l-v3fRu2IQxAVh9ATu5-wYV2i4/edit
  (you can also comment about the summary bump)
Flags: needinfo?(poirot.alex)
Ionut: the regressions spotted only related to the new test added to DAMP, so we will accept this change.
Flags: needinfo?(igoldan)
Ok, thanks for sharing this!
Flags: needinfo?(igoldan)
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.