3.75 - 6.03% ts_paint / ts_paint_webext (windows10-64-qr) regression on push 2cee53dd5773 (Wed Oct 10 2018)

NEW
Assigned to

Status

()

defect
P3
normal
9 months ago
3 months ago

People

(Reporter: igoldan, Assigned: jrmuizel)

Tracking

(Blocks 3 bugs, 4 keywords)

Firefox Tracking Flags

(Not tracked)

Details

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=4f55976a9e9115c9f41075843bc48955684364d9&tochange=2cee53dd577363866c3cc6ed7baf679a9936abbf

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  6%  ts_paint_webext windows10-64-qr opt e10s stylo     327.00 -> 346.73
  4%  ts_paint windows10-64-qr opt e10s stylo            324.50 -> 336.67

Improvements:

 32%  glterrain windows10-64-qr opt e10s stylo          2.14 -> 1.45
 22%  sessionrestore windows10-64-qr opt e10s stylo     401.33 -> 314.64
  8%  tp5o_scroll windows10-64-qr opt e10s stylo        2.82 -> 2.59
  5%  tscrollx linux64-qr opt e10s stylo                2.35 -> 2.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=16667

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Flags: needinfo?(jmuizelaar)
These regressions were caused by one the following bugs:

bug 1496670
bug 1461239
bug 1495902
bug 1496670

:jrmuizelaar which one is more related to our problem?
We also noticed these AWSY regressions were caused by the same patch:

== Change summary for alert #16687 (as of Wed, 10 Oct 2018 00:20:39 GMT) ==

Regressions:

  4%  Explicit Memory windows10-64-qr opt stylo     318,964,286.10 -> 333,257,110.10
  3%  Resident Memory windows10-64-qr opt stylo     823,379,704.81 -> 850,549,640.65

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16687
Assignee

Comment 3

8 months ago
I guess we should try to narrow down what commit actually caused this.
I bisected on try:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=86400&series=try,1682838,1,1&zoom=1539700822850.41,1539702991000,320.61610532610604,341.5149817306004

and this (bug 1496670) is the root cause:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2cee53dd5773

I really don't get this because that push is just a change to reftest.list, the data is clear that it cause the exact regression seen for win10-qr ts_paint.  Could the builds be non deterministic?
Assignee

Comment 5

8 months ago
That's super weird. I have no idea what it would mean. I suppose we could try other reftest.list changes and see if it's just changing that particular one that causes the problem.
Flags: needinfo?(jmuizelaar)
I am going to try some try pushes as "backouts" instead- that might give us a better idea
the backouts had a lot of conflicts, I could only get one to backout and the other one had a build issue.  Needless to say, backout out the reftest.list change didn't cause a perf difference.
Did a more precise bisect within that WR update [1], which gives us [2] plus the gecko-side patch in bug 1495902 (which was necessary to make things compile).

Unless we think Glenn's shadow flattening work might be to blame, that seems like pretty strong evidence that the shader caching work is responsible here, at least for the AWSY regressions. We could retrigger the two linked try pushes to include ts_paint if that's of interest.


[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=68738fc5a9ce6a0ec0c1a13bb1e7e43528e63db0&newProject=try&newRevision=5e08c890491b725dcedfeb49a963d5619349ef98&framework=4
[2] https://github.com/servo/webrender/compare/3c3f9a4e919b81639f078d7bd101012de61b9396...9dd465162183c127e7cefbe50ad9173e6ec27bb3
Flags: needinfo?(matt.woodrow)
Blocks: wr-memory
Any updates here?
Flags: needinfo?(jmuizelaar)
:vchin we need your help on concluding this bug, as it's >2 weeks since it got stuck.
Flags: needinfo?(vchin)
Given that this only affects WebRender, which is not yet riding the trains, I think it's ok to wait for Matt (or someone else) to get to it.
Flags: needinfo?(vchin)
Flags: needinfo?(jmuizelaar)
Priority: -- → P3
Assignee

Updated

5 months ago
Assignee

Updated

5 months ago
Component: General → Graphics: WebRender
Product: Testing → Core
Assignee

Updated

5 months ago
Assignee: nobody → jmuizelaar
Assignee

Comment 14

5 months ago

And just to confirm with the shadow changes removed:
Project=mozilla-inbound&originalRevision=4f55976a9e9115c9f41075843bc48955684364d9&newProject=try&newRevision=81b5499bde1afc9a19fd1a9e8e1bc1a3a44d77e5&framework=1

Assignee

Comment 15

4 months ago

Profiles from the time of the regression show a clear reason for the regression (we're spending a bunch of time compiling shaders when we don't expect to). However, recent profiles comparing wr and non-wr don't show this same problem. Further, when enabling profiling the startup time regression disappears. This even happens when lowering the sampling rate to 10ms.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f312e8e2f13bd0cf6e0eeb42dba92ff0b99cb839&selectedJob=227836045

Given this, I'm inclined to think that the regression we're seeing here is not that real and may just be related to the particular timing that's happening on the test machines.

Yeah, I did a bunch of work around shader compilation in December, so that's likely to change things.

Flags: needinfo?(matt.woodrow)
Assignee

Comment 17

4 months ago

Dropping to P4 because of inactionability.

Priority: P3 → P4
Assignee

Updated

4 months ago
Blocks: wr-67
No longer blocks: stage-wr-trains
Priority: P4 → P3
Assignee

Updated

3 months ago
Blocks: wr-68
No longer blocks: wr-67
You need to log in before you can comment on or make changes to this bug.