Closed Bug 1498220 Opened 6 years ago Closed 5 years ago

3.75 - 6.03% ts_paint / ts_paint_webext (windows10-64-qr) regression on push 2cee53dd5773 (Wed Oct 10 2018)

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- disabled
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fix-optional

People

(Reporter: igoldan, Assigned: jrmuizel)

References

Details

(5 keywords)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=4f55976a9e9115c9f41075843bc48955684364d9&tochange=2cee53dd577363866c3cc6ed7baf679a9936abbf

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  6%  ts_paint_webext windows10-64-qr opt e10s stylo     327.00 -> 346.73
  4%  ts_paint windows10-64-qr opt e10s stylo            324.50 -> 336.67

Improvements:

 32%  glterrain windows10-64-qr opt e10s stylo          2.14 -> 1.45
 22%  sessionrestore windows10-64-qr opt e10s stylo     401.33 -> 314.64
  8%  tp5o_scroll windows10-64-qr opt e10s stylo        2.82 -> 2.59
  5%  tscrollx linux64-qr opt e10s stylo                2.35 -> 2.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=16667

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Flags: needinfo?(jmuizelaar)
These regressions were caused by one the following bugs:

bug 1496670
bug 1461239
bug 1495902
bug 1496670

:jrmuizelaar which one is more related to our problem?
We also noticed these AWSY regressions were caused by the same patch:

== Change summary for alert #16687 (as of Wed, 10 Oct 2018 00:20:39 GMT) ==

Regressions:

  4%  Explicit Memory windows10-64-qr opt stylo     318,964,286.10 -> 333,257,110.10
  3%  Resident Memory windows10-64-qr opt stylo     823,379,704.81 -> 850,549,640.65

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16687
I guess we should try to narrow down what commit actually caused this.
I bisected on try:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=86400&series=try,1682838,1,1&zoom=1539700822850.41,1539702991000,320.61610532610604,341.5149817306004

and this (bug 1496670) is the root cause:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2cee53dd5773

I really don't get this because that push is just a change to reftest.list, the data is clear that it cause the exact regression seen for win10-qr ts_paint.  Could the builds be non deterministic?
That's super weird. I have no idea what it would mean. I suppose we could try other reftest.list changes and see if it's just changing that particular one that causes the problem.
Flags: needinfo?(jmuizelaar)
I am going to try some try pushes as "backouts" instead- that might give us a better idea
the backouts had a lot of conflicts, I could only get one to backout and the other one had a build issue.  Needless to say, backout out the reftest.list change didn't cause a perf difference.
I just came across this bug. I've been bisecting the AWSY regressions as well.

I narrowed them down [1] to this push: [2]. I'm bisecting further, will report back.


[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=68738fc5a9ce6a0ec0c1a13bb1e7e43528e63db0&newProject=try&newRevision=4c1fcac81edcf4f6e218a6fb9ac30b4976ec6380&framework=4

[2] https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=7801a4fb37db
Did a more precise bisect within that WR update [1], which gives us [2] plus the gecko-side patch in bug 1495902 (which was necessary to make things compile).

Unless we think Glenn's shadow flattening work might be to blame, that seems like pretty strong evidence that the shader caching work is responsible here, at least for the AWSY regressions. We could retrigger the two linked try pushes to include ts_paint if that's of interest.


[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=68738fc5a9ce6a0ec0c1a13bb1e7e43528e63db0&newProject=try&newRevision=5e08c890491b725dcedfeb49a963d5619349ef98&framework=4
[2] https://github.com/servo/webrender/compare/3c3f9a4e919b81639f078d7bd101012de61b9396...9dd465162183c127e7cefbe50ad9173e6ec27bb3
Flags: needinfo?(matt.woodrow)
Any updates here?
Flags: needinfo?(jmuizelaar)
:vchin we need your help on concluding this bug, as it's >2 weeks since it got stuck.
Flags: needinfo?(vchin)
Given that this only affects WebRender, which is not yet riding the trains, I think it's ok to wait for Matt (or someone else) to get to it.
Flags: needinfo?(vchin)
Flags: needinfo?(jmuizelaar)
Priority: -- → P3
Component: General → Graphics: WebRender
Product: Testing → Core
Assignee: nobody → jmuizelaar

And just to confirm with the shadow changes removed:
Project=mozilla-inbound&originalRevision=4f55976a9e9115c9f41075843bc48955684364d9&newProject=try&newRevision=81b5499bde1afc9a19fd1a9e8e1bc1a3a44d77e5&framework=1

Profiles from the time of the regression show a clear reason for the regression (we're spending a bunch of time compiling shaders when we don't expect to). However, recent profiles comparing wr and non-wr don't show this same problem. Further, when enabling profiling the startup time regression disappears. This even happens when lowering the sampling rate to 10ms.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f312e8e2f13bd0cf6e0eeb42dba92ff0b99cb839&selectedJob=227836045

Given this, I'm inclined to think that the regression we're seeing here is not that real and may just be related to the particular timing that's happening on the test machines.

Yeah, I did a bunch of work around shader compilation in December, so that's likely to change things.

Flags: needinfo?(matt.woodrow)

Dropping to P4 because of inactionability.

Priority: P3 → P4
Blocks: wr-67
No longer blocks: stage-wr-trains
Priority: P4 → P3
Blocks: wr-68
No longer blocks: wr-67
Blocks: wr-70
No longer blocks: wr-68
No longer blocks: wr-70

:jrmuizel Is there anything we can do on this?

Ca we close this as wontfix ?

Flags: needinfo?(jmuizelaar)

Yeah lets.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(jmuizelaar)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.