Closed Bug 1297004 Opened 3 years ago Closed 3 years ago

3.51 - 154.44% cart/damp/glterrain/sessionrestore/sessionrestore_no_auto_restore/tabpaint/tart/tp5o/tp5o Main_RSS /tp5o Private Bytes/tpaint/tps/tresize/ts_paint/tscrollx/tsvgr_opacity/tsvgx (linux64) regression on push 32fd2e8a3be8 (Fri Aug 19 2016)

Categories

(Core :: Graphics, defect)

52 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox48 --- unaffected
firefox49 --- unaffected
firefox50 --- unaffected
firefox51 --- disabled
firefox52 --- disabled
firefox53 --- fixed

People

(Reporter: ashiue, Assigned: lsalzman)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf, regression, talos-regression)

Talos has detected a Firefox performance regression from push 32fd2e8a3be820c0a8c9d2210e6cfae990a561d8. As author of one of the patches included in that push, we need your help to address this regression.

Summary of tests that regressed:

  ts_paint linux64 opt: 1260.75 -> 1335 (5.89% worse)
  tpaint linux64 opt: 261.34 -> 330.22 (26.36% worse)
  sessionrestore linux64 opt: 869.83 -> 1196.92 (37.6% worse)
  sessionrestore_no_auto_restore linux64 opt: 914.33 -> 1254.08 (37.16% worse)
  tabpaint summary linux64 opt: 86.1 -> 93.65 (8.77% worse)
  tresize linux64 opt: 22.66 -> 40.81 (80.1% worse)
  tp5o summary linux64 opt: 353.26 -> 368.45 (4.3% worse)
  tsvgx summary linux64 opt: 362.87 -> 375.6 (3.51% worse)
  tart summary linux64 opt: 5.59 -> 7.44 (33.11% worse)
  cart summary linux64 opt: 35.71 -> 37.93 (6.21% worse)
  damp summary linux64 opt: 298.9 -> 317.58 (6.25% worse)
  tps summary linux64 opt: 69.59 -> 82.08 (17.95% worse)
  glterrain summary linux64 opt: 8.62 -> 12.24 (42.03% worse)
  ts_paint linux64 opt e10s: 1172.5 -> 1469.92 (25.37% worse)
  sessionrestore linux64 opt e10s: 775.33 -> 1067.67 (37.7% worse)
  sessionrestore_no_auto_restore linux64 opt e10s: 811.92 -> 1111.75 (36.93% worse)
  tabpaint summary linux64 opt e10s: 61.7 -> 75.25 (21.95% worse)
  tpaint linux64 opt e10s: 258.56 -> 399.1 (54.35% worse)
  tresize linux64 opt e10s: 24.01 -> 61.08 (154.44% worse)
  tp5o summary linux64 opt e10s: 351.97 -> 378.21 (7.45% worse)
  tp5o Main_RSS linux64 opt e10s: 179856684.67 -> 215745598.6 (19.95% worse)
  tp5o Private Bytes linux64 opt e10s: 1064702689.46 -> 1124697682.67 (5.63% worse)
  tsvgx summary linux64 opt e10s: 231.39 -> 242.87 (4.97% worse)
  tsvgr_opacity summary linux64 opt e10s: 487.95 -> 510.08 (4.53% worse)
  tart summary linux64 opt e10s: 6.41 -> 8.11 (26.47% worse)
  tscrollx summary linux64 opt e10s: 3.38 -> 4.41 (30.18% worse)
  cart summary linux64 opt e10s: 38.13 -> 39.5 (3.59% worse)
  glterrain summary linux64 opt e10s: 9.23 -> 10.99 (19.02% worse)
  damp summary linux64 opt e10s: 269.63 -> 283.96 (5.32% worse)
  tps summary linux64 opt e10s: 50.97 -> 66 (29.47% worse)

Summary of tests that improved:

  tscrollx summary linux64 opt: 7.61 -> 4.33 (43.14% better)
  tsvgr_opacity summary linux64 opt: 552.64 -> 535.6 (3.08% better)
  tp5o_scroll summary linux64 opt: 8.82 -> 5.58 (36.79% better)
  tp5o_scroll summary linux64 opt e10s: 5.02 -> 4.26 (15.03% better)


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=2571

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
This issue might be caused by one of following changesets: 
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c807d56a4f2175493cb865ff1a3968f1a978b44e&tochange=32fd2e8a3be820c0a8c9d2210e6cfae990a561d8

Hi Andrew, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Flags: needinfo?(andrew)
For sure; a lot of these are expected outcomes of bug 594876 (both regressions and improvements), but there are some outliers.

I'll look into these.
Flags: needinfo?(andrew)
(In reply to Andrew Comminos [:acomminos] from comment #2)
> For sure; a lot of these are expected outcomes of bug 594876 (both
> regressions and improvements), but there are some outliers.
> 
> I'll look into these.

What is the plan for enabling this feature on Aurora, beta, etc.? It concerns me a bit that what's now on Nightly has such different perf characteristics from what's being shipped. Then again, I guess this is just Linux. ¯\_(ツ)_/¯
(In reply to William Lachance (:wlach) from comment #3)
> (In reply to Andrew Comminos [:acomminos] from comment #2)
> > For sure; a lot of these are expected outcomes of bug 594876 (both
> > regressions and improvements), but there are some outliers.
> > 
> > I'll look into these.
> 
> What is the plan for enabling this feature on Aurora, beta, etc.? It
> concerns me a bit that what's now on Nightly has such different perf
> characteristics from what's being shipped. Then again, I guess this is just
> Linux. ¯\_(ツ)_/¯

We're going to test it out on Nightly for a bit and gather data; mostly, whether we have the control necessary to work around broken GL implementations. Unfortunately, that does mean that we won't be testing the performance of basic composition on Linux nightlies (but we don't test it extensively on other platforms anyway).
Depends on: 1297257
Depends on: 1297537
Here's a bit of an update on some of the remaining regressions;

glterrain:

- Performance losses here are due to texture uploads and GL contention between the content and compositor threads.
- We can't really improve this further.

sessionrestore:

- Compositor initialization takes longer- we need round trips and work done on the X server to initialize a GLX context.
- Performance here will be implementation dependent.
- We also spend a non-trivial amount of time blocking the main thread while waiting for the VSync thread to power up.
- While we could initialize our VSync source asynchronously, we need to get proper multithreaded X error handling to make this work well. It's safer for now just to block the main thread during initialization.

I don't think there are any major regressions remaining. Considering that we need to switch to accelerated layers on Linux, I think we should keep these in mind as we move forward.
(In reply to Andrew Comminos [:acomminos] from comment #5)
> glterrain:
> 
> - Performance losses here are due to texture uploads and GL contention
> between the content and compositor threads.
> - We can't really improve this further.

The test itself is just loading a scene (not measured) and then rotates the camera (and maybe lighting) - which is being measured.

Do we upload textures while the test is running? I _think_ we don't, but if we are, should the test be changed?
(In reply to Avi Halachmi (:avih) from comment #6)
> The test itself is just loading a scene (not measured) and then rotates the
> camera (and maybe lighting) - which is being measured.
> 
> Do we upload textures while the test is running? I _think_ we don't, but if
> we are, should the test be changed?

Since there's no texture sharing between content and the compositor on Linux, each WebGL frame needs an upload to a texture (could be in host/unified memory, either way it's an extra copy). It's not really a fault of the test, but a consequence of not supporting shared pixmaps on Linux- the reason we don't is because various vendors and driver versions implement it differently and without a spec guarantee.
:acomminos, is there remaining work to do here?  Possibly all that remains is tracked in bug 1297257?
Flags: needinfo?(andrew)
I don't believe so, the only outstanding issue found in profiling is bug 1297257 (as you mentioned).

Thanks!
Flags: needinfo?(andrew)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
I was too quick- we have a bug that is tracking the fixes for this!
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Component: Untriaged → Graphics
Product: Firefox → Core
Hi :acomminos,
May I know if there are any updates here?
Flags: needinfo?(andrew)
Hi :acomminos, this bug is a regression and assigned to nobody and we need someone to work on this. Since it seems that you are involved in this, feel free to reassign it to someone else if you disagree.
Assignee: nobody → andrew
This regression is nightly only.  Linux acceleration is not riding the trains yet.
Assignee: andrew → nobody
Flags: needinfo?(andrew) → needinfo?(milan)
Version: 51 Branch → 52 Branch
Blocks: 1302124
No longer blocks: 1291351
See Also: → 1323991
this was backed out
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Flags: needinfo?(milan)
Assignee: nobody → lsalzman
Depends on: 1323284
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.