Closed Bug 1287707 Opened 4 years ago Closed 4 years ago

Check that canvas performance are where we expect them to be after copy-on-write canvas updates is enabled again.

Categories

(Core :: Graphics, defect)

50 Branch
defect
Not set

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox50 --- affected

People

(Reporter: ashiue, Assigned: nical)

References

Details

(Keywords: perf, talos-regression, Whiteboard: [gfx-noted])

Talos has detected a Firefox performance regression from push 3a276d3d8d6b2f0581a4fd275b3301567fc7fd09. As author of one of the patches included in that push, we need your help to address this regression.

Summary of tests that regressed:

  tcanvasmark summary windows7-32 opt - 2.12% worse
  tcanvasmark summary windows7-32 pgo - 2.06% worse


You can find links to graphs and comparsion views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=1921

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
I did some retriggers for this alert, here is the zooming better view:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,f5411e04c56c3d5a8fa830e52e3b7c8c243843b6,1,1%5D&zoom=1468111998323.28,1468410945797.361,7567.164179104478,8753.731343283582

This issue might be caused by one of these 3 patches.
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a45c596478b6e03e536a252bef8b2ba08602d7e6&tochange=3a276d3d8d6b2f0581a4fd275b3301567fc7fd09

Hi Nicolas, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Flags: needinfo?(nical.bugzilla)
(In reply to Alison Shiue from comment #1)
> Hi Nicolas, as you are the patch author, can you take a look at this and
> determine what is the root cause? Thanks!

One of the three patches is disabling an optimization that I had landed a week or two before and that caused a few regressions. I am working on these regressions in order to be able the re-enable optimization.
Flags: needinfo?(nical.bugzilla)
thanks :nical!
Whiteboard: [gfx-noted]
Sounds like this is not really a performance regression, in that it just gets us back to where we were before the code that gave us the improvements had to be disabled?  Is there a point in keeping this bug open?
Or did I misunderstand the state of the world here?
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #4)
> Sounds like this is not really a performance regression, in that it just
> gets us back to where we were before the code that gave us the improvements
> had to be disabled?

I'll sort this out and double check when copy-on-write canvas gets reenabled (right now we are in a sort of intermediate state where numbers don't reflect where we'll be after we reenable it). I also suspect there may be a snapshot copy that could be avoided in some cases, not sure whether it was already the case before (these happen automagically) and how often that happens.
Assignee: nobody → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
Summary: 2.06 - 2.12% tcanvasmark (windows7-32) regression on push 3a276d3d8d6b2f0581a4fd275b3301567fc7fd09 (Mon Jul 11 2016) → Check that canvas performance are where we expect them to be after copy-on-write canvas updates is enabled again.
as a note, this change will be merged to Aurora next week- let me know if there is anything I can do to help related to collecting numbers
I expect canvas perf to have gone back (just before the uplift) to where it was before the regression. the graph looks better in the last few days, although somewhat noisy.
I just re-triggered a handful of win32 opt canvasmark runs on my push on inbound that has the remaining optimizations[1]. I assume this is how we are supposed to do it, I didn't find how to do that from perfherder.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=e5db12322fd393fe7970e726cd1f4b64845f6d23
ok, I did a few extra retriggers (win7 non-e10s/e10s) and then did a few retriggers on the previous push:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=d0cb076a4e73&newProject=mozilla-inbound&newRevision=e5db12322fd3&framework=1&filter=canv&showOnlyImportant=0

win7 shows no difference.  If we are concerned about other platforms, we should collect more data.
There are a bunch of different things that landed at different dates in play. It's best to compare the current state of affairs against the time when the regression happened.

something like: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=3a276d3d8d6b&newProject=mozilla-inbound&newRevision=e5db12322fd3&framework=1&filter=canv&showOnlyImportant=0

Which shows a 1.73% improvement, not quite at the level of the initial 2.16% regression.

The remaining thing I can think of that may affect performance is that we tend to flush the DrawTarget earlier now. I can revert that part although I would have preferred to avoid it. I'll try.
it is hard to compare against stuff to far back in history, but overall that shows a long term trend as long as there are no test/tooling changes.
Is there anything left to do here?
Flags: needinfo?(nical.bugzilla)
(In reply to Andrew Overholt [:overholt] from comment #11)
> Is there anything left to do here?

Yes, I am still fiddling with copy-on-write canvas related things that affect performance (sometimes in surprising ways) and I use this bug as I reminder. I'll close it when the copy-on-write canvas stuff stabilizes and we are happy with the perf numbers (right now I am more focused on crashes and correctness regression than measuring perf).
Flags: needinfo?(nical.bugzilla)
I'm going to take off the "regression" keyword since this doesn't need to be tracked as a regression per se (because it is the result of backing out of a buggy perf improvement, rather than a regression compared to release versions). I'll leave the talos-regression keyword though since I don't care about that.
Keywords: regression
should we be tracking this against firefox 50?  I would rather mark this as resolved/wontfix indicating that we shouldn't be bugging folks if there is no work to be done as this moves to beta this week and ships in a few weeks.
Looks like keeping the bug open is generating confusion, let's close it and I'll open a new one if need be.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.