Closed Bug 1294351 Opened 5 years ago Closed 5 years ago
.62 - 5 .77% tcanvasmark (osx-10-10) regression on push 9beaa75894056714aefe9737fbafc19afa3e8df7 (Wed Aug 10 2016)
6.17 KB, patch
|Details | Diff | Splinter Review|
5.47 KB, patch
|Details | Diff | Splinter Review|
5.77 KB, patch
|Details | Diff | Splinter Review|
Talos has detected a Firefox performance regression from push 9beaa75894056714aefe9737fbafc19afa3e8df7. As author of one of the patches included in that push, we need your help to address this regression. Summary of tests that regressed: tcanvasmark summary osx-10-10 opt: 6325.46 -> 5969.71 (5.62% worse) tcanvasmark summary osx-10-10 opt e10s: 6281.52 -> 5919.33 (5.77% worse) You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=2368 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Here is the zooming better view: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,b74035a92eb4e2b79045542475410a750b36553e,1,1%5D&series=%5Bmozilla-inbound,f75a09e50e3478d7b9af69bf4a3bc73afca3b9de,1,1%5D&zoom=1470749048702.4324,1470772777703.3489,5500,6977.528089887641 This issue might be caused by push https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ddd470e0e85a3e7702a51fd4f379927ff22835b2&tochange=9beaa75894056714aefe9737fbafc19afa3e8df7. Hi Nicolas, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Wow, I didn't expect that.
Assignee: nobody → nical.bugzilla
> *** Please let us know your plans within 3 business days, or the offending > patch(es) will be backed out! *** I'll be away tomorrow and Monday so it will take a little longer for me to figure this out.
It looks like the regression is caused by the extra book-keeping we do to serialize and replay clips and transforms every frame, which was not previously done with skiagl. A workaround would be to have a specific code path for the non-shared persistent buffer provider, which would make us not pop all clips at the end of the frame and not re-apply them at the beginning of the next frame (we can't do that with the shared buffer provider unfortunately, because we get a different DrawTarget every frame). This is probably going to be quite the ugly hack.
I would like to avoid ugly hacks if possible- is this a regression we should accept?
While conceptually ugly it's not that bad in terms of added code, so I guess it'll do. Hopefully we will eventually get to a place where enough configurations use PersistentBufferProviderShared and the performance of the basic one don't matter (and we and remove this hack), although it's not clear whether it will be possible for skiagl in all cases.
Attachment #8781586 - Flags: review?(bas)
Attachment #8781586 - Flags: review?(bas) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/775fa3ede160 Avoid removing and re-applying the drawing states with PersistentBufferProviderBasic. r=Bas
Backed out for failing in crashtest 647480.html: https://hg.mozilla.org/integration/mozilla-inbound/rev/7d3e8cea2291 Push with failures: https://hg.mozilla.org/integration/mozilla-inbound/rev/7d3e8cea2291 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=34093924&repo=mozilla-inbound
There is also a mochitest on Linux which failed twice (at the moment): 03:52:04 INFO - 1843 INFO TEST-UNEXPECTED-FAIL | dom/canvas/test/test_canvas.html | pixel 20,20 of c628 is 0,0,0,0; expected 0,255,0,255 +/- 0 03:52:04 INFO - isPixel@dom/canvas/test/test_canvas.html:140:5 03:52:04 INFO - test_initial_reset_clip@dom/canvas/test/test_canvas.html:19953:1 03:52:04 INFO - runTests@dom/canvas/test/test_canvas.html:25437:3 03:52:04 INFO - SimpleTest._newCallStack/rval@SimpleTest/SimpleTest.js:144:17 03:52:04 INFO - EventHandlerNonNull*this.addLoadEvent@SimpleTest/SimpleTest.js:169:13 03:52:04 INFO - @SimpleTest/SimpleTest.js:1369:5 https://treeherder.mozilla.org/logviewer.html#?job_id=34093932&repo=mozilla-inbound Please check the push for more failures when it's complete.
Lots of things broke from this patch. I would recommend a full try run before relanding this.
try run with fixed patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2b324f0a4ea2e54bb0bd53fd7acb66541387a75b The result is full of intermittents but none of them are actually canvas2d related, except test_streams_capture_origin.html which I am confident is not related to my patch.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d35d1cbc59e8 Avoid removing and re-applying the drawing states with PersistentBufferProviderBasic. r=Bas
I looked at the graphs linked in comment 0 and I don't see any change in the values.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I have bisected the problem and I have a patch that restores the performance back ton where it was before the regression. I am looking into why it does, but in any case, a patch is coming soon.
This improves the tcanvasmark score, see: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Btry,f75a09e50e3478d7b9af69bf4a3bc73afca3b9de,1,1%5D&selected=%5Btry,f75a09e50e3478d7b9af69bf4a3bc73afca3b9de,121374,26219380%5D
Attachment #8784254 - Flags: review?(bas)
Attachment #8784254 - Flags: review?(bas) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/ac16b2ccd7a1 Move restoring the canvas clip stack to its own method and early return form EnsureTarget. r=Bas
Status: REOPENED → RESOLVED
Closed: 5 years ago → 5 years ago
Resolution: --- → FIXED
I have verified this via the graphs!
Status: RESOLVED → VERIFIED
Comment on attachment 8781586 [details] [diff] [review] Skip the push/pop clip dance if we can Approval Request Comment [Feature/regressing bug #]: [User impact if declined]: There are patches that we need to uplift, and are risky to uplift without uplifting this one as well. [Describe test coverage new/current, TreeHerder]: None. [Risks and why]: low, this has been in central fo a little while without causing issues. [String/UUID change made/needed]: None.
Attachment #8781586 - Flags: approval-mozilla-aurora?
Comment on attachment 8784254 [details] [diff] [review] Return early from EnsureTarget in the common case. Review of attachment 8784254 [details] [diff] [review]: ----------------------------------------------------------------- Uplift request for the same reason as the other patch (this is in the middle of a series of patch that need to be uplifted, and skipping this makes it risky to uplift the others). This patch also fixes a 5% talos regression.
Comment on attachment 8784254 [details] [diff] [review] Return early from EnsureTarget in the common case. I'm trying to find the smallest set of patches that I need uplift if copy-on-write canvas is disabled on aurora. This one should not be needed.
(In reply to Nicolas Silva [:nical] from comment #20) > Comment on attachment 8781586 [details] [diff] [review] > Skip the push/pop clip dance if we can > [User impact if declined]: There are patches that we need to uplift, and are > risky to uplift without uplifting this one as well. Now that I've changed my aurora plans, this patch is more important than I originally explained because it makes the canvas code avoid some risky code paths and lets us skip uplifting the numerous patches that fix these code paths.
Comment on attachment 8781586 [details] [diff] [review] Skip the push/pop clip dance if we can Fixes a perf regression, fix was verified on Nightly, Aurora50+
Attachment #8781586 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
This hit conflicts on uplift, can we get a rebase patch?
This should be applied after the patch in bug 1292870.
You need to log in before you can comment on or make changes to this bug.