Closed Bug 1214868 Opened 10 years ago Closed 9 years ago

25% Linux 64 tsvgr_opacity regression on Fx-Team on October 13, 2015 from push e769087466d9d7edeade46e8edd19fd9270ac7fa

Categories

(Firefox :: Theme, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned, NeedInfo)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Talos has detected a Firefox performance regression from your commit e769087466d9d7edeade46e8edd19fd9270ac7fa in bug 1214315. We need you to address this regression. This is a list of all known regressions and improvements related to your bug: http://alertmanager.allizom.org:8080/alerts.html?rev=e769087466d9d7edeade46e8edd19fd9270ac7fa&showAll=1 On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvg-opacity Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p linux64 -u none -t svgr # add "mozharness: --spsProfile" to generate profile data To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e <path>/firefox -a tsvgr_opacity Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Monday, or the offending patch will be backed out! *** Our wiki page oulines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Summary: 25% Linux 64 tsvgr_opacity regression on Fx-Team on October 14, 2015 from push e769087466d9d7edeade46e8edd19fd9270ac7fa → 25% Linux 64 tsvgr_opacity regression on Fx-Team on October 13, 2015 from push e769087466d9d7edeade46e8edd19fd9270ac7fa
according to compare view in perfherder: https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=fx-team&originalRevision=2287d2415cb7&newProject=fx-team&newRevision=e769087466d9&originalSignature=6981e256ea8173cdb53dec0741ec05e8ace13f30&newSignature=6981e256ea8173cdb53dec0741ec05e8ace13f30 big-optimizable-group-opacity-2500.svg is the page where we regressed. I have triggered more tests to see if other things have regressed/improved, we will either see alerts or data on: https://treeherder.mozilla.org/perf.html#/compare?originalProject=fx-team&originalRevision=2287d2415cb7&newProject=fx-team&newRevision=e769087466d9 :dao, can you look into this and see what we can do to explain/fix this regression?
Flags: needinfo?(dao)
Visually that patch only affected lightweight themes, so I suspect the regression is due to the CSS variable I added, like previously in bug 1179756. That said, I have no idea why this would affect an svg pageload test. I suspect that either we're hitting a severe layout bug or the test is not exactly doing what it's supposed to do.
Flags: needinfo?(dao)
It only regresses one of the subtests by ~45%, while the other subtest (there are two overall) is completely unaffected, neither in percentage nor when looking at the absolute diff in ms. Another factor is that other "pure" page load tests are unaffected (mainly tp5o), so I don't think it's related to slower rendering of the chrome following a location change. Historically, this specific test was affected by chrome changes around the location bar which were done by fx-team. I have some hypothesis that it's because these pages load relatively quickly, so any longer chrome rendering change affects the load time more in percentage (since it measures from before the location change until after the page load event and the first paint), but I don't think we were able to confirm it 100%. But maybe, and it's a pure guess, it's related to internal SVG interactions, since svg is used (maybe?) both at the location bar and at the content. The content of the page/subtest which regressed is super simple: 2500 identical 600x600 rectangles with 0.5 opacity which are fully overlapping. The test is supposedly designed to "measure" how well this case is being optimized. I think that this could be a good case to figure out, maybe once and for all, why this test (or maybe specific subtest) is affected so much by this change. Seth, Matt, any idea what might be going on here? The data: - The "offending" patch: https://bugzilla.mozilla.org/attachment.cgi?id=8673235&action=diff - The page (subtest) which regressed ~45% (file size is 120159 bytes): http://hg.mozilla.org/mozilla-central/file/e193b4da0a8c/testing/talos/talos/tests/svg_opacity/big-optimizable-group-opacity-2500.svg - The page (subtest) which didn't regress at all (file size is 191667 - yes, the "small" test is actually a bigger file): http://hg.mozilla.org/mozilla-central/file/e193b4da0a8c/testing/talos/talos/tests/svg_opacity/small-group-opacity-2500.svg - * The measurement is from just before changing the location to the test URL, until the first mozafterpaint event which occurred after the page load event, and the page is served from a local HTTP server. * Joel, can you confirm this?
Flags: needinfo?(seth)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jmaher)
It wasn't the CSS variable but background-clip:padding-box. This diagnosis patch appears to undo the regression: https://hg.mozilla.org/try/rev/6b41b8599b2f https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fe364d4c5b1c&newProject=try&newRevision=6b41b8599b2f I could try to narrow it down further to one of the two places where I had set background-clip.
good find :dao. :avih, yes, our measurements are from right before we try to load the URL to when we receive a mozafterpaint event.
Flags: needinfo?(jmaher)
(In reply to Dão Gottwald [:dao] from comment #4) > It wasn't the CSS variable but background-clip:padding-box. This diagnosis > patch appears to undo the regression: > > https://hg.mozilla.org/try/rev/6b41b8599b2f > > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=fe364d4c5b1c&newProject=try&newR > evision=6b41b8599b2f > > I could try to narrow it down further to one of the two places where I had > set background-clip. Both background-clip:padding-box instances have some impact, but that on the urlbar more than that on the back and forward buttons: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fe364d4c5b1c&newProject=try&newRevision=796feed2560e https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fe364d4c5b1c&newProject=try&newRevision=4357d9cffd14
Dao, thanks, that's some excellent info. Joel also tried to run the test while reversing the (two) subtests order, and got the same kind of regression values per subtest, however, he's not 100% sure that his procedure is correct, so he wants to redo this experiment.
(In reply to Avi Halachmi (:avih) from comment #3) > It only regresses one of the subtests by ~45%, while the other subtest > (there are two overall) is completely unaffected, neither in percentage nor > when looking at the absolute diff in ms. This is really unexpected for a browser chrome change. > > But maybe, and it's a pure guess, it's related to internal SVG interactions, > since svg is used (maybe?) both at the location bar and at the content. I can't think of any code that would have problems like this, but I guess it's plausible. Can we get before/after profiles for just this subtest to see if anything stands out? Given that chrome changes have caused variances in this test in the past, then it seems plausible that something in this page is causing the chrome to invalidate and repaint, but I don't have any ideas as to how that is possible.
Flags: needinfo?(matt.woodrow)
this seems to be on aurora and beta (not sure how it is on beta). I would like to figure out 2 things: 1) why this appears to affect mozilla-beta 2) can we fix this on trunk/aurora?
At this point, we're shipping this. Is it useful to keep this open? Joel?
Flags: needinfo?(jmaher)
there is no point in keeping this open. I would prefer to mark this as resolved:wontfix.
Flags: needinfo?(jmaher)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.