Closed Bug 1325910 Opened 3 years ago Closed 3 years ago
.17 - 15 .68% tps (linux64, osx-10-10, windows7-32, windows8-64) regression on push c2f49fba3f4d316188afc39c5d85a7336c17b75d (Sun Dec 25 2016)
Talos has detected a Firefox performance regression from push c2f49fba3f4d316188afc39c5d85a7336c17b75d. As author of one of the patches included in that push, we need your help to address this regression. Regressions: 16% tps summary linux64 opt e10s 42.56 -> 49.23 15% tps summary linux64 pgo e10s 35.96 -> 41.48 11% tps summary windows8-64 opt e10s 38.11 -> 42.48 8% tps summary windows7-32 opt e10s 41.23 -> 44.63 8% tps summary osx-10-10 opt e10s 39.95 -> 43.22 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=4635 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
After doing some retriggers, this issue might be caused by this patch: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d83cea57d22f0c74afa56611e6ce9ec6b2b25a92&tochange=c2f49fba3f4d316188afc39c5d85a7336c17b75d Hi Timothy, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
I pushed a guess to try to see.
Looks like my guess had no effect.
My current theory is that all the runnables used to async notify for each image are flooding the main thread, delaying the "tab has switched" message. Non-e10s probably uses a different method to determine when the tab switch has completed, so that might explain why this is e10s only.
Component: Untriaged → ImageLib
Product: Firefox → Core
Still investigating. It seems part of (but not all of) the regression is caused because we are able to draw an image that hasn't notified yet (because of bug 1325297), we then notify after the paint, and then we invalid and we have to paint the image again. The theory from comment 4 might be responsible for the rest of the regression, but I haven't verified that yet. Meanwhile I backed out the offending part of bug 1317562 because it's looking like the fix for this isn't going to be quick and easy.
One thing that might make us more immune to this sort of thing would be to provide the compositing timestamp with the MozAfterPaint event (bug 1264798). Currently, we have a number of tests that start a stopwatch, does a thing, and stops that stopwatch once MozAfterPaint is seen. However, MozAfterPaint might be handled an arbitrary number of ms after the paint / composite actually occurred, which makes us sensitive to things like many runnables in a queue. If we had the composition time in the MozAfterPaint event, we could use that instead, and we'd be measuring more of the right thing.
Fixed by backout. When we reland we should not re-regress this.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.