Talos has detected a Firefox performance regression from push c838d2546cadd65bf8d5579db20a268c8b6e4b87. As author of one of the patches included in that push, we need your help to address this regression. Regressions: 100% glterrain summary linux64 pgo e10s 10.9 -> 21.75 98% glterrain summary linux64 opt e10s 11.04 -> 21.89 3% tp5o summary linux64 opt e10s 354.71 -> 364.99 2% tsvgr_opacity summary linux64 opt e10s 431.72 -> 442.06 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=3656 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
As a note, Talos has detected a Firefox performance improvement from push 510cf5f0eccabf5c96385a9a11d6f460f8afb227. Improvements: 48% glterrain summary linux64 pgo e10s 21.74 -> 11.34 48% glterrain summary linux64 opt e10s 21.82 -> 11.4 For up to date info, please refer: https://treeherder.mozilla.org/perf.html#/alerts?id=3657 Hi Gian-Carlo, I am not sure if these two pushes are dependent, as you are these patches author, can you take a look at this and determine what is the root cause? Thanks!
Potentially related to bug 1308851.
Alison, is the talos regression OK now?
I don't think so. bug 1308851 looks like only solved glterrain regressions. Both tp5o and tsvgr_opacity still have the perf issue. You could please refer the perf graph: [tp5o] https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bautoland,80984697abf1f1ff2b058e2d9f0b351fd9d12ad9,1,1%5D [tsvgr_opacity] https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bautoland,3ff68dab85d334ade039992e6d8cd0ebc05cbcf4,1,1%5D
Hmm, thinking about it, this looks like the regression from bug 1284912 is showing up in more tests: https://bugzilla.mozilla.org/show_bug.cgi?id=1284912#c3 Probably because the seccomp-bpf filter is a bit bigger, and there's some additional overhead on filesystem calls now.
I'm editing the title to make it clear the outstanding regression is 3%, not 100%.
:gcp, it has been a couple of weeks without an update in this bug, is there any update or plans?
(In reply to Joel Maher ( :jmaher) from comment #7) > :gcp, it has been a couple of weeks without an update in this bug, is there > any update or plans? I'm investigating in https://bugzilla.mozilla.org/show_bug.cgi?id=1284912#c27.
Created attachment 8808748 [details] seccomperf.txt I tried to dig in here, working from: https://bugzilla.mozilla.org/show_bug.cgi?id=1284912#c27 Which showed: tp5o indiatimes.com opt e10s graph 363.82 ± 0.95% < 393.27 ± 3.08% 8.10% 7.76 (high) 11 / 11 i.e. an extremely significant 8% performance regression when file brokering is enabled. I can't reproduce this at all: - Logging in various ways shows no file IO at all in content, so there is nothing for the broker to do. - I ran the tp5o test, with the manifest edited to only cover indiatimes.com. The performance is identical with seccompf, seccomp+brokering, or everything disabled (logs attached). From the above, a performance difference should be easily visible.
which platform are you running on? By looking at the attachment, I assume linux? Sometimes we find regressions that only show up on our hardware vs locally. Here is the hardware we currently run on: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation
(In reply to Joel Maher ( :jmaher) from comment #10) > which platform are you running on? By looking at the attachment, I assume > linux? > > Sometimes we find regressions that only show up on our hardware vs locally. > Here is the hardware we currently run on: > https://wiki.mozilla.org/Buildbot/Talos/ > Misc#Hardware_Profile_of_machines_used_in_automation I doubt this is the case. I have an SSD and a bit newer CPU, but those would do little to explain the difference. The GPU can be a factor as we saw in the gl tests, but I actually also have an nvidia card in here (using ubuntu/proprietary drivers). The most telling thing for me is that I can find no evidence of the file broker activating at all. I added code to specifically dump internal things to /tmp and it's empty. The content process doesn't seem to be doing any file IO at all. In light of this, having identical performance with file brokering enabled or disabled seems like the expected result But why is try Talos different?
I really don't know. Right now we just have the svgopacity and tp5o regressions remaining, both on linux64 e10s. I have seen svgr_opacity have issues in the past with odd changes. Have you looked at the network, I know we do localhost, but that still requires networking code.
Alison, Joel, do you think this is still an issue? I'm marking this wontfix for 53 as there hasn't really been any action on the bug and we're about to ship 53.
odd the glterrain issue seems to be fixed a few days later and tsvg looks fixed, but the tp5o issue remains. So many changes to the tests happen over time, if we don't fix it in a few weeks it realistically doesn't get fixed.