2.74 - 3.57% ts_paint_webext (windows10-64-shippable) regression on push df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b (Wed July 1 2020)
Categories
(Firefox :: Toolbars and Customization, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox-esr78 | --- | unaffected |
firefox78 | --- | unaffected |
firefox79 | --- | unaffected |
firefox80 | --- | fixed |
People
(Reporter: alexandrui, Assigned: emmamalysz)
References
(Regression)
Details
(4 keywords)
Attachments
(1 file)
Perfherder has detected a talos performance regression from push df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
4% ts_paint_webext windows10-64-shippable opt e10s stylo 299.79 -> 310.50
3% ts_paint_webext windows10-64-shippable opt e10s stylo 301.42 -> 309.67
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
What caused the corresponding jump in the opposite direction a little while earlier on autoland? ( https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&selected=1926121,1175301106&series=autoland,1926121,1,1&timerange=1209600&zoom=1593437987455,1593479085063,284.0470203986355,319.7247710351851 )
Comment 2•4 years ago
|
||
Also, what is this test, what does it do, and how does it differ from ts_paint ? There's no documentation on https://wiki.mozilla.org/TestEngineering/Performance/Talos/Tests .
Comment 3•4 years ago
|
||
I triggered profiles but they appear to be no help; the profile runs just as often show the post-patch result being faster than the pre-patch result, and it's not clear where to look for the time range that the test measures. Averaging the actual results of all the profiles before and after the change, the difference is (if I've not made any mistake; the numbers aren't published so I had to dig them out of the log and average them myself) between 376.8ms and 374.95ms, so just under 2ms - but again, no idea where in the profile (they all run about 1 full second), and the noise on the individual measurements is between 10 and 20ms.
Florian, do you have any guidance here, given that you looked at the usefulness of these profiles recently?
Comment 4•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #2)
Also, what is this test, what does it do, and how does it differ from ts_paint ? There's no documentation on https://wiki.mozilla.org/TestEngineering/Performance/Talos/Tests .
From looking at https://searchfox.org/mozilla-central/rev/31d8600b73dc85b4cdbabf45ac3f1a9c11700d8e/testing/talos/talos/test.py#170-171 my guess is that it's the same as ts_paint but with a dummy webextension (testing/talos/talos/webextensions/dummy/dummy.xpi) installed to check the impact of add-ons on startup. That dummy add-on has the id talos@mozilla.org.
Comment 5•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #3)
I triggered profiles but they appear to be no help; [...] it's not clear where to look for the time range that the test measures. [...], no idea where in the profile (they all run about 1 full second)
I can answer that part of the question, I don't have good answers for the rest.
The test measures at https://searchfox.org/mozilla-central/rev/e11d919cdc8a909354eb2c3e19904d9229c55d88/testing/talos/talos/startup_test/tspaint_test.html#29, so the value is the time between process creation, and this point: https://searchfox.org/mozilla-central/rev/e11d919cdc8a909354eb2c3e19904d9229c55d88/view/nsViewManager.cpp#344
A close enough approximation in the profiles is looking for the first screenshot in the screenshot track, or the first "ForwardTransaction" marker.
Comment 6•4 years ago
|
||
Set release status flags based on info from the regressing bug 1648639
Updated•4 years ago
|
Comment 7•4 years ago
|
||
baseline: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ad3b917903a91c024d2b5262cc675f10527b9dc
full patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=15564de3339d886c584e09f93f6aee183975caac
only change CUI.jsm: https://treeherder.mozilla.org/#/jobs?repo=try&revision=85a74d6b4193734ad0128cdcefd154176c08df8a
change CUI.jsm + panelUI.js: https://treeherder.mozilla.org/#/jobs?repo=try&revision=80c69b12bc9331e5a2d53f4b9f9442ff851c36a1
Reporter | ||
Comment 8•4 years ago
•
|
||
Seems like in the "full patch" push you forgot the patch and pushed the same code as in the "baseline". Anyways, seems like "change CUI.jsm + panelUI.js" introduced the regression.
Comment 9•4 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #8)
Seems like in the "full patch" push you forgot the patch and pushed the same code as in the "baseline".
I did not? The two patches have different parents, compare:
https://hg.mozilla.org/try/rev/4ad3b917903a91c024d2b5262cc675f10527b9dc
and
https://hg.mozilla.org/try/rev/15564de3339d886c584e09f93f6aee183975caac
the latter has parent https://hg.mozilla.org/try/rev/df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b, ie https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b quoted in comment #0 as being the regressing patch.
So I don't understand why the trypush doesn't show a regression at all.
Anyways, seems like "change CUI.jsm + panelUI.js" introduced the regression.
This link compares a revision with itself, but also even https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4ad3b917903a91c024d2b5262cc675f10527b9dc&newProject=try&newRevision=80c69b12bc9331e5a2d53f4b9f9442ff851c36a1&framework=1&pageTitle=CUI%2BPanelUI does not show a regression.
Any idea what's going wrong here? Even the baseline trypush has the "regressed" numbers from comment #0.
Comment 10•4 years ago
|
||
Oh, I know what the issue is - the trypushes all have qr jobs, and the regression is in the non-qr jobs. I'll add some retriggers...
Reporter | ||
Comment 11•4 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #9)
I did not? The two patches have different parents, compare:
https://hg.mozilla.org/try/rev/4ad3b917903a91c024d2b5262cc675f10527b9dc
and
https://hg.mozilla.org/try/rev/15564de3339d886c584e09f93f6aee183975caac
Sorry, my bad!
Reporter | ||
Comment 12•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #10)
Oh, I know what the issue is - the trypushes all have qr jobs, and the regression is in the non-qr jobs. I'll add some retriggers...
Looking forward for the results.
Comment 13•4 years ago
|
||
OK, so https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4ad3b917903a91c024d2b5262cc675f10527b9dc&newProject=try&newRevision=85a74d6b4193734ad0128cdcefd154176c08df8a&framework=1&pageTitle=CUI shows no significant change but if you look closely at the base revision, you can see all the replicates for the base revision and ts_paint_webext on windows10-64-shippable (not the qr one) are just under 300ms - except one, which is 359ms, and which skews the average to be the same as the post-patch state. More retries would probably fix the perfherder UI to show this as a regression, but as these numbers match the ones in comment #0 let's not waste machine time on this.
So it appears my suspicion was correct and the diff in CustomizableUI.jsm was responsible for this, ie https://hg.mozilla.org/try/rev/d2efc64d0d0b2fd088b38bba76790b196c971a22 .
Next to work out is whether this is due to just loading PMV or due to the implementation of getViewNode
or something else. Push that just loads PMV: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c11fcbef8fea814f7e719cb430cff98a97082e8
Comment 14•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #13)
Next to work out is whether this is due to just loading PMV or due to the implementation of
getViewNode
or something else. Push that just loads PMV: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c11fcbef8fea814f7e719cb430cff98a97082e8
These numbers are the regressed numbers - but this is an artifact build, so I don't fully trust them. I meant to push without artifact builds, https://treeherder.mozilla.org/#/jobs?repo=try&revision=a648653d9e3b7947230d74a6a9ad900b604db522 .
These numbers are also bad! So I guess the issue is loading PMV at all...
Comment 15•4 years ago
|
||
Passing this off to Emma - she figured out that we can probably just remove some of the code in CustomizableUI.jsm as PanelMultiView takes care of the same things if/when the view is eventually opened.
Assignee | ||
Comment 16•4 years ago
|
||
Going to get a patch up soon that adds the node to the PanelUI-subView
class, but commenting out https://searchfox.org/mozilla-central/source/browser/components/customizableui/CustomizableUI.jsm#1760-1778 seems promising: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2468aed51b7366fec3e8311d32de5dfb0fe1ec26&selectedTaskRun=RfBZ37NGTbOrYX-dsxOqlw.0
Assignee | ||
Comment 17•4 years ago
|
||
Comment 18•4 years ago
|
||
Updated•4 years ago
|
Comment 19•4 years ago
|
||
bugherder |
Comment 20•4 years ago
|
||
So I'm not a perf sheriff but this doesn't look fixed to me, from looking at https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=autoland,1926121,1,1&timerange=604800 . Potentially because other stuff landed since the regressing patch...
Comment 21•4 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a94d5e87f67171b53cb7f65de9d9ddd85f812bfb&selectedTaskRun=DNu9G4WCSou2_nViOX8LNA.0 shows that the patch worked at the time. So something else also tripped us into the "bad" case here between when this regressed and when we fixed it... now the question is... what?
Comment 22•4 years ago
|
||
Did 4 or 5 pushes to narrow down the week-long window a bit.
good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9475703f658c2b8c620fd11b70f8d024b00f522&selectedTaskRun=dQSSfweKSy22Zb6awGf-sw.0 ( https://hg.mozilla.org/mozilla-central/rev/b48777a21aabc35311956a1a1395a5dbba8c1a77 )
bad: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f0684f5a9c59b05e370a9fffcd568c0d607bf36 ( https://hg.mozilla.org/mozilla-central/rev/6087e976924f95018479c6f5881878c95b8bd8e2 )
Doing another load of trypushes to bisect that to a specific m-c push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=399dc39e734dd3bd00b4a3b3deb5ab1cb7071703 (push https://hg.mozilla.org/mozilla-central/rev/2da2c9f85102b92961435e5434f469fef0ebf08c )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=80d036f9f118f0a04cbe53ccd145e65fb29837a4 (push https://hg.mozilla.org/mozilla-central/rev/c050478f22e6b4930cdcc54f8caabf4da9d58abf )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c1f282718eeaba109ac4b0a16ec114c3e5aa5aa (push https://hg.mozilla.org/mozilla-central/rev/efa2336315eda0aaf78c2e94d6c9c98106ea136b )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb4d2a098d42c83815f5764b7378f61538d066d5 (push https://hg.mozilla.org/mozilla-central/rev/2ccff3de01bf2aa6c7fb1f17f9a47f367d2c05fb )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6f0a720ba05c7803c7737ab82bb29c1548e5896 (push https://hg.mozilla.org/mozilla-central/rev/ec01c146f756b74d18e4892b4fd3aecba00da93e )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c00e145473fffe76ceddee95c28cdd9683b5fa5 (push https://hg.mozilla.org/mozilla-central/rev/75966ee1fe658b1767d7459256175c0662d14c25 )
Comment 23•4 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #22)
Doing another load of trypushes to bisect that to a specific m-c push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=399dc39e734dd3bd00b4a3b3deb5ab1cb7071703 (push https://hg.mozilla.org/mozilla-central/rev/2da2c9f85102b92961435e5434f469fef0ebf08c )
good ^^
https://treeherder.mozilla.org/#/jobs?repo=try&revision=80d036f9f118f0a04cbe53ccd145e65fb29837a4 (push https://hg.mozilla.org/mozilla-central/rev/c050478f22e6b4930cdcc54f8caabf4da9d58abf )
bad ^^
All the WPR stuff is test-only, so it shouldn't be as bad as it looks...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a83e593e6bc004436322e370b9c099d3848ee24a ( https://hg.mozilla.org/integration/autoland/rev/d0ff8bcce2867b19ac0ffe8f4a2b3ff7eb91bef8 )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d47e51c0c91857d4876d305e68e19667de9ae1f2 ( https://hg.mozilla.org/integration/autoland/rev/2b90c377a85ad6879d29ce2bae191e570992b2b4 )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=12f0a6e9cb4c0064a1628f0f3250a6debbaa02ff ( https://hg.mozilla.org/integration/autoland/rev/83710b53eef098a0b2a2184ab02e8bcd76a00ab9 )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=279e0f8ad6c2503f2fa10b0b9e694664472f20e1 ( https://hg.mozilla.org/integration/autoland/rev/297e0aa8233df4912e2f6bd8568f264a4a4813d4 )
Comment 24•4 years ago
|
||
I filed bug 1652273 for addressing the next regressor, though as I discussed there I'm not sure what the best approach is.
I think we're done as far as this bug is concerned, even though the numbers haven't recovered.
Updated•4 years ago
|
Description
•