Closed Bug 1650404 Opened 1 month ago Closed 29 days ago

2.74 - 3.57% ts_paint_webext (windows10-64-shippable) regression on push df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b (Wed July 1 2020)

Categories

(Firefox :: Toolbars and Customization, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 --- fixed

People

(Reporter: alexandrui, Assigned: emalysz)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords)

Attachments

(1 file)

Perfherder has detected a talos performance regression from push df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

4% ts_paint_webext windows10-64-shippable opt e10s stylo 299.79 -> 310.50
3% ts_paint_webext windows10-64-shippable opt e10s stylo 301.42 -> 309.67

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(emalysz)
Component: Performance → Toolbars and Customization
Product: Testing → Firefox

Also, what is this test, what does it do, and how does it differ from ts_paint ? There's no documentation on https://wiki.mozilla.org/TestEngineering/Performance/Talos/Tests .

I triggered profiles but they appear to be no help; the profile runs just as often show the post-patch result being faster than the pre-patch result, and it's not clear where to look for the time range that the test measures. Averaging the actual results of all the profiles before and after the change, the difference is (if I've not made any mistake; the numbers aren't published so I had to dig them out of the log and average them myself) between 376.8ms and 374.95ms, so just under 2ms - but again, no idea where in the profile (they all run about 1 full second), and the noise on the individual measurements is between 10 and 20ms.

Florian, do you have any guidance here, given that you looked at the usefulness of these profiles recently?

Pre-patch: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FD8Eb7NpyQ7iSZYovdcUIQQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_ts_paint_webext.zip

Post-patch: https://profiler.firefox.com/from-url/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FHFCPgKJGSJ-mz6WLSL4mLw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2Fprofile_ts_paint_webext.zip

Flags: needinfo?(florian)

(In reply to :Gijs (he/him) from comment #2)

Also, what is this test, what does it do, and how does it differ from ts_paint ? There's no documentation on https://wiki.mozilla.org/TestEngineering/Performance/Talos/Tests .

From looking at https://searchfox.org/mozilla-central/rev/31d8600b73dc85b4cdbabf45ac3f1a9c11700d8e/testing/talos/talos/test.py#170-171 my guess is that it's the same as ts_paint but with a dummy webextension (testing/talos/talos/webextensions/dummy/dummy.xpi) installed to check the impact of add-ons on startup. That dummy add-on has the id talos@mozilla.org.

(In reply to :Gijs (he/him) from comment #3)

I triggered profiles but they appear to be no help; [...] it's not clear where to look for the time range that the test measures. [...], no idea where in the profile (they all run about 1 full second)

I can answer that part of the question, I don't have good answers for the rest.

The test measures at https://searchfox.org/mozilla-central/rev/e11d919cdc8a909354eb2c3e19904d9229c55d88/testing/talos/talos/startup_test/tspaint_test.html#29, so the value is the time between process creation, and this point: https://searchfox.org/mozilla-central/rev/e11d919cdc8a909354eb2c3e19904d9229c55d88/view/nsViewManager.cpp#344

A close enough approximation in the profiles is looking for the first screenshot in the screenshot track, or the first "ForwardTransaction" marker.

Flags: needinfo?(florian)
Flags: needinfo?(gijskruitbosch+bugs)

Seems like in the "full patch" push you forgot the patch and pushed the same code as in the "baseline". Anyways, seems like "change CUI.jsm + panelUI.js" introduced the regression.

Flags: needinfo?(aionescu)

(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #8)

Seems like in the "full patch" push you forgot the patch and pushed the same code as in the "baseline".

I did not? The two patches have different parents, compare:

https://hg.mozilla.org/try/rev/4ad3b917903a91c024d2b5262cc675f10527b9dc

and

https://hg.mozilla.org/try/rev/15564de3339d886c584e09f93f6aee183975caac

the latter has parent https://hg.mozilla.org/try/rev/df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b, ie https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=df91fb44b2d4bd8cf55cb2a5360094feb2c6ca6b quoted in comment #0 as being the regressing patch.

So I don't understand why the trypush doesn't show a regression at all.

Anyways, seems like "change CUI.jsm + panelUI.js" introduced the regression.

This link compares a revision with itself, but also even https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4ad3b917903a91c024d2b5262cc675f10527b9dc&newProject=try&newRevision=80c69b12bc9331e5a2d53f4b9f9442ff851c36a1&framework=1&pageTitle=CUI%2BPanelUI does not show a regression.

Any idea what's going wrong here? Even the baseline trypush has the "regressed" numbers from comment #0.

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(aionescu)

Oh, I know what the issue is - the trypushes all have qr jobs, and the regression is in the non-qr jobs. I'll add some retriggers...

(In reply to :Gijs (he/him) from comment #9)

I did not? The two patches have different parents, compare:

https://hg.mozilla.org/try/rev/4ad3b917903a91c024d2b5262cc675f10527b9dc

and

https://hg.mozilla.org/try/rev/15564de3339d886c584e09f93f6aee183975caac

Sorry, my bad!

Flags: needinfo?(aionescu)

(In reply to :Gijs (he/him) from comment #10)

Oh, I know what the issue is - the trypushes all have qr jobs, and the regression is in the non-qr jobs. I'll add some retriggers...

Looking forward for the results.

OK, so https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=4ad3b917903a91c024d2b5262cc675f10527b9dc&newProject=try&newRevision=85a74d6b4193734ad0128cdcefd154176c08df8a&framework=1&pageTitle=CUI shows no significant change but if you look closely at the base revision, you can see all the replicates for the base revision and ts_paint_webext on windows10-64-shippable (not the qr one) are just under 300ms - except one, which is 359ms, and which skews the average to be the same as the post-patch state. More retries would probably fix the perfherder UI to show this as a regression, but as these numbers match the ones in comment #0 let's not waste machine time on this.

So it appears my suspicion was correct and the diff in CustomizableUI.jsm was responsible for this, ie https://hg.mozilla.org/try/rev/d2efc64d0d0b2fd088b38bba76790b196c971a22 .

Next to work out is whether this is due to just loading PMV or due to the implementation of getViewNode or something else. Push that just loads PMV: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c11fcbef8fea814f7e719cb430cff98a97082e8

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to :Gijs (he/him) from comment #13)

Next to work out is whether this is due to just loading PMV or due to the implementation of getViewNode or something else. Push that just loads PMV: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c11fcbef8fea814f7e719cb430cff98a97082e8

These numbers are the regressed numbers - but this is an artifact build, so I don't fully trust them. I meant to push without artifact builds, https://treeherder.mozilla.org/#/jobs?repo=try&revision=a648653d9e3b7947230d74a6a9ad900b604db522 .

These numbers are also bad! So I guess the issue is loading PMV at all...

Flags: needinfo?(gijskruitbosch+bugs)

Passing this off to Emma - she figured out that we can probably just remove some of the code in CustomizableUI.jsm as PanelMultiView takes care of the same things if/when the view is eventually opened.

Assignee: nobody → emalysz
Status: NEW → ASSIGNED
Flags: needinfo?(emalysz)
Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e7bb5cb77c6c
rely on panelUI to add "PanelUI-subView" attribute r=Gijs
Priority: -- → P1
Status: ASSIGNED → RESOLVED
Closed: 29 days ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 80

So I'm not a perf sheriff but this doesn't look fixed to me, from looking at https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=autoland,1926121,1,1&timerange=604800 . Potentially because other stuff landed since the regressing patch...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a94d5e87f67171b53cb7f65de9d9ddd85f812bfb&selectedTaskRun=DNu9G4WCSou2_nViOX8LNA.0 shows that the patch worked at the time. So something else also tripped us into the "bad" case here between when this regressed and when we fixed it... now the question is... what?

Did 4 or 5 pushes to narrow down the week-long window a bit.

good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9475703f658c2b8c620fd11b70f8d024b00f522&selectedTaskRun=dQSSfweKSy22Zb6awGf-sw.0 ( https://hg.mozilla.org/mozilla-central/rev/b48777a21aabc35311956a1a1395a5dbba8c1a77 )

bad: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f0684f5a9c59b05e370a9fffcd568c0d607bf36 ( https://hg.mozilla.org/mozilla-central/rev/6087e976924f95018479c6f5881878c95b8bd8e2 )

Doing another load of trypushes to bisect that to a specific m-c push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=399dc39e734dd3bd00b4a3b3deb5ab1cb7071703 (push https://hg.mozilla.org/mozilla-central/rev/2da2c9f85102b92961435e5434f469fef0ebf08c )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=80d036f9f118f0a04cbe53ccd145e65fb29837a4 (push https://hg.mozilla.org/mozilla-central/rev/c050478f22e6b4930cdcc54f8caabf4da9d58abf )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c1f282718eeaba109ac4b0a16ec114c3e5aa5aa (push https://hg.mozilla.org/mozilla-central/rev/efa2336315eda0aaf78c2e94d6c9c98106ea136b )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb4d2a098d42c83815f5764b7378f61538d066d5 (push https://hg.mozilla.org/mozilla-central/rev/2ccff3de01bf2aa6c7fb1f17f9a47f367d2c05fb )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6f0a720ba05c7803c7737ab82bb29c1548e5896 (push https://hg.mozilla.org/mozilla-central/rev/ec01c146f756b74d18e4892b4fd3aecba00da93e )
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c00e145473fffe76ceddee95c28cdd9683b5fa5 (push https://hg.mozilla.org/mozilla-central/rev/75966ee1fe658b1767d7459256175c0662d14c25 )

(In reply to :Gijs (he/him) from comment #22)

Doing another load of trypushes to bisect that to a specific m-c push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=399dc39e734dd3bd00b4a3b3deb5ab1cb7071703 (push https://hg.mozilla.org/mozilla-central/rev/2da2c9f85102b92961435e5434f469fef0ebf08c )

good ^^

https://treeherder.mozilla.org/#/jobs?repo=try&revision=80d036f9f118f0a04cbe53ccd145e65fb29837a4 (push https://hg.mozilla.org/mozilla-central/rev/c050478f22e6b4930cdcc54f8caabf4da9d58abf )

bad ^^

pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2da2c9f85102b92961435e5434f469fef0ebf08c&tochange=c050478f22e6b4930cdcc54f8caabf4da9d58abf

All the WPR stuff is test-only, so it shouldn't be as bad as it looks...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a83e593e6bc004436322e370b9c099d3848ee24a ( https://hg.mozilla.org/integration/autoland/rev/d0ff8bcce2867b19ac0ffe8f4a2b3ff7eb91bef8 )

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d47e51c0c91857d4876d305e68e19667de9ae1f2 ( https://hg.mozilla.org/integration/autoland/rev/2b90c377a85ad6879d29ce2bae191e570992b2b4 )

https://treeherder.mozilla.org/#/jobs?repo=try&revision=12f0a6e9cb4c0064a1628f0f3250a6debbaa02ff ( https://hg.mozilla.org/integration/autoland/rev/83710b53eef098a0b2a2184ab02e8bcd76a00ab9 )

https://treeherder.mozilla.org/#/jobs?repo=try&revision=279e0f8ad6c2503f2fa10b0b9e694664472f20e1 ( https://hg.mozilla.org/integration/autoland/rev/297e0aa8233df4912e2f6bd8568f264a4a4813d4 )

See Also: → 1652273

I filed bug 1652273 for addressing the next regressor, though as I discussed there I'm not sure what the best approach is.

I think we're done as far as this bug is concerned, even though the numbers haven't recovered.

You need to log in before you can comment on or make changes to this bug.