13.02 - 132.42% tabswitch / tp5n main_startup_fileio (linux64-shippable, windows10-64-shippable) regression on push 45bbeabb0c89cf73d644b59898cb207e4462f147 (Mon October 26 2020)
Categories
(Firefox :: Bookmarks & History, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox81 | --- | unaffected |
firefox82 | --- | unaffected |
firefox83 | --- | unaffected |
firefox84 | + | fixed |
People
(Reporter: alexandrui, Assigned: Gijs)
References
(Regression)
Details
(4 keywords)
Attachments
(1 file)
Perfherder has detected a talos performance regression from push 45bbeabb0c89cf73d644b59898cb207e4462f147. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
132% | tabswitch | linux64-shippable | e10s stylo | 8.23 -> 19.14 | |
15% | tp5n | main_startup_fileio | windows10-64-shippable | e10s stylo | 727,109.17 -> 836,677.00 |
13% | tabswitch | windows10-64-shippable | e10s stylo | 12.00 -> 13.56 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Set release status flags based on info from the regressing bug 727668
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
The tabswitch thing I understand and I might have an easy fix. Maybe. Depends a little bit on precisely what the test does, and I haven't checked yet.
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #0)
| 15% | tp5n | main_startup_fileio | windows10-64-shippable | e10s stylo | 727,109.17 -> 836,677.00 |
This doesn't make a lot of sense, though. I've done a bunch of these bugs now, and I always struggle finding the exact list of IO that has changed. Has that situation been improved yet? Like, is there a list somewhere we can diff to see what the additional IO is ? What files are we reading? (is it reading? Or writing?)
Assignee | ||
Comment 4•4 years ago
|
||
Based on https://treeherder.mozilla.org/jobs?repo=try&revision=ab9d52bd1006650697040f95863233e3390e1575 , https://treeherder.mozilla.org/jobs?repo=try&revision=a87fe49e3b8cfd3cf26fb3a1758a39d41852d316 and https://treeherder.mozilla.org/jobs?repo=try&revision=5bd6db1ef27c1e3cf0887885beaa6b8576fc4429 , looks to me like the tabswitch
issue is https://searchfox.org/mozilla-central/rev/7b07725fe9dc5d59a525990fc1dba78bc8b82af1/browser/base/content/browser.js#5395-5403 .
I'll look at creating a patch to optimize that code.
Assignee | ||
Comment 5•4 years ago
|
||
It would help if I could run talos locally on Windows. :-\
(filed bug 1674736)
Assignee | ||
Comment 6•4 years ago
|
||
So as always, it pays to check your assumptions... yesterday I looked into this more and figured the slow part would be checking the count of bookmarks in the toolbar - the test flicks between a blank page on which we show the toolbar, to a webpage where we do not.
So I tried commenting out the checks for the number of bookmarks, and it made sod all difference.
From a profile, it would seem that actually, the bigger cost is creating and destroying the places view when we show/hide the toolbar, and calling inferFromText
every time we show/hide the toolbar (tripped from the toolbarvisibilitychange
event). For the bookmarks toolbar, we could probably just adopt the data from the navbar, which doesn't show/hide.
Reporter | ||
Comment 7•4 years ago
|
||
Sorry for replying so late. I was in PTO yesterday.
I did some retriggers to all 3 alerts items in the summary, just to confirm the regression. Only tabswitch windows10-64-shippable but I wouldn't rely on that. I will look into it more thorouly and see what I can find. Indeed, those talos tests are sometimes problematic.
Reporter | ||
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #6)
So as always, it pays to check your assumptions... yesterday I looked into this more and figured the slow part would be checking the count of bookmarks in the toolbar - the test flicks between a blank page on which we show the toolbar, to a webpage where we do not.
So I tried commenting out the checks for the number of bookmarks, and it made sod all difference.
From a profile, it would seem that actually, the bigger cost is creating and destroying the places view when we show/hide the toolbar, and calling
inferFromText
every time we show/hide the toolbar (tripped from thetoolbarvisibilitychange
event). For the bookmarks toolbar, we could probably just adopt the data from the navbar, which doesn't show/hide.
Unfortunately, despite what the profile says, this didn't seem to help much, so it's back to bisecting the change to figure out what is broken. :-\
( https://treeherder.mozilla.org/jobs?repo=try&revision=71c64fc49f79850164e733cef7fd9e54e83e8dcd for reference; I might still end up submitting that because it's a net reduction of work, and well, I've written it now...)
Assignee | ||
Comment 9•4 years ago
|
||
I talked to Mike Conley, and conversation suggests that because the tabswitch test is meant to measure the overhead of waiting for layers etc. of painting different tabs' content, it's fair game to disable the pref for this test. The test as-is continuously swaps between the new tab page and a page with a website, so we're showing and hiding the toolbar on every tab switch, which isn't particularly realistic anyway - I'd expect most tabswitches to happen between pages that have the toolbar disabled anyway.
That said, I'm still kind of curious why it impacts so much, and I'll see if I can get at least slightly more info on that and/or improve the situation a bit...
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
(all that said, still no idea what the startup file IO thing is about)
Reporter | ||
Comment 11•4 years ago
|
||
Hi Gijs. Startup file IO tracks the IO operations made by browser at startup. If this is 0, then no IO is made, which is a problem. Still, ideally the IO needs to be as low as possible.
Judging after the title, your patch might "force" some IO from the file on the disk that contains the bookmarks (assuming that the browser saves the bookmarks to a file on the disk). In the new tab page, I guess the browser combines the most used sides with the bookmarks, probably interogates the bookmarks depending on most used websites.
I'll re-read your comments and let you know if I find anything that might be related.
Assignee | ||
Comment 12•4 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #11)
Hi Gijs. Startup file IO tracks the IO operations made by browser at startup. If this is 0, then no IO is made, which is a problem. Still, ideally the IO needs to be as low as possible.
This part is completely clear to me.
Judging after the title, your patch might "force" some IO from the file on the disk that contains the bookmarks (assuming that the browser saves the bookmarks to a file on the disk). In the new tab page, I guess the browser combines the most used sides with the bookmarks, probably interogates the bookmarks depending on most used websites.
No. We haven't changed anything relating to the new tab page itself. We only changed whether we show the bookmarks toolbar for the default homepages. Anyway, my point is, I want to get away from "guess" and towards "know", in terms of which files were accessed and how much IO we did in them. Last I checked, it was hard to figure out what the difference in IO was.
Having had a bit more practice, I now know to look for mainthread_readbytes
in the log - though tbh, the test outputs machine-readable versions of that data, and it'd be nice if our tooling exposed this and showed clearly where the IO values differed between the two tasks... now I have to do a kind of manual diff (also because the profile path directories are different between different tasks).
It looks like there's more places.sqlite
IO, presumably from the bookmarks toolbar being loaded and/or checking if it has child nodes.
Reporter | ||
Comment 13•4 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #9)
I talked to Mike Conley, and conversation suggests that because the tabswitch test is meant to measure the overhead of waiting for layers etc. of painting different tabs' content, it's fair game to disable the pref for this test. The test as-is continuously swaps between the new tab page and a page with a website, so we're showing and hiding the toolbar on every tab switch, which isn't particularly realistic anyway - I'd expect most tabswitches to happen between pages that have the toolbar disabled anyway.
I understand. Mike Conley's suggestion sounds reasonable. Talos tests get more outdated as the time passes by and I don't know about a team that's working actively on improving them. Seems like this investigation is a dead-end and is consuming your time. Let's leave this open, I'll needinfo one of the team's members that own this tests and see what's their suggestion.
Reporter | ||
Comment 14•4 years ago
|
||
Looks like Mike is actually the person I was looking for. Let's disable the pref for this test and close the bug as WONTFIX.
Assignee | ||
Comment 15•4 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #14)
Looks like Mike is actually the person I was looking for. Let's disable the pref for this test and close the bug as WONTFIX.
I mean, that works for tabswitch, but doesn't address the file io thing, which I'm still finding a bit worrying.
Come to that, the tabswitch thing is also worrying - even if we don't mean to test for frontend tab switch overhead in this test, the fact that we added some is unfortunate.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 16•4 years ago
|
||
Recap: this bug is about 2 things.
- tabswitch. To fix this, we're going to disable the change from bug 727668 for the test. Per discussions with Mike, browser frontend overhead isn't what this test is trying to measure. This feature is already going to be disabled for general release on 84, so we're OK with taking some time to figure out the broader perf work - that's bug 1675809.
- mainthread file IO increases for places.sqlite . After conversations with Marco Bonardo and several other folks, this reflects the added cost of starting with the bookmarks toolbar enabled. Those costs aren't new with 727668 - users who enable the bookmarks toolbar everywhere already pay those costs. But we're potentially increasing the number of people who are exposed to these costs. Genuinely fixing this would mean making all our places access async, which is probably a several-quarters-long project and definitely out of scope for 84. However, we have an idea that delaying the initial load of the bookmarks toolbar data until after startup (but making sure the toolbar takes up its full size) will take this out of the startup path without having to fix all of places - while making it easier to adopt an async approach to loading that and other places views in the future. We'll be investigating that in bug 1667237, where we hope to have a resolution for 84, though it might slip to 85: there too, we feel it doesn't need to block what's happening with 727668 - the feature is going to be disabled for general release on 84, and we don't expect that the increase in IO is going to impact results from the experiment.
So in this bug, I'm going to disable the feature for tabswitch and verify that gets the numbers back down. I think we'll be accepting the mainthread IO regression, though once 84 hits beta we can verify that it doesn't occur there.
Assignee | ||
Comment 17•4 years ago
|
||
Comment 18•4 years ago
|
||
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/78ac9b95e0a8 do not toggle the bookmarks toolbar in the middle of the tabswitch test, r=mconley,perftest-reviewers,kimberlythegeek
Comment 19•4 years ago
|
||
bugherder |
Assignee | ||
Comment 20•4 years ago
|
||
both show neatly that we've gone back to normal here for the tabswitch times.
Updated•3 years ago
|
Description
•