Fix sessionrestore_many_windows Talos test to not ignore SSTabRestored events (was showing ~80% improvement on Windows shippable with Fission enabled)
Categories
(Firefox :: Session Restore, defect, P2)
Tracking
()
| Fission Milestone | M7a |
People
(Reporter: mconley, Assigned: jesup)
References
(Blocks 3 open bugs)
Details
(Whiteboard: fission-perf)
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 1•2 years ago
|
||
The good news is that I can reproduce this result locally. Starting to investigate now...
| Reporter | ||
Comment 2•2 years ago
|
||
Okay, I've figured out the problem.
The way that the SessionRestore Talos tests work is by restoring a session, and then listening for SSTabRestored events firing in each window that gets restored. 10 seconds after the last SSTabRestored event is observed, the session is considered restored, and we record the delta between process start and the last SSTabRestored event time.
The problem is here:
Wayyyyy back in bug 1261657, I made it so that StartupPerformance ignores SSTabRestored events caused by remoteness flips. See that bug for the reason why.
With Fission enabled, for each tab that's getting restored automatically, we're doing a remoteness flip because the initial remoteType of the initial tab ("web") does not match what is being loaded in the test ("webIsolated=http://localhost"). So each of the restored tabs does a remoteness flip.
And that means that we don't count any of the SSTabRestored events. Because we see no SSTabRestored events, the delta that we end up recording is between process start and the sessionstore-restoring-on-startup, so it's no wonder that Fission is reporting a better score here - none of the tab restorations are being counted!
| Reporter | ||
Comment 3•2 years ago
|
||
Hey pbone, from what I can tell from the Fission meeting notes, you're working on moving us to the new process flipping mechanism, and (blessedly!) removing that responsibility from front-end / SessionStore. Do you have any WIP patches that I can look at, or planning documents or notes or something I can look at to see how that factors in with this test?
Comment 4•2 years ago
|
||
I have some patches that are still in the "embarrassing" range of WIPness. It's possible I can tidy them up and show you tomorrow. The plan is to remove the onMayChangeProcess stuff from SessionStore.jsm and move it into nsDocumentChannelParent.cpp. Moving the majority of this code into C++ where we can debug it a bit easier. It was originally in SessionStore because of some historical reasons that I was probably told but don't remember. I'm not sure how this could affect the SSTabRestored notification (it'll probably still be broken when my patches land).
If you think you've got a quick-ish fix to the session store code then land it now, if it conflicts with mine I'll figure it out. My change won't land until at least the week after next since I'm going on PTO next week.
Comment 5•2 years ago
|
||
We'll want to fix this for Fission milestone M6 (enable Fission in Nightly).
| Reporter | ||
Comment 7•2 years ago
|
||
Having spoken with pbone, mattwoodrow and kmag, there's still quite a bit of work remaining to get process switching out of the wheelhouse of SessionStore.
I don't think it makes sense to try to fix this until that work is done.
Updated•1 year ago
|
Updated•10 months ago
|
| Reporter | ||
Comment 9•6 months ago
|
||
Started looking at this again. It looks like with Fission enabled, we can probably stop ignoring the isRemotenessUpdate SSTabRestored events, which puts us back in the ballpark again. There does, however, still appear to be a slight regression - we're updating remoteness with Fission enabled from a "web" type process to a "webIsolated" process, which is eating up some time and causing a regression.
| Reporter | ||
Comment 10•6 months ago
|
||
Spoke with nika about this. We need to wait until SessionStore no longer needs to manually do the remoteness flipping of browsers, which involves making SHIP working outside of Fission.
Comment 11•6 months ago
|
||
That won't be complete in time for M6c so pushing this out to M7. It can be reviewed and re-triaged again, if needed.
Comment 12•4 months ago
|
||
Process switching is largely out of the wheelhouse of SessionStore now (at least when SHIP is enabled, which it always is with Fission) - could we perhaps keep the old logic, but only for the non-ship case, and start getting performance results here?
Updated•4 months ago
|
Updated•3 months ago
|
| Reporter | ||
Comment 13•3 months ago
|
||
could we perhaps keep the old logic, but only for the non-ship case, and start getting performance results here?
Yes, this sounds like a reasonable approach - we should special-case the logic in the test for the Fission-on case to start getting measurements here.
Comment 14•3 months ago
|
||
I took a quick look at the profile which is being restored (from https://searchfox.org/mozilla-central/source/testing/talos/talos/startup_test/sessionrestore/profile/sessionstore.js, though I imagine the actual restore is from sessionstore.jsonlz4 in the same directory), and noticed that the URLs appear to all be of the form http://localhost:8080/?httpsurl=www.whatever.com/whatever. This means that the test won't use multiple content processes under Fission (as they'll all be served under the same site http://localhost), and that we'll probably be measuring inaccurate data there.
If we can get away with it, we should probably also fix the profiles which we're loading to actually use the correct domain names through a proxy so that we get a more accurate process allocation structure.
Comment 15•3 months ago
|
||
Mike said he won't have time to work on this for Fission M7, because of Proton work assigned to him.
Updated•3 months ago
|
Comment 16•3 months ago
|
||
Not a M7 beta blocker but we'd like this test re-enabled and working correctly as soon as possible so pushing this out to M7a.
Updated•3 months ago
|
Comment 17•2 months ago
|
||
Randell will start looking into fixing this.
Updated•27 days ago
|
Description
•