Open Bug 1588165 Opened 2 years ago Updated 9 days ago

Fix sessionrestore_many_windows Talos test to not ignore SSTabRestored events (was showing ~80% improvement on Windows shippable with Fission enabled)

Categories

(Firefox :: Session Restore, defect, P2)

defect

Tracking

()

ASSIGNED
Fission Milestone M7a

People

(Reporter: mconley, Assigned: jesup)

References

(Blocks 3 open bugs)

Details

(Whiteboard: fission-perf)

Assignee: nobody → mconley

The good news is that I can reproduce this result locally. Starting to investigate now...

Okay, I've figured out the problem.

The way that the SessionRestore Talos tests work is by restoring a session, and then listening for SSTabRestored events firing in each window that gets restored. 10 seconds after the last SSTabRestored event is observed, the session is considered restored, and we record the delta between process start and the last SSTabRestored event time.

The problem is here:

https://searchfox.org/mozilla-central/rev/7536d7f480a7f18c941a590a2d4c5119d9f52770/browser/components/sessionstore/StartupPerformance.jsm#225

Wayyyyy back in bug 1261657, I made it so that StartupPerformance ignores SSTabRestored events caused by remoteness flips. See that bug for the reason why.

With Fission enabled, for each tab that's getting restored automatically, we're doing a remoteness flip because the initial remoteType of the initial tab ("web") does not match what is being loaded in the test ("webIsolated=http://localhost"). So each of the restored tabs does a remoteness flip.

And that means that we don't count any of the SSTabRestored events. Because we see no SSTabRestored events, the delta that we end up recording is between process start and the sessionstore-restoring-on-startup, so it's no wonder that Fission is reporting a better score here - none of the tab restorations are being counted!

Hey pbone, from what I can tell from the Fission meeting notes, you're working on moving us to the new process flipping mechanism, and (blessedly!) removing that responsibility from front-end / SessionStore. Do you have any WIP patches that I can look at, or planning documents or notes or something I can look at to see how that factors in with this test?

Flags: needinfo?(pbone)

I have some patches that are still in the "embarrassing" range of WIPness. It's possible I can tidy them up and show you tomorrow. The plan is to remove the onMayChangeProcess stuff from SessionStore.jsm and move it into nsDocumentChannelParent.cpp. Moving the majority of this code into C++ where we can debug it a bit easier. It was originally in SessionStore because of some historical reasons that I was probably told but don't remember. I'm not sure how this could affect the SSTabRestored notification (it'll probably still be broken when my patches land).

If you think you've got a quick-ish fix to the session store code then land it now, if it conflicts with mine I'll figure it out. My change won't land until at least the week after next since I'm going on PTO next week.

Flags: needinfo?(pbone)

We'll want to fix this for Fission milestone M6 (enable Fission in Nightly).

Fission Milestone: --- → M6

Paul, is there work remaining for this bug?

Flags: needinfo?(pbone)

Having spoken with pbone, mattwoodrow and kmag, there's still quite a bit of work remaining to get process switching out of the wheelhouse of SessionStore.

I don't think it makes sense to try to fix this until that work is done.

Flags: needinfo?(pbone)

AIUI it depends on Bug 1586985

Depends on: 1586985
Fission Milestone: M6 → M6c
Summary: Investigate surprising ~80% improvement on sessionrestore_many_windows Talos test on Windows shippable with Fission enabled → Fix sessionrestore_many_windows Talos test to not ignore SSTabRestored events (was showing ~80% improvement on Windows shippable with Fission enabled)

Started looking at this again. It looks like with Fission enabled, we can probably stop ignoring the isRemotenessUpdate SSTabRestored events, which puts us back in the ballpark again. There does, however, still appear to be a slight regression - we're updating remoteness with Fission enabled from a "web" type process to a "webIsolated" process, which is eating up some time and causing a regression.

Spoke with nika about this. We need to wait until SessionStore no longer needs to manually do the remoteness flipping of browsers, which involves making SHIP working outside of Fission.

Depends on: fission-history

That won't be complete in time for M6c so pushing this out to M7. It can be reviewed and re-triaged again, if needed.

Fission Milestone: M6c → M7

Process switching is largely out of the wheelhouse of SessionStore now (at least when SHIP is enabled, which it always is with Fission) - could we perhaps keep the old logic, but only for the non-ship case, and start getting performance results here?

Flags: needinfo?(mconley)
Whiteboard: fission-perf

could we perhaps keep the old logic, but only for the non-ship case, and start getting performance results here?

Yes, this sounds like a reasonable approach - we should special-case the logic in the test for the Fission-on case to start getting measurements here.

Flags: needinfo?(mconley)

I took a quick look at the profile which is being restored (from https://searchfox.org/mozilla-central/source/testing/talos/talos/startup_test/sessionrestore/profile/sessionstore.js, though I imagine the actual restore is from sessionstore.jsonlz4 in the same directory), and noticed that the URLs appear to all be of the form http://localhost:8080/?httpsurl=www.whatever.com/whatever. This means that the test won't use multiple content processes under Fission (as they'll all be served under the same site http://localhost), and that we'll probably be measuring inaccurate data there.

If we can get away with it, we should probably also fix the profiles which we're loading to actually use the correct domain names through a proxy so that we get a more accurate process allocation structure.

Mike said he won't have time to work on this for Fission M7, because of Proton work assigned to him.

Assignee: mconley → nobody
Fission Milestone: M7 → M8

Not a M7 beta blocker but we'd like this test re-enabled and working correctly as soon as possible so pushing this out to M7a.

Fission Milestone: M8 → M7a
See Also: → 1591124

Randell will start looking into fixing this.

Assignee: nobody → rjesup
Severity: normal → S2
Status: NEW → ASSIGNED
Type: task → defect
Priority: -- → P2
No longer depends on: fission-history
You need to log in before you can comment on or make changes to this bug.