Closed Bug 1259770 Opened 4 years ago Closed 4 years ago

Session restore regressed 400%-500% under e10s with changes in bug 1245891

Categories

(Testing :: Talos, defect, P1)

Version 3
defect

Tracking

(e10s+)

RESOLVED FIXED
Tracking Status
e10s + ---

People

(Reporter: jimm, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

https://treeherder.mozilla.org/perf.html#/e10s?filter=sessionrestore

David, any ideas on what caused this? FWIW this will likely block e10s rollout so we'll need to figure this out.
Flags: needinfo?(dteller)
Flags: needinfo?(dteller)
Blocks: e10s-perf
Priority: -- → P3
e10s team triaged this. This test appears to measure non-default behavior, session restore without restore on demand. Therefor it doesn't block e10s rollout.
The first test, `sessionrestore`, measures the behavior in which we automatically restore the session. The second test, `sessionrestore_no_auto_restore`, measures the behavior in which we load the session but do not restore it.

**Neither test changes the preference that determines whether tabs are restored on demand.**

The measurement methodology for this test has changed on 2016-03-16, so changes on that day are 100% expected.

I have checked again: the numbers I observe on my machine for Session Restore both with and without e10s seem to match what I am seeing on screen. I do not witness noticeable slowdowns in real usage with e10s wrt without, but the variance is so huge that it's hard to be sure.
Flags: needinfo?(jmathies)
(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #2)
> The first test, `sessionrestore`, measures the behavior in which we
> automatically restore the session. The second test,
> `sessionrestore_no_auto_restore`, measures the behavior in which we load the
> session but do not restore it.
> 
> **Neither test changes the preference that determines whether tabs are
> restored on demand.**
> 

Ah, that was my mistake. I assumed both sessionrestore and sessionrestore_no_auto_restore were both doing tab/window restoration, but that sessionrestore_no_auto_restore did restore on demand.

I understand now - sessionrestore will restore the windows / tabs using the default pref (so it's restoring on demand). The sessionrestore_no_auto_restore is just until we read the session info off the disk and fire sessionRestored.

So I actually do think our users will get impacted by this, and this should be investigated.
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #3)
> (In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from
> comment #2)
> > The first test, `sessionrestore`, measures the behavior in which we
> > automatically restore the session. The second test,
> > `sessionrestore_no_auto_restore`, measures the behavior in which we load the
> > session but do not restore it.
> > 
> > **Neither test changes the preference that determines whether tabs are
> > restored on demand.**
> > 
> 
> Ah, that was my mistake. I assumed both sessionrestore and
> sessionrestore_no_auto_restore were both doing tab/window restoration, but
> that sessionrestore_no_auto_restore did restore on demand.
> 
> I understand now - sessionrestore will restore the windows / tabs using the
> default pref (so it's restoring on demand). The
> sessionrestore_no_auto_restore is just until we read the session info off
> the disk and fire sessionRestored.

It sounds like sessionrestore and sessionrestore_no_auto_restore should be testing the same thing. What's the difference between these tests if the default pref specifies no auto-restore?
Flags: needinfo?(jmathies) → needinfo?(dteller)
(In reply to Jim Mathies [:jimm] from comment #4)
> (In reply to Mike Conley (:mconley) - Needinfo me! from comment #3)
> > (In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from
> > comment #2)
> > > The first test, `sessionrestore`, measures the behavior in which we
> > > automatically restore the session. The second test,
> > > `sessionrestore_no_auto_restore`, measures the behavior in which we load the
> > > session but do not restore it.
> > > 
> > > **Neither test changes the preference that determines whether tabs are
> > > restored on demand.**
> > > 
> > 
> > Ah, that was my mistake. I assumed both sessionrestore and
> > sessionrestore_no_auto_restore were both doing tab/window restoration, but
> > that sessionrestore_no_auto_restore did restore on demand.
> > 
> > I understand now - sessionrestore will restore the windows / tabs using the
> > default pref (so it's restoring on demand). The
> > sessionrestore_no_auto_restore is just until we read the session info off
> > the disk and fire sessionRestored.
> 
> It sounds like sessionrestore and sessionrestore_no_auto_restore should be
> testing the same thing. What's the difference between these tests if the
> default pref specifies no auto-restore?

I think the confusion lies in what we mean by "auto-restore". So, there is restoring tabs on demand, which is what we do by default, and what we originally though these two tests were comparing.

What these two tests are actually comparing are these two states:

1) Don't automatically restore the user's windows and tabs from last time. Show about:home instead, with a button that allows them to restore their windows and tabs from last time.

2) Restore windows and tabs from last time.
Yep, as said by mconley.
Flags: needinfo?(dteller)
> What these two tests are actually comparing are these two states:
> 
> 1) Don't automatically restore the user's windows and tabs from last time.
> Show about:home instead, with a button that allows them to restore their
> windows and tabs from last time.
> 
> 2) Restore windows and tabs from last time.

Requesting triage again, #2 above does not involve loading content in restored tabs, we open empty tabs  (based on the default behavior set in prefs) and somehow regress that. We should confirm this is actually what's happening because it seems highly dubious that e10s regresses this.
Priority: P3 → --
Okay, I think I know what's going on.

The work in bug 1245891 takes advantage of a module that got added called StartupPerformance.jsm.

Briefly, StartupPerformance.jsm works by noticing when session restore init has begun, and then records that time. It also starts a 10 second timer. It then starts listening for SSTabRestored events to be fired. Every time one of the SSTabRestored events comes in, we record the current time, and the 10 second timer is reset.

Once the timer finally goes off (so we haven't seen an SSTabRestored event in 10 seconds), the SSTabRestored event listener detaches itself, and the observer that the sessionrestore talos test is listening for is fired off. The talos test then takes the delta between sessionrestore init and the last SSTabRestored event time.

So that's how the sessionrestore test is supposed to work.

There's a small problem though, and I think it's causing us to report different results in the e10s case (even though e10s is not performing worse).

The problem is that the talos test loads an index.html file along with the session it's restoring from. That index.html file is not part of the sessionrestore set, and is passed in the cmdline args to Firefox in order to do the reporting of the test results back to talos. It's also the selected tab.

The final puzzle piece is that the initial browser in a new browser window is non-remote by default. In order for it to go anywhere, some DocShell / SessionStore stuff occurs where any pre-existing state is scooped out of the browser, the remoteness is flipped, and then the scooped out state is sent back down to the new browser / DocShell.

That's important, because that scoop/remoteness-flip/sending of state results in an SSTabRestored event to be fired, since we're using the same tab restoration mechanism to do the sending of state.

So how this ties all together is that the browser starts up, it loads up the session, and in the non-e10s case, because a non-session tab is selected and loaded (the index.html file), an SSTabRestored event never fires (since index.html wasn’t in the old session, remember - it was loaded as part of the cmdline). When StartupPerformance never gets any SSTabRestored events, the delta that gets computed is the delta between session restore init and when restoration starts.

For e10s, it’s the same up until the index.html load. In order to load that into the initial tab, we do the remoteness flip, which fires an SSTabRestored event, which StartupPerformance notices.

That’s where the discrepancy is.

TL;DR:

non-e10s is measuring the delta between session restore init and when tab restoration starts.

e10s is measuring the delta between session restore init, and when the index.html of the talos test content page has finished being restored.

This explains the big gap in the reading: we're not measuring the same thing in both cases.

I'll see if I can come up with some ways of making these measurements more comparable.
To prove my case, here's a try push where I forced the initial browser in the window to be remote (this might also contribute to wins to tpaint, so this was a thing I was already pursuing):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba7b791b1ac8095ffd090b295909c2663c6c2cbe

Try to ignore the high scores from the Linux64 spot instances - I'm not sure how to hide those, and unfortunately, Perfherder mixes the AWS-instance results with the normal Linux64 talos machine results, which kinda throws off the numbers (see bug 1260926).

In this case, we appear to at least match non-e10s. On some platforms e10s beats non-e10s significantly.

So, to sum, I think e10s is not regressing sessionstore performance when we're restoring tabs on demand. In fact, I think it performs better. We just need to make sure the test starts comparing things properly.
Status: NEW → RESOLVED
Closed: 4 years ago
Priority: -- → P1
Resolution: --- → FIXED
Thanks Mike!
You need to log in before you can comment on or make changes to this bug.