Closed Bug 1114040 Opened 5 years ago Closed 5 years ago

Session restore doesn't remember gmail


(Firefox :: Session Restore, defect)

Not set



Firefox 38
Tracking Status
firefox36 --- fixed
firefox37 --- fixed
firefox38 --- fixed


(Reporter: mccr8, Assigned: billm)




(2 files, 3 obsolete files)

If I open a new tab, then go to then when I restart the browser, it doesn't remember that the tab was open to anything, and I just get a blank tab.

What may be part of the problem here is that when I use that address, it automatically navigates to which actually loads the page.  If I open a tab and then go directly to then session restore does remember the address.

I think this started happening in the last week or two.  This is in e10s mode, if that matters.
I can reproduce this as follows.

1. Create a profile and log it into gmail. Then restart.
2. Load bugzilla into first tab.
3. Load into second tab.
4. Restart and restore session.
I've been seeing this for probably a few months in non-e10s, but never quite got around to trying without my add-ons or filing. I think it's some sort of race condition, because it seems to happen less frequently when gmail isn't the first pinned tab (which are restored eagerly, in order). When this happens, my homepage ( often also doesn't restore. gmail also gets stuck in the 'loading' screen fairly often. In that case you can simply refresh it and it will load.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2)
> When this happens, my homepage often also doesn't restore.

Correction: doesn't *load* - that case isn't restoring anything on startup.
I also have seen the problem where gmail gets stuck loading. I'm not sure if it has anything to do with this bug though.

It looks like we're not collecting the session data correctly--sessionstore.js doesn't include info about gmail after quitting. The reason we collect bad data is that __SS_data is set for the gmail tab. That property is initially set by LoadInOtherProcess when we transition from about:newtab (non-remote) to gmail (remote). __SS_data is supposed to be cleared once we're done loading gmail into the tab. However, I'm guessing that somehow gets screwed up by the redirect.
So the normal sequence when we restore a page is as follows:

1. Content script gets "load" event, sends restoreDocumentComplete message
2. ProgressListener.onStateChange fires in ContentRestore.jsm and we send restoreTabContentComplete message

However, in the gmail case, we get the onStateChange message first because of the redirect. Later we get the load event. The message handler for restoreTabContentComplete calls _resetLocalTabRestoringState, which clears the epoch value on the <browser> element. Then when we get the restoreDocumentComplete message, there's no epoch, so we ignore the message. As a consequence, __SS_data is never cleared.

The funny thing about this is that I always assumed the restoreTabContentComplete message should come before the restoreDocumentComplete message. Somehow, we have to make either ordering work I guess.
Attached patch gmail-session-fix (obsolete) — Splinter Review
This seems like the obvious way to go. It passes tests locally.
Assignee: nobody → wmccloskey
Attachment #8543476 - Flags: review?(ttaubert)
Oddly, I haven't actually noticed this problem recently.
The messages are described in SessionStore.jsm as follows:

  // All network loads for a restoring tab are done, so we should consider
  // restoring another tab in the queue.

  // The document has been restored, so the restore is done. We trigger
  // SSTabRestored at this time.

This would make me think that would expect "restoreTabContentComplete" to fire first to be able to quickly start loading the next tab as soon as all the network activity has finished.

"restoreDocumentComplete" would then be fired off the load event handler (as it is) once the page has fully loaded to mark the tab as having finished restoring.

Looking at the current code it seems that the load event always fires before the onStateChange notification, only in the case of Gmail it works as expected - which the code obviously doesn't handle right.

STATE_IS_WINDOW is actually the second STATE_STOP notification, STATE_IS_DOCUMENT is the first one that will also fire the "load" event so that seems about right. It looks like the load event is also fired when the network activity stops.

Would DOMContentLoaded give us better guarantees? It doesn't actually tell use whether resources are still loading but at least it always fires before the load event. OTOH, is starting to restore the next tab really that time-critical? Couldn't we merge those two messages and just start restoring the next tab when receiving "restoreDocumentComplete"? Less complexity is always nice...
Flags: needinfo?(wmccloskey)
Comment on attachment 8543476 [details] [diff] [review]

Clearing review request until we figure out the way forward.
Attachment #8543476 - Flags: review?(ttaubert)
Duplicate of this bug: 1107922
Listening for STATE_STOP would probably be the right way to go, considering bug 1107445. DOMContentLoaded seems to fire as well for error pages, but that doesn't signal the end of network activity.
Duplicate of this bug: 1117935
Attached patch automated test (obsolete) — Splinter Review
This is an automated test that reproduces the problem, it currently times out.

Bill, are you still actively working on it or do you want me to take this?
I'd still like to fix this.
Flags: needinfo?(wmccloskey)
Comment on attachment 8544672 [details] [diff] [review]
automated test

Review of attachment 8544672 [details] [diff] [review]:

Well feel free to use this test if you don't already have one
Attachment #8544672 - Flags: review?(wmccloskey)
Thanks Dave. I did some work on this today, but there are some subtle issues. Maybe you guys will have ideas.

I consolidated the restoreTabContentComplete and restoreDocumentComplete messages into a single message that's sent the first time onStateChange(STATE_STOP) happens. That's the time when we restore form fields, clear __SS_data and the epoch, fire SSTabRestored, and start restoring the next tab.

Unfortunately, there are some tricky timing dependencies. The first problem I ran into is that there are tests that listen for "load" events and then check that form fields were restored properly. So we need to ensure that form data restoration (which happens when we send the unified restoreComplete message) happens *before* the page's load event. (Alternatively, we could change all our tests to use some other event. I'm not sure if add-ons also depend on form data being present when the load event comes in.)

The "load" event is triggered from the docshell's onStateChange(STATE_STOP | STATE_IS_DOCUMENT) progress listener. Based on the code, it looks like we can depend on it always being the last such listener to be invoked. Based on that, it seems like we should be able to trigger the unified restore message based on onStateChange(STATE_STOP | STATE_IS_DOCUMENT).

However, there's one test failure, browser_formdata.js, that reveals an additional problem. The test restores a tab, then collects tab data again and checks that form data was correctly restored and collected. Here's the order in which that goes:

1. restoreTab calls TabStateCache.update and passes it the correct formdata.
2. The load begins and FrameTree.jsm's onStateChange(STATE_START) listener fires onFrameTreeReset, which causes FormDataListener.onFrameTreeReset to broadcast a formdata=null message to SessionStore.jsm. When that message is received, formdata is cleared from the TabStateCache.
3. We get onStateChange(STATE_STOP | STATE_IS_DOCUMENT). We first fire the ContentRestore.jsm progress listener, which restores form data and sends the unified restoreComplete message. While restoring form data, an "input" event fires for the input field we're restoring, and we land in FormDataListener.handleEvent. It checks if gFrameTree.contains(frame), which returns false. That's because the onStateChange(STATE_START) listener cleared out FrameTree._frames. Consequently, we don't broadcast the form data change to SessionStore.jsm.
4. After that, FormData.jsm's onStateChange(STATE_STOP | STATE_IS_DOCUMENT) listener fires and fills in FrameTree._frames.
5. When the test collects form data, it gets nothing because we broadcast null formdata and then ignored the "input" event that would have sent the correct data.

The reason this didn't happen before is because the ContentRestore.jsm progress listener used to trigger based on onStateChange(STATE_STOP | STATE_IS_WINDOW). So it fired after the FormData.jsm listener in step 4 (since STATE_IS_WINDOW always follows STATE_IS_DOCUMENT).

It would be nice if we could force ContentRestore.jsm's onStateChange listener to run after FormData.jsm's. That's difficult because FormData.jsm's listener is installed when the frame script starts up, while ContentRestore.jsm's listener is installed when we start restoring. Listeners are run starting with the most recently added.

One solution would be to have a global progress listener installed by content-sessionStore.jsm that would be used by both ContentRestore and FrameTree. Then we could determine the order they're run.

I also came up with a hacky solution. If ContentRestore.jsm triggers based on a "load" event instead of onStateChange(STATE_STOP | STATE_DOCUMENT), then it will run after FrameTree.jsm's listener (since the docshell runs the load event, and its progress listener is installed before any others) and before any page-level load events (since the TabChildGlobal event target is higher up in the DOM tree). However, we also need to use a web progress listener as a backup in case the page fails to load.

A final possibility is to stop relying on load events in the tests as I mentioned above.
Attached patch gmail-fix-2 (obsolete) — Splinter Review
Here's the hacky second solution I mentioned. It still needs some comments. Let me know if you'd rather take a different approach.

This patch doesn't completely pass Dave's test. When handling the JS redirect, it still records the pre-redirect URL for the session history entry. I haven't delved into this yet. The important thing is that it doesn't just time out, which is what happens without the patch.

Also, this patch passes existing tests locally, with and without e10s, and fixes bug 1107445 in addition.
Attachment #8543476 - Attachment is obsolete: true
Attachment #8545700 - Flags: feedback?(ttaubert)
Sorry for the wall of text, Tim, but I really would like to get this fixed. Do you think you'll be able to get to it soon?
Flags: needinfo?(ttaubert)
Comment on attachment 8544672 [details] [diff] [review]
automated test

Review of attachment 8544672 [details] [diff] [review]:

Thanks Dave! These tests look great to me. We just need to fix the small issue with the JS redirect I mentioned below:

::: browser/components/sessionstore/test/browser_restore_redirect.js
@@ +43,5 @@
> +  yield promiseTabState(tab, state);
> +
> +  info("Restored tab");
> +
> +  TabState.flush(browser);

The flush() call must wait until the page's JS was run, the new page was loaded and OnHistoryReplaceEntry() was called. If you add a second or so of waiting before flushing the test succeeds.
Attachment #8544672 - Flags: feedback+
Comment on attachment 8545700 [details] [diff] [review]

Review of attachment 8545700 [details] [diff] [review]:

Bill, apologies for the time it took me to answer. I needed a little more to think about a solution and tinker with your patch and Dave's tests.

What I think we should do is to consolidate the two messages. The EventHandler should continue calling restoreDocument(), it just shouldn't send a message anymore.

The actual "restoreTabContentComplete" message is sent by the ProgressListener as to catch error page loads as well.

This doesn't break any tests for me locally and passes Dave's tests as well.

::: browser/components/sessionstore/ContentRestore.jsm
@@ +407,5 @@
>    let webProgress = docShell.QueryInterface(Ci.nsIInterfaceRequestor)
>                              .getInterface(Ci.nsIWebProgress);
>    webProgress.addProgressListener(this, Ci.nsIWebProgress.NOTIFY_STATE_WINDOW);
> +  docShell.chromeEventHandler.addEventListener("load", this, true);

Don't need to change anything here when we keep the EventListener.

::: browser/components/sessionstore/SessionStore.jsm
@@ +61,5 @@
>    // time; if we did it before, the load would overwrite it.
>    "SessionStore:restoreTabContentStarted",
>    // All network loads for a restoring tab are done, so we should consider
>    // restoring another tab in the queue.

Should update the docs here.

::: browser/components/sessionstore/content/content-sessionStore.js
@@ -75,5 @@
>  /**
> - * Listens for and handles content events that we need for the
> - * session store service to be notified of state changes in content.
> - */
> -let EventListener = {

We should keep the whole EventListener and just remove the sendAsyncMessage() call at the bottom.

@@ +113,5 @@
> +            return;
> +          }
> +
> +          // Restore the form data and scroll position.
> +          gContentRestore.restoreDocument();

As said above we should keep the listener and not do that here.

@@ +119,3 @@
>            // Tell SessionStore.jsm that it may want to restore some more tabs,
>            // since it restores a max of MAX_CONCURRENT_TAB_RESTORES at a time.
> +          sendAsyncMessage("SessionStore:restoreComplete", {epoch: epoch});

Can we call it "restoreTabContentComplete"? That would be more in line with "restoreTabContent".

@@ +125,5 @@
>          let didStartLoad = gContentRestore.restoreTabContent(data.loadArguments, finishCallback);
>          sendAsyncMessage("SessionStore:restoreTabContentStarted", {epoch: epoch});
>          if (!didStartLoad) {

Need to remove sending "restoreDocumentComplete" from here.
Attachment #8545700 - Flags: feedback?(ttaubert)
Flags: needinfo?(ttaubert)
Attached patch gmail-fix-3Splinter Review
Thanks Tim. I like this approach a lot.
Attachment #8545700 - Attachment is obsolete: true
Attachment #8552672 - Flags: review?(ttaubert)
Attachment #8552672 - Flags: review?(ttaubert) → review+

Dave, do you want to fix the test or should I do it?
Flags: needinfo?(dtownsend)
Keywords: leave-open
sorry had to back this out, seems this caused perma failures like
(In reply to Carsten Book [:Tomcat] from comment #23)
> sorry had to back this out, seems this caused perma failures like

Looks like we made bug 1093655 easier to debug ;)
(In reply to Tim Taubert [:ttaubert] from comment #26)
> Re-landed together with a fix for bug 1093655:

Forgot to update the patch author after applying locally. Sorry for messing with your stats, Bill :)
Thanks Tim. How far do you think we should backport this? I'm pretty sure this affects non-e10s, and it fixes bug 1107445, which was reported against FF34.
Flags: needinfo?(ttaubert)
(In reply to Bill McCloskey (:billm) from comment #29)
> Thanks Tim. How far do you think we should backport this? I'm pretty sure
> this affects non-e10s, and it fixes bug 1107445, which was reported against
> FF34.

We should land the test and then try to backport as far as possible. The code changes aren't super risky but I would feel better with a test I think.
Flags: needinfo?(ttaubert)
Blocks: 1107445
(In reply to Bill McCloskey (:billm) from comment #22)
> Dave, do you want to fix the test or should I do it?

I'm catching up on stuff after PTO so it might go faster if you do it.
Flags: needinfo?(dtownsend)
Attached patch gmail-testSplinter Review
Here's the revised test. I just waited for the target page of the redirect to load. I think that should be sufficient. I confirmed that the JS part of the test fails when the main patch is backed out. In fact it times out waiting for promiseTabState to resolve.
Attachment #8544672 - Attachment is obsolete: true
Attachment #8544672 - Flags: review?(wmccloskey)
Attachment #8554767 - Flags: review?(ttaubert)
Comment on attachment 8554767 [details] [diff] [review]

Review of attachment 8554767 [details] [diff] [review]:

::: browser/components/sessionstore/test/browser_restore_redirect.js
@@ +20,5 @@
> +
> +  TabState.flush(browser);
> +  let data = TabState.collect(tab);
> +  is(data.entries.length, 1, "Should be one entry in session history");
> +  is(data.entries[0].url, TARGET, "Shhould be the right session history entry");

Nit: Should

@@ +59,5 @@
> +
> +  TabState.flush(browser);
> +  let data = TabState.collect(tab);
> +  is(data.entries.length, 1, "Should be one entry in session history");
> +  is(data.entries[0].url, TARGET, "Shhould be the right session history entry");

Nit: Should
Attachment #8554767 - Flags: review?(ttaubert) → review+
Comment on attachment 8552672 [details] [diff] [review]

Approval Request Comment
[Feature/regressing bug #]: bug 942374
[User impact if declined]: Session restore may fail to restore some tabs correctly.
[Describe test coverage new/current, TreeHerder]: We have a test for the regression. The changes have been on mc for about a week.
[Risks and why]: The patch is fairly low risk.
[String/UUID change made/needed]: None.
Attachment #8552672 - Flags: approval-mozilla-beta?
Attachment #8552672 - Flags: approval-mozilla-aurora?
I guess the bug can be closed now.
Attachment #8552672 - Flags: approval-mozilla-beta?
Attachment #8552672 - Flags: approval-mozilla-beta+
Attachment #8552672 - Flags: approval-mozilla-aurora?
Attachment #8552672 - Flags: approval-mozilla-aurora+
Comment on attachment 8554767 [details] [diff] [review]

[Triage Comment]
We love tests. I guess there is no reason not to take it.
Attachment #8554767 - Flags: approval-mozilla-beta+
Attachment #8554767 - Flags: approval-mozilla-aurora+
Setting as qe-verify- since this is covered by automated tests. Please set to qe-verify+ if you think manual testing is needed.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.