Closed Bug 1427007 Opened 6 years ago Closed 6 years ago

Firefox can't autosave correctly session on browser exit (the list of tabs and their state) (sometimes)

Categories

(Firefox :: Session Restore, defect, P2)

57 Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
Firefox 60
Tracking Status
firefox59 --- fixed
firefox60 --- fixed

People

(Reporter: robert_abc3, Assigned: mikedeboer)

References

(Blocks 2 open bugs)

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36

Steps to reproduce:

The same problem reported by multiple users:
https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/
https://www.reddit.com/r/firefox/comments/7lnpp6/how_do_i_get_firefox_to_properly_restore_my_last/


Actual results:

See links


Expected results:

Session should be restored with all tabs.
Please visit https://support.mozilla.org/questions/firefox to get help. Thank you.
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Component: Untriaged → Session Restore
Resolution: --- → INCOMPLETE
You can merge this bug with https://bugzilla.mozilla.org/show_bug.cgi?id=1426913
I noticed the other bug after I opened this one.
Severity: normal → major
Status: RESOLVED → UNCONFIRMED
Priority: -- → P3
Resolution: INCOMPLETE → ---
Summary: Sometimes Firefox can't remember the current site in a tab, so instead it will restore an old website → Firefox can't autosave sometimes the state of tabs
Firefox does (at least should) save tab state on an interval - default is every 15 seconds. The setting is browser.sessionstore.interval. But it doesn't work properly, as when the browser left minimized for hours, closed it, went back to it later in the day and not have the correct state of tabs loaded. 

The example would be if in single tab different sites will be visited (for example reddit threads) and previous tread (not the last one visited) will be present after Firefox restart.

See here:
https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/drpoxcw/
Blocks: 450886
Severity: major → normal
OS: Unspecified → Windows
(In reply to Robert Ab from comment #4)
> This problem is reported by multiple users:
> https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/
Doesn't restore correct thread

> https://www.reddit.com/r/firefox/comments/7lnpp6/how_do_i_get_firefox_to_properly_restore_my_last/
But this one is different - restores tabs which were previously closed
(In reply to Wayne Mery (:wsmwk) from comment #5)
> (In reply to Robert Ab from comment #4)
> > This problem is reported by multiple users:
> > https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/
> Doesn't restore correct thread
> 
> > https://www.reddit.com/r/firefox/comments/7lnpp6/how_do_i_get_firefox_to_properly_restore_my_last/
> But this one is different - restores tabs which were previously closed


So the problem with Firefox session restoration is even more serious. I will need to change title.
Summary: Firefox can't autosave sometimes the state of tabs → Firefox can't autosave correctly session on browser exit (the list of tabs and their state) (sometimes)
Depends on: 1426913
Severity: normal → major
I encounter something like this on Nightly. There's an error in the Console: 

Could not write session state file  Object { operation: "move", path: "C:\Users\user\AppData\Roaming\Mozilla\Firefox\Profiles\7w701c2s.default\sessionstore-backups\recovery.jsonlz4", winLastError: 5, stack: "", fileName: "(unknown module)", lineNumber: undefined } SessionFile.jsm:396
(In reply to Krzysztof from comment #9)
> I encounter something like this on Nightly. There's an error in the Console: 
> 
> Could not write session state file  Object { operation: "move", path:
> "C:\Users\user\AppData\Roaming\Mozilla\Firefox\Profiles\7w701c2s.
> default\sessionstore-backups\recovery.jsonlz4", winLastError: 5, stack: "",
> fileName: "(unknown module)", lineNumber: undefined } SessionFile.jsm:396

Krzysztof, your log is very valuable! Thanks a lot!

The log shows that the web worker failed to write the session to files [1] for some reasons we don't know yet,
and I think this can explain the issue here.

:mikedeboer had already foreseen this and did some great works in bug 1402267 to catch/count the worker failures and restart the worker.
We can observe the failure at [2], like WORKER_RESTART_COUNT.
(This data collection in release channel is out-out.)

I assume that this issue happened between two timings:
1. Web worker starts to fail.
2. The reported failures are not accumulated to exceed the threshold, which is currently set to 5.

Krzysztof, if possible, could you help to gather more logs? 
This could help us to understand more, like the reason behind the worker failures.
Many thanks for your support!

Mike, is the assumption reasonable to you?
Do you think that adjusting the threshold to 1 is a bad idea?
(Sorry! I understand that now is not the good time to NI you, so take your time :) )
Thanks!


[1]
https://searchfox.org/mozilla-central/rev/03877052c151a8f062eea177f684a2743cd7b1d5/browser/components/sessionstore/SessionFile.jsm#396

[2]
https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F59&measure=SCALARS_BROWSER.SESSION.RESTORE.WORKER_RESTART_COUNT&min_channel_version=nightly%252F56&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0

"A counter incremented every time the SessionFile worker is restarted due to too many failures, as defined in the browser.sessionstore.max_write_failures pref."
Flags: needinfo?(mdeboer)
Flags: needinfo?(krzysztof.glebowicz)
Today I got plenty of errors like below at SessionFile.jsm:396:

Could not write session state file  TypeError: this.Paths is null

write@resource:///modules/sessionstore/SessionWorker.js:208:9
worker.dispatch@resource:///modules/sessionstore/SessionWorker.js:23:24
handleMessage@resource://gre/modules/workers/PromiseWorker.js:124:16
@resource:///modules/sessionstore/SessionWorker.js:32:41
EventListener.handleEvent*@resource:///modules/sessionstore/SessionWorker.js:32:1
Flags: needinfo?(krzysztof.glebowicz)
This looks like a worker restart issue once more where the SessionFile worker (which is where the session data is written to disk) hasn't been initialized properly, it seems. I'll see if I can trigger this situation myself.
Blocks: 1402267
Flags: needinfo?(mdeboer)
Flags: needinfo?(mdeboer)
So the information provided by Krzysztof and Will is super useful here - I haven't had a change to take a good look at the telemetry probe I recently added to the new worker health check - and allowed me to find a fix for this issue!
Thanks!
Assignee: nobody → mdeboer
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(mdeboer)
Priority: P3 → P2
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

https://reviewboard.mozilla.org/r/214634/#review220444

::: browser/components/sessionstore/SessionFile.jsm:217
(Diff revision 2)
>    },
>  
>    async _readInternal(useOldExtension) {
>      let result;
>      let noFilesFound = true;
> +    this._usingOldExtension = useOldExtension;

Could you document this field?

::: browser/components/sessionstore/SessionFile.jsm:308
(Diff revision 2)
>          source: "",
>          parsed: null,
>          useOldExtension: false
>        };
>      }
> +    this._readOrigin = result.origin;

Also this, could you document it? In particular, *when* does it become usable?

::: browser/components/sessionstore/SessionFile.jsm:322
(Diff revision 2)
> +  },
> +
> +  // Initialize the worker in the background, whilst allowing callees to block on
> +  // its completion. This will never throw an error.
> +  _initWorker() {
> +    this._initializationStarted = true;

Could you comment that this line **MUST** be called immediately (in the current tick, microtask, etc.) in `_initWorker`, in order to avoid race conditions?

Also, just to be defensive, could you assert that `this._readOrigin` has already been set?

::: browser/components/sessionstore/SessionFile.jsm:337
(Diff revision 2)
>    },
>  
> -  // Post a message to the worker, making sure that it has been initialized
> +  // Post a message to the worker, making sure that it has been initialized first.
> -  // first.
>    async _postToWorker(...args) {
>      if (!this._initializationStarted) {

If `_initializationStarted` has been set to `true` on line 283, I believe that you may end up calling `SessionWorker.post` before `this._initWorker`.

Also, I'd be more comfortable if the code that sets `_initializationStarted` to `true` was in the same place as that that checks whether it's `true`. This would help ensure that we don't add a race condition accidentally.
David, I hope I fixed all the issues you raised during review!
David, do you have a bit of time to review this patch, perhaps? Or would you like to proxy the request?
Flags: needinfo?(dteller)
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

Apologies, I'll try to review this later today.
Flags: needinfo?(dteller)
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

https://reviewboard.mozilla.org/r/214634/#review222086

::: browser/components/sessionstore/SessionFile.jsm:325
(Diff revisions 2 - 3)
>  
>      return result;
>    },
>  
>    // Initialize the worker in the background, whilst allowing callees to block on
>    // its completion. This will never throw an error.

The documentation isn't true anymore, since the returned promise doesn't guarantee that initialization is complete.

I'm nearly sure that the code still works, because: 

1. `SessionWorker.post` will serialize messages;
2. a worker won't handle a message before it has run-to-completion the previous message;
3. consequently, the next call to `_postToWorker` will be executed by the worker only after the `SessionWorker.post("init", ...)` has been executed to completion.


However, these is non-trivial reasoning, so it deserves to be documented :)

Also, since your patch is about reliability in face of worker crash, you should document the behavior in case of crash, including both crashes during initialization and after initialization – yes, I know that your code will handle both the same way, but one line of documentation to mention it won't hurt :)

::: browser/components/sessionstore/SessionFile.jsm:333
(Diff revisions 2 - 3)
> +      if (this._initializationStarted) {
> +        resolve();
> +        return;
> +      }
> +
> +      if (!this._readOrigin)

Nit: Let's avoid duplicating Apple's mistakes and keep the brackets for a `if ()` :)

::: browser/components/sessionstore/SessionFile.jsm:208
(Diff revision 3)
>  
> +  // A string that will be set to the session file name part that was read from
> +  // disk. It will be available _after_ a session file read() is done.
> +  _readOrigin: null,
> +
> +  // `true` if the old, uncompressed, file format was use read from disk as a

Typo: "was use**d**".
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

https://reviewboard.mozilla.org/r/214634/#review222100

Forgot the r+.

Apologies about the delays, I've been a bit under the weather recently.
Attachment #8944432 - Flags: review?(dteller) → review+
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #20)
> Apologies,

No need to do that, David! I already deduced as much a while ago what you said in comment 22 ;-)

Get well soon! Thanks for the review, especially considering the circumstances.
I'm not landing this yet, since the SessionWorker tests appear to be perma-failing with the patch applied. The semantics of wipe() and read() changed slighty, so I need to look at the tests to figure out how to fix things properly.
Comment on attachment 8947543 [details]
Bug 1427007 - Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit.

https://reviewboard.mozilla.org/r/217242/#review223062

::: browser/components/sessionstore/SessionFile.jsm:455
(Diff revision 1)
>      });
>    },
>  
>    wipe() {
> -    return this._postToWorker("wipe");
> +    return this._postToWorker("wipe").then(() => {
> +      this._initializationStarted = false;

Could you document why we need to reinitialize after a call to `wipe`?
Attachment #8947543 - Flags: review?(dteller) → review+
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/986dab420f52
Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data. r=Yoric
https://hg.mozilla.org/integration/autoland/rev/79b25461de35
Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit. r=Yoric
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9cf13efe8783
Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data. r=Yoric
https://hg.mozilla.org/integration/autoland/rev/3ad66bc5f746
Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit. r=Yoric
https://hg.mozilla.org/mozilla-central/rev/9cf13efe8783
https://hg.mozilla.org/mozilla-central/rev/3ad66bc5f746
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
Thank you for solving this bug.

Probably Bug 1426913 should also be marked as solved since they describing the same problem.
No longer depends on: 1426913
I'm seeing this bug on dev edition (59.0b7) frequently, can the patch be uplifted?
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1402267
[User impact if declined]: We've seen reports of sessions that are not saved to disk properly and highly suspect this is the main cause. See e.g. bug 1423305, bug 1423672, bug 1433026 and quite a few others where we are unable to reproduce the issues reported, but they are evidently there.
We have Telemetry data backing that a worker restart is not uncommon at [1], so that's something to consider as well.
[Is this code covered by automated tests?]: Yes, see patch number two.
[Has the fix been verified in Nightly?]: I think comment 38 hints at that, but since we know about [1], it's prudent to fix this properly.
[Needs manual test from QE? If yes, steps to reproduce]: No, since it's very hard to reproduce reliably.
[List of other uplifts needed for the feature/fix]: n/a.
[Is the change risky?]: Minimal risk.
[Why is the change risky/not risky?]: Since the worker restart mechanism added in bug 1402267 was incomplete. 
[String changes made/needed]: n/a.

[1] https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F59&measure=SCALARS_BROWSER.SESSION.RESTORE.WORKER_RESTART_COUNT&min_channel_version=nightly%252F56&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0
Flags: needinfo?(mdeboer)
Attachment #8944432 - Flags: approval-mozilla-beta?
Comment on attachment 8947543 [details]
Bug 1427007 - Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit.

Please see commit 41 for the Approval Request comment.
Attachment #8947543 - Flags: approval-mozilla-beta?
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

Fixes a regression in saving session (needed for session restore), new automated tests (yay!), Beta59+
Attachment #8944432 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8947543 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.