Firefox can't autosave correctly session on browser exit (the list of tabs and their state) (sometimes)

RESOLVED FIXED in Firefox 59

Status

()

defect
P2
major
RESOLVED FIXED
a year ago
8 months ago

People

(Reporter: robert_abc3, Assigned: mikedeboer)

Tracking

(Blocks 2 bugs, {regression})

57 Branch
Firefox 60
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox59 fixed, firefox60 fixed)

Details

Attachments

(2 attachments)

(Reporter)

Description

a year ago
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36

Steps to reproduce:

The same problem reported by multiple users:
https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/
https://www.reddit.com/r/firefox/comments/7lnpp6/how_do_i_get_firefox_to_properly_restore_my_last/


Actual results:

See links


Expected results:

Session should be restored with all tabs.
Please visit https://support.mozilla.org/questions/firefox to get help. Thank you.
Status: UNCONFIRMED → RESOLVED
Last Resolved: a year ago
Component: Untriaged → Session Restore
Resolution: --- → INCOMPLETE
(Reporter)

Comment 2

a year ago
You can merge this bug with https://bugzilla.mozilla.org/show_bug.cgi?id=1426913
I noticed the other bug after I opened this one.
(Reporter)

Updated

a year ago
Severity: normal → major
Status: RESOLVED → UNCONFIRMED
Priority: -- → P3
Resolution: INCOMPLETE → ---
Summary: Sometimes Firefox can't remember the current site in a tab, so instead it will restore an old website → Firefox can't autosave sometimes the state of tabs
(Reporter)

Comment 3

a year ago
Firefox does (at least should) save tab state on an interval - default is every 15 seconds. The setting is browser.sessionstore.interval. But it doesn't work properly, as when the browser left minimized for hours, closed it, went back to it later in the day and not have the correct state of tabs loaded. 

The example would be if in single tab different sites will be visited (for example reddit threads) and previous tread (not the last one visited) will be present after Firefox restart.

See here:
https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/drpoxcw/
(Reporter)

Updated

a year ago
(Reporter)

Updated

a year ago
Blocks: 450886
Severity: major → normal
OS: Unspecified → Windows
(In reply to Robert Ab from comment #4)
> This problem is reported by multiple users:
> https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/
Doesn't restore correct thread

> https://www.reddit.com/r/firefox/comments/7lnpp6/how_do_i_get_firefox_to_properly_restore_my_last/
But this one is different - restores tabs which were previously closed
(Reporter)

Comment 6

a year ago
(In reply to Wayne Mery (:wsmwk) from comment #5)
> (In reply to Robert Ab from comment #4)
> > This problem is reported by multiple users:
> > https://www.reddit.com/r/firefox/comments/7lbc69/sometimes_firefox_cant_remember_the_current_site/
> Doesn't restore correct thread
> 
> > https://www.reddit.com/r/firefox/comments/7lnpp6/how_do_i_get_firefox_to_properly_restore_my_last/
> But this one is different - restores tabs which were previously closed


So the problem with Firefox session restoration is even more serious. I will need to change title.
(Reporter)

Updated

a year ago
Summary: Firefox can't autosave sometimes the state of tabs → Firefox can't autosave correctly session on browser exit (the list of tabs and their state) (sometimes)
(Reporter)

Updated

a year ago
Depends on: 1426913
(Reporter)

Updated

a year ago
Severity: normal → major

Comment 9

a year ago
I encounter something like this on Nightly. There's an error in the Console: 

Could not write session state file  Object { operation: "move", path: "C:\Users\user\AppData\Roaming\Mozilla\Firefox\Profiles\7w701c2s.default\sessionstore-backups\recovery.jsonlz4", winLastError: 5, stack: "", fileName: "(unknown module)", lineNumber: undefined } SessionFile.jsm:396
(In reply to Krzysztof from comment #9)
> I encounter something like this on Nightly. There's an error in the Console: 
> 
> Could not write session state file  Object { operation: "move", path:
> "C:\Users\user\AppData\Roaming\Mozilla\Firefox\Profiles\7w701c2s.
> default\sessionstore-backups\recovery.jsonlz4", winLastError: 5, stack: "",
> fileName: "(unknown module)", lineNumber: undefined } SessionFile.jsm:396

Krzysztof, your log is very valuable! Thanks a lot!

The log shows that the web worker failed to write the session to files [1] for some reasons we don't know yet,
and I think this can explain the issue here.

:mikedeboer had already foreseen this and did some great works in bug 1402267 to catch/count the worker failures and restart the worker.
We can observe the failure at [2], like WORKER_RESTART_COUNT.
(This data collection in release channel is out-out.)

I assume that this issue happened between two timings:
1. Web worker starts to fail.
2. The reported failures are not accumulated to exceed the threshold, which is currently set to 5.

Krzysztof, if possible, could you help to gather more logs? 
This could help us to understand more, like the reason behind the worker failures.
Many thanks for your support!

Mike, is the assumption reasonable to you?
Do you think that adjusting the threshold to 1 is a bad idea?
(Sorry! I understand that now is not the good time to NI you, so take your time :) )
Thanks!


[1]
https://searchfox.org/mozilla-central/rev/03877052c151a8f062eea177f684a2743cd7b1d5/browser/components/sessionstore/SessionFile.jsm#396

[2]
https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F59&measure=SCALARS_BROWSER.SESSION.RESTORE.WORKER_RESTART_COUNT&min_channel_version=nightly%252F56&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0

"A counter incremented every time the SessionFile worker is restarted due to too many failures, as defined in the browser.sessionstore.max_write_failures pref."
Flags: needinfo?(mdeboer)
Flags: needinfo?(krzysztof.glebowicz)

Comment 11

a year ago
Today I got plenty of errors like below at SessionFile.jsm:396:

Could not write session state file  TypeError: this.Paths is null

write@resource:///modules/sessionstore/SessionWorker.js:208:9
worker.dispatch@resource:///modules/sessionstore/SessionWorker.js:23:24
handleMessage@resource://gre/modules/workers/PromiseWorker.js:124:16
@resource:///modules/sessionstore/SessionWorker.js:32:41
EventListener.handleEvent*@resource:///modules/sessionstore/SessionWorker.js:32:1
Flags: needinfo?(krzysztof.glebowicz)
(Assignee)

Comment 12

a year ago
This looks like a worker restart issue once more where the SessionFile worker (which is where the session data is written to disk) hasn't been initialized properly, it seems. I'll see if I can trigger this situation myself.
Blocks: 1402267
Flags: needinfo?(mdeboer)
(Assignee)

Updated

a year ago
Flags: needinfo?(mdeboer)
Comment hidden (mozreview-request)
(Assignee)

Comment 14

a year ago
So the information provided by Krzysztof and Will is super useful here - I haven't had a change to take a good look at the telemetry probe I recently added to the new worker health check - and allowed me to find a fix for this issue!
Thanks!
Assignee: nobody → mdeboer
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(mdeboer)
Priority: P3 → P2
Comment hidden (mozreview-request)
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

https://reviewboard.mozilla.org/r/214634/#review220444

::: browser/components/sessionstore/SessionFile.jsm:217
(Diff revision 2)
>    },
>  
>    async _readInternal(useOldExtension) {
>      let result;
>      let noFilesFound = true;
> +    this._usingOldExtension = useOldExtension;

Could you document this field?

::: browser/components/sessionstore/SessionFile.jsm:308
(Diff revision 2)
>          source: "",
>          parsed: null,
>          useOldExtension: false
>        };
>      }
> +    this._readOrigin = result.origin;

Also this, could you document it? In particular, *when* does it become usable?

::: browser/components/sessionstore/SessionFile.jsm:322
(Diff revision 2)
> +  },
> +
> +  // Initialize the worker in the background, whilst allowing callees to block on
> +  // its completion. This will never throw an error.
> +  _initWorker() {
> +    this._initializationStarted = true;

Could you comment that this line **MUST** be called immediately (in the current tick, microtask, etc.) in `_initWorker`, in order to avoid race conditions?

Also, just to be defensive, could you assert that `this._readOrigin` has already been set?

::: browser/components/sessionstore/SessionFile.jsm:337
(Diff revision 2)
>    },
>  
> -  // Post a message to the worker, making sure that it has been initialized
> +  // Post a message to the worker, making sure that it has been initialized first.
> -  // first.
>    async _postToWorker(...args) {
>      if (!this._initializationStarted) {

If `_initializationStarted` has been set to `true` on line 283, I believe that you may end up calling `SessionWorker.post` before `this._initWorker`.

Also, I'd be more comfortable if the code that sets `_initializationStarted` to `true` was in the same place as that that checks whether it's `true`. This would help ensure that we don't add a race condition accidentally.
Comment hidden (mozreview-request)
(Assignee)

Comment 18

a year ago
David, I hope I fixed all the issues you raised during review!
(Assignee)

Comment 19

a year ago
David, do you have a bit of time to review this patch, perhaps? Or would you like to proxy the request?
Flags: needinfo?(dteller)
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

Apologies, I'll try to review this later today.
Flags: needinfo?(dteller)
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

https://reviewboard.mozilla.org/r/214634/#review222086

::: browser/components/sessionstore/SessionFile.jsm:325
(Diff revisions 2 - 3)
>  
>      return result;
>    },
>  
>    // Initialize the worker in the background, whilst allowing callees to block on
>    // its completion. This will never throw an error.

The documentation isn't true anymore, since the returned promise doesn't guarantee that initialization is complete.

I'm nearly sure that the code still works, because: 

1. `SessionWorker.post` will serialize messages;
2. a worker won't handle a message before it has run-to-completion the previous message;
3. consequently, the next call to `_postToWorker` will be executed by the worker only after the `SessionWorker.post("init", ...)` has been executed to completion.


However, these is non-trivial reasoning, so it deserves to be documented :)

Also, since your patch is about reliability in face of worker crash, you should document the behavior in case of crash, including both crashes during initialization and after initialization – yes, I know that your code will handle both the same way, but one line of documentation to mention it won't hurt :)

::: browser/components/sessionstore/SessionFile.jsm:333
(Diff revisions 2 - 3)
> +      if (this._initializationStarted) {
> +        resolve();
> +        return;
> +      }
> +
> +      if (!this._readOrigin)

Nit: Let's avoid duplicating Apple's mistakes and keep the brackets for a `if ()` :)

::: browser/components/sessionstore/SessionFile.jsm:208
(Diff revision 3)
>  
> +  // A string that will be set to the session file name part that was read from
> +  // disk. It will be available _after_ a session file read() is done.
> +  _readOrigin: null,
> +
> +  // `true` if the old, uncompressed, file format was use read from disk as a

Typo: "was use**d**".
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

https://reviewboard.mozilla.org/r/214634/#review222100

Forgot the r+.

Apologies about the delays, I've been a bit under the weather recently.
Attachment #8944432 - Flags: review?(dteller) → review+
(Assignee)

Comment 23

a year ago
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #20)
> Apologies,

No need to do that, David! I already deduced as much a while ago what you said in comment 22 ;-)

Get well soon! Thanks for the review, especially considering the circumstances.
Comment hidden (mozreview-request)
(Assignee)

Comment 25

a year ago
I'm not landing this yet, since the SessionWorker tests appear to be perma-failing with the patch applied. The semantics of wipe() and read() changed slighty, so I need to look at the tests to figure out how to fix things properly.
Comment hidden (mozreview-request)
Comment on attachment 8947543 [details]
Bug 1427007 - Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit.

https://reviewboard.mozilla.org/r/217242/#review223062

::: browser/components/sessionstore/SessionFile.jsm:455
(Diff revision 1)
>      });
>    },
>  
>    wipe() {
> -    return this._postToWorker("wipe");
> +    return this._postToWorker("wipe").then(() => {
> +      this._initializationStarted = false;

Could you document why we need to reinitialize after a call to `wipe`?
Attachment #8947543 - Flags: review?(dteller) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 33

a year ago
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/986dab420f52
Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data. r=Yoric
https://hg.mozilla.org/integration/autoland/rev/79b25461de35
Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit. r=Yoric
Comment hidden (mozreview-request)

Comment 36

a year ago
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9cf13efe8783
Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data. r=Yoric
https://hg.mozilla.org/integration/autoland/rev/3ad66bc5f746
Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit. r=Yoric

Comment 37

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9cf13efe8783
https://hg.mozilla.org/mozilla-central/rev/3ad66bc5f746
Status: ASSIGNED → RESOLVED
Last Resolved: a year agoa year ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
(Reporter)

Comment 38

a year ago
Thank you for solving this bug.

Probably Bug 1426913 should also be marked as solved since they describing the same problem.
(Reporter)

Updated

a year ago
No longer depends on: 1426913

Comment 40

a year ago
I'm seeing this bug on dev edition (59.0b7) frequently, can the patch be uplifted?
(Assignee)

Comment 41

a year ago
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1402267
[User impact if declined]: We've seen reports of sessions that are not saved to disk properly and highly suspect this is the main cause. See e.g. bug 1423305, bug 1423672, bug 1433026 and quite a few others where we are unable to reproduce the issues reported, but they are evidently there.
We have Telemetry data backing that a worker restart is not uncommon at [1], so that's something to consider as well.
[Is this code covered by automated tests?]: Yes, see patch number two.
[Has the fix been verified in Nightly?]: I think comment 38 hints at that, but since we know about [1], it's prudent to fix this properly.
[Needs manual test from QE? If yes, steps to reproduce]: No, since it's very hard to reproduce reliably.
[List of other uplifts needed for the feature/fix]: n/a.
[Is the change risky?]: Minimal risk.
[Why is the change risky/not risky?]: Since the worker restart mechanism added in bug 1402267 was incomplete. 
[String changes made/needed]: n/a.

[1] https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median!5th-percentile!95th-percentile&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F59&measure=SCALARS_BROWSER.SESSION.RESTORE.WORKER_RESTART_COUNT&min_channel_version=nightly%252F56&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0
Flags: needinfo?(mdeboer)
Attachment #8944432 - Flags: approval-mozilla-beta?
(Assignee)

Comment 42

a year ago
Comment on attachment 8947543 [details]
Bug 1427007 - Part 2: the SessionFile unit tests dependended on a very specific interaction between wipe() and read(), which changed in part 1. This fixes the tests up by making the dependency a bit more clear and explicit.

Please see commit 41 for the Approval Request comment.
Attachment #8947543 - Flags: approval-mozilla-beta?
Comment on attachment 8944432 [details]
Bug 1427007 - Part 1: When a SessionWorker is restarted, we also need to re-initialize it with the correct SessionFile data.

Fixes a regression in saving session (needed for session restore), new automated tests (yay!), Beta59+
Attachment #8944432 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8947543 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.