Closed Bug 1190627 Opened 9 years ago Closed 8 years ago

Fennec lost session's open tabs after restart

Categories

(Firefox for Android Graveyard :: General, defect)

defect
Not set
normal

Tracking

(firefox50 wontfix, firefox51 fixed)

RESOLVED FIXED
Firefox 51
Tracking Status
firefox50 --- wontfix
firefox51 --- fixed

People

(Reporter: bugzilla, Assigned: JanH)

References

Details

Attachments

(4 files, 4 obsolete files)

58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
Nexus 7 2013 running CyanogenMod 11. Fennec 39.

I'm a bit of a tab hoarder, so I probably had a good ~80 tabs or so in my session. Fennec wasn't loading pages properly; it would make partial progress and then just get stuck.

I pressed Android's "Recents" on-screen navigation button which displayed all of my running apps, swiped right on Fennec to close it, and then restarted Fennec. When I did so, the tabs that were previously available in my session were gone.
Did you check the "Recent tabs" panel? Normally your previous session should show up there.
Flags: needinfo?(aklotz)
Yeah, it was empty, saying "Your recent tabs show up here."
Flags: needinfo?(aklotz)
From IRC (http://logs.glob.uno/?c=mozilla%23mobile&s=20+Apr+2016&e=20+Apr+2016#c582608), this time not after killing Firefox but after a phone crash while Firefox was running:
> 08:40	zeorin	Hi, I had my android phone crash and after restarting
> all of my tabs in firefox were gone. Is there any way I can recover
> the list of tabs that were open?
> My phone is synced to my firefox sync account.
> 10:50 JanH what's your restore tabs setting?
> 10:51 zeorin "Always restore"

I think I've seen this happening myself on my phone, but not for a very long time now.

Regular async session store writes are using OS.File.writeAtomic with a temp file which is then moved into the actual destination file, so a damaged sessionstore.js should be pretty unlikely, but unfortunately not completely impossible. An interrupted sync write (which currenty writes directly to sessionstore.js, something I'm going to address as part of bug 1256277) seems unlikely, as those only happen after an "application-background" notification and in the above case the phone simply crashed (it could have played a role in the original bug report, though).
A solution could be to try creating a "known-good" backup copy of the sessionstore.js at regular intervals, which could then be used as a fallback in the case of a corrupted sessionstore.js.
Assignee: nobody → jh+bugzilla
(In reply to Jan Henning [:JanH] from comment #3)
> I think I've seen this happening myself on my phone, but not for a very long
> time now.

I spoke too soon. I'm waiting for a replacement battery for my phone and in the meantime, it's prone to going into under-voltage and a boot loop long before the battery level is approaching 0 %. I've just had that happen while opening a new tab in Firefox and had a hunch of looking into my profile directory before starting Firefox again - I was unfortunately being greeted by a sessionstore.js of 0 bytes size.
Review commit: https://reviewboard.mozilla.org/r/56094/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56094/
Attachment #8757718 - Flags: review?(margaret.leibovic)
Attachment #8757719 - Flags: review?(margaret.leibovic)
Attachment #8757720 - Flags: review?(margaret.leibovic)
Attachment #8757721 - Flags: review?(margaret.leibovic)
Attachment #8757722 - Flags: review?(margaret.leibovic)
Attachment #8757723 - Flags: review?(margaret.leibovic)
Attachment #8757724 - Flags: review?(margaret.leibovic)
Currently, sync writes go directly to the destination file, so an interrupted write will leave the session store data in an inconsistent state. To minimise the incidence of this occurring as far as possible, we now mimic the behaviour of atomicWrite when a tmpPath is set and write to a temporary file which is then renamed to the actual destination file after writing has finished.

Review commit: https://reviewboard.mozilla.org/r/56096/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56096/
Currently, it is possible for a second write (sync or async) to be requested while a previous async write operation is still in progress. This might lead to undesired results if the second write is then completed before the first write, or if a sync write is interfering with a parallel async write operation. The only guard against a second async write is the minimum delay of 2 s between successive async writes enforced in saveStateDelayed(); there is no guard against sync writes.

To avoid data loss when the application is backgrounded, it is desirable to reduce or completely eliminate this minimum delay (see Part 3), therefore we need to devise alternative means of ensuring that successive writes won't interfere with each other.

With this patch, only one save operation is allowed to execute within _saveState() at the same time. Successive calls to _saveState() will be deferred, coalesced into one operation and executed once the previous async write returns from _writeFile()'s promise callback. Sync writes take priority, so if any of the deferred calls to _saveState() is a sync write, the resulting operation will be a sync write, too.

This has the slight drawback that we can't execute truly synchronously within Android's onPause() call if an async state save is already in progress, however this should occur only very occasionally and is probably still more desirable than a possible write collision with a previous async state save.

Review commit: https://reviewboard.mozilla.org/r/56098/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56098/
When we are backgrounded and Android's onPause() handler runs, we try to synchronously flush out any pending session store to storage. If however some tab events (e.g. tab closing) have been dispatched shortly before the application backgrounding, it is possible that they'll arrive at the session store after the "application-background" event. In this case, we need to process and write them to storage as fast as possible, as we can be killed at any moment now.

Therefore the delay between successive writes is completely abolished while the application is in background.
The minimum delay between a call to saveStateDelayed() and a write operation however is not completely eliminated and instead only reduced to 200 ms, so as to allow for closely following tab events (e.g. closing a tab involves both a TabSelect and TabClose event) to be batched together in one write operation.

Review commit: https://reviewboard.mozilla.org/r/56100/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56100/
Currently, despite its name sessionstore.bak isn't actually a backup copy, but simply contains the last session if we aren't restoring tabs automatically and is used for powering the "Tabs from last time" section of the Recent Tabs panel.

This patch changes its filename to sessionstore.old, which frees up sessionstore.bak to be used for an actual backup copy of the current session store data.

If we are not restoring tabs automatically, sessionstore.old will be freshly recreated during each app startup by copying from sessionstore.js's contents, whereas if we *are* restoring automatically, any sessionstore.old file older than a day will be expired anyway, therefore no special migration logic is necessary for this change.

Review commit: https://reviewboard.mozilla.org/r/56102/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56102/
We now do a backup copy of the session store data at regular intervals to guard against interrupted write operations damaging the main session data file.

We don't use writeAtomic()'s backupTo option, because that one works by first moving the old data to the backup file before attempting to write the new data, which might still leave us vulnerable against data loss if Firefox crashes or is otherwise forcibly terminated at precisely that moment.

Review commit: https://reviewboard.mozilla.org/r/56104/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/56104/
For Part 6, do we possibly want some telemetry to see how often people are actually hitting this (this = sessionstore.js gives an exception, but sessionstore.bak is read successfully)?
Flags: needinfo?(margaret.leibovic)
Comment on attachment 8757718 [details]
Bug 1190627 - Part 0 - Fix session store logging logic.

https://reviewboard.mozilla.org/r/56094/#review53696
Attachment #8757718 - Flags: review?(margaret.leibovic) → review+
Comment on attachment 8757719 [details]
Bug 1190627 - Part 1 - Use temp file for synchronous writes, too.

https://reviewboard.mozilla.org/r/56096/#review53700
Attachment #8757719 - Flags: review?(margaret.leibovic) → review+
(In reply to Jan Henning [:JanH] from comment #12)
> For Part 6, do we possibly want some telemetry to see how often people are
> actually hitting this (this = sessionstore.js gives an exception, but
> sessionstore.bak is read successfully)?

Sure, I think this would be good information. My main concern is that someone needs to actually look at this data, but if you will be that person to take a look, then we can add the probe. This sounds like something for a histogram probe.
Flags: needinfo?(margaret.leibovic)
Comment on attachment 8757718 [details]
Bug 1190627 - Part 0 - Fix session store logging logic.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56094/diff/1-2/
Comment on attachment 8757719 [details]
Bug 1190627 - Part 1 - Use temp file for synchronous writes, too.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56096/diff/1-2/
Comment on attachment 8757720 [details]
Bug 1190627 - Part 2 - Defer writes if there's already an async write in progress.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56098/diff/1-2/
Comment on attachment 8757721 [details]
Bug 1190627 - Part 3 - Reduce session store save delays when in background.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56100/diff/1-2/
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56102/diff/1-2/
Comment on attachment 8757723 [details]
Bug 1190627 - Part 2 - Do regular session data backups.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56104/diff/1-2/
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56106/diff/1-2/
Since this also involved a rebase, the relevant change is in part 5 (https://reviewboard.mozilla.org/r/56104/diff/1-2/) around lines 672-682 (old) respectively 832-841 (new).
Comment on attachment 8757718 [details]
Bug 1190627 - Part 0 - Fix session store logging logic.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56094/diff/2-3/
Attachment #8757718 - Attachment description: MozReview Request: Bug 1190627 - Part 0 - Fix session store logging logic. r=margaret → Bug 1190627 - Part 0 - Fix session store logging logic.
Attachment #8757719 - Attachment description: MozReview Request: Bug 1190627 - Part 1 - Use temp file for synchronous writes, too. r=margaret → Bug 1190627 - Part 1 - Use temp file for synchronous writes, too.
Attachment #8757720 - Attachment description: MozReview Request: Bug 1190627 - Part 2 - Defer writes if there's already an async write in progress. r=margaret → Bug 1190627 - Part 2 - Defer writes if there's already an async write in progress.
Attachment #8757721 - Attachment description: MozReview Request: Bug 1190627 - Part 3 - Reduce session store save delays when in background. r=margaret → Bug 1190627 - Part 3 - Reduce session store save delays when in background.
Attachment #8757722 - Attachment description: MozReview Request: Bug 1190627 - Part 4 - Reorganise session store file names. r=margaret → Bug 1190627 - Part 4 - Reorganise session store file names.
Attachment #8757723 - Attachment description: MozReview Request: Bug 1190627 - Part 5 - Do regular session data backups. r=margaret → Bug 1190627 - Part 5 - Do regular session data backups.
Attachment #8757724 - Attachment description: MozReview Request: Bug 1190627 - Part 6 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead. r=margaret → Bug 1190627 - Part 6 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.
Attachment #8758860 - Attachment description: MozReview Request: Bug 1190627 - Part 7 - Add telemetry probe for measuring how often we restore from the backup session copy. r=margaret → Bug 1190627 - Part 7 - Add telemetry probe for measuring how often we restore from the backup session copy.
Comment on attachment 8757719 [details]
Bug 1190627 - Part 1 - Use temp file for synchronous writes, too.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56096/diff/2-3/
Comment on attachment 8757720 [details]
Bug 1190627 - Part 2 - Defer writes if there's already an async write in progress.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56098/diff/2-3/
Comment on attachment 8757721 [details]
Bug 1190627 - Part 3 - Reduce session store save delays when in background.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56100/diff/2-3/
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56102/diff/2-3/
Comment on attachment 8757723 [details]
Bug 1190627 - Part 2 - Do regular session data backups.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56104/diff/2-3/
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56106/diff/2-3/
Comment on attachment 8758860 [details]
Bug 1190627 - Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56976/diff/1-2/
Part 4 needs updating now that bug 1251362 has landed, and histogram.json in part 7 has some embarrassing errors on my part which need fixing - updated patch follows shortly once I'm sure this actually builds as expected.
Comment on attachment 8757718 [details]
Bug 1190627 - Part 0 - Fix session store logging logic.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56094/diff/3-4/
Comment on attachment 8757719 [details]
Bug 1190627 - Part 1 - Use temp file for synchronous writes, too.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56096/diff/3-4/
Comment on attachment 8757720 [details]
Bug 1190627 - Part 2 - Defer writes if there's already an async write in progress.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56098/diff/3-4/
Comment on attachment 8757721 [details]
Bug 1190627 - Part 3 - Reduce session store save delays when in background.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56100/diff/3-4/
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56102/diff/3-4/
Comment on attachment 8757723 [details]
Bug 1190627 - Part 2 - Do regular session data backups.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56104/diff/3-4/
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56106/diff/3-4/
Comment on attachment 8758860 [details]
Bug 1190627 - Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56976/diff/2-3/
Comment on attachment 8757720 [details]
Bug 1190627 - Part 2 - Defer writes if there's already an async write in progress.

https://reviewboard.mozilla.org/r/56098/#review55920

I would be interested to see how desktop handles this, although lifecycle constraints are different there, so the design there may not apply well to mobile. Probably a reason we have a lot of the problems we do is that this originally stemmed from desktop logic.
Attachment #8757720 - Flags: review?(margaret.leibovic) → review+
Comment on attachment 8757721 [details]
Bug 1190627 - Part 3 - Reduce session store save delays when in background.

https://reviewboard.mozilla.org/r/56100/#review55924
Attachment #8757721 - Flags: review?(margaret.leibovic) → review+
https://reviewboard.mozilla.org/r/56102/#review55932

::: mobile/android/base/java/org/mozilla/gecko/home/RecentTabsAdapter.java:132
(Diff revision 4)
>          GeckoProfile.get(context).waitForOldSessionDataProcessing();
>  
>          ThreadUtils.postToBackgroundThread(new Runnable() {
>              @Override
>              public void run() {
> -                final String jsonString = GeckoProfile.get(context).readSessionFile(true);
> +                final String jsonString = GeckoProfile.get(context).readPreviousSessionFile();

With this change, the only consumer of `readSessionFile` calls it with `false` as a parameter, so does anything actually read the backup file? Can we get rid of that method parameter?
https://reviewboard.mozilla.org/r/56104/#review55934

::: mobile/android/components/SessionStore.js:840
(Diff revision 4)
> +        this._sessionFileBackup.remove(false);
> +      }
> +
> +      log("_saveState() backing up session data");
> +      this._sessionFile.copyTo(null, this._sessionFileBackup.leafName);
> +      this._lastBackupTime = Date.now();

This looks like a synchronous file write. Will this block the JS thread?
https://reviewboard.mozilla.org/r/56106/#review55936

::: mobile/android/base/java/org/mozilla/gecko/GeckoApp.java:1302
(Diff revision 4)
> -                        Log.e(LOGTAG, "An error occurred during restore", e);
> +                        Log.e(LOGTAG, "An error occurred during restore, switching to backup file", e);
> +                        try {
> +                            restoreMessage = restoreSessionTabs(isExternalURL, true);
> +                        } catch (SessionRestoreException ex) {
> +                            // If this fails, too, do a normal startup.
> +                            Log.e(LOGTAG, "An error occurred during restore", ex);

How often is this a problem in practice? Can you first add a telemetry probe here to see how often users experience this? I worry about the extra cost of always writing a backup file, and if this is a very rare edge case, I don't know that it's worth it.

If you file a separate bug/patch for the telemetry probe, we can uplift it to beta to get more data. You could use a histogram probe for this to compare the number of succesful restores to failed restores.
https://reviewboard.mozilla.org/r/56976/#review55940

::: mobile/android/base/java/org/mozilla/gecko/GeckoApp.java:1300
(Diff revision 3)
>                      } catch (SessionRestoreException e) {
>                          // If restore failed, try reading the backup file.
>                          Log.e(LOGTAG, "An error occurred during restore, switching to backup file", e);
>                          try {
>                              restoreMessage = restoreSessionTabs(isExternalURL, true);
> +                            Telemetry.addToHistogram("FENNEC_SESSIONSTORE_RESTORING_FROM_BACKUP", 1);

Oops, didn't see this commit before my last comment!

Let's create a separate probe that we land first and uplift, before landing the rest of this bug.
https://reviewboard.mozilla.org/r/56106/#review55936

> How often is this a problem in practice? Can you first add a telemetry probe here to see how often users experience this? I worry about the extra cost of always writing a backup file, and if this is a very rare edge case, I don't know that it's worth it.
> 
> If you file a separate bug/patch for the telemetry probe, we can uplift it to beta to get more data. You could use a histogram probe for this to compare the number of succesful restores to failed restores.

Speaking anecdotally, there is this bug report, somebody [was complaining on IRC](http://logs.glob.uno/?c=mozilla%23mobile&s=20+Apr+2016&e=20+Apr+2016#c582608) about this a few weeks ago, somebody else was complaining in a support forum about loosing all open tabs and I personally have definitively hit this at least twice in the last few weeks - and possibly previously as well.

If you're still worried, I could dial the backup interval back down to e.g. 5 minutes until we get some more telemetry data. But I'll check whether some telemetry can be done easily without having a backup file in place.
https://reviewboard.mozilla.org/r/56104/#review55934

> This looks like a synchronous file write. Will this block the JS thread?

Unless I'm mistaken, it is. My thinking behind this was that I want to be sure that the backup file is in place before doing any write operations on the main session store file.

My initial solution was to simply use `writeAtomic`'s `backupTo` option, but that works by simply renaming the original file before writing the new data, which isn't really different from the current procedure of writing to a temp file and still leaves us vulnerable to data loss if the renaming is interrupted. So instead, I've opted for the above solution.
https://reviewboard.mozilla.org/r/56102/#review55932

> With this change, the only consumer of `readSessionFile` calls it with `false` as a parameter, so does anything actually read the backup file? Can we get rid of that method parameter?

This is going to get used again in [Part 6](https://reviewboard.mozilla.org/r/56106/diff/4#index_header). While it's a good idea that each commit should be able to stand on it's own, ripping this out here only to be reintroducing it again in Part 6 feels a bit pointless.
https://reviewboard.mozilla.org/r/56976/#review55940

> Oops, didn't see this commit before my last comment!
> 
> Let's create a separate probe that we land first and uplift, before landing the rest of this bug.

I just need to check that we don't regularly hit this exception for users who are clearing their history or something like that.
Depends on: 1284013
Depends on: 1284017
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

Cancelling pending a slight rework and telemetry results.
Attachment #8757724 - Flags: review?(margaret.leibovic)
Attachment #8758860 - Flags: review?(margaret.leibovic)
Attachment #8757723 - Flags: review?(margaret.leibovic)
Attachment #8757722 - Flags: review?(margaret.leibovic)
https://reviewboard.mozilla.org/r/56104/#review55934

> Unless I'm mistaken, it is. My thinking behind this was that I want to be sure that the backup file is in place before doing any write operations on the main session store file.
> 
> My initial solution was to simply use `writeAtomic`'s `backupTo` option, but that works by simply renaming the original file before writing the new data, which isn't really different from the current procedure of writing to a temp file and still leaves us vulnerable to data loss if the renaming is interrupted. So instead, I've opted for the above solution.

Hmm, [according to the desktop session store](https://dxr.mozilla.org/mozilla-central/rev/6b65dd49d4f045c0a9753ce60bdb4b7b4aaedcf8/browser/components/sessionstore/SessionFile.jsm#25), the OS.File API guarantees that multiple calls regarding the same file are executed sequentially and in order, so switching to OS.File.copy() wouldn't be a problem in combination with the asynchronous file writes.
There still remains the problem of the synchronous write operations during backgrounding/quitting, though - we can sidestep some of the issue by simply not attempting to backup if aAsync = false, but even if we do, there's nothing stopping a following call attempting a synchronous write while an async copy and subsequent write operation might still be in progress.

Hence doing a synchronous copy operation on the main thread isn't nice, but at least it ensures that no other write operation can be started while the copy is still in progress.
https://reviewboard.mozilla.org/r/56106/#review55936

> Speaking anecdotally, there is this bug report, somebody [was complaining on IRC](http://logs.glob.uno/?c=mozilla%23mobile&s=20+Apr+2016&e=20+Apr+2016#c582608) about this a few weeks ago, somebody else was complaining in a support forum about loosing all open tabs and I personally have definitively hit this at least twice in the last few weeks - and possibly previously as well.
> 
> If you're still worried, I could dial the backup interval back down to e.g. 5 minutes until we get some more telemetry data. But I'll check whether some telemetry can be done easily without having a backup file in place.

According to the telemetry data we've got so far, we hit what is presumably a damaged session store file on 0.1 - 0.2 % of our startups. There might be a slight chance of some false positives (I've only now realised that it's possible to close all normal tabs if you've got at least one private tab open), plus for proper interpretation it would be interesting to know
- How many of our users are actually automatically restoring their tabs on startup (should be larger now that this is the default setting)?
- How many app startups (and therefore also app kills, where the actual problem lies) the average user goes through per day?

In any case it means that over the course of several weeks, we most probably end up with a non-trivial chance of encountering a damaged session file, which means loosing all the user's open tabs - and besides, we've also received some more anecdotal reports of this happening since the last activity on this bug.
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56102/diff/4-5/
Attachment #8757722 - Attachment description: Bug 1190627 - Part 4 - Reorganise session store file names. → Bug 1190627 - Part 1 - Reorganise session store file names.
Attachment #8757723 - Attachment description: Bug 1190627 - Part 5 - Do regular session data backups. → Bug 1190627 - Part 2 - Do regular session data backups.
Attachment #8757724 - Attachment description: Bug 1190627 - Part 6 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead. → Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.
Attachment #8758860 - Attachment description: Bug 1190627 - Part 7 - Add telemetry probe for measuring how often we restore from the backup session copy. → Bug 1190627 - Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy.
Attachment #8757722 - Flags: review?(s.kaspari)
Attachment #8757723 - Flags: review?(s.kaspari)
Attachment #8757724 - Flags: review?(s.kaspari)
Attachment #8758860 - Flags: review?(s.kaspari)
Comment on attachment 8757723 [details]
Bug 1190627 - Part 2 - Do regular session data backups.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56104/diff/4-5/
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56106/diff/4-5/
Comment on attachment 8758860 [details]
Bug 1190627 - Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/56976/diff/3-4/
Attachment #8757718 - Attachment is obsolete: true
Attachment #8757719 - Attachment is obsolete: true
Attachment #8757720 - Attachment is obsolete: true
Attachment #8757721 - Attachment is obsolete: true
Attachment #8757720 - Flags: review+
Attachment #8757721 - Flags: review+
@Jan: I read the patches but I have some questions before continuing:

* Is it a problem if there's still an old sessionstore.bak from before this patch?
* Can we guarantee that the backup is close to the (broken) last session store? -> Can it happen that we restore an old backup that has significantly changed since saving the last session? I'm wondering if restoring an old state is always better than no state.
Flags: needinfo?(jh+bugzilla)
(In reply to Sebastian Kaspari (:sebastian) from comment #59)
> @Jan: I read the patches but I have some questions before continuing:
> 
> * Is it a problem if there's still an old sessionstore.bak from before this
> patch?

That depends how much of a problem it would be if after encoutering a damaged sessionstore.js (currently 0.13 % rate per session on Beta, slightly higher on Aurora/Nightly) the next-to-last session got restored instead of the true backup. This would mostly be a problem for users that are never restoring, because I think we turn on session restoring for the first startup after an update, in which case sessionstore.bak will contain the next-to-last session as it is currently the case. So it would be interesting to know what percentage of people is using which tab restore setting now that the default has been flipped.

For users who are always restoring their tabs automatically, because of bug 1266339 this is only a problem if they've
a) switched their tab restore setting from "Never" to "Always",
b) experienced multiple successive crashes, or
c) closed all their tabs during the last session, so the session store file contained only history-less about:home tabs

*and* the first startup after updating to a build containing this patch happened within 24 hours or else was the first startup after more than 24 hours had passed since the last occurrence of one the events mentioned above (or else they're directly updating from Firefox 48 or earlier, meaning they skip any builds that contain bug 1266339 but not this one) - *and* of course we need to have a defective sessionstore.js at the same time, too.

Additionally, case c) doesn't really matter because in that case the "empty" session file ends up as sessionstore.bak, in which trying to restore from it doesn't cause anything interesting to happen anyway.

So here it's only cases a) and b) that are really relevant. Considering that in February (the last month for which the old UI telemetry dashboard has data available - is there anything publicly available for the new UI telemetry data anywhere?), while tab restoring was still off by default, apparently only ~ 3 % of users had bothered to switch that setting, so I don't think that there are dramatically more people even now with the reversed default - and here we're only counting people who have actively switched from "Never" to "Always" at just the right/wrong time.
Regarding b), it might be interesting to have some telemetry how often we temporarily turn off session restoring after successive crashes, but with a crash rate of ~ 1.3 % per day and install it shouldn't be too likely, either.

> * Can we guarantee that the backup is close to the (broken) last session
> store? -> Can it happen that we restore an old backup that has significantly
> changed since saving the last session? I'm wondering if restoring an old
> state is always better than no state.

Not absolutely, but pretty close I think. The tabs opened during startup (even if it's just the default single about:home tab when we aren't restoring anything of interest) automatically queue the first session store write soon (10 s) after Gecko startup, and once that has gone through and a sessionstore.js is in place for certain, the next session store write (after clicking a link, opening or closing a tab, editing an input form, or even just scrolling/zooming) will create an up-to-date backup file.
After that, we always update the backup file if more than two minutes have passed since the last backup [1] - while you could still open/close quite a number of tabs in that time span, I wouldn't call that terribly out of date.

[1] Unless by chance we have ended up with multiple writes in flight at the same time, but in that case the backup gets updated at the next possible write, which because of scroll position recording should happen pretty soon as long as the user is actively browsing.
Flags: needinfo?(jh+bugzilla)
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

https://reviewboard.mozilla.org/r/56102/#review69396
Attachment #8757722 - Flags: review?(s.kaspari) → review+
Comment on attachment 8757723 [details]
Bug 1190627 - Part 2 - Do regular session data backups.

https://reviewboard.mozilla.org/r/56104/#review69398
Attachment #8757723 - Flags: review?(s.kaspari) → review+
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

https://reviewboard.mozilla.org/r/56106/#review69400

::: mobile/android/base/java/org/mozilla/gecko/GeckoApp.java:1404
(Diff revision 5)
> +                            try {
> +                                restoreMessage = restoreSessionTabs(isExternalURL, true);
> +                            } catch (SessionRestoreException ex) {
> +                                // If this fails, too, do a normal startup.
> +                                Log.e(LOGTAG, "An error occurred during restore", ex);
> -                        mShouldRestore = false;
> +                                mShouldRestore = false;
> -                    }
> +                            }

Does it make sense to add additional telemetry here? To see how often we recovered a session from a backup and how often we failed to read the session file and the backup?
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

https://reviewboard.mozilla.org/r/56106/#review69402
Attachment #8757724 - Flags: review?(s.kaspari) → review+
Comment on attachment 8758860 [details]
Bug 1190627 - Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy.

https://reviewboard.mozilla.org/r/56976/#review69406

Oh, here it is.. :)
Attachment #8758860 - Flags: review?(s.kaspari) → review+
Just rebasing to latest central to be on the safe side, no changes otherwise.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=04b05ba3f7ef
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/a2878a534a27
Part 1 - Reorganise session store file names. r=sebastian
https://hg.mozilla.org/integration/autoland/rev/98c09af699dc
Part 2 - Do regular session data backups. r=sebastian
https://hg.mozilla.org/integration/autoland/rev/5a90d6e4209b
Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead. r=sebastian
https://hg.mozilla.org/integration/autoland/rev/21a57c18d7f3
Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy. r=sebastian
Keywords: checkin-needed
Looking at the telemetry, the rate for instances of sessionstore.js the are present but unreadable continues hovering at around 0.15 - 0.2 % per browsing session across all channels for which we have data. [1]
The backup file however seems to get used somewhat more often than that - during 0.6 % of all startups the tabs are restored from the backup copy. [2]

We're only switching to the backup path if we encounter a SessionRestoreException in the first place, so I think the biggest uncertainty potential for false positives is how reliable our distinction between damaged and deliberately empty files is - although I'm fairly confident that the approach I've chosen is sound. If so, that would indicate that an interrupted write can also lead to sessionstore.js completely disappearing from the file system and that this scenario is actually twice as common as that of a "merely" damaged sessionstore.js (although damaged in this context can also mean that the file still exists on the file system, but completely empty, i.e. 0 bytes size).

[1] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-09-04&keys=__none__!__none__!__none__&max_channel_version=nightly%252F51&measure=FENNEC_SESSIONSTORE_DAMAGED_SESSION_FILE&min_channel_version=null&product=Fennec&sanitize=1&sort_keys=submissions&start_date=2016-08-01&table=0&trim=1&use_submission_date=0
[2] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-09-04&keys=__none__!__none__!__none__&max_channel_version=nightly%252F51&measure=FENNEC_SESSIONSTORE_RESTORING_FROM_BACKUP&min_channel_version=null&product=Fennec&sanitize=1&sort_keys=submissions&start_date=2016-08-18&table=0&trim=1&use_submission_date=0
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

Approval Request Comment
[Feature/regressing bug #]: mobile session restoring
[User impact if declined]: All open tabs might be lost if Firefox is terminated in an untimely manner (e.g. when Firefox or the phone crashes)
[Describe test coverage new/current, TreeHerder]: manual, 2½ weeks on Nightly with some monitoring via telemetry
[Risks and why]: Mid to lowish - the session store code has potential for introducing subtle failures, however most of the patches themselves were relatively straightforward - do regular backup copies and then use them if a failure is detected on startup. The biggest risk is probably reliably distinguishing between a session store file that is deliberately empty and one that is actually damaged, meaning we might either restore the backup when we shouldn't (the backup copy would be a few minutes out of date), or we don't restore the backup even though we should (we would be no worse off than today). However the telemetry numbers so far seem to indicate a reasonable rate of backup file usage.
[String/UUID change made/needed]: none
Attachment #8757722 - Flags: approval-mozilla-aurora?
Comment on attachment 8757723 [details]
Bug 1190627 - Part 2 - Do regular session data backups.

see above
Attachment #8757723 - Flags: approval-mozilla-aurora?
Comment on attachment 8757724 [details]
Bug 1190627 - Part 3 - If an error occurs parsing the regular session store data file on startup, attempt to read the tabs from the backup copy instead.

see above
Attachment #8757724 - Flags: approval-mozilla-aurora?
Comment on attachment 8758860 [details]
Bug 1190627 - Part 4 - Add telemetry probe for measuring how often we restore from the backup session copy.

see above
Attachment #8758860 - Flags: approval-mozilla-aurora?
Hi Sebastian, I'd like to get a second opinion on whether the risk associated with this change is worth it for uplifting to Aurora50. What do you think? It's unclear whether this is a recent regression. Is that true? If not, I'd prefer to let this one bake for longer in Nightly and ride the 51 train to release.
Flags: needinfo?(s.kaspari)
(In reply to Ritu Kothari (:ritu) from comment #78)
> Hi Sebastian, I'd like to get a second opinion on whether the risk
> associated with this change is worth it for uplifting to Aurora50. What do
> you think? It's unclear whether this is a recent regression. Is that true?
> If not, I'd prefer to let this one bake for longer in Nightly and ride the
> 51 train to release.

Considering the age of the bug I don't think we lose much if we just let it ride the trains. Merge day is next week anyways, so we can start looking at Aurora telemetry soon.
Flags: needinfo?(s.kaspari)
Comment on attachment 8757722 [details]
Bug 1190627 - Part 1 - Reorganise session store file names.

Given the risk and the fact that this was potentially triggered during a session on Fennec with 80 tabs open (not a common scenario IMO), I'd prefer to let this fix ride the 51 train.
Attachment #8757722 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #8757723 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #8757724 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #8758860 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Depends on: 1345534
Hello,

Is the qe-verify+ flag still valid?

Thank you!
Flags: needinfo?(rkothari)
Redirect to Ioana.
Flags: needinfo?(rkothari) → needinfo?(ioana.chiorean)
This was not reproduced in the last couple of years. I think we can take it out - For sure was verified at landing but not followed up in the bug. 
Thanks Mira and Ritu for the follow up!
Flags: qe-verify-
Flags: qe-verify+
Flags: needinfo?(ioana.chiorean)
Flags: qe-verify-
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: