during session restoration, window tab list is incorrectly written as empty to sessionstore.js

RESOLVED FIXED in Firefox 24

Status

()

defect
--
critical
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: brinks7, Assigned: ttaubert)

Tracking

({dataloss, reproducible})

Trunk
Firefox 26
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox24 fixed, firefox25 fixed, firefox26 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

8 years ago
User-Agent:       Mozilla/5.0 (Windows NT 5.2; WOW64; rv:2.0b12) Gecko/20100101 Firefox/4.0b12
Build Identifier: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:2.0b12) Gecko/20100101 Firefox/4.0b12

The issue happened to me when upgrading from 4.0b11 to 4.0b12, although the issue probably has nothing to do with the update system.

1. I had 15 windows open in 4.0b11, each having many tabs open.
2. Firefox prompts me to restart to upgrade to 4.0b12 and I accept.
3. The upgrade fails because the Firefox process was, for some reason, still running when the updater started.
4. Firefox 4.0b11 goes on with the restoration of my 15 windows.
5. During restoration I kill the Firefox process so that I can restart it with 4.0b12. Before I killed it the 15 windows were (AFAICT) all open, but the tabs were not fully loaded.
6. When starting Firefox again, the update to 4.0b12 occurs successfully.
7. But only 2 of the 15 windows have been restored!

Expected result: all 15 windows should have been restored.

I immediately made a copy of sessionstore.js and sessionstore.bak. Both of them were valid JSON files (not truncated).

sessionstore.js only held the 2 windows, aside from some _closedWindows that I had closed myself in the previous session and were therefore not relevant.
sessionstore.bak held all 15 windows. Unfortunately, for each of the lost windows the "_closedTabs" and "selected" field were complete and correct, but the "tabs" entry was an empty array. It was usually true that "w.selected > w.tabs.length", which doesn't make much sense.

I could understand data loss on JSON truncation. But here, it was clear that a serialization command was issued even though the session restore data was in an incomplete state. And because the data *looked* valid, it caused a silent override of the old backup file, leading to irremediable data loss.

I'm not familiar with the session store low-level architecture, but here a few solutions I can think of:
1. Never write sessionstore.js if it would lead to write a "tabs" entry containing less tabs than the actual number of shown tabs.
2. If a tab is shown in a window's tab bar, it MUST be written to the "tabs" entry of the corresponding window, even if some data about it will be incomplete (such as the title).
3. Keep an extra backup of sessionstore.bak to allow manual restoration in similar situations.

Solution 1 could be a bit dangerous because it could lead to sessionstore.js never being written if a tab remains in an incomplete state forever.

If feasible, the best way would be to implement both solution 2 and solution 3: solution 2 solves this specific bug transparently for the user, and solution 3 is a safeguard against data loss for any similar bugs that could happen in the future.


I'm flagging this as critical since this is a data loss issue and Firefox has, in my experience, always been very careful about ensuring session data is kept even in the case of crashes or forceful termination.

Reproducible: Didn't try

Updated

8 years ago
Version: unspecified → Trunk
(In reply to comment #0)
> Reproducible: Didn't try

Can you try? This shouldn't be happening.
Reporter

Comment 2

8 years ago
I did some testing (on 4.0b12), and the bug should be easily reproducible.

I wrote a script that continuously watches for writes to sessionstore.js and outputs the number of tabs for each window.

Here is the output for my current Firefox session:

[12, 24, 81, 1, 38, 50, 18, 58, 1, 3, 7]

I restart Firefox, and immediately upon restart sessionstore.js gets written again with the same content:

[12, 24, 81, 1, 38, 50, 18, 58, 1, 3, 7]

Everything is loading properly and Firefox is usable (although still loading some pages), but 15 seconds later the faulty write happens:

[12, 0, 0, 0, 0, 0, 0, 58, 1, 3, 7]

15 seconds later, everything goes back to normal:

[12, 24, 81, 1, 38, 50, 18, 58, 1, 3, 7]


With a dummy session containing 15 windows and 75 tabs per window, each tab pointing to a different URL on a single-threaded dummy HTTP server on localhost, I get the same result:

[75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75]

and in the time frame between 15 and 30 seconds after restart:

[75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 75]

Of course you may need a different number of tabs depending on the speed of your HTTP server (with 10 tabs per window I don't see the bug).

I assume you already have this kind of tools, but if necessary I can post Python scripts to generate the dummy session or monitor sessionstore.js.


Finally, it may or may not be of any use to you but I noticed that the window's "selected" field gets temporarily decreased by 1 for each window where "tabs" is written as empty (unless "selected" was already equal to 1). Here is a list of the pairs (w.tabs.length, w.selected), in the correct and incorrect cases:

[(75, 1), (75, 2), (75, 3), (75, 4), (75, 5), (75, 6), (75, 7), (75, 8), (75, 9), (75, 10), (75, 11), (75, 12), (75, 13), (75, 14), (75, 15)]

[(75, 1), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (0, 10), (0, 11), (0, 12), (0, 13), (75, 15)]

Notice how (75, 14) is written as (0, 13), for example.
(In reply to comment #2)
> I did some testing (on 4.0b12), and the bug should be easily reproducible.

Thanks for looking at this. I'll look into this more tomorrow.
Hmm, so I haven't been able to reproduce loading 10 windows with 20 tabs each of about:mozilla on OSX. Obviously not exactly your situation, but still a decently sized session.

My best guess so far is a timing issue and that save is just catching at an inopportune time, though I'm pretty sure we guard against that case to avoid this specifically.

I'd like to keep an eye on this. I'll try again on another machine with a different session.
Ah! I did reproduce it on this machine - I had browser.sessionstore.max_concurrent_tabs == 0 in the profile I was testing with and that seems to make a difference.
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 6

7 years ago
Related:

Bug 558425 - Make backups of sessionstore.bak
It has happened to me 2-3 times in the past, firefox is opened with 5-6 windows with multiple tabs each, segfaults for whatever reason, and when I reopen it has lost /some/ windows together with all their tabs.

This just happened today and I immediately copied and looked at sessionstore.{js,bak}. Both files were not truncated, and they had windows entries with "tabs":[], i.e. empty.

I am on Linux with FF17, and brinks7 was on windows, so this seems like a system-independent bug. BTW, brinks7 thanks for the analysis, it was very useful to me! What tool did you use to monitor file accesses on windows?

Comment 8

6 years ago
This is a duplicate of bug 421521, which I reported five (5) years ago.  The bug has not been fixed and is still causing dataloss.

Comment 9

6 years ago
Still exists in Firefox 19.
Keywords: dataloss
OS: Windows Server 2003 → All
Hardware: x86 → All
Duplicate of this bug: 421521
Keywords: reproducible
_getCurrentState() doesn't collect data for windows that haven't been restored, yet:

http://hg.mozilla.org/mozilla-central/file/1fb5d14e8348/browser/components/sessionstore/src/SessionStore.jsm#l2545

It collects data for those windows later from _statesToRestore:

http://hg.mozilla.org/mozilla-central/file/1fb5d14e8348/browser/components/sessionstore/src/SessionStore.jsm#l2562

restoreHistoryPrecursor() however, just wipes that data but the window still seems to be not loaded, yet.

http://hg.mozilla.org/mozilla-central/file/1fb5d14e8348/browser/components/sessionstore/src/SessionStore.jsm#l3007

If we now call _getCurrentState() again those windows have an empty list of tabs. Will need to get a more thorough understanding of that part of the code and will try to come up with a proper fix.
Assignee: nobody → ttaubert
No wait, it's something different.

restoreHistoryPrecursor() sets the window to be restored. It basically removes the cached restore data because we should read everything from the window now.

http://hg.mozilla.org/mozilla-central/file/1fb5d14e8348/browser/components/sessionstore/src/SessionStore.jsm#l3007

The problem is that _getCurrentState() only updates the active window by default. Or dirty ones. The solution is to set the window's cache state to dirty of course as long as we didn't even once collect data for it.

http://hg.mozilla.org/mozilla-central/file/1fb5d14e8348/browser/components/sessionstore/src/SessionStore.jsm#l2547

This makes a lot more sense and can actually happen in the wild. All it needs is that we try to collect state for a *restored* window that hasn't actually finished loading a single tab, yet. The window is safe after something set its state to dirty but before it would just be lost if the browser crashes.
Status: NEW → ASSIGNED
With the window being marked as dirty as soon as it is marked not-loading-anymore I can't reproduce the data loss anymore. I added a test and made sure it fails without the fix applied.
Attachment #786713 - Flags: feedback?(smacleod)
Duplicate of this bug: 696684
Comment on attachment 786713 [details] [diff] [review]
Invalidate windows after they have been restored to ensure their data is collected the first time we save

Review of attachment 786713 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good overall.

::: browser/components/sessionstore/test/browser_637020.js
@@ +1,1 @@
> +/* Any copyright is dedicated to the Public Domain.

Just a nit: might be nice to describe exactly what this is testing somewhere in this file. I could see it being obvious down the road.
Attachment #786713 - Flags: feedback?(smacleod) → feedback+
(In reply to Steven MacLeod [:smacleod] from comment #16)
> in this file. I could see it being obvious down the road.
I could see it NOT being obvious...
(In reply to Steven MacLeod [:smacleod] from comment #16)
> Just a nit: might be nice to describe exactly what this is testing somewhere
> in this file. I could see it being obvious down the road.

Yeah that's a good idea, will do.
Comment on attachment 786963 [details] [diff] [review]
Invalidate windows after they have been restored to ensure their data is collected the first time we save, v2

Review of attachment 786963 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/src/SessionStore.jsm
@@ +3009,5 @@
>        delete this._windows[aWindow.__SSi]._restoring;
> +
> +      // It's important to set the window state to dirty so that
> +      // we collect their data for the first time when saving state.
> +      this._dirtyWindows[aWindow.__SSi] = true;

Side-note: we should improve the documentation of _dirtyWindows. Eventually, we may want to replace it with a WeakMap.

::: browser/components/sessionstore/test/browser_637020.js
@@ +48,5 @@
> +
> +  // The window has now been opened. Check the state that is returned,
> +  // this should come from the cache while the window isn't restored, yet.
> +  info("the window has been opened");
> +  checkWindows();

Just to be sure I understand: that's where things used to fail, isn't it?
Attachment #786963 - Flags: review?(dteller) → review+
(In reply to David Rajchenbach Teller [:Yoric] from comment #20)
> > +
> > +      // It's important to set the window state to dirty so that
> > +      // we collect their data for the first time when saving state.
> > +      this._dirtyWindows[aWindow.__SSi] = true;
> 
> Side-note: we should improve the documentation of _dirtyWindows. Eventually,
> we may want to replace it with a WeakMap.

Yeah, I agree. Didn't want to do it in there, though.

> > +  // The window has now been opened. Check the state that is returned,
> > +  // this should come from the cache while the window isn't restored, yet.
> > +  info("the window has been opened");
> > +  checkWindows();
> 
> Just to be sure I understand: that's where things used to fail, isn't it?

It's actually the check after that. As long as the window data comes out of the _statesToRestore cache everything's fine. But as soon was kick it out of the cache and expect data to be collected from the actual window for the first time, we failed previously to do so without any "external" invalidation.
Thank you everybody in this bug for the great help in providing suggestions on how to reproduce this issue! It really helped us fix it a lot faster, well, after we rediscovered it.
(In reply to Tim Taubert [:ttaubert] from comment #21)
> > Side-note: we should improve the documentation of _dirtyWindows. Eventually,
> > we may want to replace it with a WeakMap.
> 
> Yeah, I agree. Didn't want to do it in there, though.

Filed bug 902721.
Blocks: 690374
Blocks: 764302
https://hg.mozilla.org/mozilla-central/rev/b4dd81559271
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 26
Comment on attachment 786963 [details] [diff] [review]
Invalidate windows after they have been restored to ensure their data is collected the first time we save, v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): ??? Long-standing bug.
User impact if declined: Possible data loss.
Testing completed (on m-c, etc.): Landed on m-c, has test.
Risk to taking this patch (and alternatives if risky): Very low-risk one-line change with a test.
String or IDL/UUID changes made by this patch: None

I definitely think we should uplift this to Beta and Aurora as this fixes a serious data loss issue that occurs in the wild. It's a really simple one-line fix and it even has nice test.
Attachment #786963 - Flags: approval-mozilla-beta?
Attachment #786963 - Flags: approval-mozilla-aurora?
This bug looks a lot like the nasty bug 696684 I have encountered. If the session restore is really fixed now, big thank you Tim and Dave !

(In reply to Adam Porter from comment #8)

> This is a duplicate of bug 421521, which I reported five (5) years ago.  The
> bug has not been fixed and is still causing dataloss.

But the bug 696684 I have encountered has occurred when restoring sessions of several days (that is : the quit or crash did not happen during a session restoration, but way later).
Comment on attachment 786963 [details] [diff] [review]
Invalidate windows after they have been restored to ensure their data is collected the first time we save, v2

Given the low risk and we are still early in the cycle, approving this for uplift.
Attachment #786963 - Flags: approval-mozilla-beta?
Attachment #786963 - Flags: approval-mozilla-beta+
Attachment #786963 - Flags: approval-mozilla-aurora?
Attachment #786963 - Flags: approval-mozilla-aurora+
Keywords: verifyme
Flags: needinfo?(ttaubert)
I'm not able to reproduce this issue on FF 4 beta 12 with STR from comment 4 and browser.sessionstore.max_concurrent_tabs different from 0, all the windows and tabs are restored properly. Also tried with STR from bug 421521 and the result is the same - unable to reproduce on Mac OS X 10.7.5 and Windows 7 x64; the same with FF 24 beta 4 (Build ID: 20130819170952). 
Am I missing something in order to reproduce this issue?
Flags: needinfo?
I needed like 20 windows and 75 tabs per window to reproduce this. Also I had a single-threaded HTTP server with a small HTML file on my machine that all tabs pointed to.

It's really not that easy to reproduce but with the patch applied I couldn't anymore. It also helps that we have a test here that easily reproduces the issue and I ensured the test fails without the fix.
Flags: needinfo?
(In reply to Tim Taubert [:ttaubert] from comment #33)
It also helps that we have a test here that easily reproduces the
> issue and I ensured the test fails without the fix.

Marking as in-testsuite +. 
Since it's not that easy to reproduce this issue and a test is available, is there any need for manual verification or it's enough to do some exploratory testing on session restore to make sure that nothing is broken?
Flags: in-testsuite+
Based on your verifyme keyword, is there anything that has to be tested manually as long as there is a test available?
Flags: needinfo?(bbajaj)
(In reply to Mihai Morar, QA (:MihaiMorar) [needinfo? me for any requests] from comment #35)
> Based on your verifyme keyword, is there anything that has to be tested
> manually as long as there is a test available?

Nope, comment #34 sounds good. Thank you !
Flags: needinfo?(bbajaj)
Keywords: verifyme

Updated

6 years ago
Blocks: 919668

Comment 37

6 years ago
I'm using Firefox 24 and I'm still experiencing this bug--or, at least, the same effect as this bug.  Just a moment ago, Firefox crashed.  When I restarted it, it opened all the tabs I had open at the time of the crash, but all the tabs I had opened in the past hour or so did not load when I clicked on them.  The tab title and favicon are all there, but the URL is empty; reloading the tab goes to about:blank.

However, when I dig the URLs out of sessionstore.js, all of the URLs are listed, even the ones for the tabs that loaded empty.

I don't know if this bug is not fixed, or if this is a new bug caused by fixing this bug, or if this is an old and unfixed bug--but the outcome is exactly the same.

This is very frustrating.  Firefox is supposed to be reliable and not lose open tabs.

Here's the Python script that gets the URLs:

----
#!/usr/bin/python

import json, sys
f = open(sys.argv[1], "r")
jdata = json.loads(f.read())
f.close()
for win in jdata.get("windows"):
    for tab in win.get("tabs"):
        i = tab.get("index") - 1
        print tab.get("entries")[i].get("url")
----
Adam, this looks like a different bug. Could you file a new bug, please?
You need to log in before you can comment on or make changes to this bug.