Closed Bug 1553417 Opened 5 years ago Closed 5 years ago

Session store corrupted after a shutdown hang crash

Categories

(Firefox :: Session Restore, defect, P1)

67 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1553413
Tracking Status
firefox-esr68 --- fixed
firefox68 - fixed
firefox69 - fixed

People

(Reporter: glandium, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

I don't have large amounts of details yet, and I'll try to get more tomorrow, but with Firefox 67, I get shutdown hang crashes, for some reason, and after that, restarting Firefox leaves me with only half of my tabs, with a session store jsonlz4 file that is 1.8MB instead of >8MB.

That's bad and worst thing is that I've heard this before - but nobody has been able to give me an STR... that would be the absolute most useful thing here - I need to figure out what's stopping the session data collection. JS errors (like you gave me in bug 1553413 - thank you!!), anything in the console, etc.

You can get more console log spew by setting browser.sessionstore.debug to true.

When you can get back to me with more info, I'll bump this to a P1 of course, because then I can take it on.

Priority: -- → P2

So far, I've had it happen 3 times, but now that I think I found how I got shutdown hangs (kind of self inflicted from a custom module that apparently creates problems), I'm failing to reproduce the sessionstore corruption :(

You can get more console log spew by setting browser.sessionstore.debug to true.

Did this, as well as set devtools.console.stdout.chrome to true
Hopefully, if that happens again, even during shutdown, I should have some (persisted) logs (stdout is supposed to be available in the xsession logs).

(stdout is supposed to be available in the xsession logs).

Except it seems it's not :( stderr is.

I should note that bug 1551366 may of may not be related, but happens on a different OS in different circumstances (but I do have observed shutdown crashes before upgrade on the machine that got bug 1551366)

So it happened again to me, not even involving the self inflicted problem from the custom module, but I got absolutely no log out despite all the settings I switched. Because of comment 3, essentially :(

Relatedly, I saved my entire profile before quitting, in case this would happen, because I didn't want to lose my session, then quit firefox, which triggered this bug, restarted, which left me with less tabs (like, this time, I got 15 tabs back... from > 3000), then quit firefox again, and restored my profile. On the next startup, the restored session was definitely not what I had when I was saving my profile, but something much older, like from > 12 hours ago. The history, however, was fresh.

I subsequently tried to reproduce this "wrong restore" issue, by opening new tabs, saving the profile again, killing firefox, opening firefox, quit firefox, restore profile, restart firefox. And that didn't reproduce it. So I'm starting to wonder if that is a separate symptom of this bug.

I'll also note that in all these extra attempts, I've been running firefox from a terminal, and still wasn't able to see anything related to sessionstore show up on stdout. In fact, it seems devtools.console.stdout.chrome only redirects console.log from chrome to stdout, and doesn't cover anything that would use the xpcom interfaces to log.

So perhaps it'd be useful to share that profile with me at this point - if you still have it? I can only promise that I won't look at the data it contains, but merely at code as it steps through, but I'm not sure if that's enough assurance.
(BTW, from the story you're telling here, it seems more likely that some async process is not being resolved, halting the restore. Not a JS error or another thrown exception.)

Flags: needinfo?(mh+mozilla)
See Also: → 1551366

Ok, so I had it happen again, and I now have:

  • a profile backup before quitting
  • a profile backup after quitting
  • a profile backup after restarting

(and restoring the first backup had the same effect of taking me back to a tab list of a few hours ago)

I'd rather not share any of those, but I can certainly look at the differences. What specific files should I focus on?

Flags: needinfo?(mh+mozilla) → needinfo?(mdeboer)

I should note that these two last occurrences were a little different in that when I started Firefox again, I got a about:sessionrestore page (because of crash), which I didn't originally, and that restoring from there gave me a 15-tabs session, instead of > 3000.

The good news is that I can reproduce reliably starting from the second backup, so eventually it should be possible to find out what's wrong about it.

Differing files between the first two backups:

 SiteSecurityServiceState.txt                                                                               |   18 +++++-----
 cookies.sqlite                                                                                             |binary
 cookies.sqlite-wal                                                                                         |binary
 datareporting/aborted-session-ping                                                                         |    1 
 datareporting/archived/2019-05/1559121783202.189bcd8e-9510-4152-b682-b8910464074c.main.jsonlz4             |binary
 favicons.sqlite                                                                                            |binary
 favicons.sqlite-wal                                                                                        |binary
 places.sqlite                                                                                              |binary
 places.sqlite-wal                                                                                          |binary
 prefs.js                                                                                                   |    2 -
 sessionCheckpoints.json                                                                                    |    2 -
 sessionstore-backups/recovery.baklz4                                                                       |binary
 sessionstore-backups/recovery.jsonlz4                                                                      |binary
 storage/default/https+++twitter.com/.metadata-v2                                                           |binary
 storage/default/https+++www.youtube.com/.metadata-v2                                                       |binary
 storage/default/moz-extension+++2f1ef354-2018-4f19-a247-46a7afa99b3c^userContextId=4294967295/.metadata-v2 |binary
 storage/default/moz-extension+++c4ffd6fc-8c3f-41c9-a138-1f01d51c04cb^userContextId=4294967295/.metadata-v2 |binary
 storage/default/moz-extension+++d67428eb-c0e7-4ce9-90be-d760a6448268^userContextId=4294967295/.metadata-v2 |binary
 storage/default/moz-extension+++e0daebab-e175-4d3f-acc4-53b69cd06b71^userContextId=4294967295/.metadata-v2 |binary
 storage/permanent/chrome/idb/1657114595AmcateirvtiSty.sqlite                                               |binary
 webappsstore.sqlite                                                                                        |binary
 webappsstore.sqlite-wal                                                                                    |binary
 xulstore.json                                                                                              |    2 -

Missing in the list is the lock file, because it's not a real file, but it's not there in the second backup, so I don't know why it starts with about:sessionrestore.

The second backup (so, after quitting), has this in sessionCheckpoints.json:

{"profile-after-change":true,"final-ui-startup":true,"sessionstore-windows-restored":true,"quit-application-granted":true,"quit-application":true,"profile-change-net-teardown":true,"profile-change-teardown":true,"profile-before-change":true}

Should I maybe try to instrument SessionStoreInternal._needsRestorePage to figure out why it starts with about:sessionrestore?

So it's because there's no sessionstore-final-state-write-complete in sessionCheckpoints.json that it starts with about:sessionrestore, but it's irrelevant, really, because the session is already busted when reaching SessionStoreInternal._needsRestorePage.

As for the contents of the about:sessionrestore, they're read from sessionstore-backups/recovery.jsonlz4 (because sessionstore.jsonlz4 doesn't exist), which is only 323k in that second backup (after quitting), while it was 8.8M in the first backup (and recovery.baklz4 is that size in the second backup, since that's a copy of that file).

sessionstore-backups/recovery.jsonlz4 is well formed, which is why it's loaded successfully, but only contains data for 72 tabs spread over 4 windows, while sessionstore-backups/recovery.baklz4 has 3006 over 5 windows. So it's a case of something bad happening during shutdown, rather than at startup.

That's also why starting from the first backup gives me a session that is very old: sessionstore-backups/recovery.jsonlz4 in that profile is 8 hours older than the most recent files in the profile.

If I look at my currently running profile (a continuation of that first backup), which I restarted late yesterday, and sessionstore-backups/recovery.jsonlz4 is fresh (and regularly updated).

This suggests that at some point, for some reason the session store stops being updated. Then, whether that's related or not is not clear, but when shutting down, the recovery json is saved incomplete.

Some of this could be related to the other bugs I filed about error messages I was seeing in the browser console (bug 1553412, bug 1553413).

(In reply to Mike Hommey [:glandium] from comment #16)

Some of this could be related to the other bugs I filed about error messages I was seeing in the browser console (bug 1553412, bug 1553413).

I've confirmed half of it: I observed both set of errors in the browser console starting a few minutes ago, and the recovery file stopped being updated. I then proceeded to open some new tabs and manually crash Firefox (kill -11) to see how it would go on a restart, and it went on restoring the state from before I opened the new tabs (well, from when the errors started happening).

I don't know whether I would have lost part of the session had I quit the browser rather than kill it. I'm waiting for the next time I see the errors in the browser console. An interesting observation from that last experience is that it seemed to have started occurring after I opened a new tab via a third party application executing firefox url, but after restarting, doing that didn't trigger anything.

Okay, so I had this happen again. That is, I observed the messages from bug 1553412 and bug 1553413 in the browser console, and double checked that recover.jsonlz4 had stopped being updated (which it had). This time, I quit Firefox normally (Ctrl+Q), after which recovery.jsonlz4 was truncated again.

I restored recovery.jsonlz4 from recovery.baklz4, then restarted Firefox, which then presented me with about:sessionrestore again. I didn't check sessionCheckpoints.json this time, but I assume sessionstore-final-state-write-complete was missing.

While the fix in bug 1553413 would certainly help my case, it feels something should be done about the fact that apparently, something bad happening shouldn't leave the session store in a bad state. For one, recovery.jsonlz4 should still update during the Firefox session, and it also shouldn't be truncated when exiting.

New data. I was able to reproduce with the latest nightly, by doing the following:

  • patched modules/sessionstore/PrivacyFilter.jsm in omni.ja with:
@@ -9,6 +9,8 @@
 ChromeUtils.defineModuleGetter(this, "PrivacyLevel",
   "resource://gre/modules/sessionstore/PrivacyLevel.jsm");
 
+const {Services} = ChromeUtils.import("resource://gre/modules/Services.jsm");
+
 /**
  * A module that provides methods to filter various kinds of data collected
  * from a tab by the current privacy level as set by the user.
@@ -43,6 +45,10 @@
    * @return object
    */
   filterFormData(data) {
+    if (Services.prefs.getBoolPref("browser.foo", false)) {
+        throw new Error("foo");
+    }
+
     // If the given form data object has an associated URL that we are not
     // allowed to store data for, bail out. We explicitly discard data for any
     // children as well even if storing data for those frames would be allowed.
  • started this modified Firefox with my profile
  • Open some tabs, and witness that recovery.jsonlz4 is updated
  • Go to about:config, create the browser.foo boolean pref and set it to true
  • Open Browser console
  • Open some more tabs
  • Wait for "Error: foo" to show up in the Browser console
  • See that recovery.jsonlz4 hasn't changed.
  • Quit Firefox
  • See that recovery.jsonlz4 is now truncated.

Everything up to the last step is reproducible with a fresh profile. Only the truncation doesn't happen, and might be related to something in my profile. I'll try to figure that one out.

Actually, correction. I did lose a tab in an attempt with a fresh profile, creating a few tabs. Which does feel similar to my last experiences with bug 1551366.

Aha. So, the problem only happens when there's an about:config tab (and possibly some other about:something tab. EDIT: this may require the page to contain a form), presumably because that's what it takes to trigger the PrivacyFilter.jsm code. And the tabs that disappear are all the ones after (and including) the about:config tab.
So what seems to be happening is that under normal circumstances, an exception happening during session store prevents the entire store from happening, but on shutdown, the store still happens, with the state built until the exception happened.

Alright, that sounds very interesting, Mike, and something that Alphan needs to check out, because this is likely due to his changes!

Alphan, Mike has given quite a number of hints in comment 19 - 22, which you can use to analyze what kind of value for formdata would make PrivacyFilter.jsm choke.

Flags: needinfo?(mdeboer) → needinfo?(alchen)
Assignee: nobody → alchen
Status: NEW → ASSIGNED
Priority: P2 → P1

It could be related to bug 1549973.
I will check it in detail.

Flags: needinfo?(alchen)

(In reply to Alphan Chen [:alchen] from comment #24)

It could be related to bug 1549973.
I will check it in detail.
Right now, I cannot reproduce the symptom.

Since the symptom happens on 67 and after 67, so it is not related to
Bug 1474130(Firefox 68) and Bug 1549973(Firefox 69).

In 67, what I did is rewrite the sessionStore collect() and restore() from JS to C++.
The related patch I have are the followings:
Bug 1497147 - - Rewrite "mapFrameTree" implementation to C++
Bug 1507286 - Convert content-sessionStore.jsm to C++
Bug 1497146 part 3 - Convert FormData.jsm to C++ [restore() part]
Bug 1497146 part 2 - Convert FormData.jsm to C++ [collect() part]
Bug 1497146 part 1 - Add SessionStoreUtils.webidl
Bug 1497144 part 2 - Get rid of ScrollPosition.jsm
Bug 1497144 - Rewrite DocShellCapabilities.jsm and ScrollPosition.jsm into C++

Update status:

I cannot reproduce the symptom by the steps provided by comment 19.

Then I try to simulate the symptom Mike met.
By adding a "formdata:null" into the TabState purposely, I can see the failed symptom.
After applying the patch of bug 1553413, the symptom disappears.

Hi Mike,
there are somethings that needed your help.

  1. Could you patch "PrivacyFilter.jsm": in filterFormData(data), do an early return when data is null.
    Just the same as the patch of bug 1553413?
  2. Could you share the steps of comment 22?
    Look like your behavior could trigger a tabState data with "formdata:null" and the tab is "about:config".
    I cannot produce a TabState with "formdata:null".
Flags: needinfo?(mh+mozilla)

(In reply to Alphan Chen [:alchen] from comment #26)

Then I try to simulate the symptom Mike met.
By adding a "formdata:null" into the TabState purposely, I can see the failed symptom.

One more thing, I can see the log as bug 1553412 and bug 1553413 described in this case.

Also, it is also weird that we will have "formdata:null" in TabState.

In theory, we will delete the entry if the value is null when updating TabState.
https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/browser/components/sessionstore/TabStateCache.jsm#165

(In reply to Alphan Chen [:alchen] from comment #26)

Update status:

I cannot reproduce the symptom by the steps provided by comment 19.

Then I try to simulate the symptom Mike met.
By adding a "formdata:null" into the TabState purposely, I can see the failed symptom.
After applying the patch of bug 1553413, the symptom disappears.

Hi Mike,
there are somethings that needed your help.

  1. Could you patch "PrivacyFilter.jsm": in filterFormData(data), do an early return when data is null.
    Just the same as the patch of bug 1553413?

I already applied the patch from bug 1553413 and I haven't had the problem since.

  1. Could you share the steps of comment 22?
    Look like your behavior could trigger a tabState data with "formdata:null" and the tab is "about:config".
    I cannot produce a TabState with "formdata:null".

comment 22 is a followup to comment 19. The patch to purposefully and pseudo-manually trigger the problem was still involved. I'm still not sure exactly how I ended up with formdata:null in filterFormData, but an about:config tab was how I got in filterFormData at all when I tried to reproduce it.
Now, the problem is that since the session store is never saved when it started happening, the best I have is the last state of my session before bug 1553413 happened, as well as the truncated state after quitting. I suppose the tabs might be in the same order in both stores and maybe that could give me a hint as to what was broken.

In any case, I don't think relying on nothing going awry when getting one tab's state is a good thing, as apparently hitting this or similar issues on two different machines with two very different profiles shows. I guess what bugs me the most is the fact that a truncated state is being saved (and also the discrepancy... why is no state saved at all before quitting?)

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] from comment #29)

Now, the problem is that since the session store is never saved when it started happening, the best I have is the last state of my session before bug 1553413 happened, as well as the truncated state after quitting. I suppose the tabs might be in the same order in both stores and maybe that could give me a hint as to what was broken.

In any case, I don't think relying on nothing going awry when getting one tab's state is a good thing, as apparently hitting this or similar issues on two different machines with two very different profiles shows. I guess what bugs me the most is the fact that a truncated state is being saved (and also the discrepancy... why is no state saved at all before quitting?)

Hi Mike,
The current symptom we focused on is why a truncated state is being saved when the error happened.
It is more related to error handling.
My patch is about collect() and restore() rewriting.
Could you have some inputs here?

Assignee: alchen → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(mdeboer)

I don't know what's going on here, but I just lost all my tabs/windows for about the 5th time since FF 67. Extremely frustrating. This time on linux, but it's happened on windows too.

[Tracking Requested - why for this release]:

Losing user data is bad; would be good to track this issue.

I agree that this bug isn't great, but it's also not a new issue in 69 nor does it seem to have very solid steps to consistently reproduce. While I'd be open to accepting a patch for uplift, I don't see much value in tracking either.

Regressed by: 1564412
Keywords: regression

(In reply to Jonathan Nicol from comment #31)

I don't know what's going on here, but I just lost all my tabs/windows for about the 5th time since FF 67. Extremely frustrating. This time on linux, but it's happened on windows too.

Can you please file a new bug so with a precise STR so that we can investigate and track your issue properly?

Mike's problem went away after bug 1553413 was fixed, so it's good to consider this one a dupe. Thanks so much for joining us in the investigation, Mike, you discovered and helped resolve an important issue!

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(mdeboer)
Resolution: --- → DUPLICATE
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.