Closed Bug 1260461 Opened 4 years ago Closed 4 years ago

Firefox fails to exit on system shutdown

Categories

(Firefox :: Session Restore, defect)

45 Branch
x86_64
Windows
defect
Not set

Tracking

()

VERIFIED FIXED
Tracking Status
firefox45 --- affected
firefox46 + verified
firefox47 + verified
firefox48 + verified
firefox-esr38 --- unaffected
firefox-esr45 46+ verified

People

(Reporter: dqeswn, Assigned: mconley)

References

Details

(Keywords: regression)

Attachments

(1 file)

Hi!

After every shutdown where firefox was running I get a crash session shown in session manager the next time I start firefox.
It happens since FF45, so now all channels with e10s disabled.

I don't get a prompt to kill firefox or go back to windows as I normally do when programs fail to exit in time.
Please report your issue to https://support.mozilla.org/questions/firefox instead, or follow https://developer.mozilla.org/docs/Mozilla/QA/Bug_writing_guidelines to file a new bug.
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
I can reproduce the problem

Steps To Reproduce:
0: Create New profile and disabled e10s.
1. Start Firefox and Open several tabs
2. Logout Windows
3. Login Windows
4. Start Firefox

Actual Results:
Firefox has not been closed properly.
I.e, the following embarrassing tab will open.  (Or Sometimes, automatically restored the previous opened tabs.)

----------------------------
Well, this is embarrassing.

Nightly is having trouble recovering your windows and tabs. This is usually caused by a recently opened web page.

*snip*

----------------------------


Expected Results:
Firefox should be closed properly.
So, there should be about:home tab only (Firefox should honor setting of "When Firefox starts:" in about:preferences#general)


Regression window:
https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=45708ae19e28b92403549a6e42324f6cfdcb8f67&tochange=15ded62d213cb2e01feebcd21af52be74a535983

Regressed by: Bug 1177310
Blocks: 1177310
Status: RESOLVED → REOPENED
Component: Untriaged → Session Restore
Ever confirmed: true
Flags: needinfo?(wmccloskey)
Flags: needinfo?(mconley)
Flags: needinfo?(dteller)
Keywords: regression
OS: Windows 8.1 → Windows
Resolution: INCOMPLETE → ---
Version: unspecified → 45 Branch
Status: REOPENED → NEW
(In reply to Alice0775 White from comment #2)
> I can reproduce the problem
> 
> Steps To Reproduce:
> 0: Create New profile and disabled e10s.

It will occur the same problem with enabled e10s.
And confirmed same regression window.


This happens regardless of e10s on/off.
So we handle the WM_ENDSESSION message from Windows which fires the quit-application-granted notification, which should do the work of having SessionStore flush all windows and save the state to disk.

Something must be going wrong in there. Perhaps Windows is getting impatient and killing our process because we're taking so long.

I've found a tool that should help me debug this: http://stackoverflow.com/questions/520910/how-to-simulate-windows-shutdown-for-debugging
Tracking, regression from 45. I wonder if this may also account for some of our shutdown hang/crash issues.
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #4)

> Something must be going wrong in there. Perhaps Windows is getting impatient
> and killing our process because we're taking so long.

That shouldn't be the case. It doesn't seem to take a long time for this to happen. Also windows shows a prompt kill apps (or go back to the desktop) that are taking too long. Some other programs, sometimes appear here for various reasons, but not firefox. Not since 45.
> That shouldn't be the case. It doesn't seem to take a long time for this to happen.

That's interesting. How long does it seem to take? If it's < 1minute, it's not an AsyncShutdown crash, so it's probably a data corruption somewhere else (SessionCheckpoints?)
Flags: needinfo?(dteller)
Flags: needinfo?(dqeswn)
Flags: needinfo?(alice0775)
within 10sec.
Flags: needinfo?(alice0775)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #4)
> 
> I've found a tool that should help me debug this:
> http://stackoverflow.com/questions/520910/how-to-simulate-windows-shutdown-
> for-debugging

The tool does not work. I cannot reproduce if I used the tool.
In that case, this probably means that there is a problem with sessionCheckpoints.json.
(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #7)
> That's interesting. How long does it seem to take? If it's < 1minute, it's
> not an AsyncShutdown crash, so it's probably a data corruption somewhere
> else (SessionCheckpoints?)

It's usually about 10-20 seconds here, depending on how much stuff I have running.
Flags: needinfo?(dqeswn)
I debugged this a little bit. We finish notifying quit-application, but we never finish profile-change-net-teardown. Not sure why. Anyway, that means we don't get to profile-before-change, which I think is when we write the session to disk.
Flags: needinfo?(wmccloskey)
Debugged a bit more. We spin the event loop a bunch in the NotifyObserver calls and some event that we process in a nested event loop is causing us to exit early. Not sure why or how. It's tricky to debug because the only way I can reproduce is by actually logging out, so I can't use a debugger afaik.
According to http://stackoverflow.com/questions/520910/how-to-simulate-windows-shutdown-for-debugging, there's some Microsoft document somewhere that states:

"
    Restart Manager shutdown messages are:

    a. WM_QUERYENDSESSION with LPARAM = ENDSESSION_CLOSEAPP(0x1): GUI applications must respond (TRUE) immediately to prepare for a restart.

    b. WM_ENDSESSION with LPARAM = ENDSESSION_CLOSEAPP(0x1): The application must shutdown within 5 seconds (20 seconds for services).

    c. CTRL_SHUTDOWN_EVENT: Console applications must shutdown immediately.
"

Here's where we handle WM_ENDSESSION:

https://hg.mozilla.org/mozilla-central/file/494289c72ba3/widget/windows/nsWindow.cpp#l4784

I wonder if the async flushing is taking too much time. It's completing, but that by the time we reach something like profile-change-net-teardown, Windows has had enough and just terminates us.

Are we able to detect when Windows terminates us? Does Windows log that information somewhere?
Flags: needinfo?(mconley) → needinfo?(jmathies)
"add an AsyncShutdown blocker that'll spin the event loop"

This is likely the problem, processing native events while handling WM_ENDSESSION triggers something in Windows that terminates the process. The nsWindow code was designed to avoid spinning an event loop during this.

What we probably want to do here is rely on the existing cached state we have in session store.
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #15)
> "add an AsyncShutdown blocker that'll spin the event loop"
> 
> This is likely the problem, processing native events while handling
> WM_ENDSESSION triggers something in Windows that terminates the process. The
> nsWindow code was designed to avoid spinning an event loop during this.
> 
> What we probably want to do here is rely on the existing cached state we
> have in session store.

Yeah, I think we can do that for SessionStore - though this is going to be a problem if anybody ever uses AsyncShutdown for any of the observer notifications being sent out from WM_ENDSESSION. It doesn't sound like AsyncShutdown and WM_ENDSESSION are generally compatible in the log-off case, unless we can somehow avoid consuming UI events, but just spin some loop that let's us continue executing the AsyncShutdown Blockers.

I think skipping the async shutdown for SessionStore on quit-application-granted and just using the last snapshot in the log-off case should be pretty straight forward. I'll try putting something together.
Assignee: nobody → mconley
(In reply to Jim Mathies [:jimm] from comment #15)
> "add an AsyncShutdown blocker that'll spin the event loop"
> 
> This is likely the problem, processing native events while handling
> WM_ENDSESSION triggers something in Windows that terminates the process.

Do you know anything more about this? There's a lot of code that spins the event loop during shutdown.

> What we probably want to do here is rely on the existing cached state we
> have in session store.

I think that could lead to data loss.

> I wonder if the async flushing is taking too much time. It's completing, but that by the
> time we reach something like profile-change-net-teardown, Windows has had enough and just
> terminates us.

I'm pretty sure it's not that. We get killed after ~100ms in most cases. It really seems to be about spinning the event loop.

I wonder if we could avoid processing native events but still process Gecko events.
(In reply to Bill McCloskey (:billm) from comment #17)
> (In reply to Jim Mathies [:jimm] from comment #15)
> > "add an AsyncShutdown blocker that'll spin the event loop"
> > 
> > This is likely the problem, processing native events while handling
> > WM_ENDSESSION triggers something in Windows that terminates the process.
> 
> Do you know anything more about this? There's a lot of code that spins the
> event loop during shutdown.
> 
> > What we probably want to do here is rely on the existing cached state we
> > have in session store.
> 
> I think that could lead to data loss.
> 

With the STR in this bug, I think we've already got data loss.

So perhaps using the existing cached state would be a good near-term solution to avoid having users see the "Well, that's embarrassing" page.

For a longer term solution, see below...

> I wonder if we could avoid processing native events but still process Gecko
> events.

Here's where AsyncShutdown spins the event loop:

https://hg.mozilla.org/mozilla-central/file/494289c72ba3/toolkit/components/asyncshutdown/AsyncShutdown.jsm#l551

Snooping around in xpcom/threads/, I can't find a great way of getting us to not consume native events. I think we'd have to build one in, and mucking about in how our event loops work doesn't sound upliftable if we're hoping to fix this.

I don't know too too much about how the Gecko event loop interacts with the native event loop, beyond what I've seen when dealing with some BHR stuff, but yeah - if we do what billm suggests, and built in some infrastructure that allowed AsyncShutdown to spin only Gecko events in this log-off case (which would, presumably, allow the JS of the blockers to execute), that might be the best long term solution.
Blocking win32 messages doesn't fix the problem. I wonder if we're blocking an event that causes us to close early. Some of the win32 messages that come in are WM_DESTROY, so we seem to be closing our own windows. I wonder if we're trying to quit from within our own shutdown code.
I cannot reproduce the bug with the build in comment 21. Can you, avada?

Still uncertain if this is the approach we want to go with - but it's very simple.
Flags: needinfo?(dqeswn)
It seems to have fixed it for this circumstance.

However I was hoping it would also fix bug 1260462, the e10s counterpart of this bug.
Which it only partially did. I don't get crashed content processes, but FF didn't exit properly every time. There was one logout when the windows alert didn't appear, and I got a crashed session when I started FF again.


Also I'd like to add that I normally use the Session Manager addon. With it the problem is aggravated. For the e10s=off case it seems to work normally now. However it always shows a crashed session at startup after a logout/restart. (I don't get content processes crashing here anymore, so that's good.) Historically (before ff45) this issue also never came up while I was using Session Manager.
Flags: needinfo?(dqeswn)
(In reply to avada from comment #23)
> It seems to have fixed it for this circumstance.
> 
> However I was hoping it would also fix bug 1260462, the e10s counterpart of
> this bug.
> Which it only partially did. I don't get crashed content processes, but FF
> didn't exit properly every time. There was one logout when the windows alert
> didn't appear, and I got a crashed session when I started FF again.
> 

Alright, I've followed up with a question in bug 1260462.

> 
> Also I'd like to add that I normally use the Session Manager addon. With it
> the problem is aggravated. For the e10s=off case it seems to work normally
> now. However it always shows a crashed session at startup after a
> logout/restart.

That doesn't sound like it's working normally at all. Do you mean to say that with non-e10s, with the patch in this bug, you don't see the crashed session at startup after a logout/restart, but that you still see it with e10s?
Flags: needinfo?(dqeswn)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #24)
> (In reply to avada from comment #23)
> > 
> > Also I'd like to add that I normally use the Session Manager addon. With it
> > the problem is aggravated. For the e10s=off case it seems to work normally
> > now. However it always shows a crashed session at startup after a
> > logout/restart.
> 
> That doesn't sound like it's working normally at all. Do you mean to say
> that with non-e10s, with the patch in this bug, you don't see the crashed
> session at startup after a logout/restart, but that you still see it with
> e10s?

Sorry. I misspoke. Yes. It seems to works normally with e10s off.
It is with e10s=on that I get a crashed session at startup. Though it improved in that I don't get content process crashes during the logout/shutdown. So at least the shutdown process isn't halted.
Flags: needinfo?(dqeswn)
Hey billm, how do you feel about the band-aid fix in my patch landing until we can prioritize and sort out what exactly is causing our process to get killed by the OS?
Flags: needinfo?(wmccloskey)
Or jimm? Perhaps you'd like to weigh in on this band-aid fix?
Flags: needinfo?(jmathies)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #28)
> Or jimm? Perhaps you'd like to weigh in on this band-aid fix?

Looks like the right fix to me. We have no choice but to force the sync shutdown here.
Flags: needinfo?(jmathies)
Comment on attachment 8737365 [details]
MozReview Request: Bug 1260461 - Don't flush windows when shutting down due to a Windows log-out. r?jimm

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43933/diff/1-2/
Attachment #8737365 - Attachment description: MozReview Request: Bug 1260461 - [WIP] Don't flush windows when shutting down due to a Windows log-out. → MozReview Request: Bug 1260461 - Don't flush windows when shutting down due to a Windows log-out. r?jimm
Attachment #8737365 - Flags: review?(jmathies)
Comment on attachment 8737365 [details]
MozReview Request: Bug 1260461 - Don't flush windows when shutting down due to a Windows log-out. r?jimm

https://reviewboard.mozilla.org/r/43933/#review41833
Attachment #8737365 - Flags: review?(jmathies) → review+
Let's keep this open since the behavior with this patch is definitely bad. I'm sorry I didn't get back to this. I had been hoping to track down the problem further, but I didn't have time.
(In reply to Bill McCloskey (:billm) from comment #33)
> Let's keep this open since the behavior with this patch is definitely bad.

Am I wrong when I presume that this patch puts us in the same boat as before the patch dataloss-wise, but that it allows the user to get a restored session instead of about:sessionrestore on startup?
Keywords: leave-open
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #34)
> (In reply to Bill McCloskey (:billm) from comment #33)
> > Let's keep this open since the behavior with this patch is definitely bad.
> 
> Am I wrong when I presume that this patch puts us in the same boat as before
> the patch dataloss-wise, but that it allows the user to get a restored
> session instead of about:sessionrestore on startup?

My understanding is that we're still worse dataloss-wide than we were pre-bug 1177310. We need to have a bug open on that. We could open a separate one if you like.
Flags: needinfo?(wmccloskey)
I hope this gets pushed to aurora soon. Central is just unusable for everyday browsing, so I can't test it. (It freezes on loading github right now... )
(In reply to Bill McCloskey (:billm) from comment #35)
> (In reply to Mike Conley (:mconley) - Needinfo me! from comment #34)
> > (In reply to Bill McCloskey (:billm) from comment #33)
> > > Let's keep this open since the behavior with this patch is definitely bad.
> > 
> > Am I wrong when I presume that this patch puts us in the same boat as before
> > the patch dataloss-wise, but that it allows the user to get a restored
> > session instead of about:sessionrestore on startup?
> 
> My understanding is that we're still worse dataloss-wide than we were
> pre-bug 1177310. We need to have a bug open on that. We could open a
> separate one if you like.

Can we file a separate bug on this, this is currently tracked and has a patch that landed.
Flags: needinfo?(mconley)
Keywords: leave-open
Status: NEW → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Sure - filed bug 1264036.
Flags: needinfo?(mconley)
Comment on attachment 8737365 [details]
MozReview Request: Bug 1260461 - Don't flush windows when shutting down due to a Windows log-out. r?jimm

Approval Request Comment
[Feature/regressing bug #]:

Bug 1177310 

[User impact if declined]:

Windows users who shut down or log out of Windows before shutting down Firefox will not have their sessions properly restored on startup - instead, they'll see the about:sessionrestore "Well this is embarassing" page.

[Describe test coverage new/current, TreeHerder]:

Unfortunately, this case is not easy to test at all except manually, since it relies on the OS actually exiting. So it was only verified using manual testing.

[Risks and why]: 

We're touching Windows shutdown code, but it's a small change. We're sending a special argument around with the shutdown observer notification, so if there were any add-ons out there assuming nothing would be passed as data for the quit-application-granted notification (I think that'd be odd), we'd be breaking that assumption.

[String/UUID change made/needed]:

None.
Attachment #8737365 - Flags: approval-mozilla-beta?
Attachment #8737365 - Flags: approval-mozilla-aurora?
Comment on attachment 8737365 [details]
MozReview Request: Bug 1260461 - Don't flush windows when shutting down due to a Windows log-out. r?jimm

A little risky sounding but this may fix a bad shutdown issue, and I'd rather we test it now than next week. Please uplift to aurora and beta. 
This should end up in beta 11 later this week.
Attachment #8737365 - Flags: approval-mozilla-beta?
Attachment #8737365 - Flags: approval-mozilla-beta+
Attachment #8737365 - Flags: approval-mozilla-aurora?
Attachment #8737365 - Flags: approval-mozilla-aurora+
Wait, is this e10s only? If so then we should not uplift to beta.
Flags: qe-verify+
Flags: needinfo?(mconley)
Andrei, can your team help check if/when 46 beta 10 is affected? Is it still reproducible with e10s disabled?
Flags: needinfo?(andrei.vaida)
This affects both e10s and non-e10s.
Flags: needinfo?(mconley)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #43)
> Andrei, can your team help check if/when 46 beta 10 is affected? Is it still
> reproducible with e10s disabled?

It's indeed affecting both e10s and non-e10s. I've managed to reproduce the issue on Windows 10 x64 using the following builds:
 - 48.0a1 (2016-03-29, e10s disabled)
 - 46.0b10-build1 (20160411042519, e10s disabled)
Flags: needinfo?(andrei.vaida)
Hoping a sheriff can uplift my patch if possible to aurora and beta - my local tree isn't exactly in a state to do that properly just now.
Keywords: checkin-needed
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #46)
> Hoping a sheriff can uplift my patch if possible to aurora and beta - my
> local tree isn't exactly in a state to do that properly just now.

np,anytime! took care of this and landed on aurora and beta, so its in time for the beta 11 

https://hg.mozilla.org/releases/mozilla-aurora/rev/7d39fe1e571c
 https://hg.mozilla.org/releases/mozilla-beta/rev/d4312bf2632b
do we care about esr45 here ? noticed its marked as affected but so far no uplift approval/request
Flags: needinfo?(mconley)
Flags: needinfo?(lhenry)
[Tracking Requested - why for this release]:
i'd hope that a fix is considered for 45esr as well, since this is a user facing and highly annoying bug, as users who tend to shutdown windows while firefox is running will be greeted with about:sessionstore each day.
seems this is more for sylvestre
Flags: needinfo?(lhenry) → needinfo?(sledru)
Ditto - waiting on sledru's input.
Flags: needinfo?(mconley)
Comment on attachment 8737365 [details]
MozReview Request: Bug 1260461 - Don't flush windows when shutting down due to a Windows log-out. r?jimm

Taking it in esr45 too. Should be in 45.1.0
Flags: needinfo?(sledru)
Attachment #8737365 - Flags: approval-mozilla-esr45+
Requesting checkin-needed for the ESR 45 branch.
Keywords: checkin-needed
approval-mozilla-esr45+ is enough! :)
Keywords: checkin-needed
I verified this issue on:
- 46.0b11 build1 (20160414152344),
- 47.0a2 (2016-04-18), 
- 48.0a1 (2016-04-18),
using 
- Windows 10 x64, 
- Windows 7 x64, 
- Windows 8.1 x86,
so these builds are verified fixed. 
I will continue verifying the issue on the affected builds, as soon they are fixed.
Blocks: 1260462
I verified this issue on 45.1.0esr build1 (20160420142932), too, using 
- Windows 10 x64, 
- Windows 7 x64, 
- Windows 8.1 x86.
Updating the bug since it was verified on each version.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Flags: needinfo?(iulia.cristescu)
You need to log in before you can comment on or make changes to this bug.