Closed Bug 662444 Opened 13 years ago Closed 4 years ago

Shutdown by exit(0), skip GCs and other unneeded operations

Categories

(Core :: XPCOM, task)

task
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1606879

People

(Reporter: taras.mozilla, Unassigned)

References

(Depends on 4 open bugs, Blocks 3 open bugs)

Details

(Keywords: addon-compat, Whiteboard: [Snappy:p1])

Firefox shutdown is slow and sometimes deadlocks. Explicit exit is encouraged by microsoft. We should only do the full shutdown in debug builds.
Assignee: nobody → kevin.gadd
Except for the parts where we save user data during shutdown, right?
Remove profile locks, etc. :)
Given the current architecture, we at least need to get through profile-before-change. We also need to audit statics so that destructors don't expect a full shutdown sequence.
A completely non technical user remark. Shutting down seems to get slower the longer Firefox (using the latest 5 from the beta channel) is open: A 1 day old firefox takes 950MB virtual memory and about 60 seconds to shutdown on my machine. During this shutdown, virtual memory usage peaks to 1020 MB (all monitored with Process Explorer on windows). Restarting Firefox with the same set of tabs, is about 660 MB virtual, takes only 15 seconds for shutdown and peaks only to 680 MB.
Sorry, wandering through the buglists, I realize, that my comment could also be a unqualified duplicate of bug 661820.
Blocks: zombieproc
Reassigning to Rafael since Kevin moved on to other projects. I'm hoping to have at-least a proof of concept implementation by end of the year.
Assignee: kevin.gadd → respindola
Related, OS X 10.6 supports an API called "Sudden Termination", where you can indicate to the OS that it's okay to SIGKILL your process instead of asking for a graceful shutdown:
http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSProcessInfo_Class/Reference/Reference.html#//apple_ref/doc/uid/20000316-SW3

You can toggle it on and off at any time, so that if you have outstanding data that needs to be written you aren't killed while that's happening.
Whiteboard: [Snappy[
Whiteboard: [Snappy[ → [Snappy]
Whiteboard: [Snappy] → [Snappy:p1]
Isn't bug 407981 about saving user data on shutdown being slow?
This seems to be about something else.
(In reply to Martijn Wargers [:mw22] (QA - IRC nick: mw22) from comment #10)
> Isn't bug 407981 about saving user data on shutdown being slow?
> This seems to be about something else.

No that bug is about doing "unneeded operations" on shutdown which this bug aims to solve. In particular it described symptoms of running GC/Cycle-collection when firefox is paged out... That wont happen once this bug is fixed.
Would it be possible to save some user data ahead of time? Perhaps during idle? That might also help a little. And is there any way to discourage user data from being cached out in the first place?

I've had ideas on this for quite a while, but, since I'm not actually a programmer, I wasn't sure if I should say anything.
Depends on: 702815
Depends on: 706647
OS: Linux → All
Hardware: x86 → All
If the shutdown can deadlock (per comment 0), fixing this could help those tons of bugs filed about firefox/thunderbird not properly exiting after shutdown (users finding that out by seeing the process running or subsequest start telling them profile is locked (ff running)).
Yes, quite possibly.
Several add-ons depend on xpcom* shutdown observers, so I'm flagging this for addon-compat. Is there an expected milestone for this to land?
Keywords: addon-compat
(In reply to Jorge Villalobos [:jorgev] from comment #17)
> Several add-ons depend on xpcom* shutdown observers, so I'm flagging this
> for addon-compat. Is there an expected milestone for this to land?

I optimistically aim for Firefox 14.
I was checking if the code needed to be refactored to allow us to quit after profile-before-change, but it looks like it is already in a good shape from the notification side at least.

The shutdown looks like:

~ScopedXPCOMStartup():
   calls nsXREDirProvider::DoShutdown()
   calls NS_ShutdownXPCOM()

nsXREDirProvider::DoShutdow():
   sends the profile-* messages
   NS_ShutdownXPCOM sends the xpcom-* messages and does its own shutdown

So I guess we can call this bug done when ~ScopedXPCOMStartup() looks like

~ScopedXPCOMStartup():
   calls nsXREDirProvider::DoShutdown()
   if (FastExit)
     _exit(0);
   calls NS_ShutdownXPCOM()

I will open a bug about what exactly the "if (FastExit)" should be and try to start moving it up.
No longer depends on: 733358
Depends on: 711076
Depends on: 753461
IMHO
can we save bookmarks when updated/changed(saves a lot of shutdown time)
With this we can increase the bookmarks(nested)to unlimited(yes some people use it & currently i have over 1k so too 50+ of my colleagues which makes FF more slow)
Moreover, the file should NOT include:
a. Favicons
b. Bookmark Description, which is typically very long and useless! & saving them at shutdown causes more slowness
& preferably without favicons or bookmark description field, both of which dramatically increase the size of the bookmarks file.
(In reply to magnumarchonbasileus from comment #20)
> IMHO
> can we save bookmarks when updated/changed(saves a lot of shutdown time)

We are already doing this, unless you mean the json or html backup. The latter is mostly deprecated and not enabled by default, the former usually happens at idle.
(In reply to Marco Bonardo [:mak] from comment #21)
> (In reply to magnumarchonbasileus from comment #20)
> > IMHO
> > can we save bookmarks when updated/changed(saves a lot of shutdown time)
> 
> We are already doing this, unless you mean the json or html backup. The
> latter is mostly deprecated and not enabled by default, the former usually
> happens at idle.

Yes precisely
Thank-you for the info :)
This also needs to be done?

Moreover, the file should NOT include:
a. Favicons
b. Bookmark Description, which is typically very long and useless! & saving them at shutdown causes more slowness
& preferably without favicons or bookmark description field, both of which dramatically increase the size of the bookmarks file.
Depends on: 722243
Jorge, the changes addon authors may need to make for this bug are based on the ordering at https://wiki.mozilla.org/XPCOM_Shutdown:

* The following observer topics will usually not happen in release builds:
** xpcom-will-shutdown
** xpcom-shutdown
** xpcom-shutdown-threads
** xpcom-shutdown-gc
** xpcom-shutdown-loaders

Extension authors *should* make sure that all user data persists to disk by the end of the profile-before-change notification.

Extension authors *should not* perform memory-cleanup activities during profile-before-change, but can do so later in the xpcom-shutdown-* notifications.
Rafael, we should probably make the Storage threads merge (the spinning you added recently) earlier, right now they merge at xpcom-shutdown-threads, so we may lose important work. Should not be a big deal, especially once we have bug 722243.
The problem is that we'd need to do those just after profile-before-change, but the first notification there is xpcom-will-shutdown, that looks like is going away.
(In reply to Marco Bonardo [:mak] from comment #25)
> Rafael, we should probably make the Storage threads merge (the spinning you
> added recently) earlier, right now they merge at xpcom-shutdown-threads, so
> we may lose important work. Should not be a big deal, especially once we
> have bug 722243.
> The problem is that we'd need to do those just after profile-before-change,
> but the first notification there is xpcom-will-shutdown, that looks like is
> going away.

So, the spinning as is is a debug check.

The idea is that db users are still responsible for calling AsyncClose and making sure any scheduled important writes hit the disk before or during profile-before-change.

If there is some part of the code that is not waiting for its writes, the spinning will make sure the writes do happen late and write poisoning (once enabled) will catch them. In other words, the spinning prevents us from exiting without even starting executing an async statement that would write to disk.

Your proposal, if I understand it correctly, is to change the semantics to say that a db user is only responsible for calling AsyncClose. Firefox will guarantee that all async statements not explicitly marked as cancellable will be executed.

I agree that that is a much easier to use API, but would like some more comments on it if possible. Would you mind opening a bug for just this change?
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #26)
> So, the spinning as is is a debug check.

I meant the previous one, in bug 744294.

> The idea is that db users are still responsible for calling AsyncClose and
> making sure any scheduled important writes hit the disk before or during
> profile-before-change.

I partially disagree here, in the sense I don't think we should ask each user to spin the events loop on his own.  What they should do is mark work as cancelable.

> Your proposal, if I understand it correctly, is to change the semantics to
> say that a db user is only responsible for calling AsyncClose. Firefox will
> guarantee that all async statements not explicitly marked as cancellable
> will be executed.

Right.

> I agree that that is a much easier to use API, but would like some more
> comments on it if possible. Would you mind opening a bug for just this
> change?

Sure.
Depends on: 758343
I removed bug 751899 from the dependencies because it is OK to skip shutting down NSS.
No longer depends on: 751899
Some comments on this bug talk about exit(0), some about _exit(0). As bug 826029 shows, _exit(0) can be quiet a bit harder, so lets restrict this bug to exit(0).

I have reported 826143 to track _exit(0).
I don't understand comment 31. That bug seems to be a write from exit() not from _exit(). _exit should be the more abrupt (and therefore safer) function which does not call C++ static destructors or module _fini functions. I think we should only target _exit and should not target exit ever.
(In reply to comment #32)
> I don't understand comment 31. That bug seems to be a write from exit() not
> from _exit(). _exit should be the more abrupt (and therefore safer) function
> which does not call C++ static destructors or module _fini functions. I think
> we should only target _exit and should not target exit ever.

_exit will not cause C++ destructors, atexit functions, etc to be run, therefore code that relies on such mechanisms for persisting data to disk will not run and the result could be a dataloss.
Note we currently rely on atexit to delete the profile lock file, so using _exit would effectively prevent re-running firefox.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #32)
> I don't understand comment 31. That bug seems to be a write from exit() not
> from _exit(). _exit should be the more abrupt (and therefore safer) function
> which does not call C++ static destructors or module _fini functions. I
> think we should only target _exit and should not target exit ever.

Bug 826029 found a write that happens during exit() but not during _exit(). The point of having write poisoning is precisely to make sure we don't skip writes, and that one looks a hard to move earlier.
(In reply to Mike Hommey [:glandium] from comment #34)
> Note we currently rely on atexit to delete the profile lock file, so using
> _exit would effectively prevent re-running firefox.

Interesting! I wonder if we should poison unlink too.
(In reply to comment #36)
> (In reply to Mike Hommey [:glandium] from comment #34)
> > Note we currently rely on atexit to delete the profile lock file, so using
> > _exit would effectively prevent re-running firefox.
> 
> Interesting! I wonder if we should poison unlink too.

I think we should, it's a write like any other!
(In reply to Mike Hommey [:glandium] from comment #34)
> Note we currently rely on atexit to delete the profile lock file, so using
> _exit would effectively prevent re-running firefox.

Bad recollection, it's a static destructor now (it used to be atexit, but that made problems with dlclosing libxul). But that doesn't change much from the perspective of _exit.
Which static destructor is that? Under normal (non-exit) operation, the profile should be unlocked explicitly: http://hg.mozilla.org/mozilla-central/annotate/6955309291ee/toolkit/xre/nsAppRunner.cpp#l3920 and there shouldn't be any subsequent I/O from static destructors or atexit handlers related to the profile lock.

Any patches to exit() early should also make sure that the profile is explicitly unlocked and that we're not relying on static destructors/atexit handlers.

I'm really nervous about calling C++ static destructors while we're in the C++ stack... that's why I've been talking about _exit() or the Windows equivalent TerminateProcess, specifically so that we *will* skip C++ static destructors which potentially assume things about the stack being unwound.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #39)
> Which static destructor is that? Under normal (non-exit) operation, the
> profile should be unlocked explicitly:
> http://hg.mozilla.org/mozilla-central/annotate/6955309291ee/toolkit/xre/
> nsAppRunner.cpp#l3920 and there shouldn't be any subsequent I/O from static
> destructors or atexit handlers related to the profile lock.

http://hg.mozilla.org/mozilla-central/annotate/6955309291ee/profile/dirserviceprovider/src/nsProfileLock.cpp#l374
Yes, but... why is mPidLockList non-empty by the time we get to that destructor? It should be a no-op (it should be being removed at http://hg.mozilla.org/mozilla-central/annotate/6955309291ee/profile/dirserviceprovider/src/nsProfileLock.cpp#l659)
Depends on: 834945
Blocks: 823262
Hey folks, what's the status here? Is this project still prioritized?
Also, anyone know why bug 911820 (worker doing IO via ctypes) doesn't trigger some kind of abort with write poisoning?
I suspect that bug 916078 will be necessary to make exit(0) work consistently. Please inform me if you need to prioritize some components.
Not quite sure where to stick this, so I'm putting it here on the meta bug.

I just recently learned that Firefox does not write directly to places.sqlite, but to other files. Perhaps we could save some time on shutdown by merging those files not during shutdown but at idle.
See Also: → 580007
Status: REOPENED → NEW
Blocks: 1129455
We're a bit behind on that proof of concept implementation.
Assignee: respindola → nobody
We do already do this, to an extent, in content processes.
On the upside, now, we have AsyncShutdown, that should help keep this safe.

Just noting for any passers-by: the fxperf team is currently looking into working on this.

Type: defect → task
Status: NEW → RESOLVED
Closed: 10 years ago4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.