The application should perform a normal shutdown upon receiving SIGTERM/SIGINT/SIGHUP

NEW
Unassigned

Status

()

defect
P5
normal
13 years ago
3 months ago

People

(Reporter: dietrich, Unassigned)

Tracking

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 4 obsolete attachments)

The SIGTERM signal is the default signal sent by the `kill command, and is intended to allow applications to perform normal shutdown procedures prior to exiting.

When Firefox closes due to receiving the SIGTERM signal, normal shutdown procedures are not followed. For example, none of the following global notifications are sent:

quit-application-requested
quit-application-granted
quit-application
xpcom-shutdown
I don't know a lot about unix signal handling, but don't unix signals need to be handled synchronously? Or can we watch for SIGTERM, call nsIAppStartup.quit(), and return from the signal handler immediately, allowing the eventloop and stack to unwind gracefully later?

Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved work/multiple tabs), or eForceQuit?
I believe SIGTERM can be synchronous or asynchronous. It's a signal that can by hanled by the target process, with the basic intent of: "Please terminate yourself, whatever that may entail". This is in contrast to SIGKILL, which cannot be handled by the target process: "Hey kernel, immediately terminate that process over there".

For example, IIRC, in the case of SIGTERM Apache returns from the signal, and then goes about shutting down it's child processes.
If the app is left running when the machine is shut down then shutdown actually hangs. In practice the OS steps in and asks the user what to do about the app which is not responding (Firefox 2). 

I've seen this problem with 10.4.8 and 10.3.9
FWIW: That's the same on Linux.
People are closing their session often w/o closing every single application.

(In reply to comment #1)
> I don't know a lot about unix signal handling, but don't unix signals need to
> be handled synchronously? Or can we watch for SIGTERM, call
> nsIAppStartup.quit(), and return from the signal handler immediately, allowing
> the eventloop and stack to unwind gracefully later?

You can watch for SIGTERM since that signal should do the same as a normal exit. So you can catch the signal, start the shutdown process and return from the signal handler.
 
> Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved
> work/multiple tabs), or eForceQuit?

That's a harder question and I think there is no rule for that. I tend to vote for eForceQuit since if the application gets a SIGTERM the user may not be in front of the application (or display) at all.
Blocks: 365749
There is a bug report in ubuntu related to this bug. I'm just providing a link to it here:
https://bugs.launchpad.net/firefox/+bug/73536
Duplicate of this bug: 381828
(In reply to comment #4)
> > Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved
> > work/multiple tabs), or eForceQuit?
> 
> That's a harder question and I think there is no rule for that. I tend to vote
> for eForceQuit since if the application gets a SIGTERM the user may not be in
> front of the application (or display) at all.

I also think eForceQuit is better, although it would be fine to have at least some SIGTERM handling at all. I often just start ./firefox from the commandline when developing my extension and ctrl-c it when done with the result of getting a "do you want to restore tabs" message at next start :(
(In reply to comment #7)
> (In reply to comment #4)
> > > Should SIGTERM cause eAttemptQuit (prompting the user if there is unsaved
> > > work/multiple tabs), or eForceQuit?
> > 
> > That's a harder question and I think there is no rule for that. I tend to vote
> > for eForceQuit since if the application gets a SIGTERM the user may not be in
> > front of the application (or display) at all.
> 
> I also think eForceQuit is better, although it would be fine to have at least
> some SIGTERM handling at all. I often just start ./firefox from the commandline
> when developing my extension and ctrl-c it when done with the result of getting
> a "do you want to restore tabs" message at next start :(
> 

There is sent SIGINT on Ctrl+C instead of SIGTERM.
(In reply to comment #8)
> There is sent SIGINT on Ctrl+C instead of SIGTERM.

Right, my mistake. But usually it is handled the same as SIGTERM in most UNIX
programs, and so should Firefox to also "feel" like a true UNIX program which
can also be run/terminated from the command line.
Is this bug really restricted to Mac OS?

Running Debian GNU/Linux, I find it annoying that when I shutdown my computer via my window manager, Firefox thinks it crashed and asks me if I want to restore tabs the next time I boot up my computer. So I suppose I think that eAttemptQuit sounds better -- when you shut down your computer normally, you don't expect your programs to think that they crashed, rather you want them to exit normally. When you shut down your computer normally, it's okay for applications to ask you if you want to save your work.
Component: General → Startup and Profile System
OS: Mac OS X → All
QA Contact: general → startup
Hardware: Macintosh → All
Version: 2.0 Branch → Trunk
My vote: SIGTERM should force quit.  I just used "kill" because Firefox was doing its spinlocking thing and taking 5min to get to each window to ask if I really wanted to close it.  (I'll file a separate bug if that doesn't exist... quit should ask ONCE, not once for every window.)  Anyway, leaving aside the CPU usage problem, "kill" bypasses all that.  It should definitely force a quit.

However: when it receives a SIGTERM it should NOT ask me on next start that it quit "unexpectedly".  No, I killed you on purpose.  Don't make me kill you again.
@comment 11: I think 'kill' should do the same as just clicking the "X" in most window managers. For a really forceful quit when firefox is hanging in an infinite loop we still have 'kill -9'
Product: Firefox → Toolkit
Duplicate of this bug: 459282
Blocks: 468835
Posted patch patch (obsolete) — Splinter Review
Here's a patch which appears to DTRT on Linux.  I don't have a Mac machine and I even less familiar with the OS X event loop than I am with the Linux one, so that bit would have to come from someone else.
Attachment #607268 - Flags: feedback?(benjamin)
Comment on attachment 607268 [details] [diff] [review]
patch

Actually, this does the right thing, but it would drastically inflate the version of GTK we require; g_unix_signal_add is said in the docs to be available from 2.30 onwards (though this version isn't actually available from the website...?), and we require 2.14.  So I guess the better way to do it is to roll our own pipes and such.
Attachment #607268 - Flags: feedback?(benjamin)
Posted patch patch (obsolete) — Splinter Review
Here's a better, somewhat more verbose patch that doesn't require anything beyond what we already use.  When receiving SIG{INT,TERM} and restarting, we now display about:home with an option to restore your last session instead of the "this is embarrassing" page.

As with any modular app using signals, I am unsure of how this interacts with code that already tries to deal with signals:

http://mxr.mozilla.org/mozilla-central/source/profile/dirserviceprovider/src/nsProfileLock.cpp#440

nsAppShell appears to install its handlers after this.

There's also some code in breakpad, which I assume doesn't come into play here.
Attachment #607268 - Attachment is obsolete: true
Attachment #608074 - Flags: feedback?(benjamin)
Breakpad doesn't handle SIGTERM, so you should be okay there. You will probably need to make sure that your code interacts properly with the profile locking code, since the purpose of that signal handler is to clean up the profile lock correctly upon exit.
Assignee: nobody → nfroyd
(In reply to Ted Mielczarek [:ted] from comment #19)
> Breakpad doesn't handle SIGTERM, so you should be okay there. You will
> probably need to make sure that your code interacts properly with the
> profile locking code, since the purpose of that signal handler is to clean
> up the profile lock correctly upon exit.

Doing a clean exit (nsAppShell::Exit) from handling the signal won't cause the profile unlocking code to be run?  I do still see a .parentlock in my profile.

This is going to be tricky: we can't run the old signal handler before nsAppShell::Exit, because nsProfileLock's signal handlers run _exit or the SIG_DFL handler, which would exit.  But we can't run the old signal handler after nsAppShell::Exit because we've exited.
I suppose the right thing to do here then is either:
a) Drop SIGTERM support from nsProfileLock, make your new code call into the nsProfileLock code to unlock the profile before exiting.
or
b) Forget about your new code, just tweak nsProfileLock to special-case SIGTERM and do the handling you're doing.
Two things:

1. You should probably use nsAppStartup::Quit because it will do the Right Things. This includes sending the proper shutdown notifications to code that needs to clean up.
2. Should probably make this generic enough to work on all unixish systems instead of just GTK2. Sounds like ted's second suggestion does that.
(In reply to Michael Wu [:mwu] from comment #22)
> 1. You should probably use nsAppStartup::Quit because it will do the Right
> Things. This includes sending the proper shutdown notifications to code that
> needs to clean up.

Mmm, good point.

> 2. Should probably make this generic enough to work on all unixish systems
> instead of just GTK2. Sounds like ted's second suggestion does that.

It does, but it runs rather a lot of code from inside the signal handler, which seems like a Bad Idea.  To make sure things run from outside the signal handler, we need integration with the event loop.  I guess we could have nsAppShell::PostSignalEvent(int signum) with a default do-nothing implementation?
Posted patch patch (obsolete) — Splinter Review
New patch using nsIAppStartup; also fewer printfs.
Attachment #608074 - Attachment is obsolete: true
Attachment #609331 - Flags: feedback?(benjamin)
Attachment #608074 - Flags: feedback?(benjamin)
I am not the right person to review this, having never written anything useful with signal handlers. If I understand this correctly, the signal handler itself is signaling a pipe which wakes up the event loop. Is this because it is not safe to make any GTK calls from within the signal handler? I do think that eForceQuit is correct for SIGTERM, but I wonder if we shouldn't be using eAttemptQuit for SIGQUIT, so that it can properly prompt for unsaved work.
Attachment #609331 - Flags: feedback?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #15)
> ok, fix it! I'd happily review a patch!

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #25)
> I am not the right person to review this, having never written anything
> useful with signal handlers.

I am confused. :)

> If I understand this correctly, the signal
> handler itself is signaling a pipe which wakes up the event loop. Is this
> because it is not safe to make any GTK calls from within the signal handler?

Doing almost anything non-trivial from a signal handler is a Bad Idea, unless you write your code very very carefully.  I do not think GTK code follows the necessary rules for being callable from a signal handler.

> I do think that eForceQuit is correct for SIGTERM, but I wonder if we
> shouldn't be using eAttemptQuit for SIGQUIT, so that it can properly prompt
> for unsaved work.

Wikipedia suggests that SIGQUIT is sent "when the user requests that the process perform a core dump."  The glibc info pages agree, but also suggest that "certain kinds of cleanups are best omitted in handling SIGQUIT."  (e.g. you might want to examine temporary files along with the core dump you just received.)  Given this information, I think eForceQuit is acceptable for both signals...and I doubt people SIGQUIT'ing applications is all that common anyway.
Oh, I confused SIGINT with SIGQUIT. SIGINT should presumably do normal prompting, I guess SIGQUIT doesn't need to as you said.

Somebody who knows signal handlers and GTK well should be the primary code reviewer for this patch (which isn't me). I suggest karlt.
Posted patch patchSplinter Review
Redirecting f? to karlt as bsmedberg suggests.  Updated patch with SIGQUIT handling as well.

I left SIGINT doing a eForceQuit; from comments in this bug, it sounds like people start firefox from the command line and then C-c it.  For those sorts of usecases, it would be annoying to click a dialog.  (Firefox should just save the data always, without asking, but that's a separate discussion.)
Attachment #609331 - Attachment is obsolete: true
Attachment #618671 - Flags: feedback?(karlt)
Comment on attachment 618671 [details] [diff] [review]
patch

Communicating via pipe is the only reasonable async-signal-safe way that I
know to signal the event loop.
eForceQuit on SIGINT sounds sensible to me.
I wouldn't implement this for SIGQUIT for the reasons you list in comment 26.

It should be possible to share the single existing pipe by sending the
appropriate tokens down the same mPipeFDs.

Is there a reason you chose to use SA_RESTART?  Not sure it makes much
difference in practice, but I would have expected an interrupt signal to try
to abort ASAP rather than restarting a system call.

These signal handlers should by removed once one is called.  Once they've sent
the eForceQuit, sending that again will not achieve much.  When they remove
themselves, they should restore the previous signal handler.  See how this is
done in nsProfileLock.cpp, and nsProfileLock is really the reason why the
previous handler needs to be restored.

I assume embedders have their own event loop and so this code won't be used.
That is good as this could interfere with their signal handlers.

The app is meant to exit by reraising the signal.  The glibc page says "should
end by specifying the default action for the signal that happened and then
reraising it", but I think it is actually more correct use the previous action
to chain up to other signal handlers, which will eventually raise with the
default action.  If there is no appropriate place to raise the right signal,
then this may be more trouble than it is worth.  If we don't do this, then
that is another good reason not to handle SIGQUIT.
Attachment #618671 - Flags: feedback?(karlt) → feedback+
Think also about what happens when a signal is received after the nsAppShell is deleted.
(In reply to Karl Tomlinson (:karlt, offline til July 2) from comment #29)
> Is there a reason you chose to use SA_RESTART?  Not sure it makes much
> difference in practice, but I would have expected an interrupt signal to try
> to abort ASAP rather than restarting a system call.

I would expect that too, but since we're not actually quitting immediately, enabling automatic restart of anything still in-progress seems reasonable, giving those calls a chance to complete, etc.  I'd be really surprised if all the code in Gecko &co. was prepared to handle interrupted system calls anyway.

> The app is meant to exit by reraising the signal.  The glibc page says
> "should
> end by specifying the default action for the signal that happened and then
> reraising it", but I think it is actually more correct use the previous
> action
> to chain up to other signal handlers, which will eventually raise with the
> default action.  If there is no appropriate place to raise the right signal,
> then this may be more trouble than it is worth.  If we don't do this, then
> that is another good reason not to handle SIGQUIT.

I assume you mean re-raising once we've handled application shutdown?  Oof, that's quite some complexity.  I appreciate the philosophical point behind trying to exit with the proper codes, but I'm not sure it makes much pragmatic difference here.  And for things like nsProfileLock, it seems like a proper invocation of the shutdown process, as triggered by this patch, makes much more sense than trying to invoke nsProfileLock's signal handlers.
Assignee: nfroyd → nobody
(In reply to Nathan Froyd (:froydnj) from comment #31)
> I assume you mean re-raising once we've handled application shutdown?

Yes.

> Oof, that's quite some complexity.

Yes, it could well be too much trouble.  It can be attacked separately if/when that turns out to be important for anything.

> And for things like nsProfileLock, it seems like
> a proper invocation of the shutdown process, as triggered by this patch,
> makes much more sense than trying to invoke nsProfileLock's signal handlers.

Yes, a proper shutdown is definitely the appropriate response to the first signal.
If the lock has been removed, then subsequently running nsProfileLock's handlers should be harmless.
Getting this fixed should fix some issues with coverage measurement and external applications using the browser. My current problem is that an external testing application uses SIGTERM to signal the browser to terminate but GCOV data is not written out because no regular exit is performed.

Are there any remaining problems with the attached patch?
I have the same problem, essentially with JS Test Driver trying to close Firefox (I assume using SIGTERM). Sometimes when JS Test Driver tries to restart the browser, the This is embarrassing dialog pops up and JS Test Driver is unable to capture the browser to run unit tests. I see this problem on my Mac and we think we see this problem when we try to run headlessly in a Solaris environment using XVFB. Any word on when this might be fixed? It sounds like there is already a potential patch for it.
Still not resolved?  FWIW, I would deal with SIGINT and SIGTERM using the same handler.  Functionally this should be no different than whatever happens when the user chooses File/Quit or hits Ctrl-Q.
Blocks: 1032010
Duplicate of this bug: 1088008
(In reply to Christian Holler (:decoder) from comment #33)
> 
> Are there any remaining problems with the attached patch?

What happened back in December 2012? Did the patch make it to the official source?

The issue is still present in the current version of Firefox for Linux.
Someone needs to rebase the patch here and push it to try. I'd be willing to mentor this, but I don't have the time to do it myself.
Posted patch bug336193.patch (obsolete) — Splinter Review
Summary:

 * nsProfileLock is already rigged for handling signals and is already terminating (fatally) so it is only a small change to make it clean, this is preferable to a GTK-specific implementation.
 * A clean quit should be performed for SIGINT and SIGTERM, but not SIGQUIT as it is not intended to be clean.
 * nsAppStartup::Quit calls nsIAppShell::Exit asynchronously which does most of the work so it doesn't block the signal handler.
 * raise() needs to be called at some point after calling Quit().
 * To avoid SessionStore attempting to restore a crashed session, SessionFile needs to be closed properly, complete a final write, notify CrashMonitor with "sessionstore-final-state-write-complete" which then needs to write a checkpoint to a file asynchronously. This appears to happen before the observer notification "profile-change-teardown", even if it takes a long time, however documentation suggests the later notification "profile-before-change" is more appropriate.

This patch avoids SessionStore problems with SIGINT/SIGTERM by calling Quit(eForceQuit) in the nsProfileLock signal handler and observing "profile-before-change" for later raising the signal. As usual, if termination takes too long a second SIGINT/SIGTERM can be sent which causes immediate termination via the default handler and most systems have a timeout that will also eventually kill the process.
Attachment #8626516 - Flags: review?(jmathies)
Comment on attachment 8626516 [details] [diff] [review]
bug336193.patch

Review of attachment 8626516 [details] [diff] [review]:
-----------------------------------------------------------------

Karl, mind reviewing this? I'm not qualified to review unix signal handling code.
Attachment #8626516 - Flags: review?(jmathies) → review?(karlt)
Assignee: nobody → kestrel
Comment on attachment 8626516 [details] [diff] [review]
bug336193.patch

Unfortunately, the last paragraph of comment 23 is still relevant.

(In reply to Kestrel from comment #39)
>  * nsAppStartup::Quit calls nsIAppShell::Exit asynchronously which does most
> of the work so it doesn't block the signal handler.

It is more complicated than just not blocking the signal handler.
See man 7 signal:
  "Async-signal-safe functions
       A signal handler function must be very careful, since processing
       elsewhere may be interrupted at some arbitrary point in the execution
       of the program.  POSIX has the concept of "safe function".  If a
       signal interrupts the execution of an unsafe function, and handler
       calls an unsafe function, then the behavior of the program is
       undefined."

One of the most likely problems is taking a lock, if the process is interrupted with the lock already held.  For example, nsComponentManagerImpl::GetService() uses locks, and memory allocations are also likely to use locks.

Also, unless the signal mask of every thread is carefully managed, the signals may be received on any thread.  Therefore any methods called that have state would need to be thread-safe, but locks can't be used for the reason described above.

"A process-directed signal may be delivered to any one of the threads that
 does not currently have the signal blocked.  If more than one of the threads
 has the signal unblocked, then the kernel chooses an arbitrary thread to
 which to deliver the signal."
Attachment #8626516 - Flags: review?(karlt) → review-
The SignalPipeWatcher class defined in xpcom/base/nsDumpUtils.cpp might be of interest — it's used to trigger memory reports from a signal handler (via the usual pipe-to-self trick), which sounds like the same basic problem as this bug.
What about the attempt in this patch? I intercept SIGTERM using "SignalPipeWatcher" as suggested, then I pass a task to the main thread so it calls "nsIAppStartup::Quit(nsIAppStartup::eForceQuit)".

That may seem convoluted, it would look better if it was done in one go. 

I don't have any experience on the Firefox codebase, so I may completely misunderstand the issues. If "Quit" is executed by the main thread (like it would be after a keyboard key for example), wouldn't it avoid the aforementioned problems with the locks.
This is a problem for geckodriver and for anything else that's using marionette to control the browser (e.g. various test harnesses including wpt). If we don't shut down the browser gracefully we lose out on the ability to do various things that depend on clean shutdown (certainly leak logging, perhaps code coverage). Marionette has a method to invoke shutdown, but sending the response races with closing the marionette connection, so it isn't reliable. Signalling the process with SIGTERM would be the obvious solution, except for this bug. So we either need a way to ensure that the marionette response is sent before the corresponding socket is closed, or a fix here.
Comment on attachment 8750107 [details] [diff] [review]
Bug 336193 handle SIGTERM in SignalPipeWatcher then in the main thread to quit

Let's at least review all the patches here.
Attachment #8750107 - Flags: review?(karlt)
Priority: -- → P5
Comment on attachment 8750107 [details] [diff] [review]
Bug 336193 handle SIGTERM in SignalPipeWatcher then in the main thread to quit

I think this approach will work well, thanks.

SIGTERM will have no effect if Firefox gets stuck during shutdown, but I guess
that's reasonable because SIGTERM is intended to effect normal shutdown.  If
Firefox shutdown is broken in such a way that it gets stuck, then other
signals are available to terminate the process.  The alternative of
reinstating the previous handler after the first SIGTERM would lead to the
disadvantage that two SIGTERMs would terminate the app before shutdown
completes even if shutdown would otherwise complete as expected.

> #endif
> 
>+
>   *aRetVal = true;

No extra newline here please.

>+    die_cb(NULL, NULL);

nullptr
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#CC_practices

>+namespace {
>+  void termSignalHandler(const uint8_t aRecvSig) {

Please use static instead of the anonymous namespace.
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Anonymous_namespaces

Gecko capitalizes the first letter of C++ method names (distinct from variable
names).

>+      RefPtr<QuitTask> task = new QuitTask();
>+      NS_DispatchToMainThread(task);

NS_DispatchToMainThread(task.forget()); to save unnecessary ref count toggle.

Please also remove the trailing whitespace in this patch.

Some changes will be needed to merge with changes for bug 1372405.
NS_NewRunnableFunction() may be useful.
Attachment #8750107 - Flags: review?(karlt) → feedback+
(In reply to James Graham [:jgraham] from comment #45)
> Marionette has a method to invoke shutdown,
> but sending the response races with closing the marionette connection, so it
> isn't reliable. Signalling the process with SIGTERM would be the obvious
> solution, except for this bug. So we either need a way to ensure that the
> marionette response is sent before the corresponding socket is closed, or a
> fix here.

What kind of response are you expecting from SIGTERM that you would not get
from a marionette shutdown?
(In reply to Karl Tomlinson (:karlt) from comment #48)

> (In reply to James Graham [:jgraham] from comment #45)
> 
> > Marionette has a method to invoke shutdown, but sending the
> > response races with closing the marionette connection, so it
> > isn't reliable. Signalling the process with SIGTERM would be
> > the obvious solution, except for this bug. So we either need a
> > way to ensure that the marionette response is sent before the
> > corresponding socket is closed, or a fix here.
> 
> What kind of response are you expecting from SIGTERM that you
> would not get from a marionette shutdown?

As I understand it, handling SIGTERM would ensure a graceful
shutdown allowing leak logs to be generated before the process
exits.

The problem with the Marionette shutdown command (Marionette:Quit)
is that the TCP socket is not guaranteed to be flushed before
calling Services.startup.quit causes the process to end.  If there
was a way through the nsIServerSocket XPCOM interface to set
SO_LINGER or similar this would block process termination, but the
Marionette issue here is essentially irrelevant to the context of
this bug.
Summary: The application should perform a normal shutdown upon receiving SIGTERM → The application should perform a normal shutdown upon receiving SIGTERM/SIGINT/SIGHUP
No longer blocks: 365749
Duplicate of this bug: 365749
No longer blocks: 1032010
Duplicate of this bug: 1032010
Attachment #8626516 - Attachment is obsolete: true
Here's a revised version of the SignalPipeWatcher patch.

* SIGTERM/SIGINT/SIGHUP all do a graceful exit. I tested Chromium, VLC and GIMP using strace with these signals and they all exit normally returning exit codes and not signals.

* SignalPipeWatcher does not chain to previous handler and registers after nsProfileLock so it takes over for these signals. The nsProfileLock signal handler is unwanted since we are doing a normal exit which unlocks the profile in the destructor.

* After the first termination attempt the previous handler is restored so subsequent attempts can unlock the profile and kill the process immediately like before.
Comment on attachment 8959924 [details]
Bug 336193 - P1: Use SignalPipeWatcher to detect signals SIGTERM/SIGINT/SIGHUP and quit application gracefully.

https://reviewboard.mozilla.org/r/228682/#review235302

(I'm not an XPCOM peer and so I considered passing the nsDumpUtils part to
Nathan, but the initial implementation of SignalPipeWatcher was not reviewed
by an XPCOM peer either, and so I guess it doesn't really matter who reviews.)

::: toolkit/xre/nsNativeAppSupportUnix.cpp:692
(Diff revision 1)
> +      static bool dispatchQuit = false;
> +      if (!dispatchQuit) {
> +        dispatchQuit = true;
> +        ForceAppQuitAsync();
> +      }

Is the |dispatchQuit| single quit logic necessary?
It looks like nsAppStartup::mShuttingDown already ensures that multiple Quit() calls will be harmless.

https://searchfox.org/mozilla-central/rev/b29daa46443b30612415c35be0a3c9c13b9dc5f6/toolkit/components/startup/nsAppStartup.cpp#393

::: xpcom/base/nsDumpUtils.h:146
(Diff revision 1)
>    PipeCallback mCallback;
> +  struct sigaction oldAction;

Please follow existing convention with |mOldAction|.

::: xpcom/base/nsDumpUtils.cpp:145
(Diff revision 1)
>    }
> -  SignalInfo signalInfo = { aSignal, aCallback };
> +
> +  // Save current signal action so it can be restored later
> +  struct sigaction oldAction = { 0 };
> +  if (sigaction(aSignal, nullptr, &oldAction) != 0 ||
> +      oldAction.sa_handler == SIG_IGN) {



::: xpcom/base/nsDumpUtils.cpp:145
(Diff revision 1)
>    }
> -  SignalInfo signalInfo = { aSignal, aCallback };
> +
> +  // Save current signal action so it can be restored later
> +  struct sigaction oldAction = { 0 };
> +  if (sigaction(aSignal, nullptr, &oldAction) != 0 ||
> +      oldAction.sa_handler == SIG_IGN) {

Why prevent registering callbacks for signals where the old action is SIG_IGN?

I guessing this is not necessary.

::: xpcom/base/nsDumpUtils.cpp:163
(Diff revision 1)
> +  // Restore previous signal action
> +  MutexAutoLock lock(mSignalInfoLock);
> +  for (SignalInfoArray::index_type i = 0; i < mSignalInfo.Length(); i++) {
> +    if (aSignal == mSignalInfo[i].mSignal) {
> +      struct sigaction* oldAction = &mSignalInfo[i].oldAction;
> +      if (oldAction->sa_handler &&

SIG_DFL is typically 0.

I assume this is intended to detect when oldAction has not been set, but mSignalInfo elements are added only when the old action is successfully retrieved.

Can this check just be dropped?
Attachment #8959924 - Flags: review?(karlt)
Comment on attachment 8959924 [details]
Bug 336193 - P1: Use SignalPipeWatcher to detect signals SIGTERM/SIGINT/SIGHUP and quit application gracefully.

https://reviewboard.mozilla.org/r/228682/#review235302

> Is the |dispatchQuit| single quit logic necessary?
> It looks like nsAppStartup::mShuttingDown already ensures that multiple Quit() calls will be harmless.
> 
> https://searchfox.org/mozilla-central/rev/b29daa46443b30612415c35be0a3c9c13b9dc5f6/toolkit/components/startup/nsAppStartup.cpp#393

It makes sense that it was designed for handling repeated calls and I can't find any other code that does this extra check so it is indeed redundant.

> Why prevent registering callbacks for signals where the old action is SIG_IGN?
> 
> I guessing this is not necessary.

I based this off nsProfileLock's CATCH_SIGNAL() which does not register for signal handlers that have already been set to SIG_IGN. I guess the logic is that if some other component has already chosen to ignore the signal then there is probably a good reason for that and we shouldn't interfere. It would also be inconsistent if we treated it like normal, started shutdown and then restored SIG_IGN which would not follow through with killing the process for subsequent termination attempts. It is always possible to change the signal handler to something else before calling RegisterCallback() to get around this if need be.

> SIG_DFL is typically 0.
> 
> I assume this is intended to detect when oldAction has not been set, but mSignalInfo elements are added only when the old action is successfully retrieved.
> 
> Can this check just be dropped?

Yeah this evolved into something bad, check is redundant and SIG_DFL needs to be restorable (oops!).
Comment on attachment 8959924 [details]
Bug 336193 - P1: Use SignalPipeWatcher to detect signals SIGTERM/SIGINT/SIGHUP and quit application gracefully.

https://reviewboard.mozilla.org/r/228682/#review235302

> I based this off nsProfileLock's CATCH_SIGNAL() which does not register for signal handlers that have already been set to SIG_IGN. I guess the logic is that if some other component has already chosen to ignore the signal then there is probably a good reason for that and we shouldn't interfere. It would also be inconsistent if we treated it like normal, started shutdown and then restored SIG_IGN which would not follow through with killing the process for subsequent termination attempts. It is always possible to change the signal handler to something else before calling RegisterCallback() to get around this if need be.

SignalPipeWatcher is more general than just handling termination signals (although the SignalPipeWatcher interface is not documented and so it is not clear what it is suitable for, and the whole situation with others also calling sigaction for the same signals is already delicate), and so I wouldn't treat the SIG_IGN case special if I were writing this myself.

Changing the signal handler to something else before calling RegisterCallback, doesn't work so well with UnregisterCallback.

But I checked "The default action for an unhandled real-time signal is to terminate the receiving process." and so the existing client, nsMemoryInfoDumper should not trigger this case.  That makes this harmless for now at least, and so feel free to drop this issue if you prefer.  At least we have a record of the reason for the check, so that someone can re-evaluate later if required.
Comment on attachment 8959924 [details]
Bug 336193 - P1: Use SignalPipeWatcher to detect signals SIGTERM/SIGINT/SIGHUP and quit application gracefully.

https://reviewboard.mozilla.org/r/228682/#review235956

r+ with or without the SIG_IGN case.
Attachment #8959924 - Flags: review?(karlt) → review+
Comment on attachment 8963553 [details]
Bug 336193 - P2: Use a second SIGTERM in killPid() to ensure parent process is killed.

https://reviewboard.mozilla.org/r/232486/#review237940

r+ with improved logging.

It seems odd that this is the only place in our test automation requiring this change. Other test harnesses have similar measures in place to ensure the browser is killed. Do none of them use sigterm?

::: testing/mochitest/runtests.py:344
(Diff revision 1)
>              for p in gone:
>                  log.info('psutil found pid %s dead' % p.pid)
>              for p in alive:
>                  log.info('failed to kill pid %d after 30s' % p.pid)
> +                # Try again, second SIGTERM is more forceful
> +                p.send_signal(signal.SIGTERM)

It is important for debugging intermittent failures that logging accurately identifies when a process cannot be killed.
I suggest you add another wait_procs call after the second sigterm and log its results before exiting killPid.
Attachment #8963553 - Flags: review?(gbrown) → review+
Comment on attachment 8963553 [details]
Bug 336193 - P2: Use a second SIGTERM in killPid() to ensure parent process is killed.

https://reviewboard.mozilla.org/r/232486/#review237940

Harnesses built on top of Marionette client as test runner will use SIGKILL to force killing the browser process. It is only used if the normal shutdown as requested through the application itself via `Services.startup.quit` did not work.
Does this bug also include the session not being properly counted as regularly closed when I shut down the computer normally with firefox open? (And firefox offering me a misguided tab restore after a supposed "crash" on next run) Because that's what I'm seeing here as a normal Linux desktop user. Or should I file another bug for that?
(In reply to jonas from comment #63)

That is a bug (IMO) in most distros' shutdown/reboot sequences, where they give applications no time to shut down before sending SIGKILL. I would report to your distro.
'killall firefox' has no timeout as far as I know, so I think that is completely irrelevant
(sorry, by which I meant to say more specifically: 'killall firefox' has the same issue, it prompts a session crash dialog for me on next startup. It's just not something I use often since I usually use the close button which works, but on shutdown the same mechanism is used - of course with a timeout as you correctly state, but that doesn't help when it doesn't even work correctly with no timeout)
Any news on this?
killall firefox         does not work at all (Ubuntu).
pkill -x firefox-bin    gives a crash dialog when starting firefox next time.
A simple and gentle way to end firefox from Linux commandline would be nice.

I'd like this feature, however if this is difficult to implement then a workaround for my use case would be if the firefox command could support a '--close' or similar option to exit gracefully, even if handled asynchronously and I had to poll to wait for exit.

Assignee: ke5trel → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.