Closed Bug 1250901 Opened 5 years ago Closed 4 years ago

Firefox hangs, force-quit stops process but Firefox icon remains in dock

Categories

(Firefox :: General, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 51
Tracking Status
firefox47 --- affected
firefox51 --- fixed

People

(Reporter: spohl, Assigned: spohl)

References

Details

(Whiteboard: workaround in comment 8)

Attachments

(3 files, 3 obsolete files)

Attached file Sample of Firefox.txt (obsolete) —
This is similar to bug 1242888 but doesn't seem to depend on a pending update.

This issue reproduces on a Mac that I have access to (I'm not the primary user) and occurs at least once a day. At seemingly random points in time, Firefox becomes unresponsive (Activity Monitor displays "not responding"). Force-quitting Firefox by right-clicking on the dock icon will stop the process, but the icon remains in the dock. Identically to bug 1242888, Firefox cannot be reopened by simply clicking the dock icon. It has to either be started from within the .app bundle or the following commands have to be executed in Terminal to remove the icon from the dock first:

sudo killall launchservicesd
sudo killall Dock

I was able to get a process sample from Activity Monitor when Firefox was in the hanging state (see attachment). Markus, does anything jump out at you or is there anything else I could try to isolate this?
Flags: needinfo?(mstange)
The sample looks fairly normal, I think. It only covers a time span of 52ms, and all it really says is that we're in shutdown and just received some data from the network.

I still don't know what could cause the dock icon to be stuck like that. I think the last time I saw something like that was when I had lldb attached to the process and forgotten about it. Maybe there is another process attached to our Firefox process? (Updater? Crash reporter?)
I don't know how to find that out.
Flags: needinfo?(mstange)
Thanks, Markus. I'm hoping I can get some "alone time" with this system in the coming days when it's in the hanging state. I'll report back if I find anything noteworthy.
I have encountered what seems to be the same problem. Somehow, there are two Firefox icons in the dock. Force Quit in the dock does not remove them (I had not noticed that the process had been stopped, but
that makes sense as I could not see the process with ps to kill it manually). Neither does Force Quit
from the Apple menu work.

Worse though, attempting to shut down the computer fails. I have had to stop the machine using the power button.

Today when the two icons appeared, it was immediately after installation of the latest update to 45.0.2.
I can't be sure this was true on previous occasions, but I have certainly seen this happen three times at
least now.

OS X El Capitan 10.11.4 on a MacBook Pro.
I've seen this maybe 5 or so times in the last month. Like the other reports, Firefox remains in the dock even though there are no Firefox processes actually running, and I have to use the power button to kill the computer (because it just sits waiting for Firefox to exit, presumably, and it can't Force Quit it because there are no processes).
Just saw this happen on Jim Cook's mac in the MV office as well.
FWIW, it seems like having Firefox in the Mac's Login Items was the issue on Jim's machine. Will update if it still occurs after removing it.
Note that we have a workaround in Terminal that doesn't require to restart the system:

sudo killall launchservicesd
sudo killall Dock

Doesn't fix the issue, but hopefully we can keep people from killing their entire system when this occurs.
Whiteboard: workaround in comment 8
Johnny, can you give a quick synopsis of what you saw on Jim Cook's mac?
Flags: needinfo?(jst)
Attached patch Patch (obsolete) — Splinter Review
Although the updater has switched from posix_spawnp to NSTask methods to launch applications (see bug 394984 attachment 8747283 [details] [diff] [review]), Firefox continued to use posix_spawnp for this purpose. posix_spawnp may be the cause for issues like the one reported in this bug, as well as the ones outlined in bug 996056 comment 0 and the bottom of bug 996056 comment 4.

For reference, the discussion about the move from posix_spawnp to NSTask for the updater occurred in bug 394984 comment 174, bug 394984 comment 180 and bug 394984 comment 188. We now also know that there has been no regression from this change, aside from an issue in the firefox-ui tests that has since been fixed (bug 1276220). Bug 1276220 comment 9 and bug 1276220 comment 10 may actually further confirm that the previous behavior with posix_spawnp was unintended and may be the culprit for bugs like the one reported here.

Everything considered, I believe there is a good chance that the move to NSTask will fix this bug. As discussed in bug 394984, the reasons why we used to use posix_spawnp are no longer applicable (we no longer support OSX 10.5) and since the updater has switched to NSTask with virtually no regression, I'm confident that at worst, nothing changes and at best, we can knock out a bunch of weird bugs.
Assignee: nobody → spohl.mozilla.bugs
Status: NEW → ASSIGNED
Attachment #8783782 - Flags: review?(mstange)
(In reply to Jim Mathies [:jimm] from comment #9)
> Johnny, can you give a quick synopsis of what you saw on Jim Cook's mac?

I have few details, but Potch and Lonnen I believe got to witness what happened here, needinfo on them.
Flags: needinfo?(thepotch)
Flags: needinfo?(jst)
Flags: needinfo?(chris.lonnen)
From what I observed, Firefox got into a state that necessitated a Force Quit (a chrome process hang), and after being Force Quit, Firefox remained both active in the Dock and in the Force Quit menu, despite no longer having a living process (verified with ps -ax). I inspected Jim's Login Items on his computer and found Firefox in there. I've seen issues with having Firefox be auto-started by launchctl, so I removed this item. The issue seemed to go away, but I haven't observed his machine since. Will follow up with Jim to see if the issue has happened since.
Flags: needinfo?(thepotch)
Flags: needinfo?(chris.lonnen)
Comment on attachment 8783782 [details] [diff] [review]
Patch

Review of attachment 8783782 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, let's try this. Even if it doesn't fix the bug, it's good code cleanup.

::: toolkit/xre/MacLaunchHelper.mm
@@ +25,4 @@
>  
> +  @try {
> +    NSString* launchPath = [NSString stringWithUTF8String:aArgv[0]];
> +    NSMutableArray* arguments = [NSMutableArray arrayWithCapacity:aArgc];

That's one more than we need, but it doesn't matter. (We only need aArgc - 1.)

@@ -59,5 @@
> -
> -  cpu_type_t *wanted_type = pref_cpu_types;
> -  size_t attr_count = ArrayLength(pref_cpu_types);
> -
> -  if (aRestartType & nsIAppStartup::eRestarti386) {

Can you file a follow-up bug on removing these constants? Or on documenting them as non-functional, if we want to keep them for API compatibility.
Attachment #8783782 - Flags: review?(mstange) → review+
Blocks: 1298930
Attached patch PatchSplinter Review
Updated for current trunk.

(In reply to Markus Stange [:mstange] from comment #13)
> ::: toolkit/xre/MacLaunchHelper.mm
> @@ +25,4 @@
> >  
> > +  @try {
> > +    NSString* launchPath = [NSString stringWithUTF8String:aArgv[0]];
> > +    NSMutableArray* arguments = [NSMutableArray arrayWithCapacity:aArgc];
> 
> That's one more than we need, but it doesn't matter. (We only need aArgc -
> 1.)

Since I had to update the patch for current trunk anyway, I went ahead and fixed this here and in |LaunchChild| in launchchild_osx.mm.

> @@ -59,5 @@
> > -
> > -  cpu_type_t *wanted_type = pref_cpu_types;
> > -  size_t attr_count = ArrayLength(pref_cpu_types);
> > -
> > -  if (aRestartType & nsIAppStartup::eRestarti386) {
> 
> Can you file a follow-up bug on removing these constants? Or on documenting
> them as non-functional, if we want to keep them for API compatibility.

Good point. Filed bug 1298930 and marked as good-first-bug.
Attachment #8783782 - Attachment is obsolete: true
Attachment #8786161 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/4a42968b790dc16b481c5f0bb3a4b5b4fab3c7fb
Bug 1250901: Relaunch applications using NSTask instead of posix_spawnp on OSX. r=mstange
Sorry had to back out this for Mochitest M(oth) failures on OS X, e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=34940102&repo=mozilla-inbound#L15759
Flags: needinfo?(spohl.mozilla.bugs)
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/00ec82c217ff
Backed out changeset 4a42968b790d for Mochitest M(oth) failures on OS X
Attached patch Replace waitpid (obsolete) — Splinter Review
Turns out that waitpid can't be used with a process identifier returned by an NSTask.

Robert, I wanted to run this change by you. As far as I can tell, the only time we pass a pid argument to LaunchChildMac is when we want to wait for the process to quit, so I've rewritten this to wait in LaunchChildMac directly instead of ProcessHasTerminated. Doing this in LaunchChildMac is much easier when using NSTask than trying to keep this logic in ProcessHasTerminated. ProcessHasTerminated is executed on a separate thread, but since LaunchChildMac is already executing on a secondary thread, I couldn't see a benefit of keeping it in ProcessHasTerminated instead of moving it to LaunchChildMac, especially since it would involve a lot more code.

Markus, if you could have a look at the Obj-C side of things, I would appreciate it.
Flags: needinfo?(spohl.mozilla.bugs)
Attachment #8786443 - Flags: review?(robert.strong.bugs)
Attachment #8786443 - Flags: review?(mstange)
Comment on attachment 8786443 [details] [diff] [review]
Replace waitpid

Looks fine to me but I'm not familiar with obj-c enough to give r+ so giving f+. The review from Marcus is enough for me and thanks!
Attachment #8786443 - Flags: review?(robert.strong.bugs) → feedback+
Comment on attachment 8786443 [details] [diff] [review]
Replace waitpid

Review of attachment 8786443 [details] [diff] [review]:
-----------------------------------------------------------------

This is really confusing. Can you refactor it to make it clearer? I don't like the implicit API of "if you want to know the pid, then we wait until the task has stopped, so that we can give you a pid for a process that is no longer running".
Either we need a function named LaunchChildMacAndWaitForCompletion and somehow shuffle around the caller, or you need to find a way to return the NSTask itself (or some opaque wrapper handle) from LaunchChildMac.
It also looks like waitUntilExit spawns a nested event loop. This may create problems if some of the callers in the callstack are not re-entrant. Could you instead do something like the Linux code, where you sleep(1) and then check [task isRunning]?
(In reply to Markus Stange [:mstange] from comment #22)
> Comment on attachment 8786443 [details] [diff] [review]
> Replace waitpid
> 
> Review of attachment 8786443 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is really confusing. Can you refactor it to make it clearer? I don't
> like the implicit API of "if you want to know the pid, then we wait until
> the task has stopped, so that we can give you a pid for a process that is no
> longer running".

It actually goes a bit in the reverse: "If you request that we tell you the pid of the launched process, you always want to wait for the process to terminate anyway, so we're doing this for you in LaunchChildMac even though you used to handle 'wait-for-termination' by yourself in nsUpdateProcessor::StartStagedUpdate via nsUpdateProcessor::WaitForProcess." We still return the pid because this info is also used in nsUpdateProcessor::StartStagedUpdate[1] to tell whether or not we were successful in launching the process.

So yes, I could refactor nsUpdateProcessor::StartStagedUpdate and special-case OSX. Do you just want me to return in the |if (mUpdaterPID)| block if we're on OSX?

> Either we need a function named LaunchChildMacAndWaitForCompletion and
> somehow shuffle around the caller, or you need to find a way to return the
> NSTask itself (or some opaque wrapper handle) from LaunchChildMac.
> It also looks like waitUntilExit spawns a nested event loop. This may create
> problems if some of the callers in the callstack are not re-entrant. Could
> you instead do something like the Linux code, where you sleep(1) and then
> check [task isRunning]?

I tried this by crudely returning a void* to the NSTask and then passing it back to a new obj-c function called WaitForTaskCompletion. But I couldn't get it to work. [task isRunning] always returned true (confirmed via lldb) even though the NSTask had clearly terminated. I failed to figure out if this was because we spawn a new thread and calling methods on an NSTask across threads isn't allowed or what the issue might be. I stopped investigating this when I realized that the caller only ever requests a pid if he wants to wait for the process to complete anyway and I changed LaunchChildMac to do so directly.

Another option is to register an observer to be notified when the process terminates[2], but I couldn't think of a clean way to do this because of our dance between cross-platform and obj-c code.

While I was working on this, I also realized that Linux might be susceptible to hangs should waitpid ever fail to properly tell us when a process has terminated. We just changed that code recently and I assumed that waitpid wouldn't be susceptible to such problems, but obviously passing it a pid returned via [task processIdentifier] for example could get it into this state. So I would prefer avoiding this approach on OSX.

[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/xre/nsUpdateDriver.cpp#1260
[2] https://developer.apple.com/library/mac/technotes/tn2050/_index.html#//apple_ref/doc/uid/DTS10003081-CH1-SUBSECTION3
Flags: needinfo?(mstange)
(In reply to Stephen A Pohl [:spohl] from comment #23)
> We still return the pid because this info is also used in
> nsUpdateProcessor::StartStagedUpdate[1] to tell whether or not we were
> successful in launching the process.

Ah, I see, that seems more reasonable then.

> So yes, I could refactor nsUpdateProcessor::StartStagedUpdate and
> special-case OSX. Do you just want me to return in the |if (mUpdaterPID)|
> block if we're on OSX?

Hmm, no... actually, I can't come up with a nice way to rewrite it at the moment. Let's just keep what you have, but please add a comment to the LaunchChildMac definition that describes the pid argument's behavior.

> I tried this by crudely returning a void* to the NSTask and then passing it
> back to a new obj-c function called WaitForTaskCompletion.

Did you retain the object? Otherwise it would probable have been freed when the MacAutoreleasePool went out of scope. Anyway, it doesn't really matter now.
Flags: needinfo?(mstange)
Attachment #8786443 - Flags: review?(mstange) → review+
Attached patch Replace waitpidSplinter Review
Thanks, Markus! Added comment to the definition of LaunchChildMac. Carrying over r+ and f+.
Attachment #8786443 - Attachment is obsolete: true
Attachment #8786984 - Flags: review+
Attachment #8786984 - Flags: feedback+
https://hg.mozilla.org/integration/mozilla-inbound/rev/4f4cd84b3d6585988640cbbfa8d0f7528f827d0b
Bug 1250901: Relaunch applications using NSTask instead of posix_spawnp on OSX. r=mstange

https://hg.mozilla.org/integration/mozilla-inbound/rev/42265eb434e4581ba61d5aaa8ca72327f8947a61
Bug 1250901: Replace waitpid with NSTask's waitUntilExit. r=mstange f=rstrong
https://hg.mozilla.org/mozilla-central/rev/4f4cd84b3d65
https://hg.mozilla.org/mozilla-central/rev/42265eb434e4
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
It would be great to hear if anyone is still seeing this with recent versions of nightly.
I haven't seen it recently, but I was only hitting it about once a week.
You need to log in before you can comment on or make changes to this bug.