https://treeherder.mozilla.org/logviewer.html#?job_id=21468923&repo=mozilla-inbound https://treeherder.mozilla.org/logviewer.html#?job_id=21457206&repo=mozilla-inbound https://treeherder.mozilla.org/logviewer.html#?job_id=21415200&repo=mozilla-inbound Sounds similar to bug 603147, but that got resolved worksforme three years ago...
Can we just inject crashes into these zombies to see what they're doing?
We try to do that already: https://dxr.mozilla.org/mozilla-central/rev/6ea654cad929c9bedd8a4161a182b6189fbeae6a/testing/mochitest/runtests.py#1833 I'm not sure why this isn't working, I don't see any interesting errors in the log.
The log that allows us to do this check was just fixed in bug 1193861.
Yeah, these are all Windows, mostly in e10s tests.
Component: Plug-ins → General
This just failed on https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&selectedJob=899074 , e10s shutdown hang I guess.
Unfortunately, bug 1262898 didn't help with this one.
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
can see that happening in; Version 52.0a1 Build ID 20160919065232 User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 OS Windows_NT 10.0 I'm not exactly sure how to replicate it, so I can't be sure if disabling extensions solve the issue or not. It kinda happens at random. However I do noticed that whenever I try to run Firefox the first time after a boot, it just spawns a single process in task manager which gets stuck at 2.5~ mb ram usage and stays like that forever, until I force close and re-open it. Not sure if it has any relation to above issue. Also trying to force close that process with either process name or pid via taskkill in Windows results in couldn't find process error. Not sure which one of those topics are active so also updated here: https://bugzilla.mozilla.org/show_bug.cgi?id=959544
Here are a couple screenshots. The issue just happened again. Task Manager Process using 2.6mb ram without any progress: http://i.imgur.com/tYvP7w8.png http://i.imgur.com/T2OR9pR.png Every time i clicked the Firefox icon to open it it spawned another zombie process similar to above: http://i.imgur.com/iEnT3ba.png When i tried to kill the process via command line with taskkill http://i.imgur.com/SWeWTm9.png ot sure if I should I open a new bug for this or is this related enough for this one. Let me know.
Erdin, this bug is about a specific failure that occurs within our automated test harness. Please file a new bug for the issue you're finding so that it can be properly investigated. Thanks!
(In reply to Ryan VanderMeulen [:RyanVM] from comment #40) > Erdin, this bug is about a specific failure that occurs within our automated > test harness. Please file a new bug for the issue you're finding so that it > can be properly investigated. Thanks! Sorry about that, submitted here: https://bugzilla.mozilla.org/enter_bug.cgi#h=bugForm%7CFirefox
Whiteboard: [stockwell needswork]
Flags: needinfo?(sdeckelmann) → needinfo?(nfroyd)
Windows documentation indicates that it's invalid to WaitForSingleObject on a process handle unless you request the SYNCHRONIZE access right. And indeed, we see errors in the logs like: 09:58:28 WARNING - mozcrash kill_pid(): wait failed (-1) terminating pid 6340: error 5 That "error 5" is an ERROR_ACCESS_DENIED code. Such errors should go away with requesting the proper access rights. Credit to dmajor for noticing the discrepancy.
Attachment #9039792 - Flags: review?(gbrown)
We're seeing a lot of cases where our "check for zombie child processes" check is finding live processes, but the minidumps that we get from such processes are nonsense, and don't even feature Firefox symbols. The working theory at this point, courtesy of bobowen, is that child processes that we launch are getting closed during the test runs, completely normally, and then we are finding other (non-Firefox) live processes with the PIDs that were used for Firefox child processes at the end of the test run. This scenario is plausible due to Windows's aggressive reuse of PIDs. We don't see the same behavior on our Unix test machines because Linux (and OS X, apparently) are not nearly as aggressive in reusing PIDs. Since we should be ensuring that any live processes are actually Firefox processes anyway, let's add the appropriate check. If the check works to reduce the incidence of zombiecheck failures, that's great! If not, we've at least made our test runner more robust and can investigate other possibilities for these intermittent failures.
Attachment #9039794 - Flags: review?(gbrown)
Comment on attachment 9039792 [details] [diff] [review] part 1 - request synchronization privileges in kill_pid Review of attachment 9039792 [details] [diff] [review]: ----------------------------------------------------------------- Aha! So glad you/dmajor found this - thanks!
Attachment #9039792 - Flags: review?(gbrown) → review+
Comment on attachment 9039794 [details] [diff] [review] part 2 - be more stringent in checking for live pids Review of attachment 9039794 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mochitest/runtests.py @@ +385,5 @@ > + if namelen == 0: > + # Still an active process, so conservatively assume it's Firefox. > + return True > + > + return pName.value.endswith(('firefox.exe', 'plugin-container.exe')) I'm sure you know more definitively that I do: Are those the only possibilities, for Windows firefox processes?
Attachment #9039794 - Flags: review?(gbrown) → review+
Assignee: nobody → nfroyd
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/072ba80a8f3e part 1 - request synchronization privileges in kill_pid; r=gbrown https://hg.mozilla.org/integration/mozilla-inbound/rev/9fb802aba6b9 part 2 - be more stringent in checking for live pids; r=gbrown
Depends on: 1525152
You need to log in before you can comment on or make changes to this bug.