Closed
Bug 1329528
Opened 8 years ago
Closed 8 years ago
OSError: [Errno 1] Operation not permitted exception when killing a zombie process.
Categories
(Testing :: Mozbase, defect)
Tracking
(firefox52 wontfix, firefox-esr52 wontfix, firefox53 fixed, firefox54 fixed)
RESOLVED
FIXED
mozilla54
People
(Reporter: gw, Assigned: sgiles)
References
Details
Attachments
(1 file)
On OSX, calling the kill() method on the Process class can result in an unhandled OSError exception. This occurs when the process in question has exited and is in the zombie state. An example stack trace from [1] is: Tests with unexpected results: ▶ ERROR [expected OK] /html/browsers/windows/noreferrer.html └ → Traceback (most recent call last): File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/tests/wpt/harness/wptrunner/executors/base.py", line 149, in run_test result = self.do_test(test) File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/tests/wpt/harness/wptrunner/executors/executorservo.py", line 143, in do_test self.proc.kill() File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/python/_virtualenv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 766, in kill self.proc.kill(sig=sig) File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/python/_virtualenv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 172, in kill send_sig(signal.SIGTERM) File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/python/_virtualenv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 159, in send_sig os.killpg(pid, sig) OSError: [Errno 1] Operation not permitted When a process (group) is in zombie state, it will remain that way until waitpid() is called on it. I tested catching the exception when os.killpg() is called, and then calling os.waitpid(pid). This appears to fix the problem locally for my test case, but I'm not sure if this is the correct fix. [1] https://github.com/servo/servo/pull/14818
Comment 1•8 years ago
|
||
This could be the reason for various test failures we see in Marionette restart tests.
There's a reproducable test case here (including the fix): https://github.com/servo/servo/pull/15579#issuecomment-280215792 Will add a patch for mozprocess tomorrow.
Comment hidden (mozreview-request) |
It would've been great to get a test case for this, but Python multiprocessing is too good at making sure you don't end up with Zombies.. :P
Comment 5•8 years ago
|
||
mozreview-review |
Comment on attachment 8837939 [details] Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM; https://reviewboard.mozilla.org/r/112940/#review114466 ::: testing/mozbase/mozprocess/mozprocess/processhandler.py:169 (Diff revision 1) > + # before continuing > + # Note: A negative pid refers to the entire process > + # group > + if retries < 1 and getattr(e, "errno", None) == errno.EPERM: > + try: > + os.waitpid(-pid, 0) What will happen if we do this call by default before sending killpg()? Would that cause a freeze? I just wonder if we could get rid of all the extra code the patch will add.
Comment 6•8 years ago
|
||
(In reply to sgiles from comment #4) > It would've been great to get a test case for this, but Python > multiprocessing is too good at making sure you don't end up with Zombies.. :P We have at least a forking proc written in C in the tests folder of mozprocess. Maybe you could add just another one?
Comment 7•8 years ago
|
||
mozreview-review |
Comment on attachment 8837939 [details] Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM; https://reviewboard.mozilla.org/r/112940/#review114520 Thanks for digging into this! Looks reasonable to me
Attachment #8837939 -
Flags: review?(ahalberstadt) → review+
Comment on attachment 8837939 [details] Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM; https://reviewboard.mozilla.org/r/112940/#review114466 > What will happen if we do this call by default before sending killpg()? Would that cause a freeze? > > I just wonder if we could get rid of all the extra code the patch will add. Yep, we need to send the kill signal first, otherwise we'll end up waiting on non-zombie processes and potentially hanging.
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2b4830caedd4 Reap zombie processes on Mac OS if killing the process group initially fails with EPERM; r=ahal
Comment 10•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2b4830caedd4
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment 11•8 years ago
|
||
This issue also affects branches down to current beta (52.0). We might want to wait some days but then request an uplift of the patch. It would be great to see it also fixed in the next ESR release.
status-firefox52:
--- → affected
status-firefox53:
--- → affected
status-firefox-esr52:
--- → affected
Comment 12•7 years ago
|
||
bugherder uplift |
No known fallout I'm aware of so far. Going to give this a go on Aurora for a bit before deciding on Beta. https://hg.mozilla.org/releases/mozilla-aurora/rev/63baf28e129e
Comment 13•7 years ago
|
||
Bah, this needs to be rebased around bug 1309060 if we want to uplift this to 52.
Flags: needinfo?(sgiles)
Comment 14•7 years ago
|
||
Would still consider a rebased patch for ESR52 if it's practical to do so, but it's too late for Fx52 at this point.
Updated•6 years ago
|
Flags: needinfo?(sgiles)
You need to log in
before you can comment on or make changes to this bug.
Description
•