Closed Bug 1329528 Opened 8 years ago Closed 8 years ago

OSError: [Errno 1] Operation not permitted exception when killing a zombie process.

Tracking

(firefox52 wontfix, firefox-esr52 wontfix, firefox53 fixed, firefox54 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla54

Tracking Flags:

Tracking

Status

firefox52

---

wontfix

firefox-esr52

---

wontfix

firefox53

---

fixed

firefox54

---

fixed

People

(Reporter: gw, Assigned: sgiles)

References

Details

Attachments

(1 file)

Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM; 8 years ago sgiles 59 bytes, text/x-review-board-request	ahal : review+	Details

Glenn Watson [:gw]

Reporter

Description

•

8 years ago

On OSX, calling the kill() method on the Process class can result in an unhandled OSError exception.

This occurs when the process in question has exited and is in the zombie state.

An example stack trace from [1] is:

Tests with unexpected results:
  ▶ ERROR [expected OK] /html/browsers/windows/noreferrer.html
  └   → Traceback (most recent call last):
  File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/tests/wpt/harness/wptrunner/executors/base.py", line 149, in run_test
    result = self.do_test(test)
  File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/tests/wpt/harness/wptrunner/executors/executorservo.py", line 143, in do_test
    self.proc.kill()
  File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/python/_virtualenv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 766, in kill
    self.proc.kill(sig=sig)
  File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/python/_virtualenv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 172, in kill
    send_sig(signal.SIGTERM)
  File "/Users/servo/buildbot/slave/mac-rel-wpt1/build/python/_virtualenv/lib/python2.7/site-packages/mozprocess/processhandler.py", line 159, in send_sig
    os.killpg(pid, sig)
OSError: [Errno 1] Operation not permitted

When a process (group) is in zombie state, it will remain that way until waitpid() is called on it. I tested catching the exception when os.killpg() is called, and then calling os.waitpid(pid). This appears to fix the problem locally for my test case, but I'm not sure if this is the correct fix. 

[1] https://github.com/servo/servo/pull/14818

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 1

•

8 years ago

This could be the reason for various test failures we see in Marionette restart tests.

Blocks: 1336177, 1276220, 1305435

sgiles

Assignee

Updated

•

8 years ago

Assignee: nobody → sgiles

sgiles

Assignee

Comment 2

•

8 years ago

There's a reproducable test case here (including the fix): https://github.com/servo/servo/pull/15579#issuecomment-280215792

Will add a patch for mozprocess tomorrow.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/112940/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/112940/

sgiles

Assignee

Comment 4

•

8 years ago

It would've been great to get a test case for this, but Python multiprocessing is too good at making sure you don't end up with Zombies.. :P

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 5

•

8 years ago

mozreview-review

Comment on attachment 8837939 [details]
Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM;

https://reviewboard.mozilla.org/r/112940/#review114466

::: testing/mozbase/mozprocess/mozprocess/processhandler.py:169
(Diff revision 1)
> +                            # before continuing
> +                            # Note: A negative pid refers to the entire process
> +                            # group
> +                            if retries < 1 and getattr(e, "errno", None) == errno.EPERM:
> +                                try:
> +                                    os.waitpid(-pid, 0)

What will happen if we do this call by default before sending killpg()? Would that cause a freeze?

I just wonder if we could get rid of all the extra code the patch will add.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 6

•

8 years ago

(In reply to sgiles from comment #4)
> It would've been great to get a test case for this, but Python
> multiprocessing is too good at making sure you don't end up with Zombies.. :P

We have at least a forking proc written in C in the tests folder of mozprocess. Maybe you could add just another one?

Andrew Halberstadt [:ahal]

Comment 7

•

8 years ago

mozreview-review

Comment on attachment 8837939 [details]
Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM;

https://reviewboard.mozilla.org/r/112940/#review114520

Thanks for digging into this! Looks reasonable to me

Attachment #8837939 - Flags: review?(ahalberstadt) → review+

sgiles

Assignee

Comment 8

•

8 years ago

mozreview-review-reply

Comment on attachment 8837939 [details]
Bug 1329528 - Reap zombie processes on Mac OS if killing the process group initially fails with EPERM;

https://reviewboard.mozilla.org/r/112940/#review114466

> What will happen if we do this call by default before sending killpg()? Would that cause a freeze?
> 
> I just wonder if we could get rid of all the extra code the patch will add.

Yep, we need to send the kill signal first, otherwise we'll end up waiting on non-zombie processes and potentially hanging.

Pulsebot

Comment 9

•

8 years ago

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2b4830caedd4
Reap zombie processes on Mac OS if killing the process group initially fails with EPERM; r=ahal

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 10

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2b4830caedd4

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox54: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 11

•

8 years ago

This issue also affects branches down to current beta (52.0). We might want to wait some days but then request an uplift of the patch. It would be great to see it also fixed in the next ESR release.

status-firefox52: --- → affected

status-firefox53: --- → affected

status-firefox-esr52: --- → affected

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 12

•

7 years ago

bugherder uplift

No known fallout I'm aware of so far. Going to give this a go on Aurora for a bit before deciding on Beta.
https://hg.mozilla.org/releases/mozilla-aurora/rev/63baf28e129e

status-firefox53: affected → fixed

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 13

•

7 years ago

Bah, this needs to be rebased around bug 1309060 if we want to uplift this to 52.

Flags: needinfo?(sgiles)

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 14

•

7 years ago

Would still consider a rebased patch for ESR52 if it's practical to do so, but it's too late for Fx52 at this point.

status-firefox52: affected → wontfix

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

7 years ago

Blocks: 1299173

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

6 years ago

status-firefox-esr52: affected → wontfix

Flags: needinfo?(sgiles)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

OSError: [Errno 1] Operation not permitted exception when killing a zombie process.

Categories

(Testing :: Mozbase, defect)

Tracking

(firefox52 wontfix, firefox-esr52 wontfix, firefox53 fixed, firefox54 fixed)

People

(Reporter: gw, Assigned: sgiles)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Updated

Updated

Attachment

General

Description

File Name

Content Type