925408 - [mozprocess] mozprocess should have the ability to kill a specific process

Reporter

Description

•

11 years ago

in porting talos I am stuck on the ability to kill a specific process.  While this might not be the normal case, it is the case when the browser is hung, or a secondary process is launched unrelated to the core mozprocess.  

I found the pid.py file which has utilities for getting a pid based on process name.  killing a process based on pid or name would fit right into that.

Jeff Hammel

Comment 1

•

11 years ago

I'm +1 for adding this functionality to pid.py . While there has been historically some sentiment as Joel echos that we shouldn't have to do this, but in practice we do do this, so I'd rather have this code in mozbase in one place, and mozprocess.pid seems as good a place as any.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

11 years ago

here is an example on linux:
http://hg.mozilla.org/build/talos/file/473c3513c97e/talos/ffprocess_linux.py#l50

and osx:
http://hg.mozilla.org/build/talos/file/473c3513c97e/talos/ffprocess_mac.py#l62

and windows:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/wpk.py#L50 (already in there)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

11 years ago

Blocks: 923770

Jeff Hammel

Updated

•

11 years ago

Summary: mozprocess should have the ability to kill a specific process → [mozprocess] mozprocess should have the ability to kill a specific process

Whiteboard: [good first bug]

Ryan

Comment 3

•

11 years ago

Hello,

I've been looking at this for a little while and I seem to be snagged on a little detail. I'm not sure how to check if the process given its pid is currently running. Do you think you could point me in the right direction?

Thanks!
Ryan

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

11 years ago

the above references for linux and osx reference is_running:
http://hg.mozilla.org/build/talos/file/tip/talos/utils.py#l118

this in turn references mozprocess:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/pid.py#L21

does that help out?

Ryan

Comment 5

•

11 years ago

Unfortunately not :/
I'm having a tough time understanding what ps() is doing and how to get a process from it

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

11 years ago

here is an example of how to use mozprocess.pid.ps():
from mozprocess import pid as mozpid
pids = mozpid.ps('axwww')
for pid in pids:
    if pid['COMMAND'].find('firefox') > 0:
        print pid['PID']


While this is a simple example, it should help you figure out how to use mozpid.  Basically we collect the output from 'ps <args>' and put it in an object:
{'COMMAND': '/usr/lib/firefox-trunk/firefox-trunk', 'TTY': '?', 'PID': '8462', 'STAT': 'Sl', 'TIME': '1:17'}

now we can find a matching process and get the pid without too much trouble.

Ryan

Comment 7

•

10 years ago

Okay, thanks. I think I now see what I should be doing, I'll attach what I think it should be.

Ryan

Comment 8

•

10 years ago

Attached patch proposed solution (obsolete) — Details — Splinter Review

Please let me know if I'm on the right track, or, if not, let me know where I went wrong or should be going! Thanks a lot!

Attachment #8355728 - Flags: feedback?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

10 years ago

Attachment #8355728 - Attachment is patch: true

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 9

•

10 years ago

Comment on attachment 8355728 [details] [diff] [review]
proposed solution

Review of attachment 8355728 [details] [diff] [review]:
-----------------------------------------------------------------

This is a good start, we need to support osx, linux, and windows (already done with wpk.kill_pid(pid))

You should be able to determine the os via:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/pid.py#L14

that could sort out the psargs and the signals to iterate through.

Thanks for working through this, keep up the good work!

::: mozprocess/mozprocess/pid.py
@@ +86,5 @@
> +       Args:
> +               pid: integer process id of the process to terminate
> +       """
> +       ret = ''
> +    pids = ps()

I suspect this won't get us the pids, we probably need the args='axwww' for linux and '-Acj' for osx.

here is how we do it for osx:
http://hg.mozilla.org/build/talos/file/tip/talos/ffprocess_mac.py#l22

@@ +97,5 @@
> +                    ret = "killed with %s" % sig
> +    except OSError, (errno, strerror):
> +        print "WARNING: failed os.kill: %s : %s" % (errno, strerror)
> +    return ret
> +

minor detail: we do 4 space indentation in the files, no tabs just spaces

Attachment #8355728 - Flags: feedback?(jmaher) → feedback-

Ryan

Comment 10

•

10 years ago

Thanks! I understand why I would need to check the OS for the signals I should be iterating through, but since the function ps() has a default argument of psarg would I need to pass it anything? It looks to me like the correct argument is passed to the function by default and in following the DRY principle then terminate_process function shouldn't check that again. Thanks for your help!

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 11

•

10 years ago

we would need to assure that the psargs defined at the top of pid.py will be used.  We could defined the signals up there as well.

Ryan

Comment 12

•

10 years ago

Attached patch proposed solution (obsolete) — Details — Splinter Review

Okay, thanks for clearing that up. I just fixed the things you pointed out, please let me know if there's anything you see that needs to be looked at again with this one. 

After the patch is finished should I add some tests to mozprocess/tests/?

Attachment #8355728 - Attachment is obsolete: true

Attachment #8356182 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

10 years ago

Comment on attachment 8356182 [details] [diff] [review]
proposed solution

Review of attachment 8356182 [details] [diff] [review]:
-----------------------------------------------------------------

really close, I think this is all that is left for making this solid.  After this, we really need some tests as you mentioned.

::: mozprocess/mozprocess/pid.py
@@ +80,4 @@
>      else:
>          return [pid for pid,_ in running_processes(name)]
>  
> +def terminate_process(pid, timeout):

please make timeout optional, with a default of 2 seconds.

@@ +88,5 @@
> +    """
> +    ret = ''
> +    pids = ps(psarg)
> +    if mozinfo.isMac:
> +        signals = ('SIGTERM', 'SIGKILL') 

please remove the trailing whitespace at the end of all lines, you can see them all in bugzilla review as pink characters :)

@@ +97,5 @@
> +            for p in pids:
> +                if p['PID'] == pid:
> +                    os.kill(pid, getattr(signal, sig))
> +                    time.sleep(timeout)
> +                    ret = "killed with %s" % sig

why do we iterate through the pids?  Could we just verify the pid is in the list and if so, then try to kill it with each signal?  That should reduce the double for loop.

Attachment #8356182 - Flags: review?(jmaher) → review-

Ryan

Comment 14

•

10 years ago

Attached patch Proposed Solution (obsolete) — Details — Splinter Review

Thanks for the help! Sorry it's taken me a while, I've been a little busy. How does this look?

Attachment #8356182 - Attachment is obsolete: true

Attachment #8358629 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 15

•

10 years ago

Comment on attachment 8358629 [details] [diff] [review]
Proposed Solution

Review of attachment 8358629 [details] [diff] [review]:
-----------------------------------------------------------------

this is really close, your code is clean, just one small logic flaw.

::: mozprocess/mozprocess/pid.py
@@ +98,5 @@
> +                for sig in signals:
> +                    os.kill(pid, getattr(signal, sig))
> +                    time.sleep(timeout)
> +                    ret = "killed with %s" % sig
> +                    return ret

right now we loop through the signals, we need to check if the process is running prior to sending each signal.  Also notice the return is outside the loop.

for sig in signals:
    if not isrunning(pid):
        break

    os.kill(pid, getattr(signal, sig))
    time.sleep(timeout)
    ret = "killed with %s" % sig

return ret

Attachment #8358629 - Flags: review?(jmaher) → review-

Ryan

Comment 16

•

10 years ago

Thanks for the review
So my problem right now is that I'm not sure how to check if a process is running given its pid. I know I could use the function running_processes() to get the list of processes that are running or even slim it down using get_pids() to get the pids of the running processes. However, I don't know the name of the process I am trying to kill and I don't know how to get the name. Would my local list "pids = ps(psarg)" not necessarily be the running processes?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 17

•

10 years ago

good question, if you look at code that does this in the talos framework, we check if a process is running via:
http://hg.mozilla.org/build/talos/file/tip/talos/utils.py#l116

Given what we have, in terminate_process, we should be able to do that.

Ryan

Comment 18

•

10 years ago

Ah, okay. So maybe I could just do that from the outset when I initialize the pids list by saying:
pids = [p for p in ps(psarg) if pid == int(p['PID'])]. Then check if the list is empty, if not I know that the list contains the process I want to terminate and wouldn't have to check later if the process is running. Do you think that would work?
Then this raises the question: are all PIDs unique for each process that would be running? (this is just to clarify for testing purposes later on)

Ryan

Comment 19

•

10 years ago

Also, would you mind going ahead and marking me in the 'assigned to' field?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

10 years ago

Assignee: nobody → ryan.da.baker

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 20

•

10 years ago

as far as I know all pids will be unique on all systems we need to run this code on.  With that said, we just need to verify before sending os.kill() that the process is in the *current* process list.

When this gets in, I can remove a lot of mostly duplicated code, exciting times!

Ryan

Comment 21

•

10 years ago

Ah, I see what you mean. So instead of just initializing the "pids" list once, I should reinitialize it every time I try to os.kill it with a certain process to see if it has been killed yet. I'll upload what I have.

Ryan

Comment 22

•

10 years ago

Attached patch Proposed Solution (obsolete) — Details — Splinter Review

I think this seems right. Please let me know!

Attachment #8358629 - Attachment is obsolete: true

Attachment #8359253 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 23

•

10 years ago

Comment on attachment 8359253 [details] [diff] [review]
Proposed Solution

Review of attachment 8359253 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +99,5 @@
> +                    break
> +                os.kill(pid, getattr(signal, sig))
> +                time.sleep(timeout)
> +                ret = "killed with %s" % sig
> +                pids = [p for p in ps(psarg) if pid == int(p['PID'])]

I still don't like this.  We are given a single pid and want to kill it.

we should:
* verify pid is running
* if running, try to kill with signals
** os.kill/wait
** get list of pids and if pid is not in list break

going this route, we have a single for loop, and it is simpler.

really the check could be:
if not [p for p in ps(psarg) if pid == int(p['PID'])]:
    break

Attachment #8359253 - Flags: review?(jmaher) → review-

Ryan

Comment 24

•

10 years ago

Attached patch Proposed Solution (obsolete) — Details — Splinter Review

Thanks for the feedback. As I was refactoring to make it correct I didn't give enough thought to how I could make it more efficient. I got rid of unnecessary lines and loops and came down to this process:
- check if the pid is part of a running process
- try to kill it with each signal
-- os kill/wait
-- make sure process given its pid was properly terminated

Attachment #8359253 - Attachment is obsolete: true

Attachment #8359334 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 25

•

10 years ago

Comment on attachment 8359334 [details] [diff] [review]
Proposed Solution

Review of attachment 8359334 [details] [diff] [review]:
-----------------------------------------------------------------

great, thanks!  Now for the second half- lets write a couple tests for this.

Attachment #8359334 - Flags: review?(jmaher) → review+

Ryan

Comment 26

•

10 years ago

Okay great! So here are some ideas for test cases that I'm going to write, tell me if you see any that are not necessary or can think of any that should be added.
- pid for a non-running process
- a process to kill for each signal (including SIGABRT for Linux)
    |- would this be different for each OS as well? Should that be tested do you think?
- a process that cannot be killed for whatever reason, and instead throws an OSError
- three test which will kill a process correctly but have different values for timeout
  including: default value, 0 value, large value

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 27

•

10 years ago

I would send in a None value for the pid, likewise the timeout.

If we could make a zombie process, then we could probably get to the error condition where the signals do not terminate the process.  I wouldn't go overboard on defining different process conditions and types, especially for the different OS types.

I look forward to your patch with some tests.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 28

•

10 years ago

In mozmill which is based on mozrunner and mozprocess we actually see zombie processes of Firefox and it's something I'm currently investigating and trying to fix. See bug 959551. Not sure how this would help but what I see is, when Firefox gets restarted a new process with the same pid gets started and another one still lays around until Firefox gets completely closed. Not sure if that would help you for a test, but can't we create a zombie by not using process wait()?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 29

•

10 years ago

Ryan- following up to see if you have had a chance to work on some of the unittests?

Ryan

Comment 30

•

10 years ago

Hi Joel,
Sorry it's taken me a while, school started again and I got stuck and became very busy. I'll attach what I have started but I feel I don't know how to go about doing this (setting up a ghost process, etc.)

Ryan

Comment 31

•

10 years ago

Attached patch first test case attempt (obsolete) — Details — Splinter Review

Attachment #8365112 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

10 years ago

Attachment #8365112 - Attachment is patch: true

Attachment #8365112 - Attachment mime type: text/x-python-script → text/plain

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 32

•

10 years ago

Comment on attachment 8365112 [details] [diff] [review]
first test case attempt

>#!/usr/bin/env python
>
>import os
>import time
>import unittest
>import proctest
>from mozprocess import pid
>
>class TestKillProcess(proctest.ProcTest):
>    """ Class to test various processes to kill """
>
>    def test_not_running_kill(self):
>        """try to kill a non running process"""
>        process_id = -1
>        should_return = ""
>        ret = pid.terminate_process(process_id)
>        self.assertEqual(ret, should_return)
this is looking good

>
>    def test_cannot_kill(self):
>        """fail to kill a process"""
>        # make and try to kill an unkillable process
good placeholder

>
>    def test_none_pid(self):
>        """passing in None for a pid"""
>        ret = pid.terminate_process(None)
>        # what should I expect it to return?
can you test this? I assume it will return ''

>
>    def test_none_timeout(self):
>        """passing in None for a timeout"""
>        ret = pid.terminate_process(4, None)
>        
another great start


Not going to mark it as +/-, I think if we can get asserts in here for all tests this is r+.  Now to figure out how to create a process that cannot be killed.

Attachment #8365112 - Flags: review?(jmaher)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 33

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #32)
> >    def test_cannot_kill(self):
> >        """fail to kill a process"""
> >        # make and try to kill an unkillable process
> good placeholder

If this is not part of your work on this bug and it will be checked-in that way, it would be better to mark this method as skipped via unitest.skip(), and file a follow-up bug which can be referenced in the skip comment.

For an example how I did this for mozrunner tests:
https://github.com/mozilla/mozbase/blob/master/mozrunner/tests/test_start.py#L33

Ryan

Comment 34

•

10 years ago

Attached patch second test case attempt (obsolete) — Details — Splinter Review

Here is the progress I have made so far. As I mentioned, progress is very slow due to school work and other obligations, sorry about that. I replaced a couple placeholders but am unsure of two things:
- in test_cannot_kill I am very unsure of how to make a ghost process
- in test_none_timeout I am unsure what the pid input should be (if anything), any suggestions for this?

Attachment #8365112 - Attachment is obsolete: true

Attachment #8369768 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 35

•

10 years ago

Comment on attachment 8369768 [details] [diff] [review]
second test case attempt


>
>    def test_none_timeout(self):
>        """passing in None for a timeout"""
>        ret = pid.terminate_process(4, None) #some arbitrary process
>        should_not_return = ''
>        self.assertNotEqual(ret, should_not_return) 

we should be able to launch a program here, how about we add a support file which is basically a python script:
while 1:
  sleep 1

we could launch that with mozprocess and get the pid, then we have a pid.

as for a zombie process, lets do what :whimboo suggested in comment 33 and use unitest.skip() and file a bug as a work item to write that test case.  This way we can get this functionality landed.

Attachment #8369768 - Flags: review?(jmaher) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 36

•

10 years ago

Shouldn't we make this method a class method? That way we could call terminate_process even without a pid instance.

Ryan

Comment 37

•

10 years ago

Okay I'll do what you suggest, then. In setting up a helper file (the python script) I assume creating a script (e.g. helper.py) in the tests/ directory is sufficient? If so, how do I launch such a process using mozprocess? Is filing for a bug as simple as putting 'pass' in the function body with a comment that it should be filed as a bug?

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 38

•

10 years ago

There is no need for an additional Python module here. Just mark the above mentioned method as a class method. We have a couple of other instances in mozbase already. So please have a look at this link:

https://github.com/mozilla/mozbase/search?q=classmethod&source=c

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 39

•

10 years ago

I think proclaunch:
https://github.com/mozilla/mozbase/blob/master/mozprocess/tests/proclaunch.c

will be good here.  There are examples of using it in the source- I think we are really close.  We can always file a bug and put a comment in for a test case that we need to write but don't have the time, skills, or other pre-requisites for.

Ryan

Comment 40

•

10 years ago

Henrik, if I made the method into a class method would we have access to a pid at all? As far as I understand we would only have access to a pid if the class method was for a process class (instead of a ProcTest class, like it is now). 

Joel, I've been looking through the code for examples on launching a process with proclaunch but am still unsure as to how exactly to do that.

While this is a good discussion I feel we've deviated from the main focus of this bug and should file another bug which would be to finish the test cases so we can get this bug resolved and move the discussion of using proctest vs. a classmethod to a more appropriate bug. I'll upload what I have so far and hopefully we can move forward from there.

Ryan

Comment 41

•

10 years ago

Attached patch Proposed test cases (obsolete) — Details — Splinter Review

Attachment #8369768 - Attachment is obsolete: true

Attachment #8373483 - Flags: review?(jmaher)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 42

•

10 years ago

(In reply to Ryan from comment #40)
> Henrik, if I made the method into a class method would we have access to a
> pid at all? As far as I understand we would only have access to a pid if the
> class method was for a process class (instead of a ProcTest class, like it
> is now). 

Ups, sorry. I should have actually taken a full look at the code before mentioning that. I did that now and I have seen that this module only holds methods, that's all. So we are indeed fine and do not have to change something. 

> While this is a good discussion I feel we've deviated from the main focus of
> this bug and should file another bug which would be to finish the test cases
> so we can get this bug resolved and move the discussion of using proctest

We should not land new features without tests. Same applies to bug fixes. If we want to have sane and working code, tests are always required. So I see this crucial as part of this bug.

Also a note. Please try to upload a single patch for all, means the code changes and the test additions. That way it is easier to test and to review.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 43

•

10 years ago

Comment on attachment 8359334 [details] [diff] [review]
Proposed Solution

Review of attachment 8359334 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the drive-by review but I have some concerns with the patch. Please find my comments inline.

::: mozprocess/mozprocess/pid.py
@@ +89,5 @@
> +    ret = ''
> +    if mozinfo.isMac:
> +        signals = ('SIGTERM', 'SIGKILL')
> +    elif mozinfo.isLinux:
> +        signals = ('SIGTERM', 'SIGKILL', 'SIGABRT')

What about Windows? You will get a signals used before declared failure.

@@ +95,5 @@
> +        for sig in signals:
> +            if not [p for p in ps(psarg) if pid == int(p['PID'])]:
> +                break
> +            os.kill(pid, getattr(signal, sig))
> +            time.sleep(timeout)

You should not sleep here but call wait() if necessary with a timeout. Otherwise you will return too early.

@@ +96,5 @@
> +            if not [p for p in ps(psarg) if pid == int(p['PID'])]:
> +                break
> +            os.kill(pid, getattr(signal, sig))
> +            time.sleep(timeout)
> +            ret = "killed with %s" % sig

Why can't we simply return the real exit code here? I don't think it is helpful to have a string like 'killed with -9'. If kill was not successful None should be returned

Attachment #8359334 - Flags: feedback-

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 44

•

10 years ago

windows already has a kill method, we are just adding this for linux/osx.  In fact this code was originally taken from existing code that works.  I am fine making this better- mozprocess should be a best of breed.  This bug was slated as a good first bug, and I want to keep it that way- lets all work together to wrap this up and get this feature added to mozprocess!

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 45

•

10 years ago

Comment on attachment 8373483 [details] [diff] [review]
Proposed test cases

lets move these tests into the same patch as the fix for the code.  That is reasonable.  I think adding a test case for test_cannot_kill is good, but a bit out of scope.  We are gaining adequate test coverage here.  If you can file a bug (clone this bug for the most part) for that specific test case, that would make me happy.

r- as we should make a single patch.

Attachment #8373483 - Flags: review?(jmaher) → review-

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 46

•

10 years ago

Comment on attachment 8373483 [details] [diff] [review]
Proposed test cases

>    @unittest.skip("We must first make an unkillable process")
>    def test_cannot_kill(self):
>        """fail to kill a process"""
>        # bug: make and try to kill an unkillable process
>        pass

I wonder if we can do this at all. How would you clean-up the test so this process doesn't hang around forever? We might be able to have a process which you cannot kill via SIGTERM but SIGKILL. But that would also only work on Linux and OS X. Not sure what to do on Windows. If we unskip here, please do it in a way that we know which bug will add this. Means please file the follow-up bug, and reference it here like 'Bug XYZ - Create test for unkkillable process'.

Attachment #8373483 - Flags: review- → review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 47

•

10 years ago

Comment on attachment 8373483 [details] [diff] [review]
Proposed test cases

good point- lets just forget the hung/dead process.  I like the patch and tests then, :whimboo, do you have other requirements for this?

Attachment #8373483 - Flags: review?(jmaher) → review+

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 48

•

10 years ago

Beside a sane API which is not that clear as of right now, not. We really shouldn't return a string but the returncode form kill() so we can handle it appropriately. I had to fix a couple of issues lately which also handled those parts a bit loosely.

Flags: needinfo?(hskupin)

Ryan

Comment 49

•

10 years ago

Attached patch Proposed tests and terminate_process function (obsolete) — Details — Splinter Review

This is the git diff for both the terminate_process function and the tests thereof. Here are the changes:
- os.kill() call is now in a try block in the event that the process cannot be killed for whatever reason
- os.waitpid() call has been added within a try block, it's in a try block to make sure that if it already killed the OSError is caught
- time.sleep() is still called with a value of 'timeout'. This will allow the timeout to still exist, since any os.wait() operations don't take a timeout value (as far as I know, please correct me if I'm wrong)
- os.kill() returns None (always, I believe) and makes the exit code equal to the second input value (getattr(signal, sig), in this case) so I kept the return value of 'ret' as a string and added the exit code as well as the signal (would you think just the value of the exit code would be better?)
- return value: if 'ret' is an empty string None is returned and if not the string with the exit code and terminate signal is returned instead

in the test file:
- in test_not_running_kill() 'process_id' has been changed to -2 because of how waitpid() works
- all 'should_return' or 'should_not_return' values have been changed to None instead of the empty string

Sorry I haven't been working on this lately, but I have spring break this week so I'd like to finish this up soon! Let me know what you think and I'll get right to work!

Attachment #8359334 - Attachment is obsolete: true

Attachment #8373483 - Attachment is obsolete: true

Attachment #8387883 - Flags: review?(jmaher)

Attachment #8387883 - Flags: review?(hskupin)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 50

•

10 years ago

Comment on attachment 8387883 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8387883 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for uploading a new patch.  I think we were pretty much in agreement that we were near the end previously.  This patch that you uploaded only has changes to the pid.py file.  Can you refresh the patch with all the changes?

Attachment #8387883 - Flags: review?(jmaher) → review-

Ryan

Comment 51

•

10 years ago

Attached patch Proposed solution and tests (obsolete) — Details — Splinter Review

Oops, sorry about that! I thought I uploaded the correct file. Anyway, this should be the correct file. Once you tell me the patch looks good I'll file a bug and put the bug ID number into the test_cannot_kill() method and upload the full diff (like this one) but with the bug ID included

Attachment #8387883 - Attachment is obsolete: true

Attachment #8387883 - Flags: review?(hskupin)

Attachment #8389199 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 52

•

10 years ago

Comment on attachment 8389199 [details] [diff] [review]
Proposed solution and tests

Review of attachment 8389199 [details] [diff] [review]:
-----------------------------------------------------------------

really close.

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +30,5 @@
> +        self.assertEqual(ret, should_return)
> +
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +        ret = pid.terminate_process(some arbitrary process, None)

"some arbitrary process" doesn't seem to be valid, do we need to launch a process here so we can kill it?

also what should it return?

Attachment #8389199 - Flags: review?(jmaher) → review-

Ryan

Comment 53

•

10 years ago

I think it would be best if I were to launch another process. It seems like to do that the following steps are necessary (please correct me if I'm wrong!):
1 - import proclaunch.py
2 - create a new process of class ProcessLauncher
3 - get the pid of the process that was just launched (I don't know how to do this)
4 - use the returned pid to kill the process

It would seem that most of the tests here are using ProcessHandler and not ProcessLauncher, would it be better if I were to use that instead? 

The return value should be a string which indicates the signal it was killed with and the exit code it returned upon being killed.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 54

•

10 years ago

yes, this is what I was suggesting.  instead of importing proclaunch, we could reference it as it is done here:
http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/tests/test_mozprocess_wait.py#17

whatever you feel is most effective.  

we could use mozprocess to get the pid, and then try killing it.

Thanks for sticking with this bug and making it a reality!

Ryan

Comment 55

•

10 years ago

Attached patch Proposed tests and terminate_process function (obsolete) — Details — Splinter Review

Joel,

I think I may have finished it. I am unsure of only two things in this patch:

1 - in test_not_running_kill() I am not sure if -2 is a safe value to use for a pid that is guaranteed to not be running. Do you know anything about this? I changed it to -2 from -1 after reading that -1 is a special value for that function (see here: http://docs.python.org/2/library/os.html#os.kill). 

Perhaps I should also launch a process here, kill it, then try to kill it again.

2 - in test_none_timeout() I am not sure as to the exact syntax of launching a process then getting its pid. I gave it my best shot but please show me any mistakes you see.

Attachment #8389199 - Attachment is obsolete: true

Attachment #8391439 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 56

•

10 years ago

Comment on attachment 8391439 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8391439 [details] [diff] [review]:
-----------------------------------------------------------------

the tests still fail a lot even if I add some stuff locally, can you test locally and get the tests passing?

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +3,5 @@
> +import os
> +import time
> +import unittest
> +import proctest
> +from mozprocess import pid

we need to import processhandler (see other tests for examples)

@@ +30,5 @@
> +        self.assertEqual(ret, should_return)
> +
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +		 p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))

the spacing here looks odd, could just be what I see in the patch.

@@ +33,5 @@
> +        """passing in None for a timeout"""
> +		 p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))
> +        p.run()
> +        p.wait()
> +        ret = pid.terminate_process(p.proc.pid, None)

I think p.wait() waits until the process is done.  Did you run this locally and did it work?

@@ +35,5 @@
> +        p.run()
> +        p.wait()
> +        ret = pid.terminate_process(p.proc.pid, None)
> +        should_not_return = None
> +        self.assertNotEqual(ret, should_not_return)

we need to add:
if __name__ == '__main__':
    unittest.main()

Attachment #8391439 - Flags: review?(jmaher) → review-

Ryan

Comment 57

•

10 years ago

Attached patch Proposed tests and terminate_process function (obsolete) — Details — Splinter Review

After reviewing what you said I went over some things and made a few changes. You'll notice the biggest change is in test_none_timeout() where it runs and kills a process and then checks to see if the exit signal is contained in the returned string.
After running these locally they all work as expected. Let me know if you see anything else wrong!

Attachment #8391439 - Attachment is obsolete: true

Attachment #8391630 - Flags: review?(jmaher)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 58

•

10 years ago

Comment on attachment 8391630 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8391630 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +102,5 @@
> +            except OSError:
> +                break
> +            if timeout:
> +                time.sleep(float(timeout))
> +            ret = "killed with %s exit code %d" % (sig, getattr(signal, sig))

This will not kill the process immediately. Sometimes it will still take a couple of milliseconds until it is gone. I don't think consumers should have to poll os.poll() until it is not None. So the sleep below should better be a loop with that exit condition.

@@ +104,5 @@
> +            if timeout:
> +                time.sleep(float(timeout))
> +            ret = "killed with %s exit code %d" % (sig, getattr(signal, sig))
> +            try:
> +                os.waitpid(pid, 0)

I think we should file a follow-up bug to have a conform way of handling the wait calls:

http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/mozprocess/processhandler.py#545

@@ +112,5 @@
> +        print "WARNING: failed os.kill: %s : %s" % (errno, strerror)
> +    if not ret:
> +        return None
> +    else:
> +        return ret

As mentioned earlier on this bug, why do we have to return a string? Each consumer of this method would have to parse it. Why can't we just return the exit code?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 59

•

10 years ago

I agree with Henrik's comment about the return code being a string.  We should do that and make things better.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 60

•

10 years ago

Comment on attachment 8391630 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8391630 [details] [diff] [review]:
-----------------------------------------------------------------

the tests work great, lets fix the return value to not be a string and we should be ready to complete this.  Thanks for sticking with it!

Attachment #8391630 - Flags: review?(jmaher) → review-

Ryan

Comment 61

•

10 years ago

Attached patch Proposed tests and terminate_process function (obsolete) — Details — Splinter Review

Changes:
- now returns either an int (the exit code) or None
- tests now account for that

Henrik,

When you mention having a conform way to handle wait calls, what do you mean? They way I call the waitpid() function is the same way it's called in the reference you provided.

As for the sleep loop, if it happens that the signal that was used to attempt to kill the process did not work, the loop would run infinitely because it would not have been able to kill the process. I agree that it would be better as a loop with a solid exit condition as you mentioned, but I don't see a way around making an infinite loop. Please let me know what you think.

Thanks for your input!

Attachment #8391630 - Attachment is obsolete: true

Attachment #8394584 - Flags: review?(jmaher)

Flags: needinfo?(hskupin)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 62

•

10 years ago

Comment on attachment 8394584 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394584 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for putting this together!  This patch looks good, lets see what :whimboo says.

Attachment #8394584 - Flags: review?(jmaher) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 63

•

10 years ago

Comment on attachment 8394584 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394584 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +104,5 @@
> +            if timeout:
> +                time.sleep(float(timeout))
> +            ret = getattr(signal, sig)
> +            try:
> +                os.waitpid(pid, 0)

What you want here is to store the state which is returned by waitpid. That we can assume as the final exit code, and we should return. Please check the other code I have mentioned how to interpret the status.

As a final solution we most likely want to have a shared function for this kind of waitpid() call, so we don't have duplicated code. But we could do that in a follow-up bug.

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

10 years ago

Flags: needinfo?(hskupin)

Ryan

Comment 64

•

10 years ago

Attached patch Proposed tests and terminate_process function (obsolete) — Details — Splinter Review

Changes:
- changed the return value from a supposed exit code (found from calling getattr()) to a confirmed exit code (found from os.waitpid())

I ran the tests locally and they all passed. 

Henrik,

I tried my best to replicate what you showed me and adapt it to this function. Please let me know if there's more to clean/finish up. 

Just to make sure we're on the right page: once the bug is declared "resolved" I should file a follow up bug for the test_cannot_kill() test and put the bug number and description as a comment in the file and then submit that as a patch, correct?

Attachment #8394584 - Attachment is obsolete: true

Attachment #8394760 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 65

•

10 years ago

Comment on attachment 8394760 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394760 [details] [diff] [review]:
-----------------------------------------------------------------

I am going to pass the review to :whimboo, I am fine with this patch, but my expertise is not in process handling.

Attachment #8394760 - Flags: review?(jmaher) → review?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 66

•

10 years ago

Comment on attachment 8394760 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394760 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +103,5 @@
> +            if timeout:
> +                time.sleep(float(timeout))
> +            try:
> +                exitCode = os.waitpid(pid, 0)[1]
> +                return exitCode

Please check the documentation in-front of the code I have referenced. The exit code is in the high 16bit part, so you also have to do the transformation.

Best is if you update your test to check that e.g. SIGTERM causes terminate_process() to return the correct exit code. Right now you don't get this.

Attachment #8394760 - Flags: review?(hskupin) → review-

Ryan

Comment 67

•

10 years ago

Attached patch Proposed tests and terminate_process function (obsolete) — Details — Splinter Review

Changes:
-returns the high bit from the exit code 

Henrik,

The only change I implemented was to fix the return value from the full exitCode to the high bit of the exit code if it's greater than 255.

Attachment #8394760 - Attachment is obsolete: true

Attachment #8396920 - Flags: review?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 68

•

10 years ago

Comment on attachment 8396920 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8396920 [details] [diff] [review]:
-----------------------------------------------------------------

I know that we want to get this patch landed soon. Also I don't want to be the person to hold off landing it, but I still have some little concerns about the current API implementation. I would be ok, to get all of them addressed in a follow-up bug, but for that please those needed bugs and mark them as dependent on this one. At least please address the first and third inline comment and update the patch. With that you have my r+.

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +18,5 @@
> +
> +    @unittest.skip("We must first make an unkillable process")
> +    def test_cannot_kill(self):
> +        """fail to kill a process"""
> +        # bug: make and try to kill an unkillable process

Can you please create the bug, so we can have a valid reference here? Thanks.

@@ +26,5 @@
> +        """passing in None for a pid
> +            i.e. killing a nonexistent process"""
> +        ret = pid.terminate_process(None)
> +        should_return = None
> +        self.assertEqual(ret, should_return)

I think it would be better to not return None in case of a non-existent process but raise an exception. None would mean that the process is still running, so it can cause wrong assumptions.

@@ +30,5 @@
> +        self.assertEqual(ret, should_return)
> +
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +        p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))

Mind doing wrapping here? It's good practice to try to keep the 80 char limit for a line.

@@ +33,5 @@
> +        """passing in None for a timeout"""
> +        p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))
> +        p.run()
> +        ret = pid.terminate_process(p.proc.pid, None)
> +        self.assertTrue(isinstance(ret, (int,long)))

I think we should enhance the tests and check for the correct exit code. Means if we kill with e.g. sigkill it should be equal sigkill (-9). But I'm happy if we do this in a follow-up bug too.

Attachment #8396920 - Flags: review?(hskupin) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

10 years ago

Whiteboard: [good first bug] → [mentor=jmaher][lang=py][good first bug]

Ryan

Comment 69

•

10 years ago

Attached patch Proposed tests and terminate_process function — Details — Splinter Review

Changes:
-registered bug #988716
-terminate_process() now raises an exception upon None value for pid
-test cases reflect the newly raised exception
-wrapped lines to be <=80 characters

If there are any problems let me know. Thanks for helping me through this process!

Attachment #8396920 - Attachment is obsolete: true

Attachment #8397602 - Flags: review?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

10 years ago

Blocks: 988716

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 70

•

10 years ago

Comment on attachment 8397602 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8397602 [details] [diff] [review]:
-----------------------------------------------------------------

Ryan, I hope you are not disheartened when I bring up some more stuff. But I think that those are crucial, and should be fixed. You did great work so far, and I think you can still learn a lot from the remaining things to do on this bug. I can only tell from myself when I was working on a couple of mozprocess issues in the last weeks, that all that process handling is kinda complex.

So if some of the inline comments are not clear enough for you, please let me know. I'm totally happy to assist you here.

@Joel, shall we change the mentor flag?

::: mozprocess/mozprocess/pid.py
@@ +88,5 @@
> +    Args:
> +        pid: integer process id of the process to terminate
> +    """
> +	 if not pid:
> +	 	 raise Exception("No process id provided")

This will help you to react accordingly to a pid set to None, but it will not be able to detect if the pid as specified is valid process id. Here the exception raised os.kill() would be important.

@@ +96,5 @@
> +        signals = ('SIGTERM', 'SIGKILL', 'SIGABRT')
> +    try:
> +        for sig in signals:
> +            if not [p for p in ps(psarg) if pid == int(p['PID'])]:
> +                break

If we do not find the given pid can we assume the pid is not correct, and should raise our own e.g. ProcessNotKilledError exception? That way we would not return None, and the caller could correctly handle the failure, and wouldn't assume that the process is still running.

@@ +100,5 @@
> +                break
> +            try:
> +                os.kill(pid, getattr(signal, sig))
> +            except OSError:
> +                break

I think here is a logical misbehavior. We should not break the loop as long as we haven't called kill() for each available signal, e.g. SIGTERM could fail but SIGNKILL succeed. Right now we would break the loop but not continue with the next signal. At the very end we might also raise a ProcessNotKilledError exception.

Sorry that I haven't seen this before.

@@ +104,5 @@
> +                break
> +            if timeout:
> +                time.sleep(float(timeout))
> +            try:
> +                exitCode = os.waitpid(pid, 0)[1]

While checking os.waitpid() I wonder if the call to sleep above is actually necessary. Wouldn't waitpid() wait anyway? With the extra timeout we only loose valuable time, e.g. if you specify 5s we would wait 5+x seconds while without it only x seconds.

@@ +109,5 @@
> +                if exitCode > 255:
> +                    return exitCode >> 8
> +                return -exitCode
> +            except OSError:
> +                break

We might not want to silently drop this exception. We should at least log what went wrong.

@@ +111,5 @@
> +                return -exitCode
> +            except OSError:
> +                break
> +    except OSError, (errno, strerror):
> +        print "WARNING: failed os.kill: %s : %s" % (errno, strerror)

This is a no-op given that such an exception will never reach this level. As of now you catch all of them inside the loop.

Also with our own exception type we wouldn't need it given that the caller should handle it accordingly.

Also in case of success we should never reach this line, but only in case of a failure. And then an exception should be raise. A situation would be (as mentioned above) when we tried each signal without success and finished the loop.

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +30,5 @@
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +        p = processhandler.ProcessHandler([self.python, self.proclaunch, 
> +						 "process_simple_tokill.ini"], 
> +						 cwd=(os.path.dirname(os.path.abspath(__file__))))

nit: the indentation is off here and you have trailing spaces in two of the above lines.

Attachment #8397602 - Flags: review?(hskupin) → review-

Ryan

Comment 71

•

10 years ago

Henrik,

I see this as a process to improve my coding and an experience to write production level code. I plan on seeing this bug to the end! Thank you for sticking with me through this long process. I'll get to work on this and try to fix it to the best of my ability.

Ryan

Comment 72

•

10 years ago

Henrik,

I've been working on this for a little while and have a couple questions:
- if we remove the lines which use the 'timeout' variable then it becomes an unnecessary variable, should it be removed from the function? This is an important because if it should be removed, the tests need to be refactored. 
- If we decide to raise our own exception, a class needs to be made for that exception. Where should that file go? Also, is there a Mozilla standard for creating error classes? If so, could you please provide a reference?
- In the try block with the waitpid() call, when an error is caught should a different error be raised than ProcessNotKilledError? I would guess it would be something like ProcessAlreadyKilledError.

I'll keep you posted if I have any more questions.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 73

•

10 years ago

(In reply to Ryan from comment #72)
> - if we remove the lines which use the 'timeout' variable then it becomes an
> unnecessary variable, should it be removed from the function? This is an
> important because if it should be removed, the tests need to be refactored. 

That's correct. As pinted out for the process manager module we also don't have a timeout in the wait() method. I would propose we keep the behavior, and file a follow-up bug to introduce a timeout for both in the future. The other options would be to use a while loop for `pid.poll() == None` and do the check for timeout in there, or use a signal handler which raises an exception after the timeout occurred. The latter might be good if we really have an non-killable process.

> - If we decide to raise our own exception, a class needs to be made for that
> exception. Where should that file go? Also, is there a Mozilla standard for
> creating error classes? If so, could you please provide a reference?

I would suggest to create an errors.py file under mozprocess and export it by default in __init__.py. You can have a look at mozrunner, where I already did that.

> - In the try block with the waitpid() call, when an error is caught should a
> different error be raised than ProcessNotKilledError? I would guess it would
> be something like ProcessAlreadyKilledError.

The method you are working on is targeted to kill a process. At the beginning you are checking that the pid exists. So whatever we do in killing it and we don't succeed, we should have a ProcessNotKilledError.

Status: NEW → ASSIGNED

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 74

•

10 years ago

Hi Ryan, it's been a while since the last update on this bug. Maybe you are covered with other work. Would you mind to give us an update for this bug? Thanks a lot.

Ryan

Comment 75

•

10 years ago

Hi Henrik, I'm sorry it's been so long, but I haven't worked on this bug as I've been very busy with work for my internship. I don't think I'll have time to devote to finishing this bug. Sorry for leaving this unfinished.

Nobody; OK to take it and work on it

Updated

•

10 years ago

Mentor: jmaher

Whiteboard: [mentor=jmaher][lang=py][good first bug] → [lang=py][good first bug]

Daniel Y Lee

Assignee

Comment 76

•

10 years ago

Hi,

Seems like this bug needs just a little bit more from what Ryan has built on.

Is this still a relevant bug? If so, I am interested in completing what Ryan has [almost] finished.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 77

•

10 years ago

Hi Daniel. Yes, this bug is still relevant and it would be great to get it finished. You are welcome to help us here. If you need any kind of information don't hesitate to ask.

One thing to note is that the canonical repository has been moved from github into mozilla-central. So the filenames in the patch will need an update. Please see the URL field for all the details in handling the code and reviews.

Assignee: ryan.da.baker → devty1023

URL: https://wiki.mozilla.org/Auto-tools/P...

Mentor: hskupin

Whiteboard: [lang=py][good first bug] → [lang=py]

Daniel Y Lee

Assignee

Comment 78

•

10 years ago

Hi Henrik,

Thanmks for the pointers! Will start working on the ticket.

Daniel

Daniel Y Lee

Assignee

Comment 79

•

10 years ago

Hi Henrik,

Sorry! It took a while for me to read through this thread to digest the state of this ticket.

I have an unresolved issue I like to discuss with you before I submit a patch for a reivew. 

From my understanding, we are looping through a set of signals, each with increasing "strength" to make sure that we try all means possible to terminate the process. Say the process we are trying to kill handles SIGTERM differently. Wouldn't using os.waitpid() hang? If so, what would be the point of trying out different sets of signal when the loop proceeds only if SIGTERM worked?

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 80

•

10 years ago

That's an interesting question. Maybe you could simulate that by not sending SIGTERM and check how the code would behave then? If that call would hang, we indeed would have to change it. Maybe the solution as described on that stackoverflow page would be better:

http://stackoverflow.com/questions/13399734/how-to-find-out-when-subprocess-has-terminated-after-using-os-kill

Daniel Y Lee

Assignee

Comment 81

•

10 years ago

Hi Henrik,

Some initial investigation.

1) os.waitpid(pid, option) only works if the pid we are waiting for is a child process of the system thats calling it - that is, we cannot wait for a termination of external process. I find this extremely restrictive - need your thoughts on this.

====================
shell> sleep 1000 &
[1] 16170
shell> python
...
>>>import os
>>> os.waitpid(16170,0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 10] No child processes
====================

2) If the process we are trying to kill is one of our child process. waitpid() will hang if we do not send any signal to kill it.
====================
>>> import os
>>> import subprocess
>>> p = subprocess.Popen(["sleep", "1000"])
>>> p.pid
16192
>>> os.waitpid(16192, 0)
...
====================

3) what we can do instead is to check to see if the process has responded to SIGTERM (i.e. it died) by sending in signal 0 with os.kill() after some timeout. If the process did not die by SIGTERM, we will try a stronger signal (i.e. SIGKILL)

I will implement the above approach once I get your comments.

Our of curioisity - why do we try to send SIGABRT after SIGKILL for OSX? Is it possible for OSX application to ignore SIGKILL? Possible reference I found here: (https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man3/signal.3.html)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 82

•

10 years ago

(In reply to Daniel Y Lee from comment #81)
> 1) os.waitpid(pid, option) only works if the pid we are waiting for is a
> child process of the system thats calling it - that is, we cannot wait for a
> termination of external process. I find this extremely restrictive - need
> your thoughts on this.
> 
> ====================
> shell> sleep 1000 &
> [1] 16170
> shell> python
> ...
> >>>import os
> >>> os.waitpid(16170,0)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OSError: [Errno 10] No child processes
> ====================

Ok, so that's how it has been reported on the linked document. That sounds fair, and what the current version of the patch has been implemented.

> 2) If the process we are trying to kill is one of our child process.
> waitpid() will hang if we do not send any signal to kill it.
> ====================
> >>> import os
> >>> import subprocess
> >>> p = subprocess.Popen(["sleep", "1000"])
> >>> p.pid
> 16192
> >>> os.waitpid(16192, 0)
> ...
> ====================

Ok, so that's indeed a problem. Looks like we can't use waitpid then.

> 3) what we can do instead is to check to see if the process has responded to
> SIGTERM (i.e. it died) by sending in signal 0 with os.kill() after some
> timeout. If the process did not die by SIGTERM, we will try a stronger
> signal (i.e. SIGKILL)
> 
> I will implement the above approach once I get your comments.

That sounds perfect! Good investigation Daniel! I fully support this idea.

> Our of curioisity - why do we try to send SIGABRT after SIGKILL for OSX? Is
> it possible for OSX application to ignore SIGKILL? Possible reference I
> found here:
> (https://developer.apple.com/library/ios/documentation/System/Conceptual/
> ManPages_iPhoneOS/man3/signal.3.html)

Good question. As I also read on various other pages SIGABRT is mostly used inside an application or by a library in use when calling the abort() function. So external applications should not send such a signals to a target process. SIGKILL should indeed be totally enough.

Daniel Y Lee

Assignee

Comment 83

•

10 years ago

One last thing - I am observing something quite tricky while I'm implementing this.

Say I'm given a pid to kill. We planned to do something like

>>>os.kill(pid, signal.SIGTERM)

to kill it. The problem is, when a process sends a SIGTERM/SIGKILL signal to a child process that is NOT is own, the process we've "killed" becomes a zombie/defunct process. It stays like this until the parent who spawned the process is also killed OR the parent finishes waiting on the child process to terminate.

This brings about 2 issues:

1) Is this okay? Zombie processes might be everywhere?

2) Having zombie process is troublesome since we need to come up with a new way determine if the process is really dead. The talos way of calling ps and seeing whether pid exists is no longer valid (the process is effectively dead, but it is still in the process table). My way of checking whether a process is dead by sending signal 0 also does not work (somehow, a zombie process can still respond to any signal we send to it.)

So I'm going to have to come up with a new way of checking whether a process is dead/is zombie. Tricky, but I can investigate.

Please share what how think about this issue.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 84

•

10 years ago

Good question, where I don't really have a straight answer for. I never had to handle processes spawned by other applications. So maybe you should play a bit with the SIGCHLD handler, and its different options? Maybe we can get the child process to completely quit, with that way. See http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html for the C variant. I wonder how the `kill` command line tool works, given that it does it correctly.

Which specific external tool were you trying to kill? Usually the parent process should read the child state, and react on ended child processes. Maybe the above mentioned SIGCHILD handling might work? Otherwise how long does it take until this zombie goes away for your case?

Daniel Y Lee

Assignee

Comment 85

•

10 years ago

I was working on writing a suite of testcases for my code, using processHandler module to spawn a process. It was a bit puzzling since I would not have considered process spawned via processHandler to be an "external application's process".

For clarity, I've implemented two test caes, both of which fail!

    def test_terminate_process(self):
        p = processhandler.ProcessHandler([self.python, self.proclaunch,
                        "process_simple_tokill.ini"],
                        cwd=(os.path.dirname(os.path.abspath(__file__))))
        p.run()
        ret = pid.terminate_process(p.proc.pid,10)
        self.assertEqual(ret, p.proc.pid)

    def test_terminate_process_2(self):
        p = subprocess.Popen(["sleep", "1000"])
        ret = pid.terminate_process(p.pid,10)
        self.assertEqual(ret, p.pid)

Here is a typical output:

terminate_process called with: (13400,10)
trying out sig 15 for process 13400
## result of ps
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/3', 'PID': '13400', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGTERM failed to kill process 13400

trying out sig 9 for process 13400
## result of ps
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/3', 'PID': '13400', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGKILL failed to kill process 13400

Notice the "<defunct>" keyword. Also 'STAT' notes a state of 'Z' - consultation with man pages reveal that 'Z' stands for 'Zombie'.

If I'm understanding SIG_CHLD correclty, it is signal sent from the child process its parent when it exits. Not sure if it will have any meaningful impact for the child process when it receives (unless the child process has spawned a process and is waiting for it. Then what we are doing is introducing a bug?).

My next bet is to check whether (a) process exists in ps and (2)it is not a zombie state as the predicate to my is_running(pid) function. I'm not 100% confident this is what is desired (again, I don't guarantee that zombie process will actually be cleaned up once the parent gets its SIGCHLD signal). I'm leaning towards the opinion that it will be acceptable.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 86

•

10 years ago

Would you mind delaying the `ps` call or running it in a loop for a while? I wonder if the process gets cleaned up by init a short while after. That's how it should usually work.

Daniel Y Lee

Assignee

Comment 87

•

10 years ago

The current implementation does something like that. Here's a snippet:

        for sig in signals:
            try:
                print "trying out sig %s for process %s" % (getattr(signal,sig), str(pid))
                os.kill(int(pid), getattr(signal, sig))
                time.sleep(float(timeout)) ## LOOK HERE
                if not is_running(pid):
                    return pid
                else:
                    print [p for p in ps(psarg) if pid == int(p['PID'])]
                    print "sig %s failed to kill process %s" % (sig, str(pid))

            except OSError as e:
                return None
            except TypeError as e:
                timeout=3.0
                time.sleep(float(timeout))
                if not is_running(pid):
                    return pid

I've tried waiting upto 30 seconds, still no cigar!

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 88

•

10 years ago

So what's the parent process, of the child you kill here? Looks like it's not the init process.

Would you mind to create a minimized script which perfectly illustrates that? I could also have a look on my box.

Daniel Y Lee

Assignee

Comment 89

•

10 years ago

Attached file test_mozprocess_terminate_process.py — Details

Daniel Y Lee

Assignee

Comment 90

•

10 years ago

Attached file pid.py — Details

Daniel Y Lee

Assignee

Comment 91

•

10 years ago

Basically what I'm doing is to craft and run and test the functionality. 

I've rooted out the important snippets on the comments above - so I'm not 100% sure if this suits your need.

Daniel Y Lee

Assignee

Updated

•

10 years ago

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 92

•

10 years ago

Daniel, I'm so sorry. That needinfo request totally fall through in my inbox. I have seen it right now. So I hope we can still continue in fixing it?

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

10 years ago

Attachment #8486514 - Attachment mime type: text/x-python → text/plain

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

10 years ago

Attachment #8486515 - Attachment mime type: text/x-python → text/plain

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 93

•

10 years ago

So I tested the snippet you gave above after modifying the main routine a bit:

if __name__ == '__main__':
    for i in sys.argv[1:]:
        for pid in sorted(get_pids(i)):
            terminate_process(pid)

        print 'Remaining pids: %s' % get_pids(i)


When I run the script now, I'm always able kill processes, and terminate_process returns without a failure after the 3s timeout. A check if pids still exist for the given process name returns an empty array. Here an example for two running Firefox instances on my Ubuntu box:

$ python pid.py firefox
terminate_process called with: (3849,3)
trying out sig 15 for process 3849
terminate_process called with: (16328,3)
trying out sig 15 for process 16328
Remaining pids: []

Both pids do no longer exist, and I also don't see a zombie process. Am I doing something wrong to not see your problem? Please let me know. Otherwise we should try to finish the patch and add the test cases to the same patch as the code changes are contained in. Thanks.

Flags: needinfo?(devty1023)

Daniel Y Lee

Assignee

Comment 94

•

10 years ago

I'll get on to it :)

Flags: needinfo?(devty1023)

Daniel Y Lee

Assignee

Comment 95

•

10 years ago

Okay! Took me a while to get myself into this ticket again.

I want to ask you to try to create a process using ProcessHandler and kill it using terminate_process.

Here is the one that I'm having trouble with:

class TestKillProcess(proctest.ProcTest):
    """Class to test terminating a process given pid"""
    def test_terminate_process(self):
        p = processhandler.ProcessHandler([self.python, self.proclaunch,
                                            "process_normal_finish_python.ini"],
                                            cwd=(os.path.dirname(os.path.abspath(__file__))))
        p.run()
        ret = pid.terminate_process(p.proc.pid,10)
        self.assertEqual(ret, p.proc.pid)

Or even simpler, using Pythhon's subprocess module.

    def test_terminate_process_2(self):
        p = subprocess.Popen(["sleep", "1000"])
        ret = pid.terminate_process(p.pid,10)
        self.assertEqual(ret, p.pid)


Both emits the following output

terminate_process called with: (10395,10)
trying out sig 15 for process 10395
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/7', 'PID': '10395', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGTERM failed to kill process 10395
trying out sig 9 for process 10395
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/7', 'PID': '10395', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGKILL failed to kill process 10395


So SIGTERM did kill the process - unfortunately it's a zombie process. 

Good chance that this has to do something with how ProcessHandler is spawning the processes I am trying to kill - since ProcessHandler object has yet to claim the process that is now effectively dead, we are left with zombie classes?

In all honesty, I have no idea why this I am getting zombie processes - I looks to me that the "parent process" (ProcessHandler or subprocess.popen) is the same one that is calling terminate_process. 

Would like to see if you are able to replicate the issue I am seeing

Flags: needinfo?(hskupin)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 96

•

9 years ago

whimboo, can you address the question here?  It is a few months overdue!

Andrew Halberstadt [:ahal]

Comment 97

•

9 years ago

I'm really sorry to jump in and say this after so much effort has gone into this bug.. but why can't we just use something else to kill the process?

In my mind mozprocess should only be concerned with the processes (and child processes) that it spawns itself. I'm afraid that adding API's for manipulating arbitrary 3rd party processes is opening a can of worms. There are already tons of libraries for manipulating processes, including psutil which is already checked into the tree. Here is an example of killing a process using psutil:
https://dxr.mozilla.org/mozilla-central/source/python/psutil/examples/killall.py#25

William Lachance (:wlach)

Comment 98

•

9 years ago

I agree with Andrew. I filed bug 1139487 to remove the mozprocess.pid code altogether. psutil works great, there is no good reason for us to try reinventing it. If there is a need for something specific in talos and it isn't possible to bring psutil there, we should just do a local solution until we can. I'm going to resolve this as wontfix; apologies to everyone involved for the wasted time/energy.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Flags: needinfo?(hskupin)

Resolution: --- → WONTFIX

proposed solution 10 years ago Ryan 970 bytes, patch	jmaher : feedback-	Details \| Diff \| Splinter Review
proposed solution 10 years ago Ryan 1.03 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed Solution 10 years ago Ryan 1.06 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed Solution 10 years ago Ryan 1.14 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed Solution 10 years ago Ryan 1.01 KB, patch	jmaher : review+ whimboo : feedback-	Details \| Diff \| Splinter Review
first test case attempt 10 years ago Ryan 841 bytes, patch		Details \| Diff \| Splinter Review
second test case attempt 10 years ago Ryan 1014 bytes, patch	jmaher : review+	Details \| Diff \| Splinter Review
Proposed test cases 10 years ago Ryan 1.14 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 1.01 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed solution and tests 10 years ago Ryan 2.35 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 2.78 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 3.54 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 3.31 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 3.25 KB, patch	whimboo : review-	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 3.33 KB, patch	whimboo : review+	Details \| Diff \| Splinter Review
Proposed tests and terminate_process function 10 years ago Ryan 3.36 KB, patch	whimboo : review-	Details \| Diff \| Splinter Review
test_mozprocess_terminate_process.py 10 years ago Daniel Y Lee 1.62 KB, text/plain		Details
pid.py 10 years ago Daniel Y Lee 3.82 KB, text/plain		Details