Closed Bug 925408 Opened 11 years ago Closed 9 years ago

[mozprocess] mozprocess should have the ability to kill a specific process

Categories

(Testing :: Mozbase, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jmaher, Assigned: devty1023, Mentored)

References

()

Details

(Whiteboard: [lang=py])

Attachments

(3 files, 15 obsolete files)

3.36 KB, patch
whimboo
: review-
Details | Diff | Splinter Review
1.62 KB, text/plain
Details
3.82 KB, text/plain
Details
in porting talos I am stuck on the ability to kill a specific process.  While this might not be the normal case, it is the case when the browser is hung, or a secondary process is launched unrelated to the core mozprocess.  

I found the pid.py file which has utilities for getting a pid based on process name.  killing a process based on pid or name would fit right into that.
I'm +1 for adding this functionality to pid.py . While there has been historically some sentiment as Joel echos that we shouldn't have to do this, but in practice we do do this, so I'd rather have this code in mozbase in one place, and mozprocess.pid seems as good a place as any.
Blocks: 923770
Summary: mozprocess should have the ability to kill a specific process → [mozprocess] mozprocess should have the ability to kill a specific process
Whiteboard: [good first bug]
Hello,

I've been looking at this for a little while and I seem to be snagged on a little detail. I'm not sure how to check if the process given its pid is currently running. Do you think you could point me in the right direction?

Thanks!
Ryan
the above references for linux and osx reference is_running:
http://hg.mozilla.org/build/talos/file/tip/talos/utils.py#l118

this in turn references mozprocess:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/pid.py#L21

does that help out?
Unfortunately not :/
I'm having a tough time understanding what ps() is doing and how to get a process from it
here is an example of how to use mozprocess.pid.ps():
from mozprocess import pid as mozpid
pids = mozpid.ps('axwww')
for pid in pids:
    if pid['COMMAND'].find('firefox') > 0:
        print pid['PID']


While this is a simple example, it should help you figure out how to use mozpid.  Basically we collect the output from 'ps <args>' and put it in an object:
{'COMMAND': '/usr/lib/firefox-trunk/firefox-trunk', 'TTY': '?', 'PID': '8462', 'STAT': 'Sl', 'TIME': '1:17'}

now we can find a matching process and get the pid without too much trouble.
Okay, thanks. I think I now see what I should be doing, I'll attach what I think it should be.
Attached patch proposed solution (obsolete) — Splinter Review
Please let me know if I'm on the right track, or, if not, let me know where I went wrong or should be going! Thanks a lot!
Attachment #8355728 - Flags: feedback?(jmaher)
Attachment #8355728 - Attachment is patch: true
Comment on attachment 8355728 [details] [diff] [review]
proposed solution

Review of attachment 8355728 [details] [diff] [review]:
-----------------------------------------------------------------

This is a good start, we need to support osx, linux, and windows (already done with wpk.kill_pid(pid))

You should be able to determine the os via:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/pid.py#L14

that could sort out the psargs and the signals to iterate through.

Thanks for working through this, keep up the good work!

::: mozprocess/mozprocess/pid.py
@@ +86,5 @@
> +       Args:
> +               pid: integer process id of the process to terminate
> +       """
> +       ret = ''
> +    pids = ps()

I suspect this won't get us the pids, we probably need the args='axwww' for linux and '-Acj' for osx.

here is how we do it for osx:
http://hg.mozilla.org/build/talos/file/tip/talos/ffprocess_mac.py#l22

@@ +97,5 @@
> +                    ret = "killed with %s" % sig
> +    except OSError, (errno, strerror):
> +        print "WARNING: failed os.kill: %s : %s" % (errno, strerror)
> +    return ret
> +

minor detail: we do 4 space indentation in the files, no tabs just spaces
Attachment #8355728 - Flags: feedback?(jmaher) → feedback-
Thanks! I understand why I would need to check the OS for the signals I should be iterating through, but since the function ps() has a default argument of psarg would I need to pass it anything? It looks to me like the correct argument is passed to the function by default and in following the DRY principle then terminate_process function shouldn't check that again. Thanks for your help!
we would need to assure that the psargs defined at the top of pid.py will be used.  We could defined the signals up there as well.
Attached patch proposed solution (obsolete) — Splinter Review
Okay, thanks for clearing that up. I just fixed the things you pointed out, please let me know if there's anything you see that needs to be looked at again with this one. 

After the patch is finished should I add some tests to mozprocess/tests/?
Attachment #8355728 - Attachment is obsolete: true
Attachment #8356182 - Flags: review?(jmaher)
Comment on attachment 8356182 [details] [diff] [review]
proposed solution

Review of attachment 8356182 [details] [diff] [review]:
-----------------------------------------------------------------

really close, I think this is all that is left for making this solid.  After this, we really need some tests as you mentioned.

::: mozprocess/mozprocess/pid.py
@@ +80,4 @@
>      else:
>          return [pid for pid,_ in running_processes(name)]
>  
> +def terminate_process(pid, timeout):

please make timeout optional, with a default of 2 seconds.

@@ +88,5 @@
> +    """
> +    ret = ''
> +    pids = ps(psarg)
> +    if mozinfo.isMac:
> +        signals = ('SIGTERM', 'SIGKILL') 

please remove the trailing whitespace at the end of all lines, you can see them all in bugzilla review as pink characters :)

@@ +97,5 @@
> +            for p in pids:
> +                if p['PID'] == pid:
> +                    os.kill(pid, getattr(signal, sig))
> +                    time.sleep(timeout)
> +                    ret = "killed with %s" % sig

why do we iterate through the pids?  Could we just verify the pid is in the list and if so, then try to kill it with each signal?  That should reduce the double for loop.
Attachment #8356182 - Flags: review?(jmaher) → review-
Attached patch Proposed Solution (obsolete) — Splinter Review
Thanks for the help! Sorry it's taken me a while, I've been a little busy. How does this look?
Attachment #8356182 - Attachment is obsolete: true
Attachment #8358629 - Flags: review?(jmaher)
Comment on attachment 8358629 [details] [diff] [review]
Proposed Solution

Review of attachment 8358629 [details] [diff] [review]:
-----------------------------------------------------------------

this is really close, your code is clean, just one small logic flaw.

::: mozprocess/mozprocess/pid.py
@@ +98,5 @@
> +                for sig in signals:
> +                    os.kill(pid, getattr(signal, sig))
> +                    time.sleep(timeout)
> +                    ret = "killed with %s" % sig
> +                    return ret

right now we loop through the signals, we need to check if the process is running prior to sending each signal.  Also notice the return is outside the loop.

for sig in signals:
    if not isrunning(pid):
        break

    os.kill(pid, getattr(signal, sig))
    time.sleep(timeout)
    ret = "killed with %s" % sig

return ret
Attachment #8358629 - Flags: review?(jmaher) → review-
Thanks for the review
So my problem right now is that I'm not sure how to check if a process is running given its pid. I know I could use the function running_processes() to get the list of processes that are running or even slim it down using get_pids() to get the pids of the running processes. However, I don't know the name of the process I am trying to kill and I don't know how to get the name. Would my local list "pids = ps(psarg)" not necessarily be the running processes?
good question, if you look at code that does this in the talos framework, we check if a process is running via:
http://hg.mozilla.org/build/talos/file/tip/talos/utils.py#l116

Given what we have, in terminate_process, we should be able to do that.
Ah, okay. So maybe I could just do that from the outset when I initialize the pids list by saying:
pids = [p for p in ps(psarg) if pid == int(p['PID'])]. Then check if the list is empty, if not I know that the list contains the process I want to terminate and wouldn't have to check later if the process is running. Do you think that would work?
Then this raises the question: are all PIDs unique for each process that would be running? (this is just to clarify for testing purposes later on)
Also, would you mind going ahead and marking me in the 'assigned to' field?
Assignee: nobody → ryan.da.baker
as far as I know all pids will be unique on all systems we need to run this code on.  With that said, we just need to verify before sending os.kill() that the process is in the *current* process list.

When this gets in, I can remove a lot of mostly duplicated code, exciting times!
Ah, I see what you mean. So instead of just initializing the "pids" list once, I should reinitialize it every time I try to os.kill it with a certain process to see if it has been killed yet. I'll upload what I have.
Attached patch Proposed Solution (obsolete) — Splinter Review
I think this seems right. Please let me know!
Attachment #8358629 - Attachment is obsolete: true
Attachment #8359253 - Flags: review?(jmaher)
Comment on attachment 8359253 [details] [diff] [review]
Proposed Solution

Review of attachment 8359253 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +99,5 @@
> +                    break
> +                os.kill(pid, getattr(signal, sig))
> +                time.sleep(timeout)
> +                ret = "killed with %s" % sig
> +                pids = [p for p in ps(psarg) if pid == int(p['PID'])]

I still don't like this.  We are given a single pid and want to kill it.

we should:
* verify pid is running
* if running, try to kill with signals
** os.kill/wait
** get list of pids and if pid is not in list break

going this route, we have a single for loop, and it is simpler.

really the check could be:
if not [p for p in ps(psarg) if pid == int(p['PID'])]:
    break
Attachment #8359253 - Flags: review?(jmaher) → review-
Attached patch Proposed Solution (obsolete) — Splinter Review
Thanks for the feedback. As I was refactoring to make it correct I didn't give enough thought to how I could make it more efficient. I got rid of unnecessary lines and loops and came down to this process:
- check if the pid is part of a running process
- try to kill it with each signal
-- os kill/wait
-- make sure process given its pid was properly terminated
Attachment #8359253 - Attachment is obsolete: true
Attachment #8359334 - Flags: review?(jmaher)
Comment on attachment 8359334 [details] [diff] [review]
Proposed Solution

Review of attachment 8359334 [details] [diff] [review]:
-----------------------------------------------------------------

great, thanks!  Now for the second half- lets write a couple tests for this.
Attachment #8359334 - Flags: review?(jmaher) → review+
Okay great! So here are some ideas for test cases that I'm going to write, tell me if you see any that are not necessary or can think of any that should be added.
- pid for a non-running process
- a process to kill for each signal (including SIGABRT for Linux)
    |- would this be different for each OS as well? Should that be tested do you think?
- a process that cannot be killed for whatever reason, and instead throws an OSError
- three test which will kill a process correctly but have different values for timeout
  including: default value, 0 value, large value
I would send in a None value for the pid, likewise the timeout.

If we could make a zombie process, then we could probably get to the error condition where the signals do not terminate the process.  I wouldn't go overboard on defining different process conditions and types, especially for the different OS types.

I look forward to your patch with some tests.
In mozmill which is based on mozrunner and mozprocess we actually see zombie processes of Firefox and it's something I'm currently investigating and trying to fix. See bug 959551. Not sure how this would help but what I see is, when Firefox gets restarted a new process with the same pid gets started and another one still lays around until Firefox gets completely closed. Not sure if that would help you for a test, but can't we create a zombie by not using process wait()?
Ryan- following up to see if you have had a chance to work on some of the unittests?
Hi Joel,
Sorry it's taken me a while, school started again and I got stuck and became very busy. I'll attach what I have started but I feel I don't know how to go about doing this (setting up a ghost process, etc.)
Attached patch first test case attempt (obsolete) — Splinter Review
Attachment #8365112 - Flags: review?(jmaher)
Attachment #8365112 - Attachment is patch: true
Attachment #8365112 - Attachment mime type: text/x-python-script → text/plain
Comment on attachment 8365112 [details] [diff] [review]
first test case attempt

>#!/usr/bin/env python
>
>import os
>import time
>import unittest
>import proctest
>from mozprocess import pid
>
>class TestKillProcess(proctest.ProcTest):
>    """ Class to test various processes to kill """
>
>    def test_not_running_kill(self):
>        """try to kill a non running process"""
>        process_id = -1
>        should_return = ""
>        ret = pid.terminate_process(process_id)
>        self.assertEqual(ret, should_return)
this is looking good

>
>    def test_cannot_kill(self):
>        """fail to kill a process"""
>        # make and try to kill an unkillable process
good placeholder

>
>    def test_none_pid(self):
>        """passing in None for a pid"""
>        ret = pid.terminate_process(None)
>        # what should I expect it to return?
can you test this? I assume it will return ''

>
>    def test_none_timeout(self):
>        """passing in None for a timeout"""
>        ret = pid.terminate_process(4, None)
>        
another great start


Not going to mark it as +/-, I think if we can get asserts in here for all tests this is r+.  Now to figure out how to create a process that cannot be killed.
Attachment #8365112 - Flags: review?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #32)
> >    def test_cannot_kill(self):
> >        """fail to kill a process"""
> >        # make and try to kill an unkillable process
> good placeholder

If this is not part of your work on this bug and it will be checked-in that way, it would be better to mark this method as skipped via unitest.skip(), and file a follow-up bug which can be referenced in the skip comment.

For an example how I did this for mozrunner tests:
https://github.com/mozilla/mozbase/blob/master/mozrunner/tests/test_start.py#L33
Attached patch second test case attempt (obsolete) — Splinter Review
Here is the progress I have made so far. As I mentioned, progress is very slow due to school work and other obligations, sorry about that. I replaced a couple placeholders but am unsure of two things:
- in test_cannot_kill I am very unsure of how to make a ghost process
- in test_none_timeout I am unsure what the pid input should be (if anything), any suggestions for this?
Attachment #8365112 - Attachment is obsolete: true
Attachment #8369768 - Flags: review?(jmaher)
Comment on attachment 8369768 [details] [diff] [review]
second test case attempt


>
>    def test_none_timeout(self):
>        """passing in None for a timeout"""
>        ret = pid.terminate_process(4, None) #some arbitrary process
>        should_not_return = ''
>        self.assertNotEqual(ret, should_not_return) 

we should be able to launch a program here, how about we add a support file which is basically a python script:
while 1:
  sleep 1

we could launch that with mozprocess and get the pid, then we have a pid.

as for a zombie process, lets do what :whimboo suggested in comment 33 and use unitest.skip() and file a bug as a work item to write that test case.  This way we can get this functionality landed.
Attachment #8369768 - Flags: review?(jmaher) → review+
Shouldn't we make this method a class method? That way we could call terminate_process even without a pid instance.
Okay I'll do what you suggest, then. In setting up a helper file (the python script) I assume creating a script (e.g. helper.py) in the tests/ directory is sufficient? If so, how do I launch such a process using mozprocess? Is filing for a bug as simple as putting 'pass' in the function body with a comment that it should be filed as a bug?
There is no need for an additional Python module here. Just mark the above mentioned method as a class method. We have a couple of other instances in mozbase already. So please have a look at this link:

https://github.com/mozilla/mozbase/search?q=classmethod&source=c
I think proclaunch:
https://github.com/mozilla/mozbase/blob/master/mozprocess/tests/proclaunch.c

will be good here.  There are examples of using it in the source- I think we are really close.  We can always file a bug and put a comment in for a test case that we need to write but don't have the time, skills, or other pre-requisites for.
Henrik, if I made the method into a class method would we have access to a pid at all? As far as I understand we would only have access to a pid if the class method was for a process class (instead of a ProcTest class, like it is now). 

Joel, I've been looking through the code for examples on launching a process with proclaunch but am still unsure as to how exactly to do that.

While this is a good discussion I feel we've deviated from the main focus of this bug and should file another bug which would be to finish the test cases so we can get this bug resolved and move the discussion of using proctest vs. a classmethod to a more appropriate bug. I'll upload what I have so far and hopefully we can move forward from there.
Attached patch Proposed test cases (obsolete) — Splinter Review
Attachment #8369768 - Attachment is obsolete: true
Attachment #8373483 - Flags: review?(jmaher)
(In reply to Ryan from comment #40)
> Henrik, if I made the method into a class method would we have access to a
> pid at all? As far as I understand we would only have access to a pid if the
> class method was for a process class (instead of a ProcTest class, like it
> is now). 

Ups, sorry. I should have actually taken a full look at the code before mentioning that. I did that now and I have seen that this module only holds methods, that's all. So we are indeed fine and do not have to change something. 

> While this is a good discussion I feel we've deviated from the main focus of
> this bug and should file another bug which would be to finish the test cases
> so we can get this bug resolved and move the discussion of using proctest

We should not land new features without tests. Same applies to bug fixes. If we want to have sane and working code, tests are always required. So I see this crucial as part of this bug.

Also a note. Please try to upload a single patch for all, means the code changes and the test additions. That way it is easier to test and to review.
Comment on attachment 8359334 [details] [diff] [review]
Proposed Solution

Review of attachment 8359334 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the drive-by review but I have some concerns with the patch. Please find my comments inline.

::: mozprocess/mozprocess/pid.py
@@ +89,5 @@
> +    ret = ''
> +    if mozinfo.isMac:
> +        signals = ('SIGTERM', 'SIGKILL')
> +    elif mozinfo.isLinux:
> +        signals = ('SIGTERM', 'SIGKILL', 'SIGABRT')

What about Windows? You will get a signals used before declared failure.

@@ +95,5 @@
> +        for sig in signals:
> +            if not [p for p in ps(psarg) if pid == int(p['PID'])]:
> +                break
> +            os.kill(pid, getattr(signal, sig))
> +            time.sleep(timeout)

You should not sleep here but call wait() if necessary with a timeout. Otherwise you will return too early.

@@ +96,5 @@
> +            if not [p for p in ps(psarg) if pid == int(p['PID'])]:
> +                break
> +            os.kill(pid, getattr(signal, sig))
> +            time.sleep(timeout)
> +            ret = "killed with %s" % sig

Why can't we simply return the real exit code here? I don't think it is helpful to have a string like 'killed with -9'. If kill was not successful None should be returned
Attachment #8359334 - Flags: feedback-
windows already has a kill method, we are just adding this for linux/osx.  In fact this code was originally taken from existing code that works.  I am fine making this better- mozprocess should be a best of breed.  This bug was slated as a good first bug, and I want to keep it that way- lets all work together to wrap this up and get this feature added to mozprocess!
Comment on attachment 8373483 [details] [diff] [review]
Proposed test cases

lets move these tests into the same patch as the fix for the code.  That is reasonable.  I think adding a test case for test_cannot_kill is good, but a bit out of scope.  We are gaining adequate test coverage here.  If you can file a bug (clone this bug for the most part) for that specific test case, that would make me happy.

r- as we should make a single patch.
Attachment #8373483 - Flags: review?(jmaher) → review-
Comment on attachment 8373483 [details] [diff] [review]
Proposed test cases

>    @unittest.skip("We must first make an unkillable process")
>    def test_cannot_kill(self):
>        """fail to kill a process"""
>        # bug: make and try to kill an unkillable process
>        pass

I wonder if we can do this at all. How would you clean-up the test so this process doesn't hang around forever? We might be able to have a process which you cannot kill via SIGTERM but SIGKILL. But that would also only work on Linux and OS X. Not sure what to do on Windows. If we unskip here, please do it in a way that we know which bug will add this. Means please file the follow-up bug, and reference it here like 'Bug XYZ - Create test for unkkillable process'.
Attachment #8373483 - Flags: review- → review?(jmaher)
Comment on attachment 8373483 [details] [diff] [review]
Proposed test cases

good point- lets just forget the hung/dead process.  I like the patch and tests then, :whimboo, do you have other requirements for this?
Attachment #8373483 - Flags: review?(jmaher) → review+
Flags: needinfo?(hskupin)
Beside a sane API which is not that clear as of right now, not. We really shouldn't return a string but the returncode form kill() so we can handle it appropriately. I had to fix a couple of issues lately which also handled those parts a bit loosely.
Flags: needinfo?(hskupin)
This is the git diff for both the terminate_process function and the tests thereof. Here are the changes:
- os.kill() call is now in a try block in the event that the process cannot be killed for whatever reason
- os.waitpid() call has been added within a try block, it's in a try block to make sure that if it already killed the OSError is caught
- time.sleep() is still called with a value of 'timeout'. This will allow the timeout to still exist, since any os.wait() operations don't take a timeout value (as far as I know, please correct me if I'm wrong)
- os.kill() returns None (always, I believe) and makes the exit code equal to the second input value (getattr(signal, sig), in this case) so I kept the return value of 'ret' as a string and added the exit code as well as the signal (would you think just the value of the exit code would be better?)
- return value: if 'ret' is an empty string None is returned and if not the string with the exit code and terminate signal is returned instead

in the test file:
- in test_not_running_kill() 'process_id' has been changed to -2 because of how waitpid() works
- all 'should_return' or 'should_not_return' values have been changed to None instead of the empty string

Sorry I haven't been working on this lately, but I have spring break this week so I'd like to finish this up soon! Let me know what you think and I'll get right to work!
Attachment #8359334 - Attachment is obsolete: true
Attachment #8373483 - Attachment is obsolete: true
Attachment #8387883 - Flags: review?(jmaher)
Attachment #8387883 - Flags: review?(hskupin)
Comment on attachment 8387883 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8387883 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for uploading a new patch.  I think we were pretty much in agreement that we were near the end previously.  This patch that you uploaded only has changes to the pid.py file.  Can you refresh the patch with all the changes?
Attachment #8387883 - Flags: review?(jmaher) → review-
Attached patch Proposed solution and tests (obsolete) — Splinter Review
Oops, sorry about that! I thought I uploaded the correct file. Anyway, this should be the correct file. Once you tell me the patch looks good I'll file a bug and put the bug ID number into the test_cannot_kill() method and upload the full diff (like this one) but with the bug ID included
Attachment #8387883 - Attachment is obsolete: true
Attachment #8387883 - Flags: review?(hskupin)
Attachment #8389199 - Flags: review?(jmaher)
Comment on attachment 8389199 [details] [diff] [review]
Proposed solution and tests

Review of attachment 8389199 [details] [diff] [review]:
-----------------------------------------------------------------

really close.

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +30,5 @@
> +        self.assertEqual(ret, should_return)
> +
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +        ret = pid.terminate_process(some arbitrary process, None)

"some arbitrary process" doesn't seem to be valid, do we need to launch a process here so we can kill it?

also what should it return?
Attachment #8389199 - Flags: review?(jmaher) → review-
I think it would be best if I were to launch another process. It seems like to do that the following steps are necessary (please correct me if I'm wrong!):
1 - import proclaunch.py
2 - create a new process of class ProcessLauncher
3 - get the pid of the process that was just launched (I don't know how to do this)
4 - use the returned pid to kill the process

It would seem that most of the tests here are using ProcessHandler and not ProcessLauncher, would it be better if I were to use that instead? 

The return value should be a string which indicates the signal it was killed with and the exit code it returned upon being killed.
yes, this is what I was suggesting.  instead of importing proclaunch, we could reference it as it is done here:
http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/tests/test_mozprocess_wait.py#17

whatever you feel is most effective.  

we could use mozprocess to get the pid, and then try killing it.

Thanks for sticking with this bug and making it a reality!
Joel,

I think I may have finished it. I am unsure of only two things in this patch:

1 - in test_not_running_kill() I am not sure if -2 is a safe value to use for a pid that is guaranteed to not be running. Do you know anything about this? I changed it to -2 from -1 after reading that -1 is a special value for that function (see here: http://docs.python.org/2/library/os.html#os.kill). 

Perhaps I should also launch a process here, kill it, then try to kill it again.

2 - in test_none_timeout() I am not sure as to the exact syntax of launching a process then getting its pid. I gave it my best shot but please show me any mistakes you see.
Attachment #8389199 - Attachment is obsolete: true
Attachment #8391439 - Flags: review?(jmaher)
Comment on attachment 8391439 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8391439 [details] [diff] [review]:
-----------------------------------------------------------------

the tests still fail a lot even if I add some stuff locally, can you test locally and get the tests passing?

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +3,5 @@
> +import os
> +import time
> +import unittest
> +import proctest
> +from mozprocess import pid

we need to import processhandler (see other tests for examples)

@@ +30,5 @@
> +        self.assertEqual(ret, should_return)
> +
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +		 p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))

the spacing here looks odd, could just be what I see in the patch.

@@ +33,5 @@
> +        """passing in None for a timeout"""
> +		 p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))
> +        p.run()
> +        p.wait()
> +        ret = pid.terminate_process(p.proc.pid, None)

I think p.wait() waits until the process is done.  Did you run this locally and did it work?

@@ +35,5 @@
> +        p.run()
> +        p.wait()
> +        ret = pid.terminate_process(p.proc.pid, None)
> +        should_not_return = None
> +        self.assertNotEqual(ret, should_not_return)

we need to add:
if __name__ == '__main__':
    unittest.main()
Attachment #8391439 - Flags: review?(jmaher) → review-
After reviewing what you said I went over some things and made a few changes. You'll notice the biggest change is in test_none_timeout() where it runs and kills a process and then checks to see if the exit signal is contained in the returned string.
After running these locally they all work as expected. Let me know if you see anything else wrong!
Attachment #8391439 - Attachment is obsolete: true
Attachment #8391630 - Flags: review?(jmaher)
Comment on attachment 8391630 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8391630 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +102,5 @@
> +            except OSError:
> +                break
> +            if timeout:
> +                time.sleep(float(timeout))
> +            ret = "killed with %s exit code %d" % (sig, getattr(signal, sig))

This will not kill the process immediately. Sometimes it will still take a couple of milliseconds until it is gone. I don't think consumers should have to poll os.poll() until it is not None. So the sleep below should better be a loop with that exit condition.

@@ +104,5 @@
> +            if timeout:
> +                time.sleep(float(timeout))
> +            ret = "killed with %s exit code %d" % (sig, getattr(signal, sig))
> +            try:
> +                os.waitpid(pid, 0)

I think we should file a follow-up bug to have a conform way of handling the wait calls:

http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/mozprocess/processhandler.py#545

@@ +112,5 @@
> +        print "WARNING: failed os.kill: %s : %s" % (errno, strerror)
> +    if not ret:
> +        return None
> +    else:
> +        return ret

As mentioned earlier on this bug, why do we have to return a string? Each consumer of this method would have to parse it. Why can't we just return the exit code?
I agree with Henrik's comment about the return code being a string.  We should do that and make things better.
Comment on attachment 8391630 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8391630 [details] [diff] [review]:
-----------------------------------------------------------------

the tests work great, lets fix the return value to not be a string and we should be ready to complete this.  Thanks for sticking with it!
Attachment #8391630 - Flags: review?(jmaher) → review-
Changes:
- now returns either an int (the exit code) or None
- tests now account for that

Henrik,

When you mention having a conform way to handle wait calls, what do you mean? They way I call the waitpid() function is the same way it's called in the reference you provided.

As for the sleep loop, if it happens that the signal that was used to attempt to kill the process did not work, the loop would run infinitely because it would not have been able to kill the process. I agree that it would be better as a loop with a solid exit condition as you mentioned, but I don't see a way around making an infinite loop. Please let me know what you think.

Thanks for your input!
Attachment #8391630 - Attachment is obsolete: true
Attachment #8394584 - Flags: review?(jmaher)
Flags: needinfo?(hskupin)
Comment on attachment 8394584 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394584 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for putting this together!  This patch looks good, lets see what :whimboo says.
Attachment #8394584 - Flags: review?(jmaher) → review+
Comment on attachment 8394584 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394584 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +104,5 @@
> +            if timeout:
> +                time.sleep(float(timeout))
> +            ret = getattr(signal, sig)
> +            try:
> +                os.waitpid(pid, 0)

What you want here is to store the state which is returned by waitpid. That we can assume as the final exit code, and we should return. Please check the other code I have mentioned how to interpret the status.

As a final solution we most likely want to have a shared function for this kind of waitpid() call, so we don't have duplicated code. But we could do that in a follow-up bug.
Flags: needinfo?(hskupin)
Changes:
- changed the return value from a supposed exit code (found from calling getattr()) to a confirmed exit code (found from os.waitpid())

I ran the tests locally and they all passed. 

Henrik,

I tried my best to replicate what you showed me and adapt it to this function. Please let me know if there's more to clean/finish up. 

Just to make sure we're on the right page: once the bug is declared "resolved" I should file a follow up bug for the test_cannot_kill() test and put the bug number and description as a comment in the file and then submit that as a patch, correct?
Attachment #8394584 - Attachment is obsolete: true
Attachment #8394760 - Flags: review?(jmaher)
Comment on attachment 8394760 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394760 [details] [diff] [review]:
-----------------------------------------------------------------

I am going to pass the review to :whimboo, I am fine with this patch, but my expertise is not in process handling.
Attachment #8394760 - Flags: review?(jmaher) → review?(hskupin)
Comment on attachment 8394760 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8394760 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozprocess/mozprocess/pid.py
@@ +103,5 @@
> +            if timeout:
> +                time.sleep(float(timeout))
> +            try:
> +                exitCode = os.waitpid(pid, 0)[1]
> +                return exitCode

Please check the documentation in-front of the code I have referenced. The exit code is in the high 16bit part, so you also have to do the transformation.

Best is if you update your test to check that e.g. SIGTERM causes terminate_process() to return the correct exit code. Right now you don't get this.
Attachment #8394760 - Flags: review?(hskupin) → review-
Changes:
-returns the high bit from the exit code 

Henrik,

The only change I implemented was to fix the return value from the full exitCode to the high bit of the exit code if it's greater than 255.
Attachment #8394760 - Attachment is obsolete: true
Attachment #8396920 - Flags: review?(hskupin)
Comment on attachment 8396920 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8396920 [details] [diff] [review]:
-----------------------------------------------------------------

I know that we want to get this patch landed soon. Also I don't want to be the person to hold off landing it, but I still have some little concerns about the current API implementation. I would be ok, to get all of them addressed in a follow-up bug, but for that please those needed bugs and mark them as dependent on this one. At least please address the first and third inline comment and update the patch. With that you have my r+.

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +18,5 @@
> +
> +    @unittest.skip("We must first make an unkillable process")
> +    def test_cannot_kill(self):
> +        """fail to kill a process"""
> +        # bug: make and try to kill an unkillable process

Can you please create the bug, so we can have a valid reference here? Thanks.

@@ +26,5 @@
> +        """passing in None for a pid
> +            i.e. killing a nonexistent process"""
> +        ret = pid.terminate_process(None)
> +        should_return = None
> +        self.assertEqual(ret, should_return)

I think it would be better to not return None in case of a non-existent process but raise an exception. None would mean that the process is still running, so it can cause wrong assumptions.

@@ +30,5 @@
> +        self.assertEqual(ret, should_return)
> +
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +        p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))

Mind doing wrapping here? It's good practice to try to keep the 80 char limit for a line.

@@ +33,5 @@
> +        """passing in None for a timeout"""
> +        p = processhandler.ProcessHandler([self.python, self.proclaunch, "process_simple_tokill.ini"], cwd=(os.path.dirname(os.path.abspath(__file__))))
> +        p.run()
> +        ret = pid.terminate_process(p.proc.pid, None)
> +        self.assertTrue(isinstance(ret, (int,long)))

I think we should enhance the tests and check for the correct exit code. Means if we kill with e.g. sigkill it should be equal sigkill (-9). But I'm happy if we do this in a follow-up bug too.
Attachment #8396920 - Flags: review?(hskupin) → review+
Whiteboard: [good first bug] → [mentor=jmaher][lang=py][good first bug]
Changes:
-registered bug #988716
-terminate_process() now raises an exception upon None value for pid
-test cases reflect the newly raised exception
-wrapped lines to be <=80 characters

If there are any problems let me know. Thanks for helping me through this process!
Attachment #8396920 - Attachment is obsolete: true
Attachment #8397602 - Flags: review?(hskupin)
Comment on attachment 8397602 [details] [diff] [review]
Proposed tests and terminate_process function

Review of attachment 8397602 [details] [diff] [review]:
-----------------------------------------------------------------

Ryan, I hope you are not disheartened when I bring up some more stuff. But I think that those are crucial, and should be fixed. You did great work so far, and I think you can still learn a lot from the remaining things to do on this bug. I can only tell from myself when I was working on a couple of mozprocess issues in the last weeks, that all that process handling is kinda complex.

So if some of the inline comments are not clear enough for you, please let me know. I'm totally happy to assist you here.

@Joel, shall we change the mentor flag?

::: mozprocess/mozprocess/pid.py
@@ +88,5 @@
> +    Args:
> +        pid: integer process id of the process to terminate
> +    """
> +	 if not pid:
> +	 	 raise Exception("No process id provided")

This will help you to react accordingly to a pid set to None, but it will not be able to detect if the pid as specified is valid process id. Here the exception raised os.kill() would be important.

@@ +96,5 @@
> +        signals = ('SIGTERM', 'SIGKILL', 'SIGABRT')
> +    try:
> +        for sig in signals:
> +            if not [p for p in ps(psarg) if pid == int(p['PID'])]:
> +                break

If we do not find the given pid can we assume the pid is not correct, and should raise our own e.g. ProcessNotKilledError exception? That way we would not return None, and the caller could correctly handle the failure, and wouldn't assume that the process is still running.

@@ +100,5 @@
> +                break
> +            try:
> +                os.kill(pid, getattr(signal, sig))
> +            except OSError:
> +                break

I think here is a logical misbehavior. We should not break the loop as long as we haven't called kill() for each available signal, e.g. SIGTERM could fail but SIGNKILL succeed. Right now we would break the loop but not continue with the next signal. At the very end we might also raise a ProcessNotKilledError exception.

Sorry that I haven't seen this before.

@@ +104,5 @@
> +                break
> +            if timeout:
> +                time.sleep(float(timeout))
> +            try:
> +                exitCode = os.waitpid(pid, 0)[1]

While checking os.waitpid() I wonder if the call to sleep above is actually necessary. Wouldn't waitpid() wait anyway? With the extra timeout we only loose valuable time, e.g. if you specify 5s we would wait 5+x seconds while without it only x seconds.

@@ +109,5 @@
> +                if exitCode > 255:
> +                    return exitCode >> 8
> +                return -exitCode
> +            except OSError:
> +                break

We might not want to silently drop this exception. We should at least log what went wrong.

@@ +111,5 @@
> +                return -exitCode
> +            except OSError:
> +                break
> +    except OSError, (errno, strerror):
> +        print "WARNING: failed os.kill: %s : %s" % (errno, strerror)

This is a no-op given that such an exception will never reach this level. As of now you catch all of them inside the loop.

Also with our own exception type we wouldn't need it given that the caller should handle it accordingly.

Also in case of success we should never reach this line, but only in case of a failure. And then an exception should be raise. A situation would be (as mentioned above) when we tried each signal without success and finished the loop.

::: mozprocess/tests/test_mozprocess_kill_process.py
@@ +30,5 @@
> +    def test_none_timeout(self):
> +        """passing in None for a timeout"""
> +        p = processhandler.ProcessHandler([self.python, self.proclaunch, 
> +						 "process_simple_tokill.ini"], 
> +						 cwd=(os.path.dirname(os.path.abspath(__file__))))

nit: the indentation is off here and you have trailing spaces in two of the above lines.
Attachment #8397602 - Flags: review?(hskupin) → review-
Henrik,

I see this as a process to improve my coding and an experience to write production level code. I plan on seeing this bug to the end! Thank you for sticking with me through this long process. I'll get to work on this and try to fix it to the best of my ability.
Henrik,

I've been working on this for a little while and have a couple questions:
- if we remove the lines which use the 'timeout' variable then it becomes an unnecessary variable, should it be removed from the function? This is an important because if it should be removed, the tests need to be refactored. 
- If we decide to raise our own exception, a class needs to be made for that exception. Where should that file go? Also, is there a Mozilla standard for creating error classes? If so, could you please provide a reference?
- In the try block with the waitpid() call, when an error is caught should a different error be raised than ProcessNotKilledError? I would guess it would be something like ProcessAlreadyKilledError.

I'll keep you posted if I have any more questions.
(In reply to Ryan from comment #72)
> - if we remove the lines which use the 'timeout' variable then it becomes an
> unnecessary variable, should it be removed from the function? This is an
> important because if it should be removed, the tests need to be refactored. 

That's correct. As pinted out for the process manager module we also don't have a timeout in the wait() method. I would propose we keep the behavior, and file a follow-up bug to introduce a timeout for both in the future. The other options would be to use a while loop for `pid.poll() == None` and do the check for timeout in there, or use a signal handler which raises an exception after the timeout occurred. The latter might be good if we really have an non-killable process.

> - If we decide to raise our own exception, a class needs to be made for that
> exception. Where should that file go? Also, is there a Mozilla standard for
> creating error classes? If so, could you please provide a reference?

I would suggest to create an errors.py file under mozprocess and export it by default in __init__.py. You can have a look at mozrunner, where I already did that.

> - In the try block with the waitpid() call, when an error is caught should a
> different error be raised than ProcessNotKilledError? I would guess it would
> be something like ProcessAlreadyKilledError.

The method you are working on is targeted to kill a process. At the beginning you are checking that the pid exists. So whatever we do in killing it and we don't succeed, we should have a ProcessNotKilledError.
Status: NEW → ASSIGNED
Hi Ryan, it's been a while since the last update on this bug. Maybe you are covered with other work. Would you mind to give us an update for this bug? Thanks a lot.
Hi Henrik, I'm sorry it's been so long, but I haven't worked on this bug as I've been very busy with work for my internship. I don't think I'll have time to devote to finishing this bug. Sorry for leaving this unfinished.
Mentor: jmaher
Whiteboard: [mentor=jmaher][lang=py][good first bug] → [lang=py][good first bug]
Hi,

Seems like this bug needs just a little bit more from what Ryan has built on.

Is this still a relevant bug? If so, I am interested in completing what Ryan has [almost] finished.
Hi Daniel. Yes, this bug is still relevant and it would be great to get it finished. You are welcome to help us here. If you need any kind of information don't hesitate to ask.

One thing to note is that the canonical repository has been moved from github into mozilla-central. So the filenames in the patch will need an update. Please see the URL field for all the details in handling the code and reviews.
Assignee: ryan.da.baker → devty1023
Mentor: hskupin
Whiteboard: [lang=py][good first bug] → [lang=py]
Hi Henrik,

Thanmks for the pointers! Will start working on the ticket.

Daniel
Hi Henrik,

Sorry! It took a while for me to read through this thread to digest the state of this ticket.

I have an unresolved issue I like to discuss with you before I submit a patch for a reivew. 

From my understanding, we are looping through a set of signals, each with increasing "strength" to make sure that we try all means possible to terminate the process. Say the process we are trying to kill handles SIGTERM differently. Wouldn't using os.waitpid() hang? If so, what would be the point of trying out different sets of signal when the loop proceeds only if SIGTERM worked?
That's an interesting question. Maybe you could simulate that by not sending SIGTERM and check how the code would behave then? If that call would hang, we indeed would have to change it. Maybe the solution as described on that stackoverflow page would be better:

http://stackoverflow.com/questions/13399734/how-to-find-out-when-subprocess-has-terminated-after-using-os-kill
Hi Henrik,

Some initial investigation.

1) os.waitpid(pid, option) only works if the pid we are waiting for is a child process of the system thats calling it - that is, we cannot wait for a termination of external process. I find this extremely restrictive - need your thoughts on this.

====================
shell> sleep 1000 &
[1] 16170
shell> python
...
>>>import os
>>> os.waitpid(16170,0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 10] No child processes
====================

2) If the process we are trying to kill is one of our child process. waitpid() will hang if we do not send any signal to kill it.
====================
>>> import os
>>> import subprocess
>>> p = subprocess.Popen(["sleep", "1000"])
>>> p.pid
16192
>>> os.waitpid(16192, 0)
...
====================

3) what we can do instead is to check to see if the process has responded to SIGTERM (i.e. it died) by sending in signal 0 with os.kill() after some timeout. If the process did not die by SIGTERM, we will try a stronger signal (i.e. SIGKILL)

I will implement the above approach once I get your comments.

Our of curioisity - why do we try to send SIGABRT after SIGKILL for OSX? Is it possible for OSX application to ignore SIGKILL? Possible reference I found here: (https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man3/signal.3.html)
(In reply to Daniel Y Lee from comment #81)
> 1) os.waitpid(pid, option) only works if the pid we are waiting for is a
> child process of the system thats calling it - that is, we cannot wait for a
> termination of external process. I find this extremely restrictive - need
> your thoughts on this.
> 
> ====================
> shell> sleep 1000 &
> [1] 16170
> shell> python
> ...
> >>>import os
> >>> os.waitpid(16170,0)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OSError: [Errno 10] No child processes
> ====================

Ok, so that's how it has been reported on the linked document. That sounds fair, and what the current version of the patch has been implemented.

> 2) If the process we are trying to kill is one of our child process.
> waitpid() will hang if we do not send any signal to kill it.
> ====================
> >>> import os
> >>> import subprocess
> >>> p = subprocess.Popen(["sleep", "1000"])
> >>> p.pid
> 16192
> >>> os.waitpid(16192, 0)
> ...
> ====================

Ok, so that's indeed a problem. Looks like we can't use waitpid then.

> 3) what we can do instead is to check to see if the process has responded to
> SIGTERM (i.e. it died) by sending in signal 0 with os.kill() after some
> timeout. If the process did not die by SIGTERM, we will try a stronger
> signal (i.e. SIGKILL)
> 
> I will implement the above approach once I get your comments.

That sounds perfect! Good investigation Daniel! I fully support this idea.

> Our of curioisity - why do we try to send SIGABRT after SIGKILL for OSX? Is
> it possible for OSX application to ignore SIGKILL? Possible reference I
> found here:
> (https://developer.apple.com/library/ios/documentation/System/Conceptual/
> ManPages_iPhoneOS/man3/signal.3.html)

Good question. As I also read on various other pages SIGABRT is mostly used inside an application or by a library in use when calling the abort() function. So external applications should not send such a signals to a target process. SIGKILL should indeed be totally enough.
One last thing - I am observing something quite tricky while I'm implementing this.

Say I'm given a pid to kill. We planned to do something like

>>>os.kill(pid, signal.SIGTERM)

to kill it. The problem is, when a process sends a SIGTERM/SIGKILL signal to a child process that is NOT is own, the process we've "killed" becomes a zombie/defunct process. It stays like this until the parent who spawned the process is also killed OR the parent finishes waiting on the child process to terminate.

This brings about 2 issues:

1) Is this okay? Zombie processes might be everywhere?

2) Having zombie process is troublesome since we need to come up with a new way determine if the process is really dead. The talos way of calling ps and seeing whether pid exists is no longer valid (the process is effectively dead, but it is still in the process table). My way of checking whether a process is dead by sending signal 0 also does not work (somehow, a zombie process can still respond to any signal we send to it.)

So I'm going to have to come up with a new way of checking whether a process is dead/is zombie. Tricky, but I can investigate.

Please share what how think about this issue.
Good question, where I don't really have a straight answer for. I never had to handle processes spawned by other applications. So maybe you should play a bit with the SIGCHLD handler, and its different options? Maybe we can get the child process to completely quit, with that way. See http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html for the C variant. I wonder how the `kill` command line tool works, given that it does it correctly.

Which specific external tool were you trying to kill? Usually the parent process should read the child state, and react on ended child processes. Maybe the above mentioned SIGCHILD handling might work? Otherwise how long does it take until this zombie goes away for your case?
I was working on writing a suite of testcases for my code, using processHandler module to spawn a process. It was a bit puzzling since I would not have considered process spawned via processHandler to be an "external application's process".

For clarity, I've implemented two test caes, both of which fail!

    def test_terminate_process(self):
        p = processhandler.ProcessHandler([self.python, self.proclaunch,
                        "process_simple_tokill.ini"],
                        cwd=(os.path.dirname(os.path.abspath(__file__))))
        p.run()
        ret = pid.terminate_process(p.proc.pid,10)
        self.assertEqual(ret, p.proc.pid)

    def test_terminate_process_2(self):
        p = subprocess.Popen(["sleep", "1000"])
        ret = pid.terminate_process(p.pid,10)
        self.assertEqual(ret, p.pid)

Here is a typical output:

terminate_process called with: (13400,10)
trying out sig 15 for process 13400
## result of ps
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/3', 'PID': '13400', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGTERM failed to kill process 13400

trying out sig 9 for process 13400
## result of ps
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/3', 'PID': '13400', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGKILL failed to kill process 13400

Notice the "<defunct>" keyword. Also 'STAT' notes a state of 'Z' - consultation with man pages reveal that 'Z' stands for 'Zombie'.

If I'm understanding SIG_CHLD correclty, it is signal sent from the child process its parent when it exits. Not sure if it will have any meaningful impact for the child process when it receives (unless the child process has spawned a process and is waiting for it. Then what we are doing is introducing a bug?).

My next bet is to check whether (a) process exists in ps and (2)it is not a zombie state as the predicate to my is_running(pid) function. I'm not 100% confident this is what is desired (again, I don't guarantee that zombie process will actually be cleaned up once the parent gets its SIGCHLD signal). I'm leaning towards the opinion that it will be acceptable.
Would you mind delaying the `ps` call or running it in a loop for a while? I wonder if the process gets cleaned up by init a short while after. That's how it should usually work.
The current implementation does something like that. Here's a snippet:

        for sig in signals:
            try:
                print "trying out sig %s for process %s" % (getattr(signal,sig), str(pid))
                os.kill(int(pid), getattr(signal, sig))
                time.sleep(float(timeout)) ## LOOK HERE
                if not is_running(pid):
                    return pid
                else:
                    print [p for p in ps(psarg) if pid == int(p['PID'])]
                    print "sig %s failed to kill process %s" % (sig, str(pid))

            except OSError as e:
                return None
            except TypeError as e:
                timeout=3.0
                time.sleep(float(timeout))
                if not is_running(pid):
                    return pid

I've tried waiting upto 30 seconds, still no cigar!
So what's the parent process, of the child you kill here? Looks like it's not the init process.

Would you mind to create a minimized script which perfectly illustrates that? I could also have a look on my box.
Attached file pid.py
Basically what I'm doing is to craft and run and test the functionality. 

I've rooted out the important snippets on the comments above - so I'm not 100% sure if this suits your need.
Flags: needinfo?(hskupin)
Daniel, I'm so sorry. That needinfo request totally fall through in my inbox. I have seen it right now. So I hope we can still continue in fixing it?
Flags: needinfo?(hskupin)
Attachment #8486514 - Attachment mime type: text/x-python → text/plain
Attachment #8486515 - Attachment mime type: text/x-python → text/plain
So I tested the snippet you gave above after modifying the main routine a bit:

if __name__ == '__main__':
    for i in sys.argv[1:]:
        for pid in sorted(get_pids(i)):
            terminate_process(pid)

        print 'Remaining pids: %s' % get_pids(i)


When I run the script now, I'm always able kill processes, and terminate_process returns without a failure after the 3s timeout. A check if pids still exist for the given process name returns an empty array. Here an example for two running Firefox instances on my Ubuntu box:

$ python pid.py firefox
terminate_process called with: (3849,3)
trying out sig 15 for process 3849
terminate_process called with: (16328,3)
trying out sig 15 for process 16328
Remaining pids: []

Both pids do no longer exist, and I also don't see a zombie process. Am I doing something wrong to not see your problem? Please let me know. Otherwise we should try to finish the patch and add the test cases to the same patch as the code changes are contained in. Thanks.
Flags: needinfo?(devty1023)
I'll get on to it :)
Flags: needinfo?(devty1023)
Okay! Took me a while to get myself into this ticket again.

I want to ask you to try to create a process using ProcessHandler and kill it using terminate_process.

Here is the one that I'm having trouble with:

class TestKillProcess(proctest.ProcTest):
    """Class to test terminating a process given pid"""
    def test_terminate_process(self):
        p = processhandler.ProcessHandler([self.python, self.proclaunch,
                                            "process_normal_finish_python.ini"],
                                            cwd=(os.path.dirname(os.path.abspath(__file__))))
        p.run()
        ret = pid.terminate_process(p.proc.pid,10)
        self.assertEqual(ret, p.proc.pid)

Or even simpler, using Pythhon's subprocess module.

    def test_terminate_process_2(self):
        p = subprocess.Popen(["sleep", "1000"])
        ret = pid.terminate_process(p.pid,10)
        self.assertEqual(ret, p.pid)


Both emits the following output

terminate_process called with: (10395,10)
trying out sig 15 for process 10395
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/7', 'PID': '10395', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGTERM failed to kill process 10395
trying out sig 9 for process 10395
[{'COMMAND': '[python] <defunct>', 'TTY': 'pts/7', 'PID': '10395', 'STAT': 'Z', 'TIME': '0:00'}]
sig SIGKILL failed to kill process 10395


So SIGTERM did kill the process - unfortunately it's a zombie process. 

Good chance that this has to do something with how ProcessHandler is spawning the processes I am trying to kill - since ProcessHandler object has yet to claim the process that is now effectively dead, we are left with zombie classes?

In all honesty, I have no idea why this I am getting zombie processes - I looks to me that the "parent process" (ProcessHandler or subprocess.popen) is the same one that is calling terminate_process. 

Would like to see if you are able to replicate the issue I am seeing
Flags: needinfo?(hskupin)
whimboo, can you address the question here?  It is a few months overdue!
I'm really sorry to jump in and say this after so much effort has gone into this bug.. but why can't we just use something else to kill the process?

In my mind mozprocess should only be concerned with the processes (and child processes) that it spawns itself. I'm afraid that adding API's for manipulating arbitrary 3rd party processes is opening a can of worms. There are already tons of libraries for manipulating processes, including psutil which is already checked into the tree. Here is an example of killing a process using psutil:
https://dxr.mozilla.org/mozilla-central/source/python/psutil/examples/killall.py#25
I agree with Andrew. I filed bug 1139487 to remove the mozprocess.pid code altogether. psutil works great, there is no good reason for us to try reinventing it. If there is a need for something specific in talos and it isn't possible to bring psutil there, we should just do a local solution until we can. I'm going to resolve this as wontfix; apologies to everyone involved for the wasted time/energy.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(hskupin)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: