Closed Bug 701755 Opened 13 years ago Closed 12 years ago

be over aggressive in killing off servers during the setup scripts for a test run

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: bear)

References

Details

(Keywords: intermittent-failure, Whiteboard: [mobile_unittests][android_tier_1])

Attachments

(1 file, 2 obsolete files)

try and remove all related child processes for a tegra during cleanup 12 years ago Mike Taylor [:bear] 3.34 KB, patch	mozilla : review+ hwine : feedback+	Details \| Diff \| Splinter Review
try and remove all related child processes for a tegra during cleanup 12 years ago Mike Taylor [:bear] 3.62 KB, patch	mozilla : review-	Details \| Diff \| Splinter Review
try and remove all related child processes for a tegra during cleanup 12 years ago Mike Taylor [:bear] 3.77 KB, patch	mozilla : review+ bear : checked-in+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

13 years ago

Last quarter we landed code to ensure our test harnesses will cleanup properly on errors.  This works great, but there are scenarios where the device loses connectivity and we kill the entire job from buildbot/twisted error handling.  The end result is we still have leftover webservers sitting around.

We need to be super aggressive before each test run while we are installing the .apk and look for anything running on the ports that we care about and kill -9 those processes.

Mike Taylor [:bear]

Assignee

Updated

•

13 years ago

Assignee: nobody → bear

(no longer active)

Updated

•

13 years ago

Blocks: 438871

Mike Taylor [:bear]

Assignee

Comment 1

•

12 years ago

Attached patch try and remove *all* related child processes for a tegra during cleanup (obsolete) — Details — Splinter Review

Attachment #587118 - Flags: review?(aki)

Mike Taylor [:bear]

Assignee

Updated

•

12 years ago

Attachment #587118 - Flags: feedback?(hwine)

Hal Wine [:hwine] use NI!

Comment 2

•

12 years ago

Comment on attachment 587118 [details] [diff] [review]
try and remove *all* related child processes for a tegra during cleanup

Looks as if the return value from checkStalled is:
 False - no processes were found to be killed
 True - some processes were found to be killed from 'ps' OR
        some "well known" processes were found and could not be killed
(i.e. it's not related to whether there are still processes running after function completes)

That seems an odd enough contract to be worth a comment.

Attachment #587118 - Flags: feedback?(hwine) → feedback+

Mike Taylor [:bear]

Assignee

Comment 3

•

12 years ago

(In reply to Hal Wine [:hwine] from comment #2)
> Comment on attachment 587118 [details] [diff] [review]
> try and remove *all* related child processes for a tegra during cleanup
> 
> Looks as if the return value from checkStalled is:
>  False - no processes were found to be killed
>  True - some processes were found to be killed from 'ps' OR
>         some "well known" processes were found and could not be killed
> (i.e. it's not related to whether there are still processes running after
> function completes)
> 
> That seems an odd enough contract to be worth a comment.

yea, I'm torn on that also - do I just assume that now with the more aggressive bits that it will always clean up and not return anything or make the True just be informational in the cleanup.py calling step

Aki Sasaki (not active)

Comment 4

•

12 years ago

(In reply to Mike Taylor [:bear] from comment #3)
> (In reply to Hal Wine [:hwine] from comment #2)
> > Comment on attachment 587118 [details] [diff] [review]
> > try and remove *all* related child processes for a tegra during cleanup
> > 
> > Looks as if the return value from checkStalled is:
> >  False - no processes were found to be killed
> >  True - some processes were found to be killed from 'ps' OR
> >         some "well known" processes were found and could not be killed
> > (i.e. it's not related to whether there are still processes running after
> > function completes)
> > 
> > That seems an odd enough contract to be worth a comment.
> 
> yea, I'm torn on that also - do I just assume that now with the more
> aggressive bits that it will always clean up and not return anything or make
> the True just be informational in the cleanup.py calling step

Maybe something like

CLEAN = 1
DIRTY_BUT_KILLED = 2
DIRTY_NOT_KILLED = 3

and return one of those?

Aki Sasaki (not active)

Comment 5

•

12 years ago

Comment on attachment 587118 [details] [diff] [review]
try and remove *all* related child processes for a tegra during cleanup

Hm, should you be checking the return value of the killPID call in checkStalled ?
I can see wanting to have more info about what's happening, either in output or in the return value or logs or w/e.

Attachment #587118 - Flags: review?(aki) → review+

Mike Taylor [:bear]

Assignee

Comment 6

•

12 years ago

Attached patch try and remove *all* related child processes for a tegra during cleanup (obsolete) — Details — Splinter Review

changed result from checkStalled to reflect if stalled pids were found and handled or found and not handled

Attachment #587118 - Attachment is obsolete: true

Attachment #596680 - Flags: review?(aki)

Aki Sasaki (not active)

Comment 7

•

12 years ago

Comment on attachment 596680 [details] [diff] [review]
try and remove *all* related child processes for a tegra during cleanup

>+if checkStalled(os.env['SUT_NAME']) > 1:
>+    setFlag(errorFile, "Remote Device Error: process from previous test run present")
>+    sys.exit(2)

I like how you put this in a method.
Why are we setting an error flag if we found old cruft but cleaned it up?
I would guess we'd want a warning at 2 and an error at 3.
Possibly an "all clear" message at 1 depending how verbose we want to be.

>+    for pid in pids:
>+        if not killPID(pid, sig=signal.SIGKILL, includeChildren=True):
>+            result := 3

I thought this was some sort of special python command, but on Python 2.6.7 this appears to be a typo:

>>> a = 2
>>> a := 3
  File "<stdin>", line 1
    a := 3
      ^
SyntaxError: invalid syntax

We're close though.

Attachment #596680 - Flags: review?(aki) → review-

Mike Taylor [:bear]

Assignee

Comment 8

•

12 years ago

Attached patch try and remove *all* related child processes for a tegra during cleanup — Details — Splinter Review

changed response to checkStalled to show a warning if the result was > 1 and removed errant Delphi-style assignment that snuck in

Attachment #596680 - Attachment is obsolete: true

Attachment #596711 - Flags: review?(aki)

Aki Sasaki (not active)

Comment 9

•

12 years ago

Comment on attachment 596711 [details] [diff] [review]
try and remove *all* related child processes for a tegra during cleanup

I almost want a final else: to catch any weirdness when errcode > 1 but is not 2 or 3.  But this will work without it.  r=me.

Attachment #596711 - Flags: review?(aki) → review+

Mike Taylor [:bear]

Assignee

Comment 10

•

12 years ago

Comment on attachment 596711 [details] [diff] [review]
try and remove *all* related child processes for a tegra during cleanup

committed changeset 2226:4ef60f593587

Attachment #596711 - Flags: checked-in+

Mike Taylor [:bear]

Assignee

Comment 11

•

12 years ago

overnight cronjob runs of this new code has removed all but 1 hung CP tegra \o/

marking as fixed

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Keywords: intermittent-failure

Nobody; OK to take it and work on it

Updated

•

12 years ago

Whiteboard: [orange][mobile_unittests][android_tier_1] → [mobile_unittests][android_tier_1]

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.

try and remove all related child processes for a tegra during cleanup 12 years ago Mike Taylor [:bear] 3.34 KB, patch	mozilla : review+ hwine : feedback+	Details \| Diff \| Splinter Review
try and remove all related child processes for a tegra during cleanup 12 years ago Mike Taylor [:bear] 3.62 KB, patch	mozilla : review-	Details \| Diff \| Splinter Review
try and remove all related child processes for a tegra during cleanup 12 years ago Mike Taylor [:bear] 3.77 KB, patch	mozilla : review+ bear : checked-in+	Details \| Diff \| Splinter Review